猪拉丁代码错误

数据挖掘 apache-hadoop 大数据
2022-02-11 04:36:22

运行下面的猪脚本时,我在第 4 行遇到错误:如果是,GROUP那么我收到错误。如果我在第 4 行从GROUPTO更改group,则脚本正在运行。组和组有什么区别。

LINES = LOAD '/user/cloudera/datapeople.csv' USING PigStorage(',') AS ( firstname:chararray, lastname:chararray, address:chararray, city:chararray, state:chararray, zip:chararray );

WORDS = FOREACH LINES GENERATE FLATTEN(TOKENIZE(zip)) AS ZIPS;

WORDSGROUPED = GROUP WORDS BY ZIPS;

WORDBYCOUNT = FOREACH WORDSGROUPED GENERATE GROUP AS ZIPS, COUNT(WORDS);

WORDSSORT = ORDER WORDBYCOUNT BY $1 DESC;

DUMP WORDSSORT;
1个回答

当我们对数据进行分组时,Pig 会创建一个名为“group”的新键,并将与该键匹配的所有元组放入一个包中,并将包与该键相关联。所以在分组数据的分组操作模式之后将类似于

raw = load '$input' using PigStorage('\u0001') as (id1:int, name:chararray);
groupdata1 = group raw by (id1,name);  
describe groupdata1;
{group: (id1: int,name: chararray),raw: {(id1: int,name: chararray)}}

您尝试访问的第 4 行中的“GROUP”是最后一条语句中模式的属性之一。这些属性名称区分大小写。它会产生错误,说它在架构中不存在。所以你只需要使用“组”来访问它。