2017-03-25 3 views
1

나는이 두 등급Pig- 수 없습니다

영화 데이터가 내 스크립트 같습니다

UserID#MovieID#Ratings#RatingsTimestamp 
1#1193#5#978300760 
1#661#3#978302109 
1#914#3#978301968 

같은 데이터 보인다

같은
MovieID#Title#Genre 
1#Toy Story (1995)#Animation|Children's|Comedy 
2#Jumanji (1995)#Adventure|Children's|Fantasy 
3#Grumpier Old Men (1995)#Comedy|Romance 

등급을 보이는위한 영화 세트 하나와 다른 따르기 :

1) movies_data = LOAD '/user/admin/MoviesDataset/movies_new.dat' USING PigStorage('#') AS (movieid:int, 
    moviename:chararray,moviegenere:chararray); 

    2) ratings_data = LOAD '/user/admin/RatingsDataset/ratings_new.dat' USING PigStorage('#') AS (Userid:int, 
    movieid:int,ratings:int,timestamp:long); 

    3) moviedata_ratingsdata_join = JOIN movies_data BY movieid, ratings_data BY movieid; 

    4) moviedata_ratingsdata_join_group = GROUP moviedata_ratingsdata_join BY movies_data.movieid; 

    5) moviedata_ratingsdata_averagerating = FOREACH moviedata_ratingsdata_join_group GENERATE group, 
    AVG(moviedata_ratingsdata_join.ratings) AS Averageratings, (moviedata_ratingsdata_join.Userid) AS userid; 

    6) DUMP moviedata_ratingsdata_averagerating; 

제거 라인 (6), 스크립트가 왜 5 호선에서 발생 관계를 덤프 할 수

성공적으로 실행하는 경우이 오류

2017-03-25 06:46:50,332 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: moviedata_ratingsdata_join_group: Local Rearrange[tuple]{int}(false) - scope-95 Operator Key: scope-95): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: moviedata_ratingsdata_averagerating: New For Each(false,false)[bag] - scope-83 Operator Key: scope-83): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1,Toy Story (1995),Animation|Children's|Comedy), 2nd :(2,Jumanji (1995),Adventure|Children's|Fantasy) (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar") 

는 무엇입니까?

답변

2

사용 disambiguate operator (::는) JOIN, COGROUP, CROSS, 또는 FLATTEN 운영자 후 필드 이름을 식별합니다.

관계 및 ratings_data 모두 movieid입니다. 관계 moviedata_ratingsdata_join_group을 만들 때 :: 연산자를 사용하여 movieidGROUP에 사용할 열을 식별합니다.

그래서 4)과 같을 것이다,

4) moviedata_ratingsdata_join_group = GROUP moviedata_ratingsdata_join BY movies_data::movieid;