2017-12-14 23 views
0

나는 apache pig를 처음 사용합니다. 다음과 같은 데이터가 있습니다.Apache Pig에서 동일한 관계의 행 (항목)을 병합하는 방법

tempdata = 

(linsys4f-PORT42-0211201516244460,dnis=3007047505) 
(linsys4f PORT42-0211201516244460,incoming_tfn=8778816235,tfn_location=Ashburn Avaya,ivr_location=Ashburn Avaya,state=NC) 
(linsys4f-PORT42-0211201516244460,language=ENGLISH) 
(linsys4f-PORT42-0211201516244460,outcome=Transfer to CSR,exitType=Transfer,exitState=SETDIR2^7990019) 
(linsys4f-PORT43-0211201516245465,outcome=Transfer to CSR,exitType=Transfer,exitState=SETDIR2^7990019) 
(linsys4f-PORT44-0211201516291287,dnis=3007047505) 
(linsys4f-PORT44-0211201516291287,incoming_tfn=8778816235,tfn_location=Ashburn Avaya,ivr_location=Ashburn Avaya,state=NC) 

나는 PORT42-0211201516244460, linsys4f-PORT43-0211201516245465 & linsys4f-PORT44-0211201516291287. insys4f-이며, 출력이 좋아해야하는 키에 따라 행을 병합해야합니다

(linsys4f-PORT42-0211201516244460,dnis=3007047505,incoming_tfn=8778816235,tfn_location=Ashburn Avaya,ivr_location=Ashburn Avaya,state=NC,language=ENGLISH,outcome=Transfer to CSR,exitType=Transfer,exitState=SETDIR2^7990019) 

(linsys4f-PORT43-0211201516245465,dnis=3007047505,incoming_tfn=8778816235,tfn_location=Ashburn Avaya,ivr_location=Ashburn Avaya,state=NC,language=SPANISH) 

(linsys4f-PORT43-0211201516245465,outcome=Transfer to CSR,exitType=Transfer,exitState=SETDIR2^7990019,dnis=3007047505,incoming_tfn=8778816235,tfn_location=Ashburn Avaya,ivr_location=Ashburn Avaya,state=NC). 

내가이 어떻게 병합 할 수 있습니다. 어떤 도움을 주셔서 감사합니다.

답변

1

것은이 문제를 해결하기 위해 Group BY 운영자 및 Flatten을 사용해보십시오 : 나는 clearrer 사진에 대한 링크, 포트 이름, 포트 ID로 첫 번째 필드를 분리 한

A = LOAD '/home/coe_user_1/del/data.txt' USING PigStorage(',') AS 
(port : CHARARRAY, dnis : CHARARRAY, incoming_tfn : CHARARRAY, tfn_location : CHARARRAY, ivr_location : CHARARRAY,state : CHARARRAY, language : CHARARRAY, outcome : CHARARRAY, exitType : CHARARRAY, exitState : CHARARRAY); 

B = FOREACH A GENERATE 
     FLATTEN(STRSPLIT(port, '-', 3)) as (link: chararray, port: chararray, pid: int), 
     dnis AS dnis, 
     incoming_tfn AS incoming_tfn, 
     tfn_location AS tfn_location, 
     ivr_location AS ivr_location, 
     state AS state, 
     language AS language, 
     outcome AS outcome, 
     exitType AS exitType, 
     exitState AS exitState; 
C = FOREACH B GENERATE 
     port AS port, 
     --pid AS pid, 
     dnis AS dnis, 
     incoming_tfn AS incoming_tfn, 
     tfn_location AS tfn_location, 
     ivr_location AS ivr_location, 
     state AS state, 
     language AS language, 
     outcome AS outcome, 
     exitType AS exitType, 
     exitState AS exitState; 
D = GROUP C BY port; 

E = FOREACH D GENERATE 
     group AS port,FLATTEN(BagToTuple(C.dnis)) AS dnis, FLATTEN(BagToTuple(C.incoming_tfn)) AS incoming_tfn, FLATTEN(BagToTuple(C.tfn_location)) AS tfn_location, FLATTEN(BagToTuple(C.ivr_location)) AS ivr_location ,FLATTEN(BagToTuple(C.state)) AS state,FLATTEN(BagToTuple(C.language)) AS language, FLATTEN(BagToTuple(C.outcome)) AS outcome,FLATTEN(BagToTuple(C.exitType)) AS exitType,FLATTEN(BagToTuple(C.exitState)) AS exitState ; 

DUMP E; 

출력 :

(PORT42,outcome=Transfer to CSR,language=ENGLISH,incoming_tfn=8778816235,dnis=3007047505,exitType=Transfer,,tfn_location=Ashburn Avaya,,exitState=SETDIR2^7990019,,ivr_location=Ashburn Avaya,,,,state=NC,,,,,,,,,,,,,,,,,,,,,) 
(PORT43,outcome=Transfer to CSR,exitType=Transfer,exitState=SETDIR2^7990019,,,,,,) 
(PORT44,incoming_tfn=8778816235,dnis=3007047505,tfn_location=Ashburn Avaya,,ivr_location=Ashburn Avaya,,state=NC,,,,,,,,,,,)