파이썬 mapreduce에서 출력 레코드와 일치하지 않는 입력 레코드

파이썬에서 맵 축소 프로그램을 작성하고 있습니다.파이썬 mapreduce에서 출력 레코드와 일치하지 않는 입력 레코드

cat input.csv|python mapper.py > output.tsv

을하지만 난 그것을 아래 명령을 사용하여 실행하면, 내가 원하는 출력하지 않습니다 - - 매퍼 내가 그것을 사용하여 실행할 때 완벽하게 잘 작동

nohup hadoop jar /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/tools/lib/hadoop-streaming-2.7.0-mapr-1607.jar -Dmapreduce.job.queuename=queue_name -Dmapred.map.tasks=1000 -Dmapred.reduce.tasks=0 -input /path/sample_reduce.csv -output /path/map_output -mapper "mapper_try.py" -reducer NONE -file mapper_try.py > mapp_try2.out &

그것은 작업 말한다을 성공적으로 완료되었지만 다음이 표시됩니다. -

Map-Reduce Framework 
      Map input records=1096 
      Map output records=92 
      Input split bytes=122610 
      Spilled Records=0 
      Failed Shuffles=0 
      Merged Map outputs=0 
      GC time elapsed (ms)=0 
      CPU time spent (ms)=840560 
      Physical memory (bytes) snapshot=353314721792 
      Virtual memory (bytes) snapshot=4310996582400 
      Total committed heap usage (bytes)=2036214005760

출력 레코드와 일치하지 않는 입력 레코드의 문제를 해결할 수 없습니다. 출력 파일은 모두 만들어지며 하나의 행이 각각 92 개의 파일이 있지만 나머지 파일은 비어 있습니다. 도움을 요청하십시오. 미리 감사드립니다 ...

출처

2017-05-06 Akshat Agrawal

이 시도하지 :

-mapper "python /path/to/mapper_try.py"

대신 :

-mapper "mapper_try.py"

출처

2017-05-06 10:44:57 MaxU

시도, 어떤 성공을 –

파이썬 mapreduce에서 출력 레코드와 일치하지 않는 입력 레코드

답변

관련 문제