HDFS/tmp/input에 이미 업로드 된 입력 파일이 있습니다 (인쇄되지 않는 문자 인^A로 구분됨,이 VI의 뷰입니다)하둡 스트리밍 파이썬 간단한 예가 작동하지 않습니다.
import sys
for line in sys.stdin:
name, score = line.strip().split(chr(1))
print '\t'.join([name, str(int(score)+1)])
감속기는 다음과 같습니다 (similar to을) :
import sys
from datetime import datetime
def calc(inputList):
return min(inputList)
def main():
current_key = None
value_list = []
key = None
value = None
result = None
for line in sys.stdin:
try:
line = line.strip()
key, value = line.split('\t', 1)
try:
value = eval(value)
except:
continue
if current_key == key:
value_list.append(value)
else:
if current_key:
try:
result = str(calc(value_list))
except:
pass
print '%s\t%s' % (current_key, result)
value_list = [value]
current_key = key
except:
pass
print '%s\t%s' % (current_key, str(calc(value_list)))
if __name__ == '__main__':
main()
,691,363
A^A10
A^A7
A^A10
A^A5
A^A10
A^A8
B^A1
A^A9
B^A1
A^A9
B^A1
A^A9
B^A1
A^A9
B^A1
A^A9
B^A1
A^A9
나는 매퍼는 다음과 같습니다 썼다 (210)
나는 쉘에서 맵퍼 및 감속기를 테스트하고 나를 위해 작동합니다
$ cat input | python mapper.py | sort -t$'\t' -k1 | python reducer.py
A 6
B 2
하지만 하둡 스트리밍 사용하여 구현 실패 :
13/10/07 15:59:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/07 15:59:02 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/07 15:59:02 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-a59347/mapred/local]
13/10/07 15:59:02 INFO streaming.StreamJob: Running job: job_201309301959_0089
13/10/07 15:59:02 INFO streaming.StreamJob: To kill this job, run:
13/10/07 15:59:02 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=url1:8021 -kill job_201309301959_0089
13/10/07 15:59:02 INFO streaming.StreamJob: Tracking URL: http://url1:50030/jobdetails.jsp?jobid=job_201309301959_0089
13/10/07 15:59:03 INFO streaming.StreamJob: map 0% reduce 0%
13/10/07 15:59:10 INFO streaming.StreamJob: map 50% reduce 0%
13/10/07 16:00:10 INFO streaming.StreamJob: map 100% reduce 0%
13/10/07 16:00:26 INFO streaming.StreamJob: map 100% reduce 1%
13/10/07 16:00:32 INFO streaming.StreamJob: map 100% reduce 2%
13/10/07 16:00:37 INFO streaming.StreamJob: map 100% reduce 100%
13/10/07 16:00:37 INFO streaming.StreamJob: To kill this job, run:
13/10/07 16:00:37 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=url1:8021 -kill job_201309301959_0089
13/10/07 16:00:37 INFO streaming.StreamJob: Tracking URL: http://url1:50030/jobdetails.jsp?jobid=job_201309301959_0089
13/10/07 16:00:37 ERROR streaming.StreamJob: Job not successful. Error: NA
13/10/07 16:00:37 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
:
이
/usr/bin/hadoop
jar /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar
-file mapper.py
-mapper mapper.py
-file reducer.py
-reducer reducer.py
-input /tmp/input
-output /tmp/output
오류 출력은 다음과 같습니다를
어디서 잘못 생각하나요?
어떻게 실패합니까? '/ usr/bin/hadoop jar ...'명령을 실행할 때 화면에 출력물을 출력 할 수 있습니까? – cabad
@cabad 상기시켜 주셔서 감사합니다, 그게 당신이 필요로하는 것입니까? –