1

Google Cloud Dataproc을 통해 돼지 작업을 제출하고 Pig 스크립트에서 사용하는 사용자 정의로드 기능을 구현하는 사용자 정의 jar를 포함하려고합니다. 나는 그것을하는 방법을 찾을 수 없습니다.Google Cloud Dataproc에서 Pig 작업 제출은 Pig 클래스 경로에 사용자 정의 jar를 추가하지 않습니다.

UI를 통해 사용자 정의 jar를 추가하는 것은 돼지 클래스 경로에 추가하지 않습니다.

여기 내 클래스를 찾을 실패 보여, 돼지 작업의 출력입니다 : 돼지 스크립트가 문제를 해결 안에

17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 
17/03/29 16:12:21 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0 (r: unknown) compiled Nov 27 2016, 23:14:51 
2017-03-29 16:12:21,961 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log 
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 
2017-03-29 16:12:22,379 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 
2017-03-29 16:12:22,379 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aspen-dp-central-m 
2017-03-29 16:12:22,404 [main] INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS version: 1.6.0-hadoop2 
2017-03-29 16:12:22,890 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-e53a2851-efe5-4e74-bf33-89dfe0733386 
2017-03-29 16:12:22,890 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false 
2017-03-29 16:12:23,247 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Failed to parse: Pig script failed to parse: 
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199) 
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1819) 
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1527) 
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:460) 
    at org.apache.pig.PigServer.executeBatch(PigServer.java:485) 
    at org.apache.pig.PigServer.executeBatch(PigServer.java:471) 
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:172) 
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:742) 
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206) 
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) 
    at org.apache.pig.Main.run(Main.java:532) 
    at org.apache.pig.Main.main(Main.java:176) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
Caused by: 
<line 8, column 13> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1339) 
    at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1324) 
    at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5184) 
    at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3515) 
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625) 
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) 
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) 
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) 
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191) 
    ... 19 more 
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:671) 
    at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1336) 
    ... 27 more 
2017-03-29 16:12:23,251 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve com.turner.pig.load.HBaseMultiScanLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Details at logfile: /tmp/cb3b0696-3f30-4db4-a6a7-bb716d2a8a89/pig_1490803941959.log 
2017-03-29 16:12:23,269 [main] INFO org.apache.pig.Main - Pig script completed in 1 second and 477 milliseconds (1477 ms) 
Job output is complete 

답변

1

사용자 정의 항아리를 등록. 그래서, 기본적으로 :

  1. 는 Google 저장 용량에 내 jar 파일을 추가
  2. 등록 된 스크립트 내부의 항아리
  3. 제출 돼지 작업 중 하나를 아래 UI 또는 명령 줄을 통해 :

gcloud를 통해 Dataproc 작업 submit pig - cluster eduboom-central --file custom.pig - jars = gs : //eduboom-dataproc/custom/eduboom.jar

custom.pig :

register eduboom.jar; 
raw = LOAD 'hbase://eduboom_table' 
    USING com.eduboom.pig.load.HBaseMultiScanLoader('2017-03-30T14:00Z_00', '2017-03-30T14:01Z_25', 'cf:*') 
    AS (key:chararray, data); 
DUMP raw;