2017-04-11 8 views
1

YARN에 스파크 응용 프로그램을 제출하는 동안 컨테이너에 대해 아래 오류가 표시됩니다. HADOOP (2.7.3)/SPARK (2.1) 환경은 단일 노드 클러스터에서 가상 분산 모드를 실행 중입니다. 응용 프로그램은 로컬 모델에서 실행되도록 만들었지 만 YARN을 RM으로 사용하여 클러스터 모드에서 정확성을 검사하고 일부로드 블록을 치려고 할 때 완벽하게 작동합니다. 그러므로이 세상에 새로운 도움을 구하십시오.org.apache.spark.rpc.RpcTimeoutException : [120 초] 후에 선물 시간이 초과되었습니다. 이 제한 시간은 spark.rpc.lookupTimeout에 의해 제어됩니다.

--- 응용 프로그램 로그

2017-04-11 07:13:28 INFO Client:58 - Submitting application 1 to ResourceManager 
2017-04-11 07:13:28 INFO YarnClientImpl:174 - Submitted application application_1491909036583_0001 to ResourceManager at /0.0.0.0:8032 
2017-04-11 07:13:29 INFO Client:58 - Application report for application_1491909036583_0001 (state: ACCEPTED) 
2017-04-11 07:13:29 INFO Client:58 - 
    client token: N/A 
    diagnostics: N/A 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1491909208425 
    final status: UNDEFINED 
    tracking URL: http://ip-xxx.xx.xx.xxx:8088/proxy/application_1491909036583_0001/ 
    user: xxxx 
2017-04-11 07:13:30 INFO Client:58 - Application report for application_1491909036583_0001 (state: ACCEPTED) 
2017-04-11 07:13:31 INFO Client:58 - Application report for application_1491909036583_0001 (state: ACCEPTED) 
2017-04-11 07:13:32 INFO Client:58 - Application report for application_1491909036583_0001 (state: ACCEPTED) 
2017-04-11 07:17:37 INFO Client:58 - Application report for application_1491909036583_0001 (state: FAILED) 
2017-04-11 07:17:37 INFO Client:58 - 
    client token: N/A 
    diagnostics: Application application_1491909036583_0001 failed 2 times due to AM Container for appattempt_1491909036583_0001_000002 exited with exitCode: 10 
For more detailed output, check application tracking page:http://"hostname":8088/cluster/app/application_1491909036583_0001Then, click on links to logs of each attempt. 
Diagnostics: Exception from container-launch. 
Container id: container_1491909036583_0001_02_000001 
Exit code: 10 
Stack trace: ExitCodeException exitCode=10: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) 
    at org.apache.hadoop.util.Shell.run(Shell.java:479) 
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) 
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 

**** --- 컨테이너 로그 ****

2017-04-11 07:13:30 INFO ApplicationMaster:47 - Registered signal handlers for [TERM, HUP, INT] 
2017-04-11 07:13:31 INFO ApplicationMaster:59 - ApplicationAttemptId: appattempt_1491909036583_0001_000001 
2017-04-11 07:13:32 INFO SecurityManager:59 - Changing view acls to: root,xxxx 
2017-04-11 07:13:32 INFO SecurityManager:59 - Changing modify acls to: root,xxxx 
2017-04-11 07:13:32 INFO SecurityManager:59 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, xxxx); users with modify permissions: Set(root, xxxx) 
2017-04-11 07:13:32 INFO Slf4jLogger:80 - Slf4jLogger started 
2017-04-11 07:13:32 INFO Remoting:74 - Starting remoting 
2017-04-11 07:13:32 INFO Remoting:74 - Remoting started; listening on addresses :[akka.tcp://[email protected]:45446] 
2017-04-11 07:13:32 INFO Remoting:74 - Remoting now listens on addresses: [akka.tcp://[email protected]:45446] 
2017-04-11 07:13:32 INFO Utils:59 - Successfully started service 'sparkYarnAM' on port 45446. 
2017-04-11 07:13:32 INFO ApplicationMaster:59 - Waiting for Spark driver to be reachable. 
2017-04-11 07:13:32 INFO ApplicationMaster:59 - Driver now available: xxx.xx.xx.xxx:47503 
2017-04-11 07:15:32 ERROR ApplicationMaster:96 - Uncaught exception: 
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout 
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225) 
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242) 
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:98) 
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:116) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.runAMEndpoint(ApplicationMaster.scala:279) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:473) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:315) 
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:157) 
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:625) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) 
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:623) 
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:646) 
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:241) 
    ... 16 more 
2017-04-11 07:15:32 INFO ApplicationMaster:59 - Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout) 
2017-04-11 07:15:32 INFO ShutdownHookManager:59 - Shutdown hook called 
실패시

--Yarn 노드 관리자 로그

2017-04-11 07:15:18,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30015 for container-id container_1491909036583_0001_01_000001: 201.6 MB of 1 GB physical memory used; 2.3 GB of 4 GB virtual memory used 
2017-04-11 07:15:21,735 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30015 for container-id container_1491909036583_0001_01_000001: 201.6 MB of 1 GB physical memory used; 2.3 GB of 4 GB virtual memory used 
2017-04-11 07:15:24,742 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30015 for container-id container_1491909036583_0001_01_000001: 201.6 MB of 1 GB physical memory used; 2.3 GB of 4 GB virtual memory used 
2017-04-11 07:15:27,749 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30015 for container-id container_1491909036583_0001_01_000001: 201.6 MB of 1 GB physical memory used; 2.3 GB of 4 GB virtual memory used 
2017-04-11 07:15:30,756 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 30015 for container-id container_1491909036583_0001_01_000001: 201.6 MB of 1 GB physical memory used; 2.3 GB of 4 GB virtual memory used 
2017-04-11 07:15:33,018 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1491909036583_0001_01_000001 is : 10 
2017-04-11 07:15:33,019 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1491909036583_0001_01_000001 and exit code: 10 
ExitCodeException exitCode=10: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) 

- SparkCOntext 문제를보고 정지 할 때까지 언급 한 바와 같이 매개 변수

<!-- Spark Configuration --> 
<bean id="sparkInfo" class="SparkInfo"> 
    <property name="appName" value="framework"></property> 
    <property name="master" value="yarn-client"></property> 
    <property name="dynamicAllocation" value="false"></property> 
    <property name="executorInstances" value="2"></property> 
    <property name="executorMemory" value="1g"></property> 
    <property name="executorCores" value="4"></property> 
    <property name="executorCoresMax" value="2"></property> 
    <property name="taskCpus" value="4"></property> 
    <property name="executorClassPath" value="/usr/hadoop/hadoop-2.7.3/share/hadoop/yarn/lib/*"></property> 
    <property name="yarnJar" 
     value="${framework.hdfsURI}/app/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar"></property> 
    <property name="yarnQueue" value="default"></property> 
    <property name="memoryFraction" value="0.4"></property> 
</bean> 

sparks.default.conf

spark.driver.memory    1g 
spark.executor.extraJavaOptions -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=256m -XX:CompressedClassSpaceSize=256m 
spark.rpc.lookupTimeout   600s 

원사를 site.xml

<!-- Site specific YARN configuration properties --> 
    <property> 
     <name>yarn.nodemanager.aux-services</name> 
     <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
    <name>yarn.scheduler.minimum-allocation-mb</name> 
    <value>1024</value> 
    </property> 
    <property> 
    <name>yarn.scheduler.maximum-allocation-mb</name> 
    <value>3096</value> 
    </property> 
    <property> 
    <name>yarn.nodemanager.resource.memory-mb</name> 
    <value>3096</value> 
    </property> 
    <property> 
    <name>yarn.nodemanager.vmem-pmem-ratio</name> 
    <value>4</value> 
    </property> 
</configuration> 
+1

'spark.network.timeout'을'SparkConf'에서 더 높은 값으로 설정하려고합니다 (예 :'200s'). – himanshuIIITian

답변

1

당신은 spark.network.timeout 증가 유지할 수 있습니다 코멘트에 himanshuIIITian에 의해.
spark의 작업량이 많은 경우 시간 초과 예외가 발생할 수 있습니다. 만약 당신이 낮은 실행 메모리를 가지고 있다면, GC는 시스템을 매우 바쁜 상태로 유지시켜 작업 부하를 증가시킬 수 있습니다. 메모리 부족 오류가 있으면 로그를 조사하십시오. spark.executor.extraJavaOptions에서 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps을 활성화하고 작업이 완료되기 전에 전체 GC가 여러 번 호출되는 경우 로그를 조사하십시오. 그렇다면 executorMemory을 늘리십시오. 그것은 당신의 문제를 해결할 수있을 것입니다.

+0

동일한 스파크 애플리케이션이 동적 할당이 활성화 된 Spark 독립 실행 형에서 올바르게 작동합니다. 나는 원사를 마스터로 바꿀 때만, 나는이 문제에 직면 해있다. – Sparknewbie

+1

또한 spark.network.timeout 매개 변수를 600으로 업데이트했지만 예외는 여전히 "선물은 [120 초] 후 시간 초과되었습니다"라고 계속 말합니다. 내가 본 유일한 정보는 2017-04-13 15 : 34 : 51,370 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor : 컨테이너 container_1492111885369_0001_01_000001의 종료 코드 : 10 2017-04-13 15 : 34 : 51,371 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor : 컨테이너 ID가 container-1492111885369_0001_01_000001이고 종료 코드가 10 인 컨테이너 실행에서 예외가 발생했습니다. ExitCodeException exitCode = 10 : – Sparknewbie