2017-05-09 15 views
0

환경을 줄이고 점화지도에 폐쇄되었다 :연결 작업을

점화 서버 : 커널

centos6.5 2.6.32-431.el6.x86_64

점화 버전 1.9

hadoop 버전 2.6.2

각각 3 개의 서버 노드에 '-Xms16g -Xmx16g -server -XX : + AggressiveOpts -XX : MaxMetas paceSize = 256m '시작시 설정

점화 맵 감소로지도 테스트 작업을 실행합니다. 직업은 단순히 각 사람들의 평균 숫자를 얻는 것입니다. 0.35

톰 0.78

에게

잭 릴리 0.92

잭 0.28

톰 0.18

... 처음에

, I : 데이터는 같다 100M 라인의 데이터 세트를 생성했습니다. 약 2.53GB입니다. 작업이 약 30 초 후에 올바르게 완료되었습니다. 그런 다음 약 25.3GB의 10 억 개의 데이터 세트를 생성했습니다. 작업은 항상 예외로 인해 실패했습니다. 나는 여러 번 시도했지만 같은 결과를 보였다.

Configuration configuration = new Configuration(); 
configuration.set(MRConfig.FRAMEWORK_NAME, IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME); 
configuration.set(MRConfig.MASTER_ADDRESS, "172.31.68.202:11211"); 
configuration.set("fs.igfs.impl", "org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem"); 
configuration.set("fs.default.name", "igfs://[email protected]/"); 

I 노드 상태를 확인 :

java.io.IOException: Failed to get job status: job_1fbf9083-9a44-4be9-9199-695a97652dc2_0002 
    at org.apache.ignite.internal.processors.hadoop.impl.proto.HadoopClientProtocol.getJobStatus(HadoopClientProtocol.java:197) 
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326) 
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) 
    at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323) 
    at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:611) 
    at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1357) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1318) 
    at com.tscloud.sdk.test.ignite.MRTest.run(MRTest.java:81) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
    at com.tscloud.sdk.test.ignite.MRTest.main(MRTest.java:53) 
Caused by: class org.apache.ignite.internal.client.impl.connection.GridClientConnectionResetException: Failed to perform request (connection failed): /172.31.68.204:11211 
    at org.apache.ignite.internal.client.impl.connection.GridClientConnection.getCloseReasonAsException(GridClientConnection.java:491) 
    at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:339) 
    at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:299) 
    at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onDisconnected(GridClientConnectionManagerAdapter.java:630) 
    at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionClosed(GridNioFilterChain.java:253) 
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93) 
    at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionClosed(GridNioCodecFilter.java:70) 
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93) 
    at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3005) 
    at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:147) 
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2306) 
    at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:929) 
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2026) 
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1863) 
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1568) 
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
    at java.lang.Thread.run(Thread.java:745) 

작업 구성은 다음과 같습니다 : 작업 클라이언트는 다음 예외를 던졌다

[15:06:56,804][ERROR][sys-#2740%null%][GridTcpRestProtocol] Failed to process client request [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [[email protected], idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]], msg=GridClientTaskRequest [taskName=o.a.i.i.processors.hadoop.proto.HadoopProtocolJobStatusTask, arg=HadoopProtocolTaskArguments []]] 
class org.apache.ignite.IgniteCheckedException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [[email protected], idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]] 
    at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7239) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:170) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119) 
    at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:264) 
    at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:261) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:228) 
    at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:261) 
    at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:229) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332) 
    at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:158) 
    at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:155) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332) 
    at org.apache.ignite.internal.util.future.GridFutureChainListener.applyCallback(GridFutureChainListener.java:78) 
    at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:70) 
    at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:30) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332) 
    at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:294) 
    at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:257) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389) 
    at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355) 
    at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1579) 
    at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1547) 
    at org.apache.ignite.internal.processors.task.GridTaskWorker.reduce(GridTaskWorker.java:1157) 
    at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:942) 
    at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:996) 
    at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1221) 
    at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222) 
    at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:850) 
    at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:108) 
    at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:790) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.io.IOException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [[email protected], idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]] 
    at org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:554) 
    at org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:494) 
    at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3036) 
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118) 
    at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:94) 
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118) 
    at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:264) 
    at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:189) 
    at org.apache.ignite.internal.util.nio.GridNioSessionImpl.send(GridNioSessionImpl.java:108) 
    at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:258) 
    ... 40 more 

:

은 Ignite 서버 노드 아래 예외를 던졌다 ignitevisorcmd.sh를 사용하여 작업을 실패한 후. 모든 서버 노드는 정상이지만 노드 서버가 다운 된 경우도있었습니다. 왜 이렇게 행동했는지 나는 몰랐다.

도움을 주시면 감사하겠습니다.

편집 (2017년 5월 16일는) I는 하둡 코어 site.xml의 변경 및 I는 HDFS를 포맷이어서

<property> 
    <name>hadoop.tmp.dir</name> 
    <value>/data/hadoop/hadoop-2.6.2/tmp</value> 
    </property> 

아래와 같이 hadoop.tmp.dir 속성을 추가하고 25.3GB 데이터를 업로드 파일. 테스트를 성공적으로 실행합니다. 내 hdfs가 잘못된 것으로 밝혀졌습니다. 재구성 namenode 문제를 해결합니다.

위 단계 전에 VisualVM에서 jvm 힙 사용을 확인해 보았습니다.

One of the server node visualvm monitor snapshot

답변

0

노드가 일시적으로 다운 된 경우,이 작업이 다른 노드로 오버 실패하거나 다른 노드가없는 경우 모두 실패 할 것이 합리적이다. 네트워크가 리셋되거나 다운되지 않았는지 확인합니다.또한 노드간에 소프트웨어 방화벽이 있는지 확인해야합니다 (운영 체제 방화벽을 사용하지 않는 것이 가장 좋습니다).

+0

답장을 보내 주셔서 감사합니다. 네트워크는 괜찮습니다. 노드 중 하나가 다운 될 때마다 발생하지 않았습니다. 작은 데이터 세트의 경우 작업이 항상 올바르게 완료됩니다. –

+0

메모리가 부족하지 않습니까? – Dmitriy

+0

No. 데이터 파일은 약 25.3GB이고 총 서버 노드 jvm 힙은 16GB 인 48GB입니다. –