스파크 작업 내가 실행하고 있습니다 :Neo4j : 세션에 대한 작업자 .. 크래싱. 자바 힙 공간 OutOfMemoryError가
그것은 자바에서 스칼라로 변환 '병렬'한 아주 간단한 프로그램입니다 (이것은 병렬로 실행되도록 의도되지하지만에 실험이다) spark와 neo4j를 배우고 b) 더 많은 작업을하는 더 많은 노드를 가진 스파크 클러스터에서 실행하여 속도 향상을 얻을 수 있는지 확인하십시오. 그 이유는 큰 bottle neck이 neo4j cypher 스크립트 (withinDistance 호출) 내의 공간 호출이기 때문입니다. 테스트 데이터 세트는 52,000 개의 노드와 약 140MB의 데이터베이스 크기로 매우 작습니다. neo4j가 시작 내가 그 파일을 열라고 생각하기 때문에 그것이 나에게 이상하다
Starting Neo4j.
WARNING: Max 4096 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
/usr/share/neo4j/bin/neo4j: line 411: /var/run/neo4j/neo4j.pid: No such file or directory
의 경고를주고 내가 방법 높은 것과을 설정하는 시스템 관리자에게 물었다 또한
? (ulimit -Hn
이걸 확인하는 것 같아요? ulimit -a는 4096 (softlimit)에서 열린 파일을 보여 주지만 neo4j가 보는 것과 같아요.
또한 Mac OS X에서 로컬로 실행했습니다. 소프트웨어를 실행하고 약 14 시간 정도 (어쩌면 9)를 실행하고 나서 데이터베이스가 스파크와 대화를 멈추는 콘솔에서 볼 수 있습니다. 그것은 다운되지 않았거나 작업이 시간 초과 될 것이고 나는 여전히 데이터베이스에 cypher-shell 할 수 있습니다. 그러나 그것은 어쨌든 스파크 일에 연결을 잃어 버렸기 때문에 그들이 시도 할 것이고 마침내 스파크 제출은 포기하고 멈출 것입니다.
C02RH2U9G8WM:scala-2.11 little.mac$ ulimit -Hn
unlimited
작업의 일부 코드 비트 (스칼라 이식 된 코드를 사용하여 (마지막 편집 난 이제 힙 크기에 대한 4 기가 바이트 최대 메모리와 neo4j의 conf 더 내 한계를 올렸다 또한 이후) spark 데이터 프레임을 추가했습니다. 제대로 병행되지 않았지만 앞으로 누르기 전에 작동 할 것으로 기대하고있었습니다.). 나는 자바의 코드와 비슷한 하이브리드 프로그램을 구축하고 있었지만 (neo4j에 연결된) 스파크에서 데이터 프레임을 사용했다.
기본적으로 (의사 코드) :
while (going through all these lat and lons)
{
def DoCalculation()
{
val noBbox="call spatial.bbox('geom', {lat:" + minLat +",lon:"+minLon +"}, {lat:"+maxLat+",lon:" + maxLon +"}) yield node return node.altitude as altitude, node.gtype as gtype, node.toDateFormatLong as toDateFormatLong, node.latitude as latitude, node.longitude as longitude, node.fromDateFormatLong as fromDateFormatLong, node.fromDate as fromDate, node.toDate as toDate ORDER BY node.toDateFormatLong DESC";
try {
//not overly sure what the partitions and batch are really doing for me.
val initialDf2 = neo.cypher(noBbox).partitions(5).batch(10000).loadDataFrame
val theRow = initialDf2.collect() //was someStr
for(i <- 0 until theRow.length){
//do more calculations
var radius2= 100
//this call is where the biggest bottle neck is,t he spatial withinDistance is where i thought
//I could put this code ons park and make the calls through data frames and do the same long work
//but by batching it out to many nodes would get more speed gains?
val pointQuery="call spatial.withinDistance('geom', {lat:" + lat + ",lon:"+ lon +"}, " + radius2 + ") yield node, distance WITH node, distance match (node:POINT) WHERE node.toDateFormatLong < " + toDateFormatLong + " return node.fromDateFormatLong as fromDateFormatLong, node.toDateFormatLong as toDateFormatLong";
try {
val pointResults = neo.cypher(pointQuery).loadDataFrame; //did i need to batch here?
var prRow = pointResults.collect();
//do stuff with prRow loadDataFrame
} catch {
case e: Exception => e.printStackTrace
}
//do way more stuff with the data just in some scala/java datastructures
}
} catch {
case e: Exception => println("EMPTY COLLECTION")
}
}
}
나는 내가 neo4j는 것을 알고 /var/log/neo4j/neo4j.log
java.lang.OutOfMemoryError: Java heap space
2017-12-27 03:17:13.969+0000 ERROR Worker for session '13662816-0a86-4c95-8b7f-cea9d92440c8' crashed. Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:88)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109)
2017-12-27 03:17:23.244+0000 ERROR Worker for session '75983e7c-097a-4770-bcab-d63f78300dc5' crashed. Java heap space
java.lang.OutOfMemoryError: Java heap space
에서 이러한 오류를 얻을 Neo4j에 연결하기 위해 스파크 커넥터를 useses 불꽃 제출 작업을 실행 .conf 파일 나는 heapsizes (현재 주석 처리되었지만 512m로 설정 됨)를 변경할 수 있습니다. 내가 묻는 것은 conf 파일에서 말하는 것입니다 :
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
그렇다면 내가 설정할 수있는 것 이상으로 계산되면 혼자서 heapsizes int int conf로 남겨 두어야하지 않습니까? (이 기계는 8 코어와 8GB 램을 가짐). 아니면 구체적으로 설정하는 것이 정말 도움이 될까요? 어쩌면 2000 년까지 (메가 바이트 단위로), 2 회의 공연을 할 수 있을까요? 오류 로그 파일에 메모리 오류가 발생했다고 생각하기 때문에 물어 보지만 실제로는 다른 이유로 인해 발생합니다.
디버그에서 내 jvm 값을 편집하십시오.그냥이 참고로, 나는 아직도 자바 힙 오류를 얻을 수가
2017-12-27 16:17:30.740+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-27 16:17:30.749+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 4.23 GB
2017-12-27 16:17:30.750+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.19 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Free memory: 1.89 GB
2017-12-27 16:17:30.751+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 1.95 GB
2017-12-27 16:17:30.752+0000 INFO [o.n.k.i.DiagnosticsManager] Max memory: 1.95 GB
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-27 16:17:30.777+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.89 MB, max=240.00 MB, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.42 MB, max=-1.00 B, threshold=0.00 B
2017-12-27 16:17:30.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=105.00 MB, used=59.00 MB, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=0.00 B, used=0.00 B, max=-1.00 B, threshold=?
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=1.85 GB, used=0.00 B, max=1.95 GB, threshold=0.00 B
2017-12-27 16:17:30.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-27 16:17:30.780+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-27 16:17:30.781+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: [email protected]
2017-12-27 16:17:30.785+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-27 16:17:30.814+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-27 16:17:30.815+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-Xms2000m, -Xmx2000m, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-27 16:17:30.816+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:
:
2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] NETWORK
2017-12-26 16:24:06.768+0000 INFO [o.n.k.i.DiagnosticsManager] System memory information:
2017-12-26 16:24:06.771+0000 INFO [o.n.k.i.DiagnosticsManager] Total Physical memory: 7.79 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Free Physical memory: 5.49 GB
2017-12-26 16:24:06.772+0000 INFO [o.n.k.i.DiagnosticsManager] Committed virtual memory: 5.62 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total swap space: 16.50 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free swap space: 16.49 GB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] JVM memory information:
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Free memory: 85.66 MB
2017-12-26 16:24:06.773+0000 INFO [o.n.k.i.DiagnosticsManager] Total memory: 126.00 MB
2017-12-26 16:24:06.774+0000 INFO [o.n.k.i.DiagnosticsManager] Max memory: 1.95 GB
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Young Generation: [G1 Eden Space, G1 Survivor Space]
2017-12-26 16:24:06.776+0000 INFO [o.n.k.i.DiagnosticsManager] Garbage Collector: G1 Old Generation: [G1 Eden Space, G1 Survivor Space, G1 Old Gen]
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Code Cache (Non-heap memory): committed=4.94 MB, used=4.93 MB, max=240.00 MB, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Metaspace (Non-heap memory): committed=14.38 MB, used=13.41 MB, max=-1.00 B, threshold=0.00 B
2017-12-26 16:24:06.777+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: Compressed Class Space (Non-heap memory): committed=1.88 MB, used=1.64 MB, max=1.00 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Eden Space (Heap memory): committed=39.00 MB, used=35.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Survivor Space (Heap memory): committed=3.00 MB, used=3.00 MB, max=-1.00 B, threshold=?
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Memory Pool: G1 Old Gen (Heap memory): committed=84.00 MB, used=1.34 MB, max=1.95 GB, threshold=0.00 B
2017-12-26 16:24:06.778+0000 INFO [o.n.k.i.DiagnosticsManager] Operating system information:
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Operating System: Linux; version: 3.10.0-693.5.2.el7.x86_64; arch: amd64; cpus: 8
2017-12-26 16:24:06.779+0000 INFO [o.n.k.i.DiagnosticsManager] Max number of file descriptors: 90000
2017-12-26 16:24:06.780+0000 INFO [o.n.k.i.DiagnosticsManager] Number of open file descriptors: 103
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Process id: [email protected]
2017-12-26 16:24:06.782+0000 INFO [o.n.k.i.DiagnosticsManager] Byte order: LITTLE_ENDIAN
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] Local timezone: Etc/GMT
2017-12-26 16:24:06.793+0000 INFO [o.n.k.i.DiagnosticsManager] JVM information:
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Name: OpenJDK 64-Bit Server VM
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Vendor: Oracle Corporation
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] VM Version: 25.151-b12
2017-12-26 16:24:06.794+0000 INFO [o.n.k.i.DiagnosticsManager] JIT compiler: HotSpot 64-Bit Tiered Compilers
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] VM Arguments: [-XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -Djdk.tls.ephemeralDHKeySize=2048, -Dunsupported.dbms.udc.source=rpm, -Dfile.encoding=UTF-8]
2017-12-26 16:24:06.795+0000 INFO [o.n.k.i.DiagnosticsManager] Java classpath:
AFTER :
전에 로그인합니다. 이 기계들은 (단지 제품 개발자가 아님) 각각 8GB가 있습니다
각 쿼리는 얼마나 많은 데이터를 반환합니까? –
배치를 사용할 때 'WITH ... SKIP {_skip} LIMIT {_limit}' –
문자열 연결이 아닌 매개 변수 사용 –