1

Spark를 처음 접했고 Cloudera Distr for Hadoop (CDH)에서 학습 중입니다. 오류 메시지와 함께 아래PySpark에서 PageRank 및 BFS 함수를 실행하는 동안 오류가 발생했습니다.

pyspark --packages graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-csv_2.11:1.2.0 

내가 실행하려고 랭크 기능 명령을한다 : 나는 다음과 같은 명령을 사용하여 시작되었다 Jupyter 노트북,를 통해 페이지 랭크 (PageRank)와 BFS 기능을 실행하기 위해 노력하고있어

ranks = tripGraph.pageRank(resetProbability=0.15, maxIter=5) 

출력 : 내가 노력하고있어 BFS 기능에 대해 동일한 오류 메시지를 받고 있어요

--------------------------------------------------------------------------- 
Py4JJavaError        Traceback (most recent call last) 
<ipython-input-20-34d549cc033e> in <module>() 
----> 1 ranks = tripGraph.pageRank(resetProbability=0.15, maxIter=5) 
     2 ranks.vertices.orderBy(ranks.vertices.pagerank.desc()).limit(20).show() 

/tmp/spark-3bdc323d-a439-4f0a-ac1d-4e64ef4d1396/userFiles-0c248c5c-29fc-44c7-bfd9-3543500350dc/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in pageRank(self, resetProbability, sourceId, maxIter, tol) 

/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 
    811   answer = self.gateway_client.send_command(command) 
    812   return_value = get_return_value(
--> 813    answer, self.gateway_client, self.target_id, self.name) 
    814 
    815   for temp_arg in temp_args: 

/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 
    43  def deco(*a, **kw): 
    44   try: 
---> 45    return f(*a, **kw) 
    46   except py4j.protocol.Py4JJavaError as e: 
    47    s = e.java_exception.toString() 

/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 
    306     raise Py4JJavaError(
    307      "An error occurred while calling {0}{1}{2}.\n". 
--> 308      format(target_id, ".", name), value) 
    309    else: 
    310     raise Py4JError(

Py4JJavaError: An error occurred while calling o106.run. 
: java.lang.AbstractMethodError 
    at org.apache.spark.Logging$class.log(Logging.scala:50) 
    at org.apache.spark.graphx.lib.backport.PageRank$.log(PageRank.scala:65) 
    at org.apache.spark.Logging$class.logInfo(Logging.scala:58) 
    at org.apache.spark.graphx.lib.backport.PageRank$.logInfo(PageRank.scala:65) 
    at org.apache.spark.graphx.lib.backport.PageRank$.runWithOptions(PageRank.scala:148) 
    at org.graphframes.lib.PageRank$.run(PageRank.scala:130) 
    at org.graphframes.lib.PageRank.run(PageRank.scala:104) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
    at py4j.Gateway.invoke(Gateway.java:259) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:209) 
    at java.lang.Thread.run(Thread.java:745) 

:

filteredPaths = tripGraph.bfs(
    fromExpr = "id = 'SEA'", 
    toExpr = "id = 'SFO'", 
    maxPathLength = 1) 

출력 :

--------------------------------------------------------------------------- 
Py4JJavaError        Traceback (most recent call last) 
<ipython-input-22-74394b11f50d> in <module>() 
     4 fromExpr = "id = 'SEA'", 
     5 toExpr = "id = 'SFO'", 
----> 6 maxPathLength = 1) 
     7 
     8 filteredPaths.show() 

/tmp/spark-3bdc323d-a439-4f0a-ac1d-4e64ef4d1396/userFiles-0c248c5c-29fc-44c7-bfd9-3543500350dc/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in bfs(self, fromExpr, toExpr, edgeFilter, maxPathLength) 

/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 
    811   answer = self.gateway_client.send_command(command) 
    812   return_value = get_return_value(
--> 813    answer, self.gateway_client, self.target_id, self.name) 
    814 
    815   for temp_arg in temp_args: 

/usr/lib/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 
    43  def deco(*a, **kw): 
    44   try: 
---> 45    return f(*a, **kw) 
    46   except py4j.protocol.Py4JJavaError as e: 
    47    s = e.java_exception.toString() 

/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 
    306     raise Py4JJavaError(
    307      "An error occurred while calling {0}{1}{2}.\n". 
--> 308      format(target_id, ".", name), value) 
    309    else: 
    310     raise Py4JError(

Py4JJavaError: An error occurred while calling o147.run. 
: java.lang.AbstractMethodError 
    at org.apache.spark.Logging$class.log(Logging.scala:50) 
    at org.graphframes.lib.BFS$.log(BFS.scala:131) 
    at org.apache.spark.Logging$class.logInfo(Logging.scala:58) 
    at org.graphframes.lib.BFS$.logInfo(BFS.scala:131) 
    at org.graphframes.lib.BFS$.org$graphframes$lib$BFS$$run(BFS.scala:212) 
    at org.graphframes.lib.BFS.run(BFS.scala:126) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) 
    at py4j.Gateway.invoke(Gateway.java:259) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:209) 
    at java.lang.Thread.run(Thread.java:745) 

당신이 나에게 문제를 알려 주시기 바랍니다 수 있습니까?

감사합니다. Sasi.

답변

1

사용하는 호환되지 않는 스칼라 버전 :

  • graphframes:graphframes:0.1.0-spark1.6 - 스칼라 2.10
  • com.databricks:spark-csv_2.11:1.2.0 - 스칼라 2.11
  • Spark installation - 아마 스칼라 2.10.

모든 구성 요소에 동일한 스칼라 버전 (스팍이 스칼라 2.10으로 컴파일 된 경우 com.databricks:spark-csv_2.10:1.2.0)을 사용해야합니다. 자세한 내용은 Resolving dependency problems in Apache Spark에 문의하십시오.

+0

지연된 승인을 드려 죄송합니다. 그러나 이것은 도움이되었으며 천천히 물건을 둘러보고 있습니다! 많은 감사합니다. – Sasi

+0

문제 없습니다. 당신은 [대답을 수락 할 수 있습니까] (http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) 그리고/또는 대답을 upvote 수 있습니까? – user8371915

+1

완료 (StackOvreflow도 새로 도입되었습니다 :)) – Sasi