2017-12-20 33 views
0

선형 회귀 모델을 학습하려고 할 때 주어진 예외 아래에 있습니다. (그러나 똑같은 문제가 올바르게 실행되고있었습니다. 별도의 JVM을 사용하여) 모델을 훈련하기 : 나는 훈련을위한 데이터 세트 생성하는 코드 아래 사용하고scala.collection.immutable에 대한 spark ml에서 클래스 캐스트 예외가 발생했습니다. scala.collection.Seq에 대한 목록

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 28, impetus-dsrv07.impetus.co.in, executor 2): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 
    at org.apache.spark.scheduler.Task.run(Task.scala:108) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:748) 
Driver stacktrace: 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) 
    at scala.Option.foreach(Option.scala:257) 
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) 
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) 
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) 
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) 
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336) 
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) 
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2853) 
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153) 
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153) 
    at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2837) 
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) 
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2836) 
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2153) 
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2160) 
    at org.apache.spark.sql.Dataset.first(Dataset.scala:2167) 
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:198) 
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:76) 
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:118) 
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.trainLR(LRTrainer.java:88) 
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.processLRTraining(LRTrainer.java:83) 
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.execute(LRTrainer.java:54) 
    at com.impetus.idw.turin.core.Sequence.runSequence(Sequence.java:122) 
    at com.impetus.idw.turin.core.Status.runStatus(Status.java:93) 
    at com.impetus.idw.turin.core.Action.runAction(Action.java:83) 
    at com.impetus.idw.turin.core.Node.runNode(Node.java:156) 
    at com.impetus.idw.turin.core.Node.run(Node.java:96) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:748) 
Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD 
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) 
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 
    at org.apache.spark.scheduler.Task.run(Task.scala:108) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
    ... 3 common frames omitted 

: 여기 는, inputDS는 CSV 데이터 세트입니다 ... 난에서 예외를 얻고있다

Dataset<Row> data1 = inputDS.select(label,features); 
Dataset<Row> data2 = data1.withColumn("label",data1.col(label).cast("Double")); 
data2.map(new MapFunction<Row,Row>() { 
    @Override 
    public Row call(Row row) throws Exception { 
     double label = row.getAs("label"); 
     double prediction = row.getAs("prediction"); 
     DenseVector features = row.getAs("features"); 
     return RowFactory.create(label,features.toArray(),prediction); 
    } 
}, Encoders.bean(Row.class)); 

을 이 지점 :

lrModel = lRegression.fit(ds); 

답변

0

스칼라 버전을 2.10으로 줄이십시오. 또는 코드를 검사 (분석 -> 코드 검사 ...)하고 직렬화와 관련된 권장되지 않는 메소드를 찾아 해결할 수 있습니다.