2017-01-10 9 views
0

Cloud Storage 버킷에 저장된 XML 파일에서 읽을 때 XmlSource.from을 사용하고 있습니다.Google Cloud Dataflow에서 XmlSource를 사용하여 XML 파일을 읽을 때 ClassCastException이 발생했습니다.

XmlSource<Data> source = XmlSource.<Data>from("gs://<my-url>/TestData.xml") 
     .withRootElement("data") 
     .withRecordElement("record") 
     .withRecordClass(Data.class); 

p.apply(Read.from(source)) 
     .apply(RemoveDuplicates.<Data>create()) 
     .apply(ParDo.of(new XMLPipeline.CreateItemQtyMapping())) 
     .apply(Combine.<String, Integer>perKey(new SumIntegers())) 
     .apply("FormatResults", MapElements.via(
       new SimpleFunction<KV<String, Integer>, String>() { 
        @Override 
        public String apply(KV<String, Integer> input) { 
        return input.getKey() + "," + input.getValue(); 
        } 
       })) 
     .apply(TextIO.Write.to("gs://<my-url>.appspot.com/pos-pipeline-output/ItemCounts")); 

p.run(); 

그러나 나는이 예외를 얻고있다 :

017-01-09T14:01:31.107Z: Error: (c88c756cabe0dbec): java.io.IOException: Failed to start reading from source: StaticValueProvider{value=gs://<my-url>/TestData.xml} range [48524, 97048) 
at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:534) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:387) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:217) 
at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:182) 
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:69) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:284) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:220) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:170) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:192) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:172) 
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:159) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.ClassCastException: com.sun.xml.internal.stream.XMLInputFactoryImpl cannot be cast to org.codehaus.stax2.XMLInputFactory2 
    at com.google.cloud.dataflow.sdk.io.XmlSource$XMLReader.setUpXMLParser(XmlSource.java:490) 
    at com.google.cloud.dataflow.sdk.io.XmlSource$XMLReader.startReading(XmlSource.java:356) 
    at com.google.cloud.dataflow.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:528) 
    at com.google.cloud.dataflow.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:281) 
    at com.google.cloud.dataflow.sdk.runners.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:531) 
    ... 14 more 

이 내 pom.xml 파일의 종속성은 다음과 같습니다

<dependencies> 
<dependency> 
    <groupId>com.google.cloud.dataflow</groupId> 
    <artifactId>google-cloud-dataflow-java-sdk-all</artifactId> 
    <version>1.9.0</version> 
</dependency> 

<dependency> 
    <groupId>com.google.cloud</groupId> 
    <artifactId>google-cloud-storage</artifactId> 
    <version>0.7.0</version> 
</dependency> 

<dependency> 
    <groupId>org.codehaus.woodstox</groupId> 
    <artifactId>stax2-api</artifactId> 
    <version>4.0.0</version> 
</dependency> 

내가 여기에 어떤 문제가 있는지 모르겠습니다. 누군가 포인터를 주시겠습니까?

감사합니다,

Abhishek

+0

이것은 버그 일 수 있습니다. 좀 더 자세히 살펴볼 예정이지만 SDK 1.8.0을 사용하면 해결할 수 있습니다. –

답변

1

이 조금 미묘하지만 당신은 또한 적절한 런타임 종속성을 포함해야 것 같습니다. 같은 조직으로, stax2-API

  • 가 실행시 클래스 패스에 호환되는 구현을 포함

    :

    1. 가 명시 적으로 org.codehaus.woodstox에 대한 종속성을 선언 : https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/XmlSource에 따르면, 당신이 원하는 .codehaus.woodstox : woodstox-core-asl

    # 1이 아닌 # 2를 올바르게 완료 한 것으로 보입니다.

  • 0

    나 java.lang.ClassCastException가를 해결하기 위해 : com.sun.xml.internal.stream.XMLInputFactoryImpl 만를 사용 org.codehaus.stax2.XMLInputFactory2

    대답은 이었다

    캐스트 할 수없는 org.codehaus.woodstox 대한 의존성 : woodstox.core.asl 이미 간접 종속성 STAX 및 stax2하는

    (javax.xml.stream - STAX-API, org.codehaus.woodstox - stax2-API)를 .