Databricks ETL -> BigQuery, WRITE_TRUNCATE가 작동하지 않습니다.

Databricks에서 ETL을 실행하고 BigQuery에 글을 남깁니다. 나는 "WRITE_TRUNCATE"코드를 얻으려고 노력하고있다. 일명 그것이 실행될 때마다 데이터를 쓸 수있다. 여기에는 BigQuery 구성 변경이 포함됩니다.Databricks ETL -> BigQuery, WRITE_TRUNCATE가 작동하지 않습니다.

나는 많은 것들을 시도했지만 작동시키지 못했다.

"error: value OUTPUT_TABLE_WRITE_DISPOSITION_KEY is not a member of object com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration"

모든 아이디어 :이 오류를 전송

import com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration 
val conf = sc.hadoopConfiguration 
conf.set(BigQueryConfiguration.OUTPUT_TABLE_WRITE_DISPOSITION_KEY, "WRITE_TRUNCATE")

: 여기에 내 현재 코드는 무엇입니까? 고맙습니다!

출처

2017-09-05 Ashley O

[link] (https://github.com/GoogleCloudPlatform/bigdata-interop/issues/43)를 보셨습니까? 구체적으로 :'conf.set ("mapreduce.job.outputformat.class", classOf [IndirectBigQueryOutputFormat [,]]. getName)' –

네, 그걸 보았고 많은 변형을 시도했지만, 그리고 내가 그것을 실행할 때 하나는 여전히 데이터를 복제했다. –

시도 구성 설정하는 코드 블록을 다음

import com.google.cloud.hadoop.io.bigquery.output.BigQueryOutputConfiguration; 
    import com.google.cloud.hadoop.io.bigquery.BigQueryFileFormat; 
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
    import com.google.cloud.hadoop.io.bigquery.output.IndirectBigQueryOutputFormat; 

    BigQueryOutputConfiguration.configure(conf ,projectId ,outputDatasetId ,outputTableId,outputSchema ,Temp_Gcs_path ,BigQueryFileFormat.NEWLINE_DELIMITED_JSON ,classOf[TextOutputFormat[_,_]]) 
    ... 
    conf.set("mapreduce.job.outputformat.class",classOf[IndirectBigQueryOutputFormat[,]].getName)

당신이 질문, 즉, 당신이 달성하려고하는 무엇을, 전체 의사의 자세한 내용을 제공 할 수 있다면 문제를 이해하기 쉬울 것이다 유즈 케이스 코드 등

출처

2017-09-15 10:04:01

Databricks ETL -> BigQuery, WRITE_TRUNCATE가 작동하지 않습니다.

답변

관련 문제