kafka-connect가 포함 된 여러 개의 하이브 파티션

나는 kafka-connect를 사용하여 프로세스 중에 하이브 통합과 함께 HDFS로 데이터를 스트리밍하려고했습니다.kafka-connect가 포함 된 여러 개의 하이브 파티션

필자의 경우에는 "FieldPartioner"를 파티션 클래스로 사용해야합니다.

제 문제는 다중 파티션을 얻을 수 없다는 것입니다.

예 :

{ 
    "_id": "582d666ff6e02edad83cae28", 
    "index": "ENAUT", 
    "mydate": "03-01-2016", 
    "hour": 120000, 
    "balance": "$2,705.80" 
}

내가 'MyDate가'와 '시간'

을 기준으로 파티션을 갖고 싶어

내 예를 들어, JSON 나는 다음과 같은

name=hdfs-sink 
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector 
tasks.max=1 
topics=fieldPartition_test_hdfs 
hdfs.url=hdfs://quickstart.cloudera:8020 
flush.size=3 

partitioner.class=io.confluent.connect.hdfs.partitioner.FieldPartitioner 
partition.field.name={mydate,hour} 

locale=en 
timezone=GMT 

hive.database=weblogs 
hive.integration=true 
hive.metastore.uris=thrift://quickstart.cloudera:9083 
schema.compatibility=BACKWARD

시도

partition.field.name을

로 지정하려고 시도했습니다.

partition.field.name={'mydate','hour'}

및

partition.field.name=mydate,hour

및

더 많은 같은 조합

크게 될 문제에 어떤 도움을

감사합니다 감사합니다.

출처

2016-11-18 Khal Drogo

가능한 모든 방법을 시도하고 나중에 소스 코드를 파고 들기 시작했습니다.

FieldPartitoner의 코드는 here입니다!

그리고 여기에 파일에 마지막 커밋은

너희들이 다른 해결책이 있으면 알려 마십시오 "되돌리기 '지원 멀티 파티션 필드'2 개월 전"을 보여줍니다.

출처

2016-11-18 11:35:41

kafka-connect가 포함 된 여러 개의 하이브 파티션

답변

관련 문제