pyspark : distinctCount - AnalysisException :. u는 "해결할 수없는 주어진 입력 열을 'X':

-1

나는 다음과 같은 데이터 프레임을 가지고 : 나는 단지 이드 distinctCount(field_A') = 1이있는 행을 유지하고 싶은 pyspark : distinctCount - AnalysisException :. u는 "해결할 수없는 주어진 입력 열을 'X':

Id | field_A | field_B | field_C | field_D 
1 | cat | 12  | black | 11 
1 | dog | 128  | white | 19 
2 | dog | 35  | yellow | 20 
2 | dog | 21  | brown | 4 
3 | bird | 10  | blue | 7 
4 | cow | 99  | brown | 34

(. 즉, 동물의 "ONE TYPE")과 이드 최종 결과가 있어야한다 :

myDF.groupBy(['Id']).agg(countDistinct('field_A')).alias('distinct_A_count').filter('distinct_A_count = 1').show(20,False)

Id | field_A | field_B | field_C | field_D 
2 | dog | 35  | yellow | 20 
2 | dog | 21  | brown | 4 
3 | bird | 10  | blue | 7 
4 | cow | 99  | brown | 34

나는 아래의 방법으로 시작

이

AnalysisException: u"cannot resolve 'distinct_A_count' given input columns: [Id, count(field_A)];"

사람이 내가 뭘 잘못했는지 알고 있나요 :

은 그 때 나는 다음과 같은 오류가있어? 감사!

출처

2016-06-24 Edamame

가 나는 이 대신 의 별칭

myDF.groupBy(['Id']).agg(countDistinct('field_A')).withColumnRenamed('count(field_A)','distinct_A_count').filter('distinct_A_count = 1').show(20,False)

을 withColumnRenamed에 의해 작동있어

출처

2016-06-24 21:51:07 Edamame

pyspark : distinctCount - AnalysisException :. u는 "해결할 수없는 주어진 입력 열을 'X':

답변

관련 문제