2016-09-26 7 views
3

저는 파이썬과 팬더를 처음 사용합니다.열이 아닌 열을 필터링하기 위해 panda df를 쿼리하는 중임

데이터 프레임을 쿼리하고 열 중 하나가 NaN이 아닌 행을 필터링하고 싶습니다.

a=dictionarydf.label.isnull() 

을하지만,이 true 또는 false로 채워집니다 :

나는 노력했다. 이

dictionarydf.query(dictionarydf.label.isnull()) 

을 시도하지만 예상대로 오류를 준

샘플 데이터 :

: 나는 레이블

예상 출력 NaN의

없는 경우에는 데이터를 필터링 할

 reference_word   all_matching_words label review 
0   account    fees - account NaN  N 
1   account   mobile - account NaN  N 
2   account   monthly - account NaN  N 
3 administration delivery - administration NaN  N 
4 administration  fund - administration NaN  N 
5   advisor    fees - advisor NaN  N 
6   advisor   optimum - advisor NaN  N 
7   advisor    sub - advisor NaN  N 
8    aichi   delivery - aichi NaN  N 
9    aichi    pref - aichi NaN  N 
10   airport    biz - airport travel  N 
11   airport    cfo - airport travel  N 
12   airport   cfomtg - airport travel  N 
13   airport   meeting - airport travel  N 
14   airport   summit - airport travel  N 
15   airport    taxi - airport travel  N 
16   airport   train - airport travel  N 
17   airport   transfer - airport travel  N 
18   airport    trip - airport travel  N 
19    ais    admin - ais NaN  N 
20    ais    alpine - ais NaN  N 
21    ais     fund - ais NaN  N 
22  allegiance  custody - allegiance NaN  N 
23  allegiance   fees - allegiance NaN  N 
24   alpha    late - alpha NaN  N 
25   alpha    meal - alpha NaN  N 
26   alpha    taxi - alpha NaN  N 
27   alpine    admin - alpine NaN  N 
28   alpine    ais - alpine NaN  N 
29   alpine    fund - alpine NaN  N 

 reference_word   all_matching_words label review 
0   airport    biz - airport travel  N 
1   airport    cfo - airport travel  N 
2   airport   cfomtg - airport travel  N 
3   airport   meeting - airport travel  N 
4   airport   summit - airport travel  N 
5   airport    taxi - airport travel  N 
6   airport   train - airport travel  N 
7   airport   transfer - airport travel  N 
8   airport    trip - airport travel  N 

답변

3

당신은 dropna를 사용할 수 있습니다

df = df.dropna(subset=['label']) 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 

또 다른 해결책 - boolean indexingnotnull에 : 문제를 해결 빠른 답변 : @jezrael에 대한

df = df[df.label.notnull()] 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 
+0

감사합니다. 행을 삭제하지 않고 중복 데이터 프레임을 만들 필요가 없기 때문에 부울 인덱스를 선택했습니다. 두 가지 솔루션 모두 완벽하게 작동했습니다. – Dileep