2017-12-17 16 views
1

범주 형 데이터에 레이블을 지정해야합니다. 우리가 홍채 예를 살펴 보자 : 그것은 인쇄됩니다 "unfair"pandas categorical.from_codes

import pandas as pd 
import numpy as np 
from sklearn.datasets import load_iris 

iris = load_iris() 

print "targets: ", np.unique(iris.target) 
print "targets: ", iris.target.shape 
print "target_names: ", np.unique(iris.target_names) 
print "target_names: ", iris.target_names.shape 

:

내가 pandas.Categorical.from_codes 사용하여 원하는 라벨을 생산하기 위해

targets: [0 1 2] targets: (150L,) target_names: ['setosa' 'versicolor' 'virginica'] target_names: (3L,)

:

print pd.Categorical.from_codes(iris.target, iris.target_names) 

[setosa, setosa, setosa, setosa, setosa, ..., virginica, virginica, virginica, virginica, virginica] Length: 150 Categories (3, object): [setosa, versicolor, virginica]

다른 예를 들어 보겠습니다.

# I define new targets 
target = np.array([123,123,54,123,123,54,2,54,2]) 
target = np.array([1,1,3,1,1,3,2,3,2]) 
target_names = np.array(['paglia','gioele','papa']) 
#--- 
print "targets: ", np.unique(target) 
print "targets: ", target.shape 
print "target_names: ", np.unique(target_names) 
print "target_names: ", target_names.shape 

내가 라벨의 범주 값을 변환 다시 시도하십시오 :

C:\Users\ianni\Anaconda2\lib\site-packages\pandas\core\categorical.pyc in from_codes(cls, codes, categories, ordered) 459 460 if len(codes) and (codes.max() >= len(categories) or codes.min() < -1): --> 461 raise ValueError("codes need to be between -1 and " 462 "len(categories)-1") 463

ValueError: codes need to be between -1 and len(categories)-1

당신은 이유를 알고 수행

print pd.Categorical.from_codes(target, target_names) 

나는 오류 메시지가?

답변

1

Do you know why?

당신은 오류 추적에 대해 자세히 살펴 걸릴 경우

codes.max() >= len(categories) 

귀하의 경우 :

In [128]: pd.Categorical.from_codes(target, target_names) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-128-c2b4f6ac2369> in <module>() 
----> 1 pd.Categorical.from_codes(target, target_names) 

~\Anaconda3_5.0\envs\py36\lib\site-packages\pandas\core\categorical.py in from_codes(cls, codes, categories, ordered) 
    619 
    620   if len(codes) and (codes.max() >= len(categories) or codes.min() < -1): 
--> 621    raise ValueError("codes need to be between -1 and " 
    622        "len(categories)-1") 
    623 

ValueError: codes need to be between -1 and len(categories)-1 

는 다음과 같은 조건이 충족되는 것을 볼 수 있습니다

In [133]: target.max() >= len(target_names) 
Out[133]: True 

즉,

In [173]: target 
Out[173]: array([123, 123, 54, 123, 123, 54, 2, 54, 2]) 

도우미 dicts :

In [174]: mapping = dict(zip(np.unique(target), np.arange(len(target_names)))) 

In [175]: mapping 
Out[175]: {2: 0, 54: 1, 123: 2} 

In [176]: reverse_mapping = {v:k for k,v in mapping.items()} 

In [177]: reverse_mapping 
Out[177]: {0: 2, 1: 54, 2: 123} 

구축 범주 시리즈 :

In [178]: ser = pd.Categorical.from_codes(pd.Series(target).map(mapping), target_names) 

In [179]: ser 
Out[179]: 
[papa, papa, gioele, papa, papa, gioele, paglia, gioele, paglia] 
Categories (3, object): [paglia, gioele, papa] 
을 677,659,는 codes로 일련 번호가 0 최대

len(categories) - 1에 대한 해결 방법부터 기대 0

역 매핑 :

In [180]: pd.Series(ser.codes).map(reverse_mapping) 
Out[180]: 
0 123 
1 123 
2  54 
3 123 
4 123 
5  54 
6  2 
7  54 
8  2 
dtype: int64