팬더 Dataframe NumPy와의 배열 - 잘못된 데이터 유형 및

내가 파이썬 2.7에서 다음 팬더 DataFrame을 정렬하는 데 노력하고 변경할 수 없습니다 : 코드의팬더 Dataframe NumPy와의 배열 - 잘못된 데이터 유형 및

import numpy as np 
import pandas as pd 

heading_cols = ["Video Title", "Up Ratings", "Down Ratings", "Views", "User Name","Subscribers"] 
column_1 = ["Adelaide","Brisbane","Darwin","Hobart","Sydney","Melbourne","Perth"] 
column_2 = [1295, 5905, 112, 1357, 2058, 1566, 5386] 
column_3 = [1158259, 1857594, 120900, 205556, 4336374, 3806092, 1554769] 
column_4 = [600.5, 1146.4, 1714.7, 619.5, 1214.8, 646.9, 869.4] 
column_5 = ["Bob","Tom","Dave","Sally","Rick","Mary","Roberta"] 
column_6 = [25000,30000,15000,15005,20000,31111,11000] 

#Generate data: 
xdata_arr = np.array([column_1,column_2,column_3,column_4,column_5,column_6]).T 

# Generate the DataFrame: 
df = pd.DataFrame(xdata_arr, columns=heading_cols) 
print df

다음 2 줄 일으키는 문제 :

# Print DataFrame and basic stats: 
print df["Up Ratings"].describe() 
print df.sort('Views', ascending=False)

을

문제 :

정렬은 열에 대해 작동하지 않습니다.
통계에 mean, std, min, max 등의 정보가 포함되어야합니다. 이러한 정보는 표시되지 않습니다.

문제는 dtypes()가 모든 열에 대해 "개체"를 반환한다는 것입니다. 이것은 잘못된 것입니다. 일부는 정수 여야하지만 숫자 만 변경하는 방법을 알아낼 수는 없습니다. 나는 시도했다 :

df.convert_objects(convert_numeric=True)

그러나 이것은 효과가 없다. 그래서 NumPy 배열로 가서 거기에 dtypes를 변경하려고했습니다 :

dt = np.dtype([(heading_cols[0], np.str_), (heading_cols[1], np.int16), (heading_cols[2], np.int16), (heading_cols[3], np.int16), (heading_cols[4], np.str_), (heading_cols[5], np.int16) ])

그러나 이것도 작동하지 않습니다.

수동으로 dtype을 숫자로 변경하는 방법이 있습니까?

출처

2014-10-10 W R

''convert_object()''가장 팬더 방법과 같은 새로운 객체를 반환 할 :''DF = df.convert_object (convert_numeric = TRUE)는'' – Jeff

가 좋아, 그냥 시도,하지만 난 무엇입니까 이 메시지는 다음과 같습니다. Traceback (가장 최근에 마지막으로 호출) : 파일 "C : \ Python27 \ testing.py" df = pd.DataFrame (xdata_arr, columns = heading_cols) .convert_object (convert_numeric = True) 파일 형식 : C : \ Python27 \ lib \ site-packages \ pandas \ core \ generic.py, 1843 줄, __getattr__ (유형 (자체) .__ name__, 이름) AttributeError : 'DataFrame' convert_object ' –

오타 :''convert_objects'' – Jeff

팬더에서와 마찬가지로 대부분 convert_objects은 NEW 개체를 반환합니다.

In [20]: df.convert_objects(convert_numeric=True) 
Out[20]: 
    Video Title Up Ratings Down Ratings Views User Name Subscribers 
0 Adelaide  1295  1158259 600.5  Bob  25000 
1 Brisbane  5905  1857594 1146.4  Tom  30000 
2  Darwin   112  120900 1714.7  Dave  15000 
3  Hobart  1357  205556 619.5  Sally  15005 
4  Sydney  2058  4336374 1214.8  Rick  20000 
5 Melbourne  1566  3806092 646.9  Mary  31111 
6  Perth  5386  1554769 869.4 Roberta  11000 

In [21]: df.convert_objects(convert_numeric=True).dtypes 
Out[21]: 
Video Title  object 
Up Ratings  int64 
Down Ratings  int64 
Views   float64 
User Name  object 
Subscribers  int64 
dtype: object

출처

2014-10-10 14:49:48 Jeff

고마워요. 당신이 한 것처럼, 나는 한 줄로 모든 것을 할 수있었습니다 - 데이터 프레임을 만들고 동시에 변환을하십시오. 더 중요한 것은, 나는 "새로운 객체 반환"부분을 인식하지 못했습니다. 이를 우회하기 위해 왜 df.convert_objects (convert_numeric = True, inplace = True)와 같이 "inplace = True"를 사용할 수 없습니까? <--- 나는 이것을 시도했고 오류 메시지가 나타납니다. –

예,''convert_objects''는 inplace가 없으며 일반적으로 사용하지 않는 것에서 -1입니다. 그들은 독자를위한 직관적 인 것이 아니며 퍼펙트 혜택을 거의 제공하지 않습니다. – Jeff

고마워. 여기에 모든 질문에 답이 있습니다. –

팬더 Dataframe NumPy와의 배열 - 잘못된 데이터 유형 및

답변

관련 문제