2013-10-25 5 views
0

저는 관료적 이유로 업그레이드가 불가능한 환경에서 Pandas 0.8.1을 사용하고 있습니다.datetime.date pandas에서 set_index, groupby 및 apply와 관련된 많은 문제가 발생했습니다. 0.8.1

초기 문제와 목표에 대해 모두 읽으려면 아래의 "단순화 된 문제"섹션으로 건너 뛰고 싶을 수 있습니다.

내 목표 : DataFrame을 범주 형 열 "D"로 그룹화 한 다음 각 그룹에 대해 날짜 열 "dt"로 정렬하고 "dt"로 인덱스를 설정 한 다음 롤링 OLS 회귀를 수행하고 회귀 계수의 DataFrame beta은 날짜순으로 인덱싱됩니다.

마지막 결과는 각 특정 범주 형 변수에 고유 한 beta 프레임이 쌓여 있기 때문에 최종 인덱스는 범주 ID와 날짜 중 하나 인 두 수준이 될 것입니다.

뭔가가 다음

my_dataframe.groupby("D").apply(some_wrapped_OLS_caller) 

처럼 내가 KeyError: 0 오류가 종종 절망적 가치가없는 무엇입니까 및 역 추적이 날짜 문제에 질식 할 것 같다 할 경우 : 나는 회귀 단계를 수행하면

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity) 
    2287    arrays.append(level) 
    2288 
-> 2289   index = MultiIndex.from_arrays(arrays, names=keys) 
    2290 
    2291   if verify_integrity and not index.is_unique: 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names) 
    1505   if len(arrays) == 1: 
    1506    name = None if names is None else names[0] 
-> 1507    return Index(arrays[0], name=name) 
    1508 
    1509   cats = [Categorical.from_array(arr) for arr in arrays] 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name) 
    102   if dtype is None: 
    103    if (lib.is_datetime_array(subarr) 
--> 104     or lib.is_datetime64_array(subarr) 
    105     or lib.is_timestamp_array(subarr)): 
    106     from pandas.tseries.index import DatetimeIndex 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key) 
    427  def __getitem__(self, key): 
    428   try: 
--> 429    return self.index.get_value(self, key) 
    430   except InvalidIndexError: 
    431    pass 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key) 
    639   """ 
    640   try: 
--> 641    return self._engine.get_value(series, key) 
    642   except KeyError, e1: 
    643    if len(self) > 0 and self.inferred_type == 'integer': 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)() 

KeyError: 0 

그룹 별 객체의 각 그룹에서 수동으로 하나씩 수동으로 모든 작업이 장애없이 작동합니다.

코드 : 나는 groupby 객체를 저장하면

In [102]: dfrm_test.groupby("d").apply(foo) 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-102-345a8d45df50> in <module>() 
----> 1 dfrm_test.groupby("d").apply(foo) 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs) 
    267   applied : type depending on grouped object and function 
    268   """ 
--> 269   return self._python_apply_general(func, *args, **kwargs) 
    270 
    271  def aggregate(self, func, *args, **kwargs): 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs) 
    402    group_axes = _get_axes(group) 
    403 
--> 404    res = func(group, *args, **kwargs) 
    405 
    406    if not _is_indexed_like(res, group_axes): 

<ipython-input-101-8b9184c63365> in foo(zz) 
     1 def foo(zz): 
----> 2  zz1 = zz.sort("dt", ascending=True).set_index("dt") 
     3  r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12) 
     4  return r1.beta 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity) 
    2287    arrays.append(level) 
    2288 
-> 2289   index = MultiIndex.from_arrays(arrays, names=keys) 
    2290 
    2291   if verify_integrity and not index.is_unique: 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names) 
    1505   if len(arrays) == 1: 
    1506    name = None if names is None else names[0] 
-> 1507    return Index(arrays[0], name=name) 
    1508 
    1509   cats = [Categorical.from_array(arr) for arr in arrays] 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name) 
    102   if dtype is None: 
    103    if (lib.is_datetime_array(subarr) 
--> 104     or lib.is_datetime64_array(subarr) 
    105     or lib.is_timestamp_array(subarr)): 
    106     from pandas.tseries.index import DatetimeIndex 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key) 
    427  def __getitem__(self, key): 
    428   try: 
--> 429    return self.index.get_value(self, key) 
    430   except InvalidIndexError: 
    431    pass 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key) 
    639   """ 
    640   try: 
--> 641    return self._engine.get_value(series, key) 
    642   except KeyError, e1: 
    643    if len(self) > 0 and self.inferred_type == 'integer': 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)() 

KeyError: 0 

foo을 적용하려고 :

import numpy as np 
import pandas 
import datetime 
from dateutil.relativedelta import relativedelta as drr 

def foo(zz): 
    zz1 = zz.sort("dt", ascending=True).set_index("dt") 
    r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12) 
    return r1.beta 

dfrm_test = pandas.DataFrame({"x":np.random.rand(731), 
           "y1":np.random.rand(731), 
           "y2":np.random.rand(731), 
           "z":np.random.rand(731)}) 

dfrm_test['d'] = np.random.randint(0,2, size= (len(dfrm_test),)) 
dfrm_test['dt'] = [datetime.date(2000, 1, 1) + drr(days=i) 
        for i in range(len(dfrm_test))] 

지금 여기에 내가이 groupbyapply를 사용하여 작업 할 때 발생하는 것입니다 나 자신도 똑같은 방식으로 실패합니다.

In [103]: grps = dfrm_test.groupby("d") 

In [104]: for grp in grps: 
    foo(grp[1]) 
    .....: 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-104-f215ff55c12b> in <module>() 
     1 for grp in grps: 
----> 2  foo(grp[1]) 
     3 

<ipython-input-101-8b9184c63365> in foo(zz) 
     1 def foo(zz): 
----> 2  zz1 = zz.sort("dt", ascending=True).set_index("dt") 
     3  r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12) 
     4  return r1.beta 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity) 
    2287    arrays.append(level) 
    2288 
-> 2289   index = MultiIndex.from_arrays(arrays, names=keys) 
    2290 
    2291   if verify_integrity and not index.is_unique: 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names) 
    1505   if len(arrays) == 1: 
    1506    name = None if names is None else names[0] 
-> 1507    return Index(arrays[0], name=name) 
    1508 
    1509   cats = [Categorical.from_array(arr) for arr in arrays] 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name) 
    102   if dtype is None: 
    103    if (lib.is_datetime_array(subarr) 
--> 104     or lib.is_datetime64_array(subarr) 
    105     or lib.is_timestamp_array(subarr)): 
    106     from pandas.tseries.index import DatetimeIndex 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key) 
    427  def __getitem__(self, key): 
    428   try: 
--> 429    return self.index.get_value(self, key) 
    430   except InvalidIndexError: 
    431    pass 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key) 
    639   """ 
    640   try: 
--> 641    return self._engine.get_value(series, key) 
    642   except KeyError, e1: 
    643    if len(self) > 0 and self.inferred_type == 'integer': 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)() 

KeyError: 0 

그러나 그룹 데이터 프레임 중 하나를 저장 한 다음 foo을 호출하면 이 제대로 작동합니다. ... ??

In [105]: for grp in grps: 
    x = grp[1] 
    .....: 

In [106]: x.head() 
Out[106]: 
      x  y1  y2   z   dt d 
0 0.240858 0.235135 0.196027 0.940180 2000-01-01 1 
1 0.115784 0.802576 0.870014 0.482418 2000-01-02 1 
2 0.081640 0.939411 0.344041 0.846485 2000-01-03 1 
5 0.608413 0.100349 0.306595 0.739987 2000-01-06 1 
6 0.429635 0.678575 0.449520 0.362761 2000-01-07 1 

In [107]: foo(x) 
Out[107]: 
<class 'pandas.core.frame.DataFrame'> 
Index: 360 entries, 2000-01-17 to 2001-12-29 
Data columns: 
x   360 non-null values 
intercept 360 non-null values 
dtypes: float64(2) 

여기에 어떤 현상이 발생합니까? 잘못된 날짜/시간 유형으로의 변환을 트리거하는 논리가 트립되는 경우와 관련이 있습니까? 어떻게 해결할 수 있습니까?

간체 문제

난 그냥 apply 기능 내에서 set_index 통화에 문제를 단순화 할 수 있습니다. 그러나 이것은 정말로 이상하게되고 있습니다. 다음은 간단한 테스트 DataFrame을 사용한 예입니다 (set_index).

In [154]: tdf = pandas.DataFrame(
    {"dt":([datetime.date(2000,1,i+1) for i in range(12)] + 
      [datetime.date(2001,3,j+1) for j in range(13)]), 
    "d":np.random.randint(1,4,(25,)), 
    "x":np.random.rand(25)}) 

In [155]: tdf 
Out[155]: 
    d   dt   x 
0 1 2000-01-01 0.430667 
1 3 2000-01-02 0.159652 
2 1 2000-01-03 0.719015 
3 1 2000-01-04 0.175328 
4 3 2000-01-05 0.233810 
5 3 2000-01-06 0.581176 
6 1 2000-01-07 0.912615 
7 1 2000-01-08 0.534971 
8 3 2000-01-09 0.373345 
9 1 2000-01-10 0.182665 
10 1 2000-01-11 0.286681 
11 3 2000-01-12 0.054054 
12 3 2001-03-01 0.861348 
13 1 2001-03-02 0.093717 
14 2 2001-03-03 0.729503 
15 1 2001-03-04 0.888558 
16 1 2001-03-05 0.263055 
17 1 2001-03-06 0.558430 
18 3 2001-03-07 0.064216 
19 3 2001-03-08 0.018823 
20 3 2001-03-09 0.207845 
21 2 2001-03-10 0.735640 
22 2 2001-03-11 0.908427 
23 2 2001-03-12 0.819994 
24 2 2001-03-13 0.798267 

set_index 여기에 아무 문제가 없으며 날짜는 변경되지 않습니다.

In [156]: tdf.set_index("dt") 
Out[156]: 
      d   x 
dt 
2000-01-01 1 0.430667 
2000-01-02 3 0.159652 
2000-01-03 1 0.719015 
2000-01-04 1 0.175328 
2000-01-05 3 0.233810 
2000-01-06 3 0.581176 
2000-01-07 1 0.912615 
2000-01-08 1 0.534971 
2000-01-09 3 0.373345 
2000-01-10 1 0.182665 
2000-01-11 1 0.286681 
2000-01-12 3 0.054054 
2001-03-01 3 0.861348 
2001-03-02 1 0.093717 
2001-03-03 2 0.729503 
2001-03-04 1 0.888558 
2001-03-05 1 0.263055 
2001-03-06 1 0.558430 
2001-03-07 3 0.064216 
2001-03-08 3 0.018823 
2001-03-09 3 0.207845 
2001-03-10 2 0.735640 
2001-03-11 2 0.908427 
2001-03-12 2 0.819994 
2001-03-13 2 0.798267 

groupby는 없습니다 성공적으로 set_index하지만 (부적당 한 크기의 압축을 푸는 문제를 타격하기 전에, 그냥 모든 인덱스를 재설정 할 수 없습니다 그것을 오류주의) 할 수 있습니다.

In [157]: tdf.groupby("d").apply(lambda x: x.set_index("dt")) 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-157-cf2d3964f4d3> in <module>() 
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt")) 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs) 
    267   applied : type depending on grouped object and function 
    268   """ 
--> 269   return self._python_apply_general(func, *args, **kwargs) 
    270 
    271  def aggregate(self, func, *args, **kwargs): 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs) 
    402    group_axes = _get_axes(group) 
    403 
--> 404    res = func(group, *args, **kwargs) 
    405 
    406    if not _is_indexed_like(res, group_axes): 

<ipython-input-157-cf2d3964f4d3> in <lambda>(x) 
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt")) 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity) 
    2287    arrays.append(level) 
    2288 
-> 2289   index = MultiIndex.from_arrays(arrays, names=keys) 
    2290 
    2291   if verify_integrity and not index.is_unique: 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names) 
    1505   if len(arrays) == 1: 
    1506    name = None if names is None else names[0] 
-> 1507    return Index(arrays[0], name=name) 
    1508 
    1509   cats = [Categorical.from_array(arr) for arr in arrays] 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name) 
    102   if dtype is None: 
    103    if (lib.is_datetime_array(subarr) 
--> 104     or lib.is_datetime64_array(subarr) 
    105     or lib.is_timestamp_array(subarr)): 
    106     from pandas.tseries.index import DatetimeIndex 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key) 
    427  def __getitem__(self, key): 
    428   try: 
--> 429    return self.index.get_value(self, key) 
    430   except InvalidIndexError: 
    431    pass 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key) 
    639   """ 
    640   try: 
--> 641    return self._engine.get_value(series, key) 
    642   except KeyError, e1: 
    643    if len(self) > 0 and self.inferred_type == 'integer': 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)() 

KeyError: 0 

매우 이상한 부분은 여기

나는 그룹 개체를 저장하고 수동에 set_index를 호출하려고합니다. 이것은 작동하지 않습니다. 그룹의 특정 DataFrame 요소를 저장하더라도 작동하지 않습니다.

In [159]: grps = tdf.groupby("d") 

In [160]: grps 
Out[160]: <pandas.core.groupby.DataFrameGroupBy at 0x7600bd0> 

In [161]: grps_list = [(x,y) for x,y in grps] 

In [162]: grps_list[2][1].set_index("dt") 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-162-77f985a6e063> in <module>() 
----> 1 grps_list[2][1].set_index("dt") 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity) 
    2287    arrays.append(level) 
    2288 
-> 2289   index = MultiIndex.from_arrays(arrays, names=keys) 
    2290 
    2291   if verify_integrity and not index.is_unique: 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names) 
    1505   if len(arrays) == 1: 
    1506    name = None if names is None else names[0] 
-> 1507    return Index(arrays[0], name=name) 
    1508 
    1509   cats = [Categorical.from_array(arr) for arr in arrays] 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name) 
    102   if dtype is None: 
    103    if (lib.is_datetime_array(subarr) 
--> 104     or lib.is_datetime64_array(subarr) 
    105     or lib.is_timestamp_array(subarr)): 
    106     from pandas.tseries.index import DatetimeIndex 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key) 
    427  def __getitem__(self, key): 
    428   try: 
--> 429    return self.index.get_value(self, key) 
    430   except InvalidIndexError: 
    431    pass 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key) 
    639   """ 
    640   try: 
--> 641    return self._engine.get_value(series, key) 
    642   except KeyError, e1: 
    643    if len(self) > 0 and self.inferred_type == 'integer': 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)() 

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)() 

KeyError: 0 

하지만 수동 재건 후, set_index않습니다 작업을 그룹의 DataFrame를 수동으로 직접 복사를 구성하는 경우? 무엇 지옥 빌어 먹을 사람 : 아처 시즌 3의 처음 몇 에피소드에서 말할 수있는 해적으로

In [163]: grps_list[2][1] 
Out[163]: 
    d   dt   x 
1 3 2000-01-02 0.159652 
4 3 2000-01-05 0.233810 
5 3 2000-01-06 0.581176 
8 3 2000-01-09 0.373345 
11 3 2000-01-12 0.054054 
12 3 2001-03-01 0.861348 
18 3 2001-03-07 0.064216 
19 3 2001-03-08 0.018823 
20 3 2001-03-09 0.207845 

In [165]: recreation = pandas.DataFrame(
    {"d":[3,3,3,3,3,3,3,3,3], 
    "dt":[datetime.date(2000,1,2), datetime.date(2000,1,5), datetime.date(2000,1,6), 
      datetime.date(2000,1,9), datetime.date(2000,1,12), datetime.date(2001,3,1), 
      datetime.date(2001,3,7), datetime.date(2001,3,8), datetime.date(2001,3,9)], 
    "x":[0.159, 0.233, 0.581, 0.3733, 0.054, 0.861, 0.064, 0.0188, 0.2078]}) 

In [166]: recreation 
Out[166]: 
    d   dt  x 
0 3 2000-01-02 0.1590 
1 3 2000-01-05 0.2330 
2 3 2000-01-06 0.5810 
3 3 2000-01-09 0.3733 
4 3 2000-01-12 0.0540 
5 3 2001-03-01 0.8610 
6 3 2001-03-07 0.0640 
7 3 2001-03-08 0.0188 
8 3 2001-03-09 0.2078 

In [167]: recreation.set_index("dt") 
Out[167]: 
      d  x 
dt 
2000-01-02 3 0.1590 
2000-01-05 3 0.2330 
2000-01-06 3 0.5810 
2000-01-09 3 0.3733 
2000-01-12 3 0.0540 
2001-03-01 3 0.8610 
2001-03-07 3 0.0640 
2001-03-08 3 0.0188 
2001-03-09 3 0.2078 

?

답변

1

이 점은 그룹 인덱스를 MultiIndex로 변경하는 groupby에서 발생합니다.

def foo(zz): 
    zz1 = zz.sort("dt", ascending=True).reset_index().set_index("dt", inplace=True) 
    r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12) 
    return r1.beta 

및이 적어도 하나의 대안을 제공하는 함수의 내부 인덱스를 다시 호출을 추가

apply으로 적용하기 위해서는 문제를 제거한다.