sklearn을 사용하여 3 개 이상의 부분으로 데이터를 분할하는 방법

계층화 된 train, test 및 validation 데이터 세트로 데이터를 분할하려고하지만 sklearn은 2 조각으로 만 나눌 수있는 cross_validation.train_test_split 만 제공합니다. 난 당신이 층화 기차/테스트 분할을 사용하려면이sklearn을 사용하여 3 개 이상의 부분으로 데이터를 분할하는 방법

출처

2017-09-15 loseryao

, 당신은 StratifiedKFold in Sklearn

한다고 가정 X을 사용할 수 있습니다 할 것인지 내가 어떻게해야 당신의 기능과 y는 예를 here에 따라 라벨입니다 :

from sklearn.model_selection import StratifiedKFold 
cv_stf = StratifiedKFold(n_splits=3) 
for train_index, test_index in skf.split(X, y): 
    print("TRAIN:", train_index, "TEST:", test_index) 
    X_train, X_test = X[train_index], X[test_index] 
    y_train, y_test = y[train_index], y[test_index]

업데이트는 :

: 3을 다른 비율이 numpy.split()은 다음과 같이 수행 할 수 있습니다 사용한다고 말할에 데이터를 분할하려면 0

X_train, X_test, X_validate = np.split(X, [int(.7*len(X)), int(.8*len(X))]) 
y_train, y_test, y_validate = np.split(y, [int(.7*len(y)), int(.8*len(y))])

출처

2017-09-15 05:57:50

답장을 보내 주셔서 감사합니다.하지만 데이터 세트를 [70 %, 20 %, 10 %]와 같이 세 조각으로 나누고 싶습니다. StratifiedKFold가 도움이되지 않을 수 있습니다. – loseryao

@loseryao 오, 미안, 나는 네가 3 가지 폴드를 의미한다고 생각했다. –

덕분에 많이 .............. – loseryao

sklearn을 사용하여 3 개 이상의 부분으로 데이터를 분할하는 방법

답변

관련 문제