파이썬 & NumPy와 -이 작업을 수행하는 일반적인 방법을 찾고 있어요

ndarray

의 동적 임의의 부분 집합을 만듭니다파이썬 & NumPy와 -이 작업을 수행하는 일반적인 방법을 찾고 있어요

raw_data = np.array(somedata) 
filterColumn1 = raw_data[:,1] 
filterColumn2 = raw_data[:,3] 
cartesian_product = itertools.product(np.unique(filterColumn1), np.unique(filterColumn2)) 
for val1, val2 in cartesian_product: 
    fixed_mask = (filterColumn1 == val1) & (filterColumn2 == val2) 
    subset = raw_data[fixed_mask]

내가 filterColumns의 금액을 사용할 수 있어야합니다. 그래서 내가 원하는 것은 이것입니다 :

filterColumns = [filterColumn1, filterColumn2, ...] 
uniqueValues = map(np.unique, filterColumns) 
cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 
    variable_mask = ???? 
    subset = raw_data[variable_mask]

내가 원하는 것을하기위한 간단한 문법이 있습니까? 그렇지 않으면 다른 접근 방식을 시도해야합니까?

편집 :이 같은

cartesian_product = itertools.product(*uniqueValues) 
for combination in cartesian_product: 

    variable_mask = True 
    for idx, fc in enumerate(filterColumns): 
     variable_mask &= (fc == combination[idx]) 

    subset = raw_data[variable_mask]

출처

2014-10-03 Joe Bashe

당신은 당신의 필터 매트릭스 내에있는 경우 똑같은 일을하지만 간단한 방법은 특히 오버 numpy argsort 및 numpy roll을 통해이 있습니다이

filter_matrix = np.array(filterColumns) 
combination_array = np.array(combination) 
bool_matrix = filter_matrix == combination_array[newaxis, :] #not sure of the newaxis position 
subset = raw_data[bool_matrix]

에 대한 numpy.all 인덱스 방송을 사용할 수 있습니다 중심선. 먼저 필터를 첫 번째 열로 정렬 할 때까지 축을 축까지 회전시킨 다음 배열을 정렬하고 배열을 세로로 자른 다음 나머지 행렬을 얻습니다.

일반적으로 for 루프가 Python에서 피할 수있는 경우이를 피하는 것이 좋습니다.

업데이트 :

이
import numpy as np # select filtering indexes filter_indexes = [1, 3] # generate the test data raw_data = np.random.randint(0, 4, size=(50,5)) # create a column that we would use for indexing index_columns = raw_data[:, filter_indexes] # sort the index columns by lexigraphic order over all the indexing columns argsorts = np.lexsort(index_columns.T) # sort both the index and the data column sorted_index = index_columns[argsorts, :] sorted_data = raw_data[argsorts, :] # in each indexing column, find if number in row and row-1 are identical # then group to check if all numbers in corresponding positions in row and row-1 are identical autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1) # find out the breakpoints: these are the positions where row and row-1 are not identical breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1 # finally find the desired subsets subsets = np.split(sorted_data, breakpoints)

대안 구현이 문자열 매트릭스에 인덱싱 행렬을 변환하는 것, 행 방향으로 합 얻을 : 여기

가 for 루프없이 전체 코드입니다 위와 같이 이제는 고유 한 색인 열 및 분할에 대한 argsort.

처음에는 인덱스 행렬이 모두 행렬의 시작 부분에 올 때까지 먼저 굴려서 위의 정렬 작업이 명확 해지면 재미있을 수 있습니다.

출처

2014-10-03 13:28:47 chiffa

나는 당신의 답을 받아들이고 싶지만 모두가 그들의 머리 속에 n 차원 행렬을 회전시킬 수있는 것은 아닙니다. ;) 즉, 내 문제에 대해이 솔루션을 구현하는 방법을 잘 모르겠습니다. 나는 argsort와 rollaxis 문서를 조금 파 냈다. 그러나 부분 집합을 얻기 위해 그것들을 적용하는 방법은 나를 넘어있다. 다행히도 내 데이터가 너무 커서 루프가 잘 돌아 가지 않기 때문에 가능한 경우 루프를 피하는 것이 좋습니다. –

업데이트를 참조하십시오. 실제로 축 정렬의 인덱스 배열을 제공하는 argsot이 아니라 축에 대한 여러 개의 단일 요소와 축의 여러 요소에 대해 생각하고있는 lexsort입니다. D – chiffa

자세한 업데이트를 주셔서 대단히 감사합니다! 나는 당신의 논리를 지금 따르고 numpy에서의 데이터 조작에 대해 생각하는 더 좋은 방법을 배웠다. autocorrelation 및 breakpoints를 상당히 표준화하는 데 사용하는 방법입니까? 초보자가 코멘트없이 코드에서 수행중인 작업을 이해하는 것이 어려울 것으로 보입니다. –

뭔가를 작동하는 것 같군?

variable_mask = np.ones_like(filterColumns[0])  # select all rows initially 
for column, val in zip(filterColumns, combination): 
    variable_mask &= (column == val) 
subset = raw_data[variable_mask]

출처

2014-10-03 13:23:48 r3m0t

파이썬 & NumPy와 -이 작업을 수행하는 일반적인 방법을 찾고 있어요

답변

관련 문제