CUDA를 사용하여 행렬의 각 요소의 순위

-1

CUDA 또는 NVidia에서 제공하는 것과 동일한 함수를 사용하여 행렬 요소의 순위를 개별적으로 찾는 방법이 있습니까?CUDA를 사용하여 행렬의 각 요소의 순위

출처

2017-02-01 Rasmi Ranjan Khansama

당신이 더 자세히 질문을 설명해주십시오 수 : 여기

은 추력을 사용하여 가능한 솔루션 접근 방식의 (비 최적화) 개요입니까? – Soeren

문제 세부 정보 : 예 : 행 요소 = [4,1,7,1], 순위 = [1,0,2,0] 동일한 값에 동일한 순위가 지정됩니다. –

CUDA 또는 익숙한 라이브러리에서 기본 제공 순위 또는 argsort 기능을 알지 못합니다.

예를 들어 추력을 사용하여 저수준 작업에서 이러한 기능을 작성할 수 있습니다.

$ cat t84.cu 
#include <thrust/device_vector.h> 
#include <thrust/copy.h> 
#include <thrust/sort.h> 
#include <thrust/sequence.h> 
#include <thrust/functional.h> 
#include <thrust/adjacent_difference.h> 
#include <thrust/transform.h> 
#include <thrust/iterator/permutation_iterator.h> 
#include <iostream> 

typedef int mytype; 

struct clamp 
{ 
    template <typename T> 
    __host__ __device__ 
    T operator()(T data){ 
    if (data == 0) return 0; 
    return 1;} 
}; 

int main(){ 

    mytype data[] = {4,1,7,1}; 
    int dsize = sizeof(data)/sizeof(data[0]); 
    thrust::device_vector<mytype> d_data(data, data+dsize); 
    thrust::device_vector<int> d_idx(dsize); 
    thrust::device_vector<int> d_result(dsize); 

    thrust::sequence(d_idx.begin(), d_idx.end()); 

    thrust::sort_by_key(d_data.begin(), d_data.end(), d_idx.begin(), thrust::less<mytype>()); 
    thrust::device_vector<int> d_diff(dsize); 
    thrust::adjacent_difference(d_data.begin(), d_data.end(), d_diff.begin()); 
    d_diff[0] = 0; 
    thrust::transform(d_diff.begin(), d_diff.end(), d_diff.begin(), clamp()); 
    thrust::inclusive_scan(d_diff.begin(), d_diff.end(), d_diff.begin()); 

    thrust::copy(d_diff.begin(), d_diff.end(), thrust::make_permutation_iterator(d_result.begin(), d_idx.begin())); 
    thrust::copy(d_result.begin(), d_result.end(), std::ostream_iterator<int>(std::cout, ",")); 
    std::cout << std::endl; 
} 

$ nvcc -arch=sm_61 -o t84 t84.cu 
$ ./t84 
1,0,2,0, 
$

출처

2017-02-06 16:38:16

감사합니다. 최적화되지 않은 이유는 무엇입니까? 내가 잘못하지 않았다면 당신의 솔루션은 벡터를 기반으로합니다. 위의 작업을 행렬로 수행하려면이 경우 솔루션이 작동합니까? pyCUDA에서 사용할 수 있습니까? –

이러한 기능을 만드는 여러 가지 방법을 모두 생각하지 않았기 때문에 최적화되지 않았으므로 더 최적의 방법이 있다고 상상해보십시오. 심지어 무엇이 보여 지더라도 성능을 향상시키기 위해 추력 융합을 영리하게 사용할 수 있습니다. 설명 된 방법은 컨셉 스케치와 같이 행 순위 지정 기능을 구현할 수있는 방법을 보여주기위한 것입니다. 한 번에 행렬 행을 처리하도록 확장하려면 추력 조작이 확장 될 수 있으므로 (추력보기를 참조하십시오) 수행 할 수 있다고 상상합니다. pyCUDA와 관련하여 google "thrust pycuda"를 사용하면 interop 예제를 찾을 수 있습니다. –

CUDA를 사용하여 행렬의 각 요소의 순위

답변

관련 문제