상관 행렬을 계산하기 위해 rapidminer가 사용하는 방법과 두 가지 범주/명목 속성에 대해 음의 상관 관계를 갖는 이유는 무엇입니까?

나는 내가 붙어있어 누군가가 나를 위해 대답 할 수 있기를 바라고있다.상관 행렬을 계산하기 위해 rapidminer가 사용하는 방법과 두 가지 범주/명목 속성에 대해 음의 상관 관계를 갖는 이유는 무엇입니까?

상관 행렬에서 rapidminer는 어떤 방법론을 사용합니까? 모든 데이터 조합에 대해 좋지만 가장 중요한 것은 명목상/범주 형 데이터 세트에 대한 것입니다.

상관 행렬을 생성하기 위해 rapidminer를 사용하고 있고 숫자, 이항, 다항 등으로 모든 속성을 적절하게 레이블 지정하는 데주의를 기울였습니다. 행렬이 공칭/명목 조합의 일부에 대해 음의 상관 관계를 보여주는 것으로 나타났습니다. 이것은 내가 보통 생각할 수있는 방법 (Phi, Cramer 's V, Contingency Coefficient)을 바탕으로 계산할 수있는 방법을 기반으로하지 않습니다. 상관 관계가 이러한 테스트에 긍정적이어야한다고 생각했는데 성별과 도시와 같은 카테고리 간에는 "음의"상관 관계가있는 것이 데이터의 순서를 제안하는 것처럼 의미가 없습니다.

다른 테스트가 사용되었거나 더미 코딩이 있습니까? 그리고 만약 더미 코딩이 사용된다면 그 값은 얼마나 신뢰할 수 있습니까?

나를 도울 수있는 사람에게 미리 감사드립니다. 내가 길을 잃었을 때 싫어하지만, 여기에지도가 필요하다.

출처

2016-07-14 schradera

나는 공칭 값을 포함하는 예제 집합에 대한 상관 행렬을 계산하고 동일한 예제 집합에 대해 다시 XML을 포함했다. 명목은 숫자로 변환됩니다. 이 프로세스는 명목이 간단한 숫자로 변환 될 때 동일한 행렬을 생성합니다. 즉, value1은 0이되고 value2는 1이됩니다.

Correlation Matrix 연산자의 도움으로 각 속성 값이 해당 속성의 평균값에서)니다. 이 차이는 속성 쌍에 대해 곱해지고 모든 예제에 대해 합산됩니다. 그런 다음 예제 1의 수와 속성 쌍의 표준 편차의 곱으로 나눕니다. 필자는 스프레드 시트에서 계산을 다시 만들었 기 때문에 사용 된 표준 편차가 모집단보다는 표본에 대한 것임을 알고 있습니다.

여기

<?xml version="1.0" encoding="UTF-8" standalone="no"?> 
<process version="7.1.001"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="generate_nominal_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Nominal Data" width="90" x="45" y="85"> 
     <parameter key="number_examples" value="20"/> 
     <parameter key="number_of_attributes" value="3"/> 
     <parameter key="number_of_values" value="3"/> 
     </operator> 
     <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85"> 
     <parameter key="attribute_filter_type" value="subset"/> 
     <parameter key="attributes" value="label"/> 
     <parameter key="invert_selection" value="true"/> 
     <parameter key="include_special_attributes" value="true"/> 
     </operator> 
     <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="313" y="85"/> 
     <operator activated="true" class="nominal_to_numerical" compatibility="7.1.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="447" y="289"> 
     <parameter key="coding_type" value="unique integers"/> 
     <list key="comparison_groups"/> 
     </operator> 
     <operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix" width="90" x="581" y="85"/> 
     <operator activated="true" class="correlation_matrix" compatibility="7.1.001" expanded="true" height="103" name="Correlation Matrix (2)" width="90" x="581" y="289"/> 
     <connect from_op="Generate Nominal Data" from_port="output" to_op="Select Attributes" to_port="example set input"/> 
     <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/> 
     <connect from_op="Multiply" from_port="output 1" to_op="Correlation Matrix" to_port="example set"/> 
     <connect from_op="Multiply" from_port="output 2" to_op="Nominal to Numerical" to_port="example set input"/> 
     <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Correlation Matrix (2)" to_port="example set"/> 
     <connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/> 
     <connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/> 
     <connect from_op="Correlation Matrix (2)" from_port="example set" to_port="result 3"/> 
     <connect from_op="Correlation Matrix (2)" from_port="matrix" to_port="result 4"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
     <portSpacing port="sink_result 3" spacing="0"/> 
     <portSpacing port="sink_result 4" spacing="0"/> 
     <portSpacing port="sink_result 5" spacing="0"/> 
    </process> 
    </operator> 
</process>

그것이 시작으로 도움이되기를 바랍니다 과정입니다.

출처

2016-07-14 22:56:56 awchisholm

감사합니다. 훌륭한 시작입니다. 나는 그들이 위에서 언급 한 방법 중 하나와 반대되는 코딩을 사용하고 있다고 의심했지만 예상 한대로 출력을 사용하기 전에 확신해야했습니다. XML을 볼 수 있다는 것은 깔끔한 트릭입니다. 어떻게 그 일을 할 수 있었습니까? – schradera

XML은 RapidMiner Studio의 메뉴 옵션 중 하나의보기입니다. 전체 XML을보기로 복사하고 작은 눈금 아이콘을 클릭하여 유효성을 확인한 다음 편집 할 수있는 프로세스 일 뿐이므로 다른 사람들과 프로세스를 공유 할 수 있습니다. – awchisholm

상관 행렬을 계산하기 위해 rapidminer가 사용하는 방법과 두 가지 범주/명목 속성에 대해 음의 상관 관계를 갖는 이유는 무엇입니까?

답변

관련 문제