2017-05-15 6 views
1

죄송합니다. 저는 RapidMiner를 완전히 처음 접했고 기본 자습서 만 만들었습니다.급속 광부 행 최대

나는

MatchID Value1 Value2 Value3 
1   5  1  2 
1   4.5  1.5  2 
... 

같은 데이터 집합을 가지고 있고 (예를 들어 값 1을위한) 열 당 가장 높은 값을 얻고 (속성을 생성) 그것으로 추가 계산을 만들 수있는 possibilty이 있는지 알고 싶습니다.

감사합니다.

답변

3

여러 가지 방법이 있습니다. 여기에 Aggregate 연산자를 사용하여 최대 값을 찾고 Join을 원래 값에 연결하고 Generate Attributes을 계산하여 계산합니다.

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.2.003" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="retrieve" compatibility="7.2.003" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"> 
    <parameter key="repository_entry" value="//Samples/data/Iris"/> 
     </operator> 
     <operator activated="true" class="aggregate" compatibility="7.2.003" expanded="true" height="82" name="Aggregate" width="90" x="179" y="34"> 
    <parameter key="use_default_aggregation" value="true"/> 
    <parameter key="default_aggregation_function" value="maximum"/> 
    <list key="aggregation_attributes"/> 
     </operator> 
     <operator activated="true" class="join" compatibility="7.2.003" expanded="true" height="82" name="Join" width="90" x="313" y="34"> 
    <parameter key="join_type" value="outer"/> 
    <parameter key="use_id_attribute_as_key" value="false"/> 
    <list key="key_attributes"/> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.2.003" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34"> 
    <list key="function_descriptions"> 
     <parameter key="deltaA1" value="[maximum(a1)]-a1"/> 
     <parameter key="deltaA2" value="[maximum(a2)]-a2"/> 
     <parameter key="deltaA3" value="[maximum(a3)]-a3"/> 
     <parameter key="deltaA4" value="[maximum(a4)]-a4"/> 
    </list> 
     </operator> 
     <connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/> 
     <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/> 
     <connect from_op="Aggregate" from_port="original" to_op="Join" to_port="right"/> 
     <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/> 
     <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
    </process> 
    </operator> 
</process> 
0

또 다른 방법은 statisticsmax 설정하여 Extract Macro 연산자를 사용하는 것이다. 주어진 속성에 대한 최대 값을 매크로 값으로 저장합니다. Generate Attributes.

원래 데이터 집합을 수정하지 않고 join 또는 multiply 연산자를 사용하지 않아도된다는 장점이 있습니다.

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="retrieve" compatibility="7.5.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"> 
     <parameter key="repository_entry" value="//Samples/data/Iris"/> 
     </operator> 
     <operator activated="true" class="extract_macro" compatibility="7.5.000" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34"> 
     <parameter key="macro" value="maxA1"/> 
     <parameter key="macro_type" value="statistics"/> 
     <parameter key="statistics" value="max"/> 
     <parameter key="attribute_name" value="a1"/> 
     <list key="additional_macros"/> 
     <description align="center" color="transparent" colored="false" width="126">extract maximum of attribute a1 and store it in a macro</description> 
     </operator> 
     <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34"> 
     <list key="function_descriptions"> 
      <parameter key="DifferenceA1" value="parse(%{maxA1})-a1"/> 
     </list> 
     <description align="center" color="transparent" colored="false" width="126">calculate the difference of a1 from the maximum using the macro value</description> 
     </operator> 
     <connect from_op="Retrieve Iris" from_port="output" to_op="Extract Macro" to_port="example set"/> 
     <connect from_op="Extract Macro" from_port="example set" to_op="Generate Attributes" to_port="example set input"/> 
     <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
    </process> 
    </operator> 
</process> 

힌트 : 매크로 값이 텍스트로 저장됩니다 때문에, 먼저 자신의 수치를 사용하도록 parse에 있습니다.

0

세 번째 옵션은 예제 세트 Sort이고 예제는 최대 값이 Filter Example Range 인 경우에만 유지됩니다. 이것은 특정 속성이 최대 일 때 다른 속성의 값에 주로 관심이있는 경우 편리합니다.

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000"> 
    <context> 
    <input/> 
    <output/> 
    <macros/> 
    </context> 
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process"> 
    <process expanded="true"> 
     <operator activated="true" class="retrieve" compatibility="7.5.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34"> 
     <parameter key="repository_entry" value="//Samples/data/Iris"/> 
     </operator> 
     <operator activated="true" class="sort" compatibility="7.5.000" expanded="true" height="82" name="Sort" width="90" x="179" y="34"> 
     <parameter key="attribute_name" value="a1"/> 
     <parameter key="sorting_direction" value="decreasing"/> 
     <description align="center" color="transparent" colored="false" width="126">sorting the example set on a1 decreasing</description> 
     </operator> 
     <operator activated="true" class="filter_example_range" compatibility="7.5.000" expanded="true" height="82" name="Filter Example Range" width="90" x="313" y="34"> 
     <parameter key="first_example" value="1"/> 
     <parameter key="last_example" value="1"/> 
     <description align="center" color="transparent" colored="false" width="126">only keeping the first example, which has the maximum for a1</description> 
     </operator> 
     <connect from_op="Retrieve Iris" from_port="output" to_op="Sort" to_port="example set input"/> 
     <connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/> 
     <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/> 
     <portSpacing port="source_input 1" spacing="0"/> 
     <portSpacing port="sink_result 1" spacing="0"/> 
     <portSpacing port="sink_result 2" spacing="0"/> 
    </process> 
    </operator> 
</process>