Parallelising MapReduce

저는 Parallel Programming과 Hadoop MapReduce를 처음 접했습니다. 다음 예제는 Tutorial 웹 사이트에서 선택되었습니다.Parallelising MapReduce

https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

어떻게 맵리 듀스는 맵퍼 및 감속기 여기에 멀티 스레딩 소개하는이 함께 실행할 수 있도록 및 방법 (병렬 프로그래밍을 적용) parallelise 하는가?

Mapper를 한 대의 컴퓨터에서 실행하고 Reducer를 다른 컴퓨터에서 동시에 실행할 수 있습니까?

매우 잘 설명 할 수없는 경우 사과드립니다.

package hadoop; 

import java.util.*; 

import java.io.IOException; 
import java.io.IOException; 

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.conf.*; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.util.*; 

public class ProcessUnits 
{ 
    //Mapper class 
    public static class E_EMapper extends MapReduceBase implements 
    Mapper<LongWritable ,/*Input key Type */ 
    Text,    /*Input value Type*/ 
    Text,    /*Output key Type*/ 
    IntWritable>  /*Output value Type*/ 
    { 

     //Map function 
     public void map(LongWritable key, Text value, 
     OutputCollector<Text, IntWritable> output, 
     Reporter reporter) throws IOException 
     { 
     String line = value.toString(); 
     String lasttoken = null; 
     StringTokenizer s = new StringTokenizer(line,"\t"); 
     String year = s.nextToken(); 

     while(s.hasMoreTokens()) 
      { 
       lasttoken=s.nextToken(); 
      } 

     int avgprice = Integer.parseInt(lasttoken); 
     output.collect(new Text(year), new IntWritable(avgprice)); 
     } 
    } 


    //Reducer class 
    public static class E_EReduce extends MapReduceBase implements 
    Reducer< Text, IntWritable, Text, IntWritable > 
    { 

     //Reduce function 
     public void reduce(Text key, Iterator <IntWritable> values, 
     OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException 
     { 
      int maxavg=30; 
      int val=Integer.MIN_VALUE; 

      while (values.hasNext()) 
      { 
       if((val=values.next().get())>maxavg) 
       { 
        output.collect(key, new IntWritable(val)); 
       } 
      } 

     } 
    } 


    //Main function 
    public static void main(String args[])throws Exception 
    { 
     JobConf conf = new JobConf(ProcessUnits.class); 

     conf.setJobName("max_eletricityunits"); 
     conf.setOutputKeyClass(Text.class); 
     conf.setOutputValueClass(IntWritable.class); 
     conf.setMapperClass(E_EMapper.class); 
     conf.setCombinerClass(E_EReduce.class); 
     conf.setReducerClass(E_EReduce.class); 
     conf.setInputFormat(TextInputFormat.class); 
     conf.setOutputFormat(TextOutputFormat.class); 

     FileInputFormat.setInputPaths(conf, new Path(args[0])); 
     FileOutputFormat.setOutputPath(conf, new Path(args[1])); 

     JobClient.runJob(conf); 
    } 
}

출처

2017-05-01 Novice Programmer

당신이 아주 잘 설명하면 나도 몰라,하지만 난 gen.Strash @ 즐거움 – strash

로 게시물의 대답을 읽을 수 있도록 나도 같은 하둡 수준에서 고군분투하고있다. Yeh 미안, 나는 그것을 올바르게 설명하려고 노력했다. 그러나 많은 지식을 가지지 않고 있기 때문에 방해가되었다. 우리가 해답을 얻길 바랍니다. –

Hadoop이 작업을 병렬 처리합니다. 당신은 hadoop jar을 실행하는 것 이외의 다른 것을 할 필요가 없습니다. 일반 MapReduce에 대해서는

하면 reduce가 map의 결과에 의존하기 때문에 map 위상 reduce 위상 (평행하지)를 순차적으로 발생하는 것을 염두에 두어야한다. 그러나 여러 개의 mappers을 병렬로 처리 할 수 있으며 완료되면 여러 개의 reducers이 병렬로 처리됩니다 (물론 작업에 따라 다름). 다시, hadoop은 당신을 위해 그것들을 시작하고 조정할 것입니다.

출처

2017-05-01 13:44:34

답변

관련 문제