Hi, Habr! In previous articles, we
described the MapReduce paradigm , and also
showed how to implement and execute a MapReduce application on the Hadoop stack
in practice . It is time to describe the various techniques that allow you to effectively use MapReduce to solve practical problems, as well as to show some features of Hadoop, which allow you to simplify the development or significantly speed up the implementation of the MapReduce task on a cluster.

Map only job
As we remember, MapReduce consists of Map, Shuffle and Reduce stages. As a rule, in practical tasks
the Shuffle stage is the hardest , since data is sorted at this stage. In fact, there are a number of tasks in which you can get by with only the Map stage. Here are examples of such tasks:
- Filtering data (for example, βFind all records from IP address 123.123.123.123β in the web server logs);
- Data conversion (βDelete column in csv-logsβ);
- Loading and unloading data from an external source ("Insert all records from the log into the database").
Such tasks are solved with the help of Map-Only. When creating a Map-Only task in Hadoop, you need to specify the zero number of reducer'ov:
')
Map Only JobConfiguration example map-only tasks on hadoop:
Native interface | Hadoop Streaming Interface |
Specify zero number of reducer when configuring job'a:
job.setNumReduceTasks(0);
A more extensive example of the link . | We do not specify a reducer and indicate zero number of reducer. Example:
hadoop jar hadoop-streaming.jar \ -D mapred.reduce.tasks=0\ -input input_dir\ -output output_dir\ -mapper "python mapper.py"\ -file "mapper.py" |
Map Only jobs can actually be very useful. For example, in the
Facetz.DCA platform
, to identify the characteristics of users by their behavior, it is exactly one large map-only that is used, each mapper accepts user input and gives its characteristics to the output.
Combine
As I already wrote, usually the hardest stage when performing a Map-Reduce task is the shuffle stage. This happens because the intermediate results (output mapper'a) are recorded on disk, sorted and transmitted over the network. However, there are tasks in which this behavior does not seem very reasonable. For example, in the same task of counting words in documents, you can pre-aggregate the results of outputs of several mappers on one map-reduce node of the task, and transfer to the reducer the already accumulated values ββfor each machine.
Combine. Taken on the linkIn hadoop for this, you can define a combining function that will handle the output of the mapper part. The combining function is very similar to reduce - it accepts the output of a part of mappers and gives an aggregated result for these mappers, so the reducer is often used as a combiner. An important difference from reduce is
that not all values ββcorresponding to the same key fall on the combining function.Moreover, hadoop does not guarantee that the combining function will be executed at all to exit the mapper. Therefore, the combining function is not always applicable, for example, in the case of searching for a median value by key. Nevertheless, in those tasks where the combining function is applicable, its use allows to achieve a significant increase in the speed of the MapReduce-task.
Using Combiner on hadoop:
Native interface | Hadoop streaming |
When configuring job-a, specify the Combiner class. As a rule, it coincides with Reducer:
job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); | In the command line parameters specify the command -combiner. As a rule, this command is the same as the reducer command. Example:
hadoop jar hadoop-streaming.jar \ -input input_dir\ -output output_dir\ -mapper "python mapper.py"\ -reducer "python reducer.py"\ -combiner "python reducer.py"\ -file "mapper.py"\ -file "reducer.py"\ |
MapReduce task chains
There are situations when, for solving a problem, MapReduce is indispensable. For example, consider a slightly modified WordCount task: there is a set of text documents, you need to count how many words occur from 1 to 1000 times in a set, how many words from 1001 to 2000, how many from 2001 to 3000, and so on.
To solve this, we need 2 MapReduce jobs:
- Modified wordcount, which for each word will calculate in which of the intervals it fell;
- MapReduce, counting how many times in the output of the first MapReduce each of the intervals occurred.
Pseudocode solution:
#map1 def map(doc): for word in doc: yield word, 1 | #reduce1 def reduce(word, values): yield int(sum(values)/1000), 1 |
#map2 def map(doc): interval, cnt = doc.split() yield interval, cnt | #reduce2 def reduce(interval, values): yield interval*1000, sum(values) |
In order to perform a sequence of MapReduce-tasks on hadoop, it is rather simple to specify the folder that was specified as output for the first one as input data for the second task and start them in turn.
In practice, chains of MapReduce tasks can be quite complex sequences in which MapReduce tasks can be connected both in series and in parallel with each other. To simplify the management of such execution plans, there are separate tools such as
oozie and
luigi , which will be covered in a separate article in this cycle.
An example of a chain of MapReduce-tasks.Distributed cache
An important mechanism in Hadoop is Distributed Cache. Distributed Cache allows you to add files (for example, text files, archives, jar files) to the environment in which the MapReduce task is performed.
You can add files stored on HDFS, local files (local for the machine from which the task is run). I have already implicitly shown how to use Distributed Cache along with hadoop streaming: adding the mapper.py and reducer.py files through the
-file option. In fact, you can add not only mapper.py and reducer.py, but arbitrary files in general, and then use them as if they are in a local folder.
Using Distributed Cache:
Native API |
// Job'a JobConf job = new JobConf(); DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); // mapper-e: public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> { private Path[] localArchives; private Path[] localFiles; public void configure(JobConf job) { // File f = new File("./map.zip/some/file/in/zip.txt"); } public void map(K key, V value, OutputCollector<K, V> output, Reporter reporter) throws IOException { // // ... // ... output.collect(k, v); } } |
Hadoop streaming |
# , distributed cache βfiles. βfiles . yarn hadoop-streaming.jar\ -files mapper.py,reducer.py,some_cached_data.txt\ -input '/some/input/path' \ -output '/some/output/path' \ -mapper 'python mapper.py' \ -reducer 'python reducer.py' \ : import sys # data = open('some_cached_data.txt').read() for line in sys.stdin() #processing input #use data here |
Reduce join
Those who are used to working with relational databases often use the very convenient Join operation, which allows them to jointly process the contents of some tables by combining them according to some key. When working with big data, this problem also sometimes arises. Consider the following example:
There are logs of two web servers, each log has the following form:
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase
\t\t. :
1446792139 178.78.82.1 /sphingosine/unhurrying.css 1446792139 126.31.163.222 /accentually.js 1446792139 154.164.149.83 /pyroacid/unkemptly.jpg 1446792139 202.27.13.181 /Chawia.js 1446792139 67.123.248.174 /morphographical/dismain.css 1446792139 226.74.123.135 /phanerite.php 1446792139 157.109.106.104 /bisonant.css
IP- 2- . : \t. :
178.78.82.1 first 126.31.163.222 second 154.164.149.83 second 226.74.123.135 first
, , ( β IP-) 3- MapReduce Reduce Join:
ReduceJoin
ReduceJoin :
MapReduce (Map only), :
key -> (type, value)
key β , , Type β (first second ), Value β , .
MapReduce 3- MapReduce, , , . MapReduce Mapper, . shuffle :
key -> [(type, value)]
, type , . , . reducere , type type.
MapJoin
ReduceJoin . , . , , . :
2 . web-c ( ), ( 100) URL-> . 2- :
/toyota.php auto /football/spartak.html sport /cars auto /finances/money business
IP- IP- .
Join 2- URL. 3 MapReduce, . , 1- MapReduce, Distributed Cache, Mapper'a , -> topic.
:
Map:
# </em> input_line -> [ip, topic]
Reduce:
Ip -> [topics] -> [ip, most_popular_topic]
Reduce ip , , . 1- MapReduce, Join map ( β MapOnly job-):
MapJoin
MapReduce, , MapReduce- join- .
Hadoop, , MapReduce .
Youtube-
Β» 1: , MapReduce
Β» 2: Hadoop
Β» 4: Hbase