This article will serve as a practical guide to building, initial configuration, and testing the health of Hadoop beginners administrators. We will analyze how to build Hadoop from source, configure, run and verify that everything works as it should. In the article you will not find the theoretical part. If you have not come across Hadoop before, you donβt know what parts it consists of and how they interact, here are a couple of useful links to official documentation:cd ~ wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u112-b15/jdk-8u112-linux-x64.tar.gz tar xvf ~/jdk-8u112-linux-x64.tar.gz mv ~/jdk1.8.0_112 /opt/java echo "PATH=\"/opt/java/bin:\$PATH\"" >> ~/.bashrc echo "export JAVA_HOME=\"/opt/java\"" >> ~/.bashrc cd ~ wget http://apache.rediris.es/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz tar xvf ~/apache-maven-3.3.9-bin.tar.gz mv ~/apache-maven-3.3.9 ~/maven echo "PATH=\"/root/maven/bin:\$PATH\"" >> ~/.bashrc source ~/.bashrc yum -y install gcc gcc-c++ autoconf automake libtool cmake yum -y install zlib-devel openssl openssl-devel snappy snappy-devel bzip2 bzip2-devel protobuf protobuf-devel cd ~ wget http://apache.rediris.es/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz tar -xvf ~/hadoop-2.7.3-src.tar.gz mv ~/hadoop-2.7.3-src ~/hadoop-src cd ~/hadoop-src mvn package -Pdist,native -DskipTests -Dtar tar -C/opt -xvf ~/hadoop-src/hadoop-dist/target/hadoop-2.7.3.tar.gz mv /opt/hadoop-* /opt/hadoop echo "PATH=\"/opt/hadoop/bin:\$PATH\"" >> ~/.bashrc source ~/.bashrc sed -i '1iJAVA_HOME=/opt/java' /opt/hadoop/etc/hadoop/hadoop-env.sh sed -i '1iJAVA_HOME=/opt/java' /opt/hadoop/etc/hadoop/yarn-env.sh cat << EOF > /opt/hadoop/etc/hadoop/core-site.xml <configuration> <property><name>fs.defaultFS</name><value>hdfs://localhost:8020</value></property> </configuration> EOF cat << EOF > /opt/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property><name>dfs.replication</name><value>1</value></property> <property><name>dfs.namenode.name.dir</name><value>/data/dfs/nn</value></property> <property><name>dfs.datanode.data.dir</name><value>/data/dfs/dn</value></property> <property><name>dfs.namenode.checkpoint.dir</name><value>/data/dfs/snn</value></property> </configuration> EOF cat << EOF > /opt/hadoop/etc/hadoop/yarn-site.xml <configuration> <property><name>yarn.resourcemanager.hostname</name><value>localhost</value></property> <property><name>yarn.nodemanager.resource.memory-mb</name><value>4096</value></property> <property><name>yarn.nodemanager.resource.cpu-vcores</name><value>4</value></property> <property><name>yarn.scheduler.maximum-allocation-mb</name><value>1024</value></property> <property><name>yarn.scheduler.maximum-allocation-vcores</name><value>1</value></property> <property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property> <property><name>yarn.nodemanager.local-dirs</name><value>/data/yarn</value></property> <property><name>yarn.nodemanager.log-dirs</name><value>/data/yarn/log</value></property> <property><name>yarn.log-aggregation-enable</name><value>true</value></property> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property> <property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property> </configuration> EOF cat << EOF > /opt/hadoop/etc/hadoop/mapred-site.xml <configuration> <property><name>mapreduce.framework.name</name><value>yarn</value></property> <property><name>mapreduce.jobhistory.address</name><value>localhost:10020</value></property> <property><name>mapreduce.jobhistory.webapp.address</name><value>localhost:19888</value></property> <property><name>mapreduce.job.reduce.slowstart.completedmaps</name><value>0.8</value></property> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>1</value></property> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>1024</value></property> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> <property><name>mapreduce.map.cpu.vcores</name><value>1</value></property> <property><name>mapreduce.map.memory.mb</name><value>1024</value></property> <property><name>mapreduce.map.java.opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> <property><name>mapreduce.reduce.cpu.vcores</name><value>1</value></property> <property><name>mapreduce.reduce.memory.mb</name><value>1024</value></property> <property><name>mapreduce.reduce.java.opts</name><value>-Djava.net.preferIPv4Stack=true -Xmx768m</value></property> </configuration> EOF mkdir /data hadoop namenode -format /opt/hadoop/sbin/hadoop-daemon.sh start namenode /opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager /opt/hadoop/sbin/yarn-daemon.sh start nodemanager /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver /opt/hadoop/ βββ bin βββ etc β βββ hadoop βββ include βββ lib β βββ native βββ libexec βββ logs βββ sbin βββ share βββ doc β βββ hadoop βββ hadoop βββ common βββ hdfs βββ httpfs βββ kms βββ mapreduce βββ tools βββ yarn hdfs dfs -ls -R / drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history/done drwxrwxrwt - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history/done_intermediate /data/ βββ dfs β βββ dn β β βββ current β β β βββ BP-1600342399-192.168.122.70-1483626613224 β β β β βββ current β β β β β βββ finalized β β β β β βββ rbw β β β β β βββ VERSION β β β β βββ scanner.cursor β β β β βββ tmp β β β βββ VERSION β β βββ in_use.lock β βββ nn β βββ current β β βββ edits_inprogress_0000000000000000001 β β βββ fsimage_0000000000000000000 β β βββ fsimage_0000000000000000000.md5 β β βββ seen_txid β β βββ VERSION β βββ in_use.lock βββ yarn βββ filecache βββ log βββ nmPrivate βββ usercache hdfs dfs -put /var/log/messages /tmp/ hdfs dfs -ls /tmp/messages -rw-r--r-- 1 root supergroup 375974 2017-01-05 09:33 /tmp/messages /data/dfs/dn βββ current β βββ BP-1600342399-192.168.122.70-1483626613224 β β βββ current β β β βββ finalized β β β β βββ subdir0 β β β β βββ subdir0 β β β β βββ blk_1073741825 β β β β βββ blk_1073741825_1001.meta β β β βββ rbw β β β βββ VERSION β β βββ scanner.cursor β β βββ tmp β βββ VERSION βββ in_use.lock yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 3 100000 β¦ Job Finished in 37.837 seconds Estimated value of Pi is 3.14168000000000000000 /data/yarn/ βββ filecache βββ log β βββ application_1483628783579_0001 β βββ container_1483628783579_0001_01_000001 β β βββ stderr β β βββ stdout β β βββ syslog β βββ container_1483628783579_0001_01_000002 β β βββ stderr β β βββ stdout β β βββ syslog β βββ container_1483628783579_0001_01_000003 β β βββ stderr β β βββ stdout β β βββ syslog β βββ container_1483628783579_0001_01_000004 β βββ stderr β βββ stdout β βββ syslog βββ nmPrivate β βββ application_1483628783579_0001 β βββ container_1483628783579_0001_01_000001 β β βββ container_1483628783579_0001_01_000001.pid β β βββ container_1483628783579_0001_01_000001.tokens β β βββ launch_container.sh β βββ container_1483628783579_0001_01_000002 β β βββ container_1483628783579_0001_01_000002.pid β β βββ container_1483628783579_0001_01_000002.tokens β β βββ launch_container.sh β βββ container_1483628783579_0001_01_000003 β β βββ container_1483628783579_0001_01_000003.pid β β βββ container_1483628783579_0001_01_000003.tokens β β βββ launch_container.sh β βββ container_1483628783579_0001_01_000004 β βββ container_1483628783579_0001_01_000004.pid β βββ container_1483628783579_0001_01_000004.tokens β βββ launch_container.sh βββ usercache βββ root βββ appcache β βββ application_1483628783579_0001 β βββ container_1483628783579_0001_01_000001 β β βββ container_tokens β β βββ default_container_executor_session.sh β β βββ default_container_executor.sh β β βββ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar β β βββ jobSubmitDir β β β βββ job.split -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/12/job.split β β β βββ job.splitmetainfo -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/10/job.splitmetainfo β β βββ job.xml -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/13/job.xml β β βββ launch_container.sh β β βββ tmp β β βββ Jetty_0_0_0_0_37883_mapreduce____.rposvq β β βββ webapp β β βββ webapps β β βββ mapreduce β βββ container_1483628783579_0001_01_000002 β β βββ container_tokens β β βββ default_container_executor_session.sh β β βββ default_container_executor.sh β β βββ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar β β βββ job.xml β β βββ launch_container.sh β β βββ tmp β βββ container_1483628783579_0001_01_000003 β β βββ container_tokens β β βββ default_container_executor_session.sh β β βββ default_container_executor.sh β β βββ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar β β βββ job.xml β β βββ launch_container.sh β β βββ tmp β βββ container_1483628783579_0001_01_000004 β β βββ container_tokens β β βββ default_container_executor_session.sh β β βββ default_container_executor.sh β β βββ job.jar -> /data/yarn/usercache/root/appcache/application_1483628783579_0001/filecache/11/job.jar β β βββ job.xml β β βββ launch_container.sh β β βββ tmp β βββ filecache β β βββ 10 β β β βββ job.splitmetainfo β β βββ 11 β β β βββ job.jar β β β βββ job.jar β β βββ 12 β β β βββ job.split β β βββ 13 β β βββ job.xml β βββ work βββ filecache 42 directories, 50 files /data/yarn/ βββ filecache βββ log βββ nmPrivate βββ usercache βββ root βββ appcache βββ filecache hdfs dfs -ls -R / drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging drwxrwx--- - root supergroup 0 2017-01-05 10:07 /tmp/hadoop-yarn/staging/history drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05 drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000 -rwxrwx--- 1 root supergroup 46338 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000/job_1483628783579_0001-1483629144632-root-QuasiMonteCarlo-1483629179995-3-1-SUCCEEDED-default-1483629156270.jhist -rwxrwx--- 1 root supergroup 117543 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done/2017/01/05/000000/job_1483628783579_0001_conf.xml drwxrwxrwt - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging/history/done_intermediate drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/history/done_intermediate/root drwx------ - root supergroup 0 2017-01-05 10:12 /tmp/hadoop-yarn/staging/root drwx------ - root supergroup 0 2017-01-05 10:13 /tmp/hadoop-yarn/staging/root/.staging drwxrwxrwt - root supergroup 0 2017-01-05 10:12 /tmp/logs drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/logs/root drwxrwx--- - root supergroup 0 2017-01-05 10:12 /tmp/logs/root/logs drwxrwx--- - root supergroup 0 2017-01-05 10:13 /tmp/logs/root/logs/application_1483628783579_0001 -rw-r----- 1 root supergroup 65829 2017-01-05 10:13 /tmp/logs/root/logs/application_1483628783579_0001/master.local_37940 drwxr-xr-x - root supergroup 0 2017-01-05 10:12 /user drwxr-xr-x - root supergroup 0 2017-01-05 10:13 /user/root cd /opt/hadoop/etc/hadoop sed -i 's/localhost/master.local/' core-site.xml hdfs-site.xml yarn-site.xml mapred-site.xml cat /etc/hosts β¦ 192.168.122.70 master.local 192.168.122.59 slave1.local 192.168.122.217 slave2.local rm -rf /data/dfs/dn/* /opt/hadoop/sbin/hadoop-daemon.sh start namenode /opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager /opt/hadoop/sbin/yarn-daemon.sh start nodemanager /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver /opt/hadoop/sbin/hadoop-daemon.sh start datanode /opt/hadoop/sbin/yarn-daemon.sh start nodemanager hdfs dfsadmin -report ... Live datanodes (3): β¦ Name: 192.168.122.70:50010 (master.local) ... Name: 192.168.122.59:50010 (slave1.local) ... Name: 192.168.122.217:50010 (slave2.local) 
yarn node -list -all 17/01/06 06:17:52 INFO client.RMProxy: Connecting to ResourceManager at master.local/192.168.122.70:8032 Total Nodes:3 Node-Id Node-State Node-Http-Address Number-of-Running-Containers slave2.local:39694 RUNNING slave2.local:8042 0 slave1.local:36880 RUNNING slave1.local:8042 0 master.local:44373 RUNNING master.local:8042 0 
yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 30 1000 ... Node-Id Node-State Node-Http-Address Number-of-Running-Containers slave2.local:39694 RUNNING slave2.local:8042 4 slave1.local:36880 RUNNING slave1.local:8042 4 master.local:44373 RUNNING master.local:8042 4 
Source: https://habr.com/ru/post/319048/
All Articles