前回の記事「Apache Hadoop 2.5.0 セットアップ手順 その1 – ローカル実行からシングルノードクラスター起動まで」で、Apache Hadoop 2.xのセットアップを公式ドキュメントに沿って確認したので、今回は複数ノードに分散させたクラスターを組んでみます。

環境は、Ubuntu 14.04です。

各ノードにHadoopをインストール

前回の記事の「Hadoopのインストール」までの手順を各ノードで行います。

各ノードからホスト名でアクセスできるように設定

クラスターに参加する全ノードの/etc/hostsに各ノードを指定します。

192.168.33.11 master
192.168.33.12 slave01


hostnameも設定しておきます。

$ sudo hostname master

再起同時のために/etc/hostnameにもセット

master

注意
localhostの設定行は、削除するかコメントアウトします。

127.0.0.1 localhost

この行が残っていると以下の例外がスローされ結果がマスターに通知されません。

java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "slave01/192.168.33.12"; destination host is: "master":9000; 

Masterから各Slaveノードにパスフレーズ無しでssh接続できるように設定

スレーブノードのDataNodeおよびNodeManagerデーモンは、マスターノードからパスフレーズ無しでssh接続して起動されます。

masterからslave01へのアクセスを設定する場合

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub
ssh-dss AAAAB3...jmyLDA== vagrant@master

id_dsa.pushの内容をslave01の.ssh/authorized_keysに追加

$ echo "ssh-dss AAAAB3...jmyLDA== vagrant@master"  >> ~/.ssh/authorized_keys

Hadoop用の環境変数を追加

以下の環境変数をセットします。

export HADOOP_HOME=/path/to/hadoop-2.5.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Hadoop設定ファイル

etc/hadoop

マスターノード
etc/hadoop/savlesにスレーブノード(DataNodeとNodeManager)のホスト名を改行区切で指定します。

slave01
slave02

全ノード共通

<?xml version="1.0" encoding="UTF-8"?>
...
<configuration>
   <property>
        <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>
<?xml version="1.0" encoding="UTF-8"?>
...
<configuration>
   <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>     
        <name>dfs.permissions</name>     
        <value>false</value>
   </property>
</configuration>
<?xml version="1.0"?>
...

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
<?xml version="1.0"?>
...
<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
</configuration>

HDFSをフォーマット

$ hadoop namenode -format

Hadoopデーモンの起動

HDFSデーモンの起動
マスターノード上ではNameNodeとSecondaryNameNodeを起動し、スレーブノード上ではDataNodeを起動します。

$ $HADOOP_HOME/sbin/start-dfs.sh

YARNデーモンの起動
マスターノード上ではResourceManagerを起動し、スレーブノード上ではNodeManagerを起動します。

$ $HADOOP_HOME/sbin/start-yarn.sh

起動されたデーモンを確認
マスターノード

$ jps
3453 Jps
2838 NameNode
3196 ResourceManager
3047 SecondaryNameNode

スレーブノード

$ jps
2793 NodeManager
2645 DataNode
2884 Jps

MapReduceのサンプルを実行

masterノード上でgrepの例を実行して動作を確認します。

$ hadoop fs -mkdir /user
$ hadoop fs -mkdir /user/<username>
$ cd hadoop-2.5.0
$ hadoop fs -put etc/hadoop input
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+'
14/09/15 11:43:43 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.11:8032
14/09/15 11:43:44 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/09/15 11:43:44 INFO input.FileInputFormat: Total input paths to process : 26
14/09/15 11:43:44 INFO mapreduce.JobSubmitter: number of splits:26
14/09/15 11:43:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410780302807_0003
14/09/15 11:43:45 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/15 11:43:45 INFO impl.YarnClientImpl: Submitted application application_1410780302807_0003
14/09/15 11:43:45 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1410780302807_0003/
14/09/15 11:43:45 INFO mapreduce.Job: Running job: job_1410780302807_0003
14/09/15 11:44:00 INFO mapreduce.Job: Job job_1410780302807_0003 running in uber mode : false
14/09/15 11:44:00 INFO mapreduce.Job:  map 0% reduce 0%
14/09/15 11:44:51 INFO mapreduce.Job:  map 23% reduce 0%
14/09/15 11:45:31 INFO mapreduce.Job:  map 27% reduce 0%
14/09/15 11:45:32 INFO mapreduce.Job:  map 46% reduce 0%
14/09/15 11:46:08 INFO mapreduce.Job:  map 46% reduce 15%
14/09/15 11:46:17 INFO mapreduce.Job:  map 65% reduce 15%
14/09/15 11:46:18 INFO mapreduce.Job:  map 65% reduce 22%
14/09/15 11:46:54 INFO mapreduce.Job:  map 69% reduce 22%
14/09/15 11:46:55 INFO mapreduce.Job:  map 85% reduce 22%
14/09/15 11:46:58 INFO mapreduce.Job:  map 85% reduce 28%
14/09/15 11:47:25 INFO mapreduce.Job:  map 88% reduce 28%
14/09/15 11:47:26 INFO mapreduce.Job:  map 92% reduce 28%
14/09/15 11:47:27 INFO mapreduce.Job:  map 92% reduce 29%
14/09/15 11:47:30 INFO mapreduce.Job:  map 100% reduce 31%
14/09/15 11:47:32 INFO mapreduce.Job:  map 100% reduce 100%
14/09/15 11:47:32 INFO mapreduce.Job: Job job_1410780302807_0003 completed successfully
14/09/15 11:47:32 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=371
		FILE: Number of bytes written=2622985
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=65917
		HDFS: Number of bytes written=469
		HDFS: Number of read operations=81
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=27
		Launched reduce tasks=1
		Data-local map tasks=27
		Total time spent by all maps in occupied slots (ms)=1070384
		Total time spent by all reduces in occupied slots (ms)=119112
		Total time spent by all map tasks (ms)=1070384
		Total time spent by all reduce tasks (ms)=119112
		Total vcore-seconds taken by all map tasks=1070384
		Total vcore-seconds taken by all reduce tasks=119112
		Total megabyte-seconds taken by all map tasks=1096073216
		Total megabyte-seconds taken by all reduce tasks=121970688
	Map-Reduce Framework
		Map input records=1625
		Map output records=25
		Map output bytes=614
		Map output materialized bytes=521
		Input split bytes=3128
		Combine input records=25
		Combine output records=14
		Reduce input groups=12
		Reduce shuffle bytes=521
		Reduce input records=14
		Reduce output records=12
		Spilled Records=28
		Shuffled Maps =26
		Failed Shuffles=0
		Merged Map outputs=26
		GC time elapsed (ms)=12836
		CPU time spent (ms)=17720
		Physical memory (bytes) snapshot=5688336384
		Virtual memory (bytes) snapshot=21223825408
		Total committed heap usage (bytes)=3586301952
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=62789
	File Output Format Counters 
		Bytes Written=469
14/09/15 11:47:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.11:8032
14/09/15 11:47:32 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/09/15 11:47:32 INFO input.FileInputFormat: Total input paths to process : 1
14/09/15 11:47:32 INFO mapreduce.JobSubmitter: number of splits:1
14/09/15 11:47:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410780302807_0004
14/09/15 11:47:32 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/15 11:47:32 INFO impl.YarnClientImpl: Submitted application application_1410780302807_0004
14/09/15 11:47:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1410780302807_0004/
14/09/15 11:47:32 INFO mapreduce.Job: Running job: job_1410780302807_0004
14/09/15 11:47:49 INFO mapreduce.Job: Job job_1410780302807_0004 running in uber mode : false
14/09/15 11:47:49 INFO mapreduce.Job:  map 0% reduce 0%
14/09/15 11:48:00 INFO mapreduce.Job:  map 100% reduce 0%
14/09/15 11:48:09 INFO mapreduce.Job:  map 100% reduce 100%
14/09/15 11:48:10 INFO mapreduce.Job: Job job_1410780302807_0004 completed successfully
14/09/15 11:48:11 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=317
		FILE: Number of bytes written=193785
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=599
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=9597
		Total time spent by all reduces in occupied slots (ms)=7410
		Total time spent by all map tasks (ms)=9597
		Total time spent by all reduce tasks (ms)=7410
		Total vcore-seconds taken by all map tasks=9597
		Total vcore-seconds taken by all reduce tasks=7410
		Total megabyte-seconds taken by all map tasks=9827328
		Total megabyte-seconds taken by all reduce tasks=7587840
	Map-Reduce Framework
		Map input records=12
		Map output records=12
		Map output bytes=287
		Map output materialized bytes=317
		Input split bytes=130
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=317
		Reduce input records=12
		Reduce output records=12
		Spilled Records=24
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=162
		CPU time spent (ms)=1560
		Physical memory (bytes) snapshot=320286720
		Virtual memory (bytes) snapshot=1578164224
		Total committed heap usage (bytes)=168497152
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=469
	File Output Format Counters 
		Bytes Written=215

結果が正しく出力されているか確認します。

$ hadoop fs -cat output/*
6	dfs.audit.logger
4	dfs.class
3	dfs.server.namenode.
2	dfs.period
2	dfs.audit.log.maxfilesize
2	dfs.audit.log.maxbackupindex
1	dfsmetrics.log
1	dfsadmin
1	dfs.servers
1	dfs.replication
1	dfs.permissions
1	dfs.file

以上で、最小限の設定でクラスターを構成してMapReduceを実行するところまで確認できました。

補足

NTPの設定は必須

サーバー間の日付がずれていると以下のエラーが出力されます。

org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Unauthorized request to start containe. 
This token is expired. current time is 1410781184294 found 1410780933225

http://stackoverflow.com/questions/20257878/yarnexception-unauthorized-request-to-start-container

core-site.xml

dfs.name.dir
NameNodeのメタデータの保存先のデフォルトは/tmpなので、再起動時削除されてしまいます。
このパラメータも必須設定項目です。
dfs.name.dir
DataNodeのブロックデータの保存先もデフォルトは/tmpなので、再起動時削除されてしまいます。
このパラメータも必須設定項目です。

参照

Steps to install Hadoop 2.x release (Yarn or Next-Gen) on multi-node cluster