WARNING
This guide is a work-in-progress and currently does not result in a fully working Hadoop. Please see CentOS 6: Install Single-node Hadoop from Cloudera CDH
Overview
Guide for setting up a single-node Hadoop on CentOS using the Apache Bigtop repo.
Versions
- CentOS 6.3
- Oracle Java JDK 1.6
- Apache BigTop 0.5.0
- Hadoop 2.0.2-alpha
Prerequisties
Install
1. Download the yum repo file:
1 |
sudo wget -O /etc/yum.repos.d/bigtop.repo http://www.apache.org/dist/bigtop/bigtop-0.5.0/repos/centos6/bigtop.repo |
2. Install
1 |
sudo yum install hadoop* |
Configure
Separate where the namenode and datanode store their files
1. Edit /etc/hadoop/conf/hdfs-site.xml and change the following properties to the listing below:
- dfs.namenode.name.dir
- dfs.namenode.checkpoint.dir
- dfs.datanode.data.dir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
... <property> <name>dfs.namenode.name.dir</name> <value>file:///var/lib/hadoop-hdfs/namenode/${user.name}/dfs/name</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///var/lib/hadoop-hdfs/namenode/${user.name}/dfs/namesecondary\</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///var/lib/hadoop-hdfs/datanode/${user.name}/dfs/data</value> </property> ... |
Note: this step is not part of the official Apache BigTop instructions, but was required to avoid errors when running a datanode on the same machine as the namenode.
2. Format the name node
1 |
sudo -u hdfs hadoop namenode -format |
Output:
1 2 3 4 5 6 7 8 9 |
... 13/03/18 03:26:48 INFO namenode.FSImage: Image file of size 119 saved in 0 seconds. 13/03/18 03:26:48 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 13/03/18 03:26:48 INFO namenode.FileJournalManager: Purging logs older than 0 13/03/18 03:26:48 INFO util.ExitUtil: Exiting with status 0 13/03/18 03:26:48 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ |
Note: formatting the datanode is not required, *however* if you have a previous install, you may have to to remove /var/lib/hadoop-hdfs/datanode to clear locks
3. Start hadoop namenode and datanode
1 2 3 4 5 6 7 |
sudo service hadoop-hdfs-namenode start sudo service hadoop-hdfs-datanode start sudo service hadoop-httpfs start sudo service hadoop-mapreduce-historyserver start sudo service hadoop-yarn-nodemanager start sudo service hadoop-yarn-proxyserver start sudo service hadoop-yarn-resourcemanager start |
TODO: figure out why hadoop-hdfs-zkfc doesn’t start
4. Start services on boot
1 2 3 4 5 6 7 |
sudo chkconfig hadoop-hdfs-namenode on sudo chkconfig hadoop-hdfs-datanode on sudo chkconfig hadoop-httpfs on sudo chkconfig hadoop-mapreduce-historyserver on sudo chkconfig hadoop-yarn-nodemanager on sudo chkconfig hadoop-yarn-proxyserver on sudo chkconfig hadoop-yarn-resourcemanager on |
5. Optional: Create a home directory on the hdfs
1 2 3 |
sudo -u hdfs hadoop fs -mkdir /user sudo -u hdfs hadoop fs -mkdir /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER |
6. Edit /etc/profile.d/hadoop.sh
1 2 |
export HADOOP_HOME=/usr/lib/hadoop export HADOOP_VERSION=2.0.2-alpha |
7. Load into session
1 |
source /etc/profile.d/hadoop.sh |
Test
1. Download the examples (they are missing 2.0.2-alpha for some reason)
1 |
sudo wget -O /usr/lib/hadoop/hadoop-examples.jar http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-examples/2.0.3-alpha/hadoop-mapreduce-examples-2.0.3-alpha.jar |
2. Get a directory listing from hadoop hdfs
1 |
sudo -u hdfs hadoop fs -lsr / |
3. Run one of the examples
1 |
sudo -u hdfs hadoop jar /usr/lib/hadoop/hadoop-examples.jar pi 10 1000 |
TODO: while the cluster appears to be working, this example hangs. :[
4. Navigate browser to http://<hostname>:50070
5. Click on “Live Nodes”
You say, “change the following properties.” I say, “to what?” I keep thinking I’m going to see a list of data node hostnames or IP addresses for the secondary NN and the job tracker in one of these how-tos but I never do.
Hey Mark! Not sure what your question is exactly, but the code listing below the property names shows what the properties should be set to. I’ll admit its not the clearest. If you have some suggestions on how to present these type of file edits in a way that is clearer — I’m all ears. Also, this configuration is for a single-node testing configuration, so namenode, datanode, etc are on the same box.
The stock bigtop packages gave me a hdfs-site.xml file with the values you posted here. Probably this was not the case when you wrote this. Nevermind.