Overview
Guide for setting up a single-node Hadoop on CentOS using the Cloudera CDH repository.
Versions
- CentOS 6.4
- Oracle Java JDK 1.6
- CDH 4
- Hadoop 0.2
Prerequisties
Install
1. Download the yum repo file:
1 |
sudo wget -O /etc/yum.repos.d/cloudera-cdh4.repo http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo |
2. Install
1 |
sudo yum install hadoop-0.20-conf-pseudo |
Configure
1. Format the name node
1 |
sudo -u hdfs hdfs namenode -format |
Output:
1 2 3 4 5 6 7 8 9 |
... 13/03/26 15:24:01 INFO namenode.FSImage: Saving image file /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 13/03/26 15:24:01 INFO namenode.FSImage: Image file of size 119 saved in 0 seconds. 13/03/26 15:24:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 13/03/26 15:24:01 INFO util.ExitUtil: Exiting with status 0 13/03/26 15:24:01 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ |
2. Start namenode/datanode services
1 2 3 |
sudo service hadoop-hdfs-namenode start sudo service hadoop-hdfs-secondarynamenode start sudo service hadoop-hdfs-datanode start |
3. Optional: Start services on boot
1 2 3 |
sudo chkconfig hadoop-hdfs-namenode on sudo chkconfig hadoop-hdfs-secondarynamenode on sudo chkconfig hadoop-hdfs-datanode on |
4. Create directories
1 2 3 |
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp sudo -u hdfs hadoop fs -mkdir /user |
5. Create map/reduce directories
1 2 3 |
sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred |
6. Start map/reduce services
1 2 |
sudo service hadoop-0.20-mapreduce-jobtracker start sudo service hadoop-0.20-mapreduce-tasktracker start |
7. Optional: Start services on boot
1 2 |
sudo chkconfig hadoop-0.20-mapreduce-jobtracker on sudo chkconfig hadoop-0.20-mapreduce-tasktracker on |
8. Optional: Create a home directory on the hdfs for the current user
1 2 |
sudo -u hdfs hadoop fs -mkdir /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER |
9. Edit /etc/profile.d/hadoop.sh
1 |
export HADOOP_HOME=/usr/lib/hadoop |
10. Load into session
1 |
source /etc/profile.d/hadoop.sh |
Test
1. Get a directory listing from hadoop hdfs
1 |
sudo -u hdfs hadoop fs -ls -R / |
Output:
1 2 3 4 5 6 7 8 9 |
drwxrwxrwt - hdfs supergroup 0 2012-04-19 15:14 /tmp drwxr-xr-x - hdfs supergroup 0 2013-03-26 15:38 /user drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs/cache drwxr-xr-x - mapred supergroup 0 2012-04-19 15:19 /var/lib/hadoop-hdfs/cache/mapred drwxr-xr-x - mapred supergroup 0 2012-04-19 15:29 /var/lib/hadoop-hdfs/cache/mapred/mapred drwxrwxrwt - mapred supergroup 0 2012-04-19 15:33 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging |
Note: results will vary based on user directories created
2. Navigate browser to http://<hostname>:50070
4. Navigate browser to http://<hostname>:50030
3. Run one of the examples
1 |
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 1000 |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
doop-0.20-mapreduce/hadoop-examples.jar pi 10 1000 Number of Maps = 10 Samples per Map = 1000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 13/03/26 15:48:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/03/26 15:48:05 INFO mapred.FileInputFormat: Total input paths to process : 10 13/03/26 15:48:05 INFO mapred.JobClient: Running job: job_201303261534_0001 13/03/26 15:48:06 INFO mapred.JobClient: map 0% reduce 0% 13/03/26 15:48:11 INFO mapred.JobClient: map 20% reduce 0% 13/03/26 15:48:13 INFO mapred.JobClient: map 40% reduce 0% 13/03/26 15:48:15 INFO mapred.JobClient: map 60% reduce 0% 13/03/26 15:48:16 INFO mapred.JobClient: map 80% reduce 0% 13/03/26 15:48:18 INFO mapred.JobClient: map 100% reduce 26% 13/03/26 15:48:21 INFO mapred.JobClient: map 100% reduce 100% 13/03/26 15:48:21 INFO mapred.JobClient: Job complete: job_201303261534_0001 13/03/26 15:48:21 INFO mapred.JobClient: Counters: 33 13/03/26 15:48:21 INFO mapred.JobClient: File System Counters 13/03/26 15:48:21 INFO mapred.JobClient: FILE: Number of bytes read=226 13/03/26 15:48:21 INFO mapred.JobClient: FILE: Number of bytes written=2016361 13/03/26 15:48:21 INFO mapred.JobClient: FILE: Number of read operations=0 13/03/26 15:48:21 INFO mapred.JobClient: FILE: Number of large read operations=0 13/03/26 15:48:21 INFO mapred.JobClient: FILE: Number of write operations=0 13/03/26 15:48:21 INFO mapred.JobClient: HDFS: Number of bytes read=2390 13/03/26 15:48:21 INFO mapred.JobClient: HDFS: Number of bytes written=215 13/03/26 15:48:21 INFO mapred.JobClient: HDFS: Number of read operations=31 13/03/26 15:48:21 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/03/26 15:48:21 INFO mapred.JobClient: HDFS: Number of write operations=3 13/03/26 15:48:21 INFO mapred.JobClient: Job Counters 13/03/26 15:48:21 INFO mapred.JobClient: Launched map tasks=10 13/03/26 15:48:21 INFO mapred.JobClient: Launched reduce tasks=1 13/03/26 15:48:21 INFO mapred.JobClient: Data-local map tasks=10 13/03/26 15:48:21 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=19270 13/03/26 15:48:21 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=10129 13/03/26 15:48:21 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/03/26 15:48:21 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/03/26 15:48:21 INFO mapred.JobClient: Map-Reduce Framework 13/03/26 15:48:21 INFO mapred.JobClient: Map input records=10 13/03/26 15:48:21 INFO mapred.JobClient: Map output records=20 13/03/26 15:48:21 INFO mapred.JobClient: Map output bytes=180 13/03/26 15:48:21 INFO mapred.JobClient: Input split bytes=1210 13/03/26 15:48:21 INFO mapred.JobClient: Combine input records=0 13/03/26 15:48:21 INFO mapred.JobClient: Combine output records=0 13/03/26 15:48:21 INFO mapred.JobClient: Reduce input groups=2 13/03/26 15:48:21 INFO mapred.JobClient: Reduce shuffle bytes=280 13/03/26 15:48:21 INFO mapred.JobClient: Reduce input records=20 13/03/26 15:48:21 INFO mapred.JobClient: Reduce output records=0 13/03/26 15:48:21 INFO mapred.JobClient: Spilled Records=40 13/03/26 15:48:21 INFO mapred.JobClient: CPU time spent (ms)=4110 13/03/26 15:48:21 INFO mapred.JobClient: Physical memory (bytes) snapshot=2668306432 13/03/26 15:48:21 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8668086272 13/03/26 15:48:21 INFO mapred.JobClient: Total committed heap usage (bytes)=2210988032 13/03/26 15:48:21 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 13/03/26 15:48:21 INFO mapred.JobClient: BYTES_READ=240 Job Finished in 16.69 seconds Estimated value of Pi is 3.14080000000000000000 |