Category Archives: hadoop

[DEPRECATED] CentOS 6: Install Single-node Hadoop from Cloudera CDH

Overview

Guide for setting up a single-node Hadoop on CentOS using the Cloudera CDH repository.

Versions

  • CentOS 6.4
  • Oracle Java JDK 1.6
  • CDH 4
  • Hadoop 0.2

Prerequisties

Install

1. Download the yum repo file:

2. Install

Configure

1. Format the name node

Output:

2. Start namenode/datanode services

3. Optional: Start services on boot

4. Create directories

5. Create map/reduce directories

6. Start map/reduce services

7. Optional: Start services on boot

8. Optional: Create a home directory on the hdfs for the current user

9. Edit /etc/profile.d/hadoop.sh

10. Load into session

Test

1. Get a directory listing from hadoop hdfs

Output:

Note: results will vary based on user directories created

2. Navigate browser to http://<hostname>:50070
Hadoop NameNode localhost:8020 - Google Chrome_024

4. Navigate browser to http://<hostname>:50030
localhost Hadoop Map-Reduce Administration - Google Chrome_023

3. Run one of the examples

Output:

Sources

[DEPRECATED] CentOS 6: Install Hadoop from Apache Bigtop

WARNING

This guide is a work-in-progress and currently does not result in a fully working Hadoop. Please see CentOS 6: Install Single-node Hadoop from Cloudera CDH

Overview

Guide for setting up a single-node Hadoop on CentOS using the Apache Bigtop repo.

Versions

  • CentOS 6.3
  • Oracle Java JDK 1.6
  • Apache BigTop 0.5.0
  • Hadoop 2.0.2-alpha

Prerequisties

Install

1. Download the yum repo file:

2. Install

Configure

Separate where the namenode and datanode store their files

1. Edit /etc/hadoop/conf/hdfs-site.xml and change the following properties to the listing below:

  • dfs.namenode.name.dir
  • dfs.namenode.checkpoint.dir
  • dfs.datanode.data.dir

Note: this step is not part of the official Apache BigTop instructions, but was required to avoid errors when running a datanode on the same machine as the namenode.

2. Format the name node

Output:

Note: formatting the datanode is not required, *however* if you have a previous install, you may have to to remove /var/lib/hadoop-hdfs/datanode to clear locks

3. Start hadoop namenode and datanode

TODO: figure out why hadoop-hdfs-zkfc doesn’t start
4. Start services on boot

5. Optional: Create a home directory on the hdfs

6. Edit /etc/profile.d/hadoop.sh

7. Load into session

Test

1. Download the examples (they are missing 2.0.2-alpha for some reason)

2. Get a directory listing from hadoop hdfs

3. Run one of the examples

TODO: while the cluster appears to be working, this example hangs. :[

4. Navigate browser to http://<hostname>:50070
Hadoop NameNode localhost:8020 - Google Chrome_021
5. Click on “Live Nodes”
Hadoop NameNode localhost:8020 - Google Chrome_022

Sources