Pages

How To Setup a Hadoop Cluster

In this tutorial I will show you the require steps for setting up a multi-node hadoop cluster using Hadoop Distributed File System (HDFS) in Linux based Operating Systems.

What is Hadoop ?
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Source : http://en.wikipedia.org/wiki/Hadoop

In this tutorial I will guide you the required steps to setup a multi-node cluster.

STEP 1
To Setup hadoop we need some prerequisites.

1. Download and Config JDK
    Java 1.6.X recommended.

2. Download Hadoop
    Download Hadoop latest stable release in here

All the nodes must have the same version of JDK and hadoop core.

STEP 2
Establish Authentication among nodes


Suppose if a user from node_A wants to login to a remote node_B by using SSH, It will asked the password for node_B for authentication. So it is impossible to enter the password every time the masternode wants to operate the slavenode. To solve this we must adopt public key authentication. Every node will generate a pair of public key and private key, and node_A can login to node_B without password authentication only if node_B has a copy of node_A's public key. In hadoop cluster all the slave nodes must have a copy of master nodes public key.

To do this,
Login each node and run the following command.
ssh-keygen -t rsa
When question asked simply press enter to continue. Then two files "id_rsa" and "id_rsa.pub" are creates under the /home/username/.ssh/

Now login to master node and run the following command.

  • cat /home/username/.ssh/id_rsa.pub >> /home/username/.ssh/authorizes_keys
  • scp /home/username/.ssh/id_rsa.pub ip_address_of_slavenode:/home/username/.ssh/master.pub
Then login to each slave node and run the following command.
cat /home/username/.ssh/master.pub >> /home/username/.ssh/authorized_keys
Then login back to master node and run to test whether masternode can login to slave node without password.
ssh ip_address_of_slave_node

STEP 5
In this step we have to install hadoop in each slave node. Download the hadoop and exact to a directory and set the HADOOP_INSTALL variable.

STEP 6
Hadoop Configuration

Set the JAVA_HOME and HADOOP_INSTALL system variables.

Modify "hadoop-env.sh" in HADOOP_HOME/conf/. Delete the beginning '#' in The Java Implementation to use and fill the appropriate path.

Modify hdfs-site.xml , mapred-site.xml , core-site.xml as below.
Download Link : http://hotfile.com/dl/85903416/b760647/XMLs.tar.gz.html

STEP 7
Start Hadoop

First you have to format the namenode. To do this
hadoop namenode -format
Then Start the cluster
start-all.sh


Some Useful links.
http://ip_add_of_namenode:50070
http://ip_add_of_jobtracker:50030
http://ip_add_of_map_reduce:50060

0 comments:

Post a Comment