What is Hadoop ?
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
Source : http://en.wikipedia.org/wiki/Hadoop
In this tutorial I will guide you the required steps to setup a multi-node cluster.
STEP 1
To Setup hadoop we need some prerequisites.
1. Download and Config JDK
Java 1.6.X recommended.
2. Download Hadoop
Download Hadoop latest stable release in here
All the nodes must have the same version of JDK and hadoop core.
STEP 2
Establish Authentication among nodes
Suppose if a user from node_A wants to login to a remote node_B by using SSH, It will asked the password for node_B for authentication. So it is impossible to enter the password every time the masternode wants to operate the slavenode. To solve this we must adopt public key authentication. Every node will generate a pair of public key and private key, and node_A can login to node_B without password authentication only if node_B has a copy of node_A's public key. In hadoop cluster all the slave nodes must have a copy of master nodes public key.
To do this,
Login each node and run the following command.
ssh-keygen -t rsaWhen question asked simply press enter to continue. Then two files "id_rsa" and "id_rsa.pub" are creates under the /home/username/.ssh/
Now login to master node and run the following command.
Then login to each slave node and run the following command.
- cat /home/username/.ssh/id_rsa.pub >> /home/username/.ssh/authorizes_keys
- scp /home/username/.ssh/id_rsa.pub ip_address_of_slavenode:/home/username/.ssh/master.pub
cat /home/username/.ssh/master.pub >> /home/username/.ssh/authorized_keysThen login back to master node and run to test whether masternode can login to slave node without password.
ssh ip_address_of_slave_node
STEP 5
In this step we have to install hadoop in each slave node. Download the hadoop and exact to a directory and set the HADOOP_INSTALL variable.
STEP 6
Hadoop Configuration
Set the JAVA_HOME and HADOOP_INSTALL system variables.
Modify "hadoop-env.sh" in HADOOP_HOME/conf/. Delete the beginning '#' in The Java Implementation to use and fill the appropriate path.
Modify hdfs-site.xml , mapred-site.xml , core-site.xml as below.
Download Link : http://hotfile.com/dl/85903416/b760647/XMLs.tar.gz.html
STEP 7
Start Hadoop
First you have to format the namenode. To do this
hadoop namenode -formatThen Start the cluster
start-all.sh
Some Useful links.
http://ip_add_of_namenode:50070
http://ip_add_of_jobtracker:50030
http://ip_add_of_map_reduce:50060