
Configuration
=============
1. Set HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME.
   export HADOOP_HOME=/Users/nkutuzov/Projects/hadoop/hadoop-2.2.0
   export PATH=${PATH}:$HADOOP_HOME/bin
2. etc\hadoop\core-site.xml

      <configuration>
         <property>
             <name>fs.defaultFS</name>
             <value>hdfs://localhost:9000</value>
         </property>
         
         <property>
             <name>hadoop.security.authorization</name>
             <value>true</value>
         </property>
         
         <property>
           <name>hadoop.security.authentication</name>
           <value>simple</value> <!-- simple -  disable security or kerberos -->
         </property>
      </configuration>

3. etc\hadoop\hdfs-site.xml

      <configuration>
         <property>
             <name>dfs.replication</name>
             <value>1</value>
         </property>
         <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/Users/nkutuzov/Projects/hadoop/data/dfs/namenode</value>
         </property>
         <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:/Users/nkutuzov/Projects/hadoop/data/dfs/datanode</value>
         </property>
         
         <property>
             <name>dfs.namenode.http-address</name>
             <value>localhost:50070</value>
         </property>
      </configuration>

  Note : namenode and datanode directories should be created

4. etc\hadoop\yarn-site.xml

      <configuration>
         <!-- Site specific YARN configuration properties -->
         <property>
             <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
         </property>
         <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
         </property>
      
         <!-- optional -->
         <property>
            <name>yarn.resourcemanager.address</name>
            <value>localhost:8032</value>
         </property>
          
         <!--
         <property>
            <name>yarn.nodemanager.local-dirs</name>
            <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local</value>
         </property>
         <property>
            <name>yarn.nodemanager.log-dirs</name>
            <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs</value>
         </property>
         <property>
            <description>Where to aggregate logs</description>
            <name>yarn.nodemanager.remote-app-log-dir</name>
            <value>/var/log/hadoop-yarn/apps</value>
         </property>
         -->
         
         <property>
             <name>yarn.nodemanager.remote-app-log-dir</name>
             <value>/tmp/hadoop-logs/yarn-nodemanager-remote</value>
         </property>
      
         <property>
             <name>yarn.app.container.log.dir</name>
             <value>/tmp/hadoop-logs/yarn-app-container</value>
         </property>
      </configuration>
      

5. etc\hadoop\mapred-site.xml

      <configuration>
         <property>
            <name>mapred.job.tracker</name>
            <value>localhost:9001</value>
         </property>
      
         <property>
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
      </configuration>

6. Setup passphraseless ssh, ssh localhost should not ask password

  ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  chmod 0600 ~/.ssh/authorized_keys

7. Add hostname with datanode address to /etc/hosts
   Specified host should be known on client machine.
   
   10.1.0.4 SST-LX-HADOOP-SERVER
   
   or 
   
   10.1.0.4 SST-LX-HADOOP-SERVER.cloudapp.net
   

Starting
==========================
Add priveleges to create symbolic links
1. Format namenode
   ./bin/hdfs namenode -format
2. Start HDFS (Namenode and Datanode)
   ./sbin/start-dfs.sh
3. Start MapReduce aka YARN (Resource Manager and Node Manager)
   ./sbin/start-yarn.sh

Verifying
============
Resource Manager: http://localhost:8042 
Namenode: http://localhost:50070

Stop HDFS & MapReduce
======================
./sbin/stop-yarn.sh
./sbin/stop-dfs.sh


Samples
===============
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/nkutuzov
bin/hdfs dfs -put etc/hadoop input

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar grep input output 'dfs[a-z.]+'

bin/hdfs dfs -cat output/*
bin/hdfs dfs -get output output
cat output/*

hdfs dfsadmin -refreshServiceAcl

  