Friday, December 02, 2011

Memory allocation settings in Hadoop

Edit file conf/mapred-site.xml to change amount of memory allocated to sorting:

<property> 
    <name>io.sort.mb</name>
    <value>300</value>
</property>

Edit file conf/mapred-site.xml to change amount of memory allocated to each map/reduce task:

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx800m</value>
</property>

Edit file conf/hadoop-env.sh to change amount of memory allocated to Hadoop daemons:

export HADOOP_HEAPSIZE=1000

Change ports used by Hadoop

Edit file conf/hdfs-site.xml to change ports used by HDFS

    <property>
        <name>dfs.secondary.http.address</name>
        <value>0.0.0.0:51090</value>
    </property>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:51010</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:51075</value>
    </property>
    <property>
        <name>dfs.datanode.https.address</name>
        <value>0.0.0.0:51475</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:51020</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>0.0.0.0:51070</value>
    </property>
    <property>
        <name>dfs.https.address</name>
        <value>0.0.0.0:51470</value>
    </property>

Edit file conf/mapred-site.xml o change ports used by MapReduce

    <property>
        <name>mapred.job.tracker.http.address</name>
        <value>0.0.0.0:51030</value>
    </property>

    <property>
        <name>mapred.task.tracker.http.address</name>
        <value>0.0.0.0:51060</value>
    </property>

exclude directories when using GNU tar

tar zvcf name.tar.gz --exclude path/to/dir1 --exclude path/to/dir2 path/to/tar

Note:

  1. Do not include a trailing '/' in the path of excluded directories.  Otherwise, it won't work.
  2. Put --exclude before the directory/file to be tarred.