Friday, December 31, 2010

Hadoop tips

  • Change logging level
    • For each daemon, there is a service at http://daemon_address:port/logLevel through which you can get and set logging level.
    • Use command line
        hadoop daemonlog -getLevel daemon_address:port fullQualifiedClassName
      hadoop daemonlog -setLevel daemon_address:port fullQualifiedClassName logLevel
    • Permanent change
      Change file log4j.properties. Example
          log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
          log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
          log4j.logger.org.apache.hadoop.fs.FSNamesystem=DEBUG

  • Commission and decommission nodes
    Following four config parameters are related:
    dfs.hosts
    dfs.hosts.exclude
    mapreduce.jobtracker.hosts.filename (mapred.hosts for old version)
    mapreduce.jobtracker.hosts.exclude.filename (mapred.hosts.exclude for old version)

    For HDFS, execute "hadoop dfsadmin -refreshNodes" after you change the include file or exclude file.
    From the mailing list, I know "mradmin -refreshNodes was added in 0.21".  So for MapReduce, you can use "hadoop mradmin -refreshNodes" after you change the include file or exclude file to commission or decommission a node respectively.
    To permanently add or remove a node, you also need to change slave file conf/slaves.
  • Block scanner report
    http://datanode_address:50075/blockScannerReport
  • If you want to check blocks and block locations of a specific file, use following command:
      hadoop fsck file_to_check -files -blocks -locations -racks
    Note: you should execute it on master node.
    Use "hadoop fsck /" to check health of the whole file system.