Friday, December 31, 2010

Hadoop tips

  • Change logging level
    • For each daemon, there is a service at http://daemon_address:port/logLevel through which you can get and set logging level.
    • Use command line
        hadoop daemonlog -getLevel daemon_address:port fullQualifiedClassName
      hadoop daemonlog -setLevel daemon_address:port fullQualifiedClassName logLevel
    • Permanent change
      Change file log4j.properties. Example
          log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
          log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
          log4j.logger.org.apache.hadoop.fs.FSNamesystem=DEBUG

  • Commission and decommission nodes
    Following four config parameters are related:
    dfs.hosts
    dfs.hosts.exclude
    mapreduce.jobtracker.hosts.filename (mapred.hosts for old version)
    mapreduce.jobtracker.hosts.exclude.filename (mapred.hosts.exclude for old version)

    For HDFS, execute "hadoop dfsadmin -refreshNodes" after you change the include file or exclude file.
    From the mailing list, I know "mradmin -refreshNodes was added in 0.21".  So for MapReduce, you can use "hadoop mradmin -refreshNodes" after you change the include file or exclude file to commission or decommission a node respectively.
    To permanently add or remove a node, you also need to change slave file conf/slaves.
  • Block scanner report
    http://datanode_address:50075/blockScannerReport
  • If you want to check blocks and block locations of a specific file, use following command:
      hadoop fsck file_to_check -files -blocks -locations -racks
    Note: you should execute it on master node.
    Use "hadoop fsck /" to check health of the whole file system.

Thursday, December 23, 2010

Extract some continuous lines from a file

Sometimes, I want to extract some continuous lines from a file, e.g. line 10 to line 100. I was thinking whether there is any linux command to do that. Unfortunately, I did not find the command in vanilla Linux distros. Suddenly, I found that it can achieved by combining commands head and tail.

Let's say you want to extract line min to line max, both inclusive.
Calculate nlines=(max-min+1). Then use following command:

cat <filename>|head -n <max>|tail -n <nlines>

Friday, December 17, 2010

vim quickfix and location list

Commands Description
copen open quickfix window
cclose close quickfix window
cwindow open quickfix window if its content is not empty.
   
cc [nr] display error [nr]
cr display the first error.
cfirst display the first error.
clast display the last error.
[count]cn display [count] next error
[count]cp display [count] previous error
[count]cnf display first error in the [count] next file
[count]cpf display first error in the [count] previous file

For commands related to location list, just replace first 'c' with 'l' in above commands.

Thursday, December 16, 2010

Compile and Install vim on Linux

Commands

wget ftp://ftp.vim.org/pub/vim/unix/vim-7.3.tar.bz2
tar jvxf vim-7.3.tar.bz2
cd vim73
./configure --enable-gui=no \
           
--enable-multibyte \
            --enable-cscope \
            --disable-netbeans \
            --prefix=<your_desired_vim_home>
make
make test
make install

Change you environment variable PATH to add vim bin path. Following commands are for bash.

Temporary change
    export PATH="<your_vim_home>/bin;$PATH"

Permanent change
    echo -e '\nexport PATH="<your_vim_home>/bin;$PATH" ' >> ~/.bash_profile

Wednesday, December 15, 2010

Delete executables and object (.o) files

Following two commands can delete all ELF executables and ELF object (.o) files.

find ./ -executable -type f|xargs -I{} file {} | grep ELF|cut -d ':' -f 1|xargs -I{} rm {}   

find ./ -name "*.o" -type f|xargs -I{} file {} | grep ELF|cut -d ':' -f 1|xargs -I{} rm {}

Some executables may not be found because its "x" bit is not set. Use following command to find them

find ./ -type f|xargs -I{} file {} | grep ELF|cut -d ':' -f 1|xargs -I{} rm {}

Note:
    -executable is not supported in old version of find.

Monday, December 13, 2010

How to change hostname in Ubuntu

Temporary change

    hostname <new_host_name>

Permanent change

  1. Edit /etc/hostname to specify your new hostname
    sudoedit /etc/hostname
  2. sudo service hostname start

Ubuntu init scripts and upstart jobs

Init script

Those scripts are located in directory /etc/init.d. Note: some of them have been converted to upstart jobs (see next section) and they should not be invoked directly.  To check whether it's a upstart job, check directory whether file /etc/init/<job>.conf exists for a specific job.

upstart jobs

http://upstart.ubuntu.com/  Use "man 5 init" to see the syntax of the conf file.

Each upstart job has a conf file in directory /etc/init/<job>.conf. You should not directly invoke the init script to start/stop the job. You should use commands initctl to do that. E.g. initctl restart hostname
initctl list will list upstart jobs that are running.

service

It can be used to interact with the init scripts, no matter they are upstart jobs or regular init script. For upstart jobs, it does not run /etc/init.d/<job> . Instead it runs "start <job>" directly.

E.g.   service hostname status

invoke-rc.d

Another tool to start/stop init jobs. In my opinion you should use command service because invoke-rc.d does NOT detect whether the job is a upstart job or regular init job. Usually, this is not a big deal because upstart job shell script automatically calls initctrl related commands (start, stop, reload, etc) .

Example - network

I will give an example about how network interfaces are managed by init daemon.

As you may know, ifup and ifdown can be used to bring up or down network interfaces.

/etc/network/interfaces are used by ifup and ifdown to know how you want your system to connect to the network.

Sample file

# interfaces lo and eth0 should be started when ifup -a is invoked.
auto lo eth0
# eth1 is allowed to be brought up by subsystem hotplug.
allow-hotplug eth1
# For interface lo, it should use internet protocol and it is a loopback device.
iface lo inet loopback
# Interface eth1 uses internet protocol and dhcp for configuration
iface eth1 inet dhcp

  • For upstart job networking, its config file is /etc/init/networking.conf:

description "configure virtual network devices"

start on (local-filesystems
      and stopped udevtrigger)

task

pre-start exec mkdir -p /var/run/network

exec ifup -a

Notice the last line? Yes, it invoke ifup to bring up those interfaces that are marked as "auto" in file /etc/network/interfaces.

  • Job network-interface is used when a network interface is added or removed. Its config file is /etc/init/network-interface.conf

description "configure network device"

start on net-device-added
stop on net-device-removed INTERFACE=$INTERFACE

instance $INTERFACE

pre-start script
    if [ "$INTERFACE" = lo ]; then
    # bring this up even if /etc/network/interfaces is broken
    ifconfig lo 127.0.0.1 up || true
    initctl emit -n net-device-up \
        IFACE=lo LOGICAL=lo ADDRFAM=inet METHOD=loopback || true
    fi
    mkdir -p /var/run/network
    exec ifup --allow auto $INTERFACE
end script

post-stop exec ifdown --allow auto $INTERFACE

Line "exec ifup --allow auto $INTERFACE" bring up the newly added interface if it is set to be brought up automatically.  The trigger event is "net-device-added" or "net-device-removed" which is sent by upstart-udev-bridge. It basically forwards events received from udev to init daemon. When your network interface (e.g. eth0) is detected by udev, finally a net-device-added event is sent to network-interface upstart job which runs ifup to bring it up.

  • upstart job hostname. (/etc/init/hostname.conf). It includes following line:
      exec hostname -b -F /etc/hostname
    Now you should know how to change hostname.
    Edit file /etc/hostname, run command "sudo service start hostname", "sudo start hostname", or "sudo initctl start hostname".

Recover corrupted partition table

Partition table of my linux drive was corrupted recently.  I could not start up my Ubuntu.

I burned a Ubuntu CD. But when I tried to boot into the liveCD, it always gave me errors. It seems to be a CD burning/CD drive problem. Then I made a live USB drive which worked great. Following two tools can be used to "guess" the partition table.

  • gpart
    This program is kind of old and is not maintained any longer. It can just recognizes some file systems (ext3, ext4, etc are not recognized correctly)
  • testdisk
    This is a great tool which a text UI. You can find information here. You just follow the instructions. Check the "guessed" partition table match your real partition table(if you have backup, you are lucky.).

Run fsck to check integrity of your file system.

Afterthoughts:

  1. Back up your partition table!
  2. USB drive is more stable than CD in this case.