Sunday, November 14, 2010

How to build Hadoop in Eclipse

Subclipse

  1. Install SVN
    I installed "Slik-SVN" because it provides JavaHL lib for 64bit Windows.
  2. Install Subclipse plugin to Eclipse.
  3. Change eclipse.ini to add following parameters after -vmargs:
    -Djava.library.path=/usr/share/jni/lib (For linux)
    -Djava.library.path=<svn_install_dir>/bin (For Windows)
  4. Start eclipse
  5. Goto WIndow --> Preference --> Team --> SVN
    In section "SVN Interface", it should say something like "JavaHL (JNI) … SlikSvn". If it says "JavaHL(JNI) not available", it means subclipse cannot find JavaHL library. Check step 3).

Code checkout

Example: svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/
You can also check out code within Eclipse using Subclipse plugin.

Install Ant and Ivy

Read http://ant.apache.org/ for how to install Ant.

Download ivy jar and put it into directory ANT_HOME/lib/. If ANT_HOME is not specified explicitly, it is the installation directory.

IvyDE

Hadoop is managed by ivy. You need IvyIDE Eclipse plugin. Read http://ant.apache.org/ivy/ivyde/ for more info. IvyDE includes ivy jar file itself. So it does not use the ivy jar you installed in last step. Also it seems that ANT_HOME variable is set to <eclipse_dir>\plugins\org.apache.ant_1.7.1.v20090120-1145 (version number may vary for you).

Shell and Unix commands on Windows

Hadoop build file invokes sh and some other linux commands such as tr, sed to build the project. Of course those commands don't exist on Windows.

Following two projects port linux tools to Windows:

  1. http://sourceforge.net/projects/win-bash/files/win-bash/
  2. http://unxutils.sourceforge.net/

I use the first one. You just download the tarball and decompress it to a directory. This directory must be passed to Ant. The usual way is you put it into environment variable "PATH". Ant will pick it up automatically. It's true for command line use of Ant. It does not work well for Ant within Eclipse. Following sections include instructions about how to pass PATH to Ant in Eclipse.
For command line use, try command "ant compile".

Create Eclipse Project for Hadoop

  1. New --> Java Project
    Select "Create project from existing source". Then select the directory where code is located.
    Click "Next"
  2. CHANGE OUTPUT DIRECTORY TO <workspace_name>/build.
    The default directory bin is used by Hadoop for different purposes.
    Click "Finish"
  3. Add JDK's tools.jar to build path.  It is not included in JRE.
  4. Change source directories to tell Eclipse which directories include Java source code.
    Right click project name --> Build Path --> Configure Build Path… --> Source
  5. Make IvyDE to manage ivy dependencies.
    Right click project name --> Build Path --> Configure Build Path… --> Libraries --> Add Library --> IvyDE Managed Dependencies --> Next --> (couple of  IvyDE setting steps) --> OK.
    It may take some time for IvyDE to resolve dependencies.

Create Run Configuration

  1. Right click "build.xml" --> Run As --> Ant Build … (not "Ant Build")
  2. A dialog should pop up
    1. Switch to "Targets" tab: select corresponding target (e.g. compile) you want to execute.
    2. Switch to "JRE" tab: select "separate JRE"
    3. This step is for Windows users.
      Switch to Environment Tab: set PATH. (to include where those linux tools are included on Window)
      You can click "Select" and choose variable "Path". But in my case, its value does NOT include all of the content of the variable (use "path" command in command line). probably, Eclipse has some restriction about length of value of environment variable. If it's too long, it will be truncated.
    4. Click "Run"
  3. See Console for messages.

Customize project builder

It's more convenient to use "Builder" than right click "build.xml" --> Run As --> Ant Build … --> Run. Following steps tell you how to use ant as default builder. Then you can use "Project-->Build Project" to build your project (same as any regular native Eclipse Java project).

  1. Right click project name --> Properties --> Builder --> New --> Ant Builder
    1. Select the build file (usually "build.xml").
    2. Switch to "Targets" tab.  Specify which targets are executed when the project is built or cleaned.
    3. This step is for Windows users.
      Switch to "Environment" tab. Add PATH environment variable if needed. (to include where those linux tools are included on Window)
  2. Deselect default java builder.

2 comments:

Arun said...
This comment has been removed by the author.
Arun said...

Hi Gerald Guo !

Very nice info !
Useful to many i guess including me.
Build was successful.
How can we run and debug some tests or hdfs or mapreduce codes ?

Regards,
Arun