Sunday, November 14, 2010

XLink

 http://www.xml.com/lpt/a/1038

Can be transformed to RDF?

One RDF use include to include other RDFs.

XLink and HLink: http://www.xml.com/lpt/a/1038

Enhancements to html link

  1. When to actuate the link
    In current html link impl, the link is actuated when it is clicked. In XLink and HLink, a link can be actuated when the page containing the link is loaded
  2. More effects when a link is actuated.
    embed, new, replace, etc.
  3. HLink supports creation of arbitrary link element.
    <hlink namespace="http://www.example.com/markup"
           element="home"
           locator="/"
           effect="replace"
           actuate="onRequest"/>
    <hlink namespace="http://www.example.com/markup"
           element="home"
           locator="/icons/home.png"
           effect="embed"
           actuate="onLoad"/>
    
    <home/>
  4. XLink supports creation of links among more than two resources.
  5. Add more metadata to links
  6. Links can be specified outside the linked resources.
    In HTML, users can only specify links within the source resource.
    When you write
     <a href=”destination.resource”>source</a>
    this piece of code must be located in the source html. In other words, the user cannot specify links among external resources.
    XLink adds this support.

 

METS, DIDL, ORE

http://www.oreillynet.com/xml/blog/2008/06/oaiore_compound_documents_draf.html

http://www.oreillynet.com/xml/blog/2008/05/bad_xml.html

http://www.dehora.net/journal/2008/06/18/dates-in-atom/

http://dret.net/netdret/docs/wilde-cacm2008-xml-fever.html

http://www.tbray.org/ongoing/When/200x/2006/01/09/On-XML-Language-Design

Google AppEngine mail test

Mail

How to test: http://aralbalkan.com/1311

Two bugs:

http://code.google.com/p/googleappengine/issues/detail?id=626
For this bug, you can upgrade your python to new version (2.5.4 and up).

http://code.google.com/p/googleappengine/issues/detail?id=1061

http://groups.google.com/group/app-engine-patch/browse_thread/thread/1662f95d9cacee24

Deb package manipulation notes (deb make, view, install, etc)

Directory /var/lib/dpkg/info/ contains package related files. For each package, its conffiles, md5sums, preinst, postinst, prerm, postrm, list of installed files, etc are kept there.

dpkg-dev

debian/files: "The  list  of  generated files which are part of the upload being prepared."

.changes: upload control file

dpkg-buildpackage
build binary or source packages from sources

dpkg-architecture: set and determine the architecture for package building

dpkg-checkbuilddeps: check build dependencies and conflicts. By default, debian/control is read.

dpkg-distaddfile: adds an entry for a named file to debian/files.
dpkg-genchanges:
dpkg-gencontrol:  generate Debian control files
dpkg-gensymbols:

dpkg-name
dpkg-scanpackages: create Packages index files
dpkg-scansources: create Sources index files
dpkg-shlibdeps
dpkg-source: packs and unpacks Debian source archives.
dpkg-vendor: query vendor information
dpkg-parsechangelog: get changelog information

Vendor

/etc/dpkg/origins/default

devscripts

debchange

debhelper

dh-make

This package is useful when you have a regular source package (not debian source package) and want to debianlize it.
dh_make must be invoked within a directory containing the source code, which must be named <packagename>-<version>. The <packagename> must  be  all lowercase, digits and dashes.
As I mentioned, there are two types of debian source packages – native and non-native.
For non-native package, obviously you need the original source tree. The reason is that the original source tree is needed to deb tools to generate diff. dh_make makes sure original source tarball(<packagename>_<version>.orig.tar.gz) exists.
Option –f can be used to specify location of the tarball.
If –f is not given, dh_make searches parent directory for file <packagename>_<version>.orig.tar.gz and directory <packagename>_<version>.orig. If either of them exists, it will be fine. If neither exists, dh_make will complain and exit.
If you want to create a original source tarball based on the code in current directory, use option "—createorig". Then current directory is copied to <packagename>_<version>.orig in parent directory.

key: public key
secret: private key

Trusted pub keys are stored in file /etc/apt/trusted.gpg (not /etc/apt/trustdb.gpg)

apt-key list

gpg --recv-keys --keyserver keyserver.ubuntu.com key_ID_here;
gpg --export --armor key_ID_here | sudo apt-key add -

http://wiki.debian.org/SecureApt
https://help.ubuntu.com/community/SecureApt

Downloaded deb packages are stored at /var/cache/apt/archives/ and /var/cache/apt/archives/partial/.

Low-level understanding

Deb binary package format

man deb
The manual describes debian binary package format
deb package is ar archive. So you can read content of a deb package using command:
    ar tf pkg_name.deb
On my machine, the output is
    debian-binary 
    control.tar.gz 
    data.tar.gz 

Extract content of a deb pkg using command:
  ar xof pkg_name.deb

Deb control

control.tar.gz is a control file. Its format is deb-control.
"It is a gzipped tar  archive  containing the  package  control  information,  as a series of plain files, of which the file control is mandatory and contains the core control information."
Use command tar zvxf control.tar.gz to extract control files. The most important file is control. The format of the file is described in man deb-control.
conffiles: this file lists all configuration files used by this package.
control:
md5sums 
postinst
postrm  
preinst 
prerm

Deb data

"It contains the filesystem as a tar archive, either not compressed".

High-level understanding

Ubuntu provides some tools to make it more convenient to manipulate deb package so that users don't need to use ar, tar, etc to extract files/information manually.

First, command dpkg-deb comes really handy

    dpkg-deb –I: provides information of a deb pkg. (Extracts info from file control) 
    dpkg-deb –c: list content of the package. (Extracts info from data.tar.gz)
    dpkg-deb –x: extract a deb archive 
    dpkg-deb –X: extract a deb archive and print list of extracted files. 
    dpkg-deb –e: extract control information to DEBIAN directory if not specified.
                 (Extract files from control.tar.gz)

Deb Source Package

Format of source package is described in section "SOURCE PACKAGE FORMATS" within manual "man dpkg-source".
Also read http://www.debian.org/doc/debian-policy/ch-source.html for more info.

There are two types of source packages: native and non-native.
Layout of native package

  .dsc: includes package info and md5 checksum for the package content.
.tar.gz
Layout of non-native package:
  .dsc: debian source control 
.orig.tar.gz: source code
.diff.gz: 1)patches applied to the source code; 2) debian package (debain/ dir)

Download a source package instead of binary packge:

  apt-get source pkg_name     #Download and unpack
apt-get source --download-only pkg_name #only download

Then command dpkg-source comes handy to manipulate source package.

  dpkg-source –x pkg_name.dsc    # Extract a source package. 

If you use command "apt-get source pkg_name", the package has been download and extracted. So you don't need to execute this command. If you use command "apt-get source --download pkg_name", you can use this command to extract the downloaded package and apply the patch.
If you don't want the patch to be applied, add option --skip-debianization.

If the directory where you execute command "dpkg-source –x" is different from the directory where downloaded source package is stored, option "-su, –sp, sn" can be used to specify where source tarball will be copied to current direcotory.

In all cases any existing original source tree will be removed! So be sure to backup your code if it is in current directory.

  dpkg-source –sn –x pkg_name.src    #original source tarball is not copied to current directory. But source tree is unpacked to current dir and patch is applied
  dpkg-source –sp –x pkg_name.src    #source tarball is copied to current directory, unpacked, and patch is applied
  dpkg-source –su –x pkg_name.src    #Copy source tarball to current directory, both original source tree and patched source tree are extracted.

If you want the original source is extracted also, use command "dpkg-source –su –x pkg_name.dsc".
When I extracted the source package, I got following warning:
gpgv: Can't check signature: public key not found
dpkg-source: warning: failed to verify signature on ./pkg_name.dsc

This means public key has not been found which is needed to verify signature of the package. The dpkg-source manaul can tell you more:

--no-check
  Do not check signatures and checksums before unpacking.
--require-valid-signature
  Refuse to unpack  the source package if it doesn't contain an OpenPGP signature that can be verified either with the user's trustedkeys.gpg keyring, one of the vendor-specific  keyrings,  or  one of the official Debian keyrings (/usr/share/keyrings/debiankeyring.gpg and /usr/share/keyrings/debianmaintainers.gpg).

debian/ direcotory

Version:
https://wiki.ubuntu.com/PackagingGuide/Complete#changelog

changelog

Default file is located at debian/changelog. Changelog contains a list of changes. Note: it has a specific format. Command debchange can be used to edit the file.

debchange –a        #append a changelog entry at current version
debchange –i         #increase release number for non-native packages (2.4-1ubuntu1 –> 2.4-1ubuntu2).
debchange –v        #create a changelog entry for a arbitrary new version.
debchange --create  #create a new changelog file
debchange –c changelogfile  #edit a specified changelog file instead of default one.

Read http://www.debian.org/doc/debian-policy/ch-source.html#s-dpkgchangelog for more info.

dpkg-source –b   # build source package
man deb-version
Debian package version number format

Export:
gpg --export-secret-keys keyID
gpg --export keyID    #export public key
gpg --gen-key
gpg –k   #list pub keys
gpg –K   #list secret keys


copyright

Read this: https://wiki.ubuntu.com/PackagingGuide/Basic#Copyright%20Information

control

https://wiki.ubuntu.com/PackagingGuide/Complete#control

rules
It specifies how to compile, install the app and create the .deb package.
https://wiki.ubuntu.com/PackagingGuide/Complete#rules

DEBFULLNAME:
DEBEMAIL:

Package build

Binary package

dpkg-buildpackage
debuild: wrap dpkg-buildpackage and some other tools. Or you can set variable DEBSIGN_KEYID to the key id.
Use debuild –kKEYID to specify the key used to sign the package.
If you want to pass parameters to dpkg-buildpackage, set variable DEBUILD_DPKG_BUILDPACKAGE_OPTS.

debsign –kkeyID
debsign –m'LastName FirstName (Comment) <email_address>'

Source package: debuild –S

 

lintian
Debian package checker
  lintian -Ivai *.dsc

sudo pbuilder build pkg_name.dsc

dpkg-query –s pkg_name    #conf files are listed

https://wiki.ubuntu.com/PackagingGuide/Complete
https://wiki.ubuntu.com/DebootstrapChroot
https://wiki.ubuntu.com/PackagingGuide/Basic
https://wiki.ubuntu.com/PbuilderHowto
https://help.ubuntu.com/community/GnuPrivacyGuardHowto

http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html

http://www.debian.org/doc/manuals/maint-guide/index.en.html

http://www.debian.org/doc/debian-policy/

How to build Hadoop in Eclipse

Subclipse

  1. Install SVN
    I installed "Slik-SVN" because it provides JavaHL lib for 64bit Windows.
  2. Install Subclipse plugin to Eclipse.
  3. Change eclipse.ini to add following parameters after -vmargs:
    -Djava.library.path=/usr/share/jni/lib (For linux)
    -Djava.library.path=<svn_install_dir>/bin (For Windows)
  4. Start eclipse
  5. Goto WIndow --> Preference --> Team --> SVN
    In section "SVN Interface", it should say something like "JavaHL (JNI) … SlikSvn". If it says "JavaHL(JNI) not available", it means subclipse cannot find JavaHL library. Check step 3).

Code checkout

Example: svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/
You can also check out code within Eclipse using Subclipse plugin.

Install Ant and Ivy

Read http://ant.apache.org/ for how to install Ant.

Download ivy jar and put it into directory ANT_HOME/lib/. If ANT_HOME is not specified explicitly, it is the installation directory.

IvyDE

Hadoop is managed by ivy. You need IvyIDE Eclipse plugin. Read http://ant.apache.org/ivy/ivyde/ for more info. IvyDE includes ivy jar file itself. So it does not use the ivy jar you installed in last step. Also it seems that ANT_HOME variable is set to <eclipse_dir>\plugins\org.apache.ant_1.7.1.v20090120-1145 (version number may vary for you).

Shell and Unix commands on Windows

Hadoop build file invokes sh and some other linux commands such as tr, sed to build the project. Of course those commands don't exist on Windows.

Following two projects port linux tools to Windows:

  1. http://sourceforge.net/projects/win-bash/files/win-bash/
  2. http://unxutils.sourceforge.net/

I use the first one. You just download the tarball and decompress it to a directory. This directory must be passed to Ant. The usual way is you put it into environment variable "PATH". Ant will pick it up automatically. It's true for command line use of Ant. It does not work well for Ant within Eclipse. Following sections include instructions about how to pass PATH to Ant in Eclipse.
For command line use, try command "ant compile".

Create Eclipse Project for Hadoop

  1. New --> Java Project
    Select "Create project from existing source". Then select the directory where code is located.
    Click "Next"
  2. CHANGE OUTPUT DIRECTORY TO <workspace_name>/build.
    The default directory bin is used by Hadoop for different purposes.
    Click "Finish"
  3. Add JDK's tools.jar to build path.  It is not included in JRE.
  4. Change source directories to tell Eclipse which directories include Java source code.
    Right click project name --> Build Path --> Configure Build Path… --> Source
  5. Make IvyDE to manage ivy dependencies.
    Right click project name --> Build Path --> Configure Build Path… --> Libraries --> Add Library --> IvyDE Managed Dependencies --> Next --> (couple of  IvyDE setting steps) --> OK.
    It may take some time for IvyDE to resolve dependencies.

Create Run Configuration

  1. Right click "build.xml" --> Run As --> Ant Build … (not "Ant Build")
  2. A dialog should pop up
    1. Switch to "Targets" tab: select corresponding target (e.g. compile) you want to execute.
    2. Switch to "JRE" tab: select "separate JRE"
    3. This step is for Windows users.
      Switch to Environment Tab: set PATH. (to include where those linux tools are included on Window)
      You can click "Select" and choose variable "Path". But in my case, its value does NOT include all of the content of the variable (use "path" command in command line). probably, Eclipse has some restriction about length of value of environment variable. If it's too long, it will be truncated.
    4. Click "Run"
  3. See Console for messages.

Customize project builder

It's more convenient to use "Builder" than right click "build.xml" --> Run As --> Ant Build … --> Run. Following steps tell you how to use ant as default builder. Then you can use "Project-->Build Project" to build your project (same as any regular native Eclipse Java project).

  1. Right click project name --> Properties --> Builder --> New --> Ant Builder
    1. Select the build file (usually "build.xml").
    2. Switch to "Targets" tab.  Specify which targets are executed when the project is built or cleaned.
    3. This step is for Windows users.
      Switch to "Environment" tab. Add PATH environment variable if needed. (to include where those linux tools are included on Window)
  2. Deselect default java builder.