1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
try
{
    FileSystem dfs = DistributedFileSystem.get(hadoopJobConfiguration);
    final FileStatus[] sts = dfs.listStatus(new Path(this.hdfsDirectory));
    for ( FileStatus s : sts )
    {
        if ( s.getPath().toString().endsWith(".jar") )
        {
            log.info("Jar found: " + s.getPath().toString());
            DistributedCache.addFileToClassPath(new Path(s.getPath().toUri().getPath()), hadoopJobConfiguration);
        }
    }
}
catch (IOException e)
{
    throw new MyException("FileSystem exception while caching JAR files: ", e);
}

Hadoop still manages to surprise me every day. Now, it would certainly make sense if I take Path object from DistributedFileSystem and feed it to DistributedCache’s addFileToClassPath. It would. But it doesn’t work.

In fact, full Path in Hadoop looks like http://hadoop-master-host:9000/path/to/the/file. But if you want to use this path with DistributedCache you need to chop off everything but the path itself, which is /path/to/file in this example. And of course, there’s no other way to find out about this but to try (in fact, I only figured it out because I had some hard-coded constants which did work, while nice and clean code didn’t).

Tagged with:
 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">