sigizmund.com
Hadoop’s “DistributedFileSystem vs DistributedCache” mystery
By sigizmund On March 23, 2010 · 352Leave a Commenthttp%3A%2F%2Fsigizmund.com%2Fhadoops-distributedfilesystem-vs-distributedcache-mystery%2FHadoop%27s+%22DistributedFileSystem+vs+DistributedCache%22+mystery2010-03-23+16%3A52%3A22sigizmundhttp%3A%2F%2Fsigizmund.com%2Fhadoops-distributedfilesystem-vs-distributedcache-mystery%2F
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | try { FileSystem dfs = DistributedFileSystem.get(hadoopJobConfiguration); final FileStatus[] sts = dfs.listStatus(new Path(this.hdfsDirectory)); for ( FileStatus s : sts ) { if ( s.getPath().toString().endsWith(".jar") ) { log.info("Jar found: " + s.getPath().toString()); DistributedCache.addFileToClassPath(new Path(s.getPath().toUri().getPath()), hadoopJobConfiguration); } } } catch (IOException e) { throw new MyException("FileSystem exception while caching JAR files: ", e); } |
Hadoop still manages to surprise me every day. Now, it would certainly make sense if I take Path object from DistributedFileSystem and feed it to DistributedCache’s addFileToClassPath. It would. But it doesn’t work.
In fact, full Path in Hadoop looks like http://hadoop-master-host:9000/path/to/the/file. But if you want to use this path with DistributedCache you need to chop off everything but the path itself, which is /path/to/file in this example. And of course, there’s no other way to find out about this but to try (in fact, I only figured it out because I had some hard-coded constants which did work, while nice and clean code didn’t).
-
Categories
-
Calendar
May 2012 M T W T F S S « Feb 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 -
Meta




