hadoop - Convert HDFS SequenceFile from SnappyCodec to DefaultCodec -
given choice, i'd love can run hadoop shell vs mr job. have few files need converted.
some code untested should trick (obviously file names made - sequence files don't typically have extensions):
configuration conf = new configuration(); filesystem fs = filesystem.get(conf); path inputpath = new path("part-r-00000.snappy"); path outputpath = new path("part-r-00000.deflate"); fsdataoutputstream dos = fs.create(outputpath); sequencefile.reader reader = new sequencefile.reader(fs, inputpath, conf); writable key = (writable) reflectionutils.newinstance( reader.getkeyclass(), conf); writable value = (writable) reflectionutils.newinstance( reader.getvalueclass(), conf); compressioncodecfactory ccf = new compressioncodecfactory(conf); compressioncodec codec = ccf.getcodecbyclassname(defaultcodec.class .getname()); sequencefile.writer writer = sequencefile.createwriter(conf, dos, key.getclass(), value.getclass(), reader.getcompressiontype(), codec); while (reader.next(key, value)) { writer.append(key, value); } reader.close(); dos.close();
you should acquire configuration via toolrunner
/ tool
pattern - here's similar question outlines incase new principal you:
Comments
Post a Comment