hadoop - How to run external program within mapper or reducer giving HDFS files as input and storing output files in HDFS? -
i have external program take file input , give output file
//for example input file: in_file output file: out_file //run external program ./vx < ${in_file} > ${out_file}
i want both input , output files in hdfs
i have cluster 8 nodes.and have 8 input files each have 1 line
//1 input file : 1.txt 1:0,0,0 //2 input file : 2.txt 2:0,0,128 //3 input file : 3.txt 3:0,128,0 //5 input file : 4.txt 4:0,128,128 //5 input file : 5.txt 5:128,0,0 //6 input file : 6.txt 6:128,0,128 //7 input file : 7.txt 7:128,128,0 //8 input file : 8.txt 8:128,128,128
i using keyvaluetextinputformat
key :file name value: initial coordinates
for example 5th file
key :5 value:128,0,0
each map tasks generate huge amount of data according initial coordinates.
now want run external program in each map task , generate output file.
but confuse how files in hdfs .
can use 0 reducer , create file in hdfs configuration conf = new configuration(); filesystem fs = filesystem.get(conf); path outfile; outfile = new path(input_file_name); fsdataoutputstream out = fs.create(outfile); //generating data ........ , writing hdfs out.writeutf(lon + ";" + lat + ";" + depth + ";");
i confuse how run external program hdfs file without getting file file local directory .
dfs -get
without using mr getting results shell script following
#!/bin/bash if [ $# -lt 2 ]; printf "usage: %s: <infile> <outfile> \n" $(basename $0) >&2 exit 1 fi in_file=/users/x34/data/$1 out_file=/users/x34/data/$2 cd "/users/x34/projects/externalprogram/model/" ./vx < ${in_file} > ${out_file} paste ${in_file} ${out_file} | awk '{print $1,"\t",$2,"\t",$3,"\t",$4,"\t",$5,"\t",$22,"\t",$23,"\t",$24}' > /users/x34/data/combined if [ $? -ne 0 ]; exit 1 fi exit 0
and run with
processbuilder pb = new processbuilder("shell_script","in", "out"); process p = pb.start();
i appreciate idea how use hadoop streaming or other way run external program .i want both input , output files in hdfs further processing .
please
so assuming external program doesnt know how recognize or read hdfs, want load in file java , pass input directly program
path path = new path("hdfs/path/to/input/file"); filesystem fs = filesystem.get(configuration); fsdatainputstream fin = fs.open(path); processbuilder pb = new processbuilder("shell_script"); process p = pb.start(); outputstream os = p.getoutputstream(); bufferedreader br = new bufferedreader(new inputstreamreader(fin)); bufferedwriter writer = new bufferedwriter(new outputstreamwriter(os)); string line = null; while ((line = br.readline())!=null){ writer.write(line); }
the output can done in reverse manner. inputstream process, , make fsdataoutputstream write hdfs.
essentially program these 2 things becomes adapter converts hdfs input , output hdfs.
Comments
Post a Comment