java - The loss of key-value pair of Map output -


i wrote hadoop program on input of mapper text file hdfs://192.168.1.8:7000/export/hadoop-1.0.1/bin/input/paths.txt written ways of local file system (which identical on computers of cluster) program ./readwritepaths in 1 line , partitioned character |. @ first in mapper there reading quantity of subordinate nodes of cluster /usr/countcomputers.txt file, equally 2 read correctly, judging program execution. further contents of input file arrived in form of value on input of mapper , transformed line, segmented means of separator | , received ways added in arraylist<string> paths.

package org.myorg;  import java.io.*; import java.util.*; import org.apache.hadoop.fs.path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*;  public class parallelindexation {     public static class map extends mapreducebase implements             mapper<longwritable, text, text, longwritable> {         private final static longwritable 0 = new longwritable(0);         private text word = new text();          public void map(longwritable key, text value,                 outputcollector<text, longwritable> output, reporter reporter)                 throws ioexception {             string line = value.tostring();             int countcomputers;             fileinputstream fstream = new fileinputstream(                     "/usr/countcomputers.txt");              bufferedreader br = new bufferedreader(new inputstreamreader(fstream));             string result=br.readline();             countcomputers=integer.parseint(result);             in.close();             fstream.close();             system.out.println("countcomputers="+countcomputers);             arraylist<string> paths = new arraylist<string>();             stringtokenizer tokenizer = new stringtokenizer(line, "|");             while (tokenizer.hasmoretokens()) {                 paths.add(tokenizer.nexttoken());             } 

then check take out values of arraylist<string> paths elements /export/hadoop-1.0.1/bin/readpathsfromdatabase.txt file contents given below , speaks correctness of filling of arraylist<string> paths.

            printwriter zzz = null;             try             {                     zzz = new printwriter(new fileoutputstream("/export/hadoop-1.0.1/bin/readpathsfromdatabase.txt"));             }             catch(filenotfoundexception e)             {                     system.out.println("error");                     system.exit(0);             }             (int i=0; i<paths.size(); i++)             {                     zzz.println("paths[" + + "]=" + paths.get(i) + "\n");             }             zzz.close(); 

then concatenation of these ways through character \n , record of connected results in array string[] concatpaths = new string [countcomputers] made.

        string[] concatpaths = new string[countcomputers];         int numberofelementconcatpaths = 0;         if (paths.size() % countcomputers == 0) {             (int = 0; < countcomputers; i++) {                 concatpaths[i] = paths.get(numberofelementconcatpaths);                 numberofelementconcatpaths += paths.size() / countcomputers;                 (int j = 1; j < paths.size() / countcomputers; j++) {                     concatpaths[i] += "\n"                             + paths.get(i * paths.size() / countcomputers                                     + j);                 }             }         } else {             numberofelementconcatpaths = 0;             (int = 0; < paths.size() % countcomputers; i++) {                 concatpaths[i] = paths.get(numberofelementconcatpaths);                 numberofelementconcatpaths += paths.size() / countcomputers                         + 1;                 (int j = 1; j < paths.size() / countcomputers + 1; j++) {                     concatpaths[i] += "\n"                             + paths.get(i                                     * (paths.size() / countcomputers + 1)                                     + j);                 }             }             (int k = paths.size() % countcomputers; k < countcomputers; k++) {                 concatpaths[k] = paths.get(numberofelementconcatpaths);                 numberofelementconcatpaths += paths.size() / countcomputers;                 (int j = 1; j < paths.size() / countcomputers; j++) {                     concatpaths[k] += "\n"                             + paths.get((k - paths.size() % countcomputers)                                     * paths.size() / countcomputers                                     + paths.size() % countcomputers                                     * (paths.size() / countcomputers + 1)                                     + j);                 }             }         } 

i take out array cells string[] concatpaths /export/hadoop-1.0.1/bin/concatpaths.txt file check correctness of concatenation. text of file received , given below speaks correctness of previous operation stages.

        printwriter zzz1 = null;         try         {                     zzz1 = new printwriter(new fileoutputstream("/export/hadoop-1.0.1/bin/concatpaths.txt"));         }         catch(filenotfoundexception e)         {                     system.out.println("error");                     system.exit(0);         }         (int = 0; < concatpaths.length; i++)          {                     zzz1.println("concatpaths[" + + "]=" + concatpaths[i] + "\n");             }         zzz1.close(); 

on output of mapper array cells string[] concatpaths - connected ways arrive.

        (int = 0; < concatpaths.length; i++)          {             word.set(concatpaths[i]);             output.collect(word, zero);         } 

in reducers there partition of input keys on part means of separator \n , record of received ways in arraylist<string> processedpaths.

public static class reduce extends mapreducebase implements         reducer<text, intwritable, text, longwritable> {     public native long traveser(string path);      public native void configure(string path);      public void reduce(text key, iterator<intwritable> value,             outputcollector<text, longwritable> output, reporter reporter)             throws ioexception {         long count=0;         string line = key.tostring();         arraylist<string> processedpaths = new arraylist<string>();         stringtokenizer tokenizer = new stringtokenizer(line, "\n");         while (tokenizer.hasmoretokens()) {             processedpaths.add(tokenizer.nexttoken());         } 

further validation of separation of separate ways bring elements out of connected keys arraylist<string> processedpaths in /export/hadoop-1.0.1/bin/processedpaths.txt file. contents of file on both subordinate nodes appeared equally , represented separate ways second connected key , in spite of fact on output of mepper arrived 2 different connected ways. surprising - result of operation of subsequent lines of reducer realize file indexing on received ways, introduction of words these files in database table, 1 file - /export/hadoop-1.0.1/bin/error.txt belongs first connected key indexed.

        printwriter zzz2 = null;             try             {                     zzz2 = new printwriter(new fileoutputstream("/export/hadoop-1.0.1/bin/processedpaths.txt"));             }             catch(filenotfoundexception e)             {                     system.out.println("error");                     system.exit(0);             }             (int i=0; < processedpaths.size(); i++)             {                 zzz2.println("processedpaths[" + + "]=" + processedpaths.get(i) + "\n");             }             zzz2.close();                    configure("/etc/nsindexer.conf");         (int = 0; < processedpaths.size(); i++) {             count = traveser(processedpaths.get(i));         }         output.collect(key, new longwritable(count)); 

execution of program happened of following of bash of script

#!/bin/bash cd /export/hadoop-1.0.1/bin ./hadoop namenode -format ./start-all.sh ./hadoop fs -rmr hdfs://192.168.1.8:7000/export/hadoop-1.0.1/bin/output ./hadoop fs -rmr hdfs://192.168.1.8:7000/export/hadoop-1.0.1/bin/input ./hadoop fs -mkdir hdfs://192.168.1.8:7000/export/hadoop-1.0.1/input ./readwritepaths sleep 120 ./hadoop fs -put /export/hadoop-1.0.1/bin/input/paths.txt hdfs://192.168.1.8:7000/export/hadoop-1.0.1/bin/input/paths.txt 1> copyinhdfs.txt 2>&1 ./hadoop jar /export/hadoop-1.0.1/bin/parallelindexation.jar org.myorg.parallelindexation /export/hadoop-1.0.1/bin/input /export/hadoop-1.0.1/bin/output -d mapred.map.tasks=1 -d mapred.reduce.tasks=2 1> resultofexecute.txt 2>&1 

according last command mepper shall one. despite these files /export/hadoop- 1.0.1/bin/readpathsfromdatabase.txt , /export/hadoop-1.0.1/bin/concatpaths.txt appeared on both subordinate nodes. give contents of above-mentioned files hdfs://192.168.1.8:7000/export/hadoop-1.0.1/bin/input/paths.txt

/export/hadoop-1.0.1/bin/error.txt|/root/nexenta_search/nsindexer.conf|/root/nexenta_search/traverser.c|/root/nexenta_search/buf_read.c|/root/nexenta_search/main.c|/root/nexenta_search/avl_tree.c| 

/export/hadoop-1.0.1/bin/readpathsfromdatabase.txt

paths[0]=/export/hadoop-1.0.1/bin/error.txt  paths[1]=/root/nexenta_search/nsindexer.conf  paths[2]=/root/nexenta_search/traverser.c  paths[3]=/root/nexenta_search/buf_read.c  paths[4]=/root/nexenta_search/main.c  paths[5]=/root/nexenta_search/avl_tree.c 

/export/hadoop-1.0.1/bin/concatpaths.txt

concatpaths[0]=/export/hadoop-1.0.1/bin/error.txt /root/nexenta_search/nsindexer.conf /root/nexenta_search/traverser.c  concatpaths[1]=/root/nexenta_search/buf_read.c /root/nexenta_search/main.c /root/nexenta_search/avl_tree.c 

/export/hadoop-1.0.1/bin/processedpaths.txt

processedpaths[0]=/root/nexenta_search/buf_read.c  processedpaths[1]=/root/nexenta_search/main.c  processedpaths[2]=/root/nexenta_search/avl_tree.c 

in connection want ask 3 questions:

  1. why texts of /export/hadoop-1.0.1/bin/processedpaths.txt files on both nodes identical , such, provided here?
  2. why 1 file - /export/hadoop-1.0.1/bin/error.txt result indexed?
  3. why mapper executed on both subordinate nodes?


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -