hadoop - Pattern match input files for Amazon Elastic MapReduce -


i trying run mapreduce streaming job takes input files directories in s3 bucket match given pattern. pattern bucket-name/[date]/product/logs/[hour]/[logfilename]. example log in while bucket-name/2013-05-02/product/logs/05/log123456789.

i can job work passing hour portion of file name wildcard. example: bucket-name/2013-05-02/product/logs/*/. picks each log file each hour, , passes them individually mappers.

the problem comes try make date wildcard, example: bucket-name/*/product/logs/*/. when this, job gets created no tasks created , fails. error printed in syslog.

2013-05-02 08:03:41,549 error org.apache.hadoop.streaming.streamjob (main): job not successful. error: job initialization failed: java.lang.outofmemoryerror: java heap space     @ java.util.regex.matcher.<init>(matcher.java:207)     @ java.util.regex.pattern.matcher(pattern.java:888)     @ org.apache.hadoop.conf.configuration.substitutevars(configuration.java:378)     @ org.apache.hadoop.conf.configuration.get(configuration.java:418)     @ org.apache.hadoop.conf.configuration.getlong(configuration.java:523)     @ org.apache.hadoop.mapred.skipbadrecords.getmappermaxskiprecords(skipbadrecords.java:247)     @ org.apache.hadoop.mapred.taskinprogress.<init>(taskinprogress.java:146)     @ org.apache.hadoop.mapred.jobinprogress.inittasks(jobinprogress.java:722)     @ org.apache.hadoop.mapred.jobtracker.initjob(jobtracker.java:4238)     @ org.apache.hadoop.mapred.eagertaskinitializationlistener$initjob.run(eagertaskinitializationlistener.java:79)     @ java.util.concurrent.threadpoolexecutor$worker.runtask(threadpoolexecutor.java:886)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:908)     @ java.lang.thread.run(thread.java:662)  2013-05-02 08:03:41,549 info org.apache.hadoop.streaming.streamjob (main): killjob... 

on further testing, looks multiple wildcard syntax works expected in command line client. had trouble getting work @ first, before realizing requiring ruby 1.8.7 meant requires exactly ruby 1.8.7, , nothing later.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -