hadoop - Pattern match input files for Amazon Elastic MapReduce -
i trying run mapreduce streaming job takes input files directories in s3 bucket match given pattern. pattern bucket-name/[date]/product/logs/[hour]/[logfilename]
. example log in while bucket-name/2013-05-02/product/logs/05/log123456789
.
i can job work passing hour portion of file name wildcard. example: bucket-name/2013-05-02/product/logs/*/
. picks each log file each hour, , passes them individually mappers.
the problem comes try make date wildcard, example: bucket-name/*/product/logs/*/
. when this, job gets created no tasks created , fails. error printed in syslog.
2013-05-02 08:03:41,549 error org.apache.hadoop.streaming.streamjob (main): job not successful. error: job initialization failed: java.lang.outofmemoryerror: java heap space @ java.util.regex.matcher.<init>(matcher.java:207) @ java.util.regex.pattern.matcher(pattern.java:888) @ org.apache.hadoop.conf.configuration.substitutevars(configuration.java:378) @ org.apache.hadoop.conf.configuration.get(configuration.java:418) @ org.apache.hadoop.conf.configuration.getlong(configuration.java:523) @ org.apache.hadoop.mapred.skipbadrecords.getmappermaxskiprecords(skipbadrecords.java:247) @ org.apache.hadoop.mapred.taskinprogress.<init>(taskinprogress.java:146) @ org.apache.hadoop.mapred.jobinprogress.inittasks(jobinprogress.java:722) @ org.apache.hadoop.mapred.jobtracker.initjob(jobtracker.java:4238) @ org.apache.hadoop.mapred.eagertaskinitializationlistener$initjob.run(eagertaskinitializationlistener.java:79) @ java.util.concurrent.threadpoolexecutor$worker.runtask(threadpoolexecutor.java:886) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:908) @ java.lang.thread.run(thread.java:662) 2013-05-02 08:03:41,549 info org.apache.hadoop.streaming.streamjob (main): killjob...
on further testing, looks multiple wildcard syntax works expected in command line client. had trouble getting work @ first, before realizing requiring ruby 1.8.7 meant requires exactly ruby 1.8.7, , nothing later.
Comments
Post a Comment