java - Massive multiprogramming and read-only file access -


i trying create dictionary-based tagger running on hadoop cluster using pig. basically, does, each document (quite large text documents, few mbs) run each word in each sentence against dictionary read corresponding value.

there few hundred java programs (not threads) running in parallel, using dictionary file in read-only mode. idea load dictionary text , create map query against it.

question: should prepared for? remotely logic want read file in multiprogramming environment or should first copy (relatively small) file each instance of program? bufferedreader should use while reading file?

there little structured documentation on multiprogramming (compared multithreading) bit afraid of running against wall doing so.

note: allowed answer way of thinking totally wrong if provide me better way ;-)

i think approach fine. should load dictionary distributedcache memory, , checks memory-loaded dictionary (e.g., hashmap).


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -