Loading multiple CSV files into MySQL -
i working on metrics project team. have load several different reports central repository , create tables , reports off of data.
the data sources are:
- csv files
- pdfs
- ad-hoc/manual data.
i playing talend , mysql. little confused how load csv files. should have collection of directories , 1 or more scheduled tasks load files?
another thought write custom file processor load file based on naming convention. thoughts?
"pdf" complicated. pdf... "ad-hoc/manual data" needs more details.
if focus on csv , question related guys if i'm right, i'd writing app calls sp in mysql db, handing on full path csv (and additional data, such table's "user friendly name" if needed - or other meta-data you'd store) executes import using mysql load data.
reason is, there can many rules in "business logic" after csv imported, , it's easier maintain app according changing business requirements, changing db behavior time, and, if goes terribly wrong db safe , "import manager app" fails - don't have store neither nor csvs on same system db is.
dbs, relational dbs storing data, , retrieving data rapidly based on 'set theory', not taking care of how data gets system.
think these questions before start implementing anything:
- what happens csv after processed? can deleted? should moved e.g. "processed" folder? should remain/stay intact?
- if should stay , was, should know processed file? (set "ready archive" flag, instance? touch "last modified" date , set 1950.01.01? add property file?
- what should if csv import fails (e.g. invalid data in file, or null value shouldn't have nulls)? display error? mark csv unusable? send e-mail? move "processing_failed" folder?
- what if file count grows huge in input folder?
- how can change import/process/etc if business logic changes, or csv format changes?
and on. think through options have , decide.
i hope answered question ;)
Comments
Post a Comment