numpy - Extract data from a txt file that contains data blocks and strings using python -
i need postprocess output file model using python. output file has combination of data , strings. first, want separate strings data , save columns 0,1 , 2 each output time (only data, no strings) in separate text file. so, below example, have 3 text files (for time=0, time=0.01, time=0.04) each containing data each output time without header or other strings in them. short form of output file model looks this:
******* program ****** ******* program ****** ******* program ****** date: 26. 4. time: 15:40:32 units: l = cm , t = days , m = mmol time: 0.000000 node depth head moisture headf moisturef flux [l] [l] [-] [l] [-] [l/t] 1 0.00 -1000.00 0.1088 -1000.00 0.002508 -0.562e-03 2 -0.04 -1000.00 0.1088 -1000.00 0.002508 -0.562e-03 3 -0.08 -1000.00 0.1088 -1000.00 0.002508 -0.562e-03 end time: 0.010000 node depth head moisture headf moisturef flux [l] [l] [-] [l] [-] [l/t] 1 0.00 -666.06 0.1304 -14.95 0.139033 -0.451e-02 2 -0.04 -666.11 0.1304 -15.01 0.138715 -0.887e-02 3 -0.08 -666.35 0.1304 -15.06 0.138394 -0.174e-01 end time: 0.040000 node depth head moisture headf moisturef flux [l] [l] [-] [l] [-] [l/t] 1 0.00 -324.87 0.1720 -12.30 0.157799 -0.315e-02 2 -0.04 -324.84 0.1720 -12.31 0.157724 -0.628e-02 3 -0.08 -324.83 0.1720 -12.32 0.157649 -0.125e-01 end
i found following code question posted in stackoverflow earlier. here link question: enter link description here
that problem similar mine; however, have problems modifying solve problem. how should modify problem? or should use strategy approach problem?
def parse_dpt(lines): dpt = [] while lines: line = lines.pop(0).lstrip() if line == ' ' or line.startswith('*'): continue if line.startswith('*'): lines.insert(0, line) break data = line.split(' ') # pick columns 0, 1, 2 , # convert appropiate numeric format # , append list current dpt , step dpt.append([int(data[0]), float(data[1]), float(data[2])]) return dpt raw = [] open('nod_inftest.txt') nit: lines = nit.readlines() while lines: line = lines.pop(0) if line.startswith(''): if line.find('time:') > -1: raw.append(parse_dpt(lines)) pprint import pprint raw_step in zip(raw): print 'raw:' pprint(raw_step)
here error message python:
'import sitecustomize' failed; use -v traceback traceback (most recent call last): file "c:\users\desktop\python test\p-test3.py", line 58, in <module> raw.append(parse_dpt(lines)) file "c:\users\desktop\python test\p-test3.py", line 35, in parse_dpt dpt.append([int(data[0]), float(data[1]), float(data[2])]) valueerror: invalid literal int() base 10: 'units:'
if understood question code should trick:
import re open('in.txt', 'r') in_file: file_content = in_file.read() blocks = re.findall( 'time:\s*\d+\.\d*(.*?)end', file_content, re.dotall ) file_number = 1 block in blocks: open('out%s.txt'%str(file_number), 'w') out_file: row in re.findall( '\s*(-?\d+.?\d*)\s*(-?\d+.?\d*)\s*(-?\d+.?\d*).*', block): out_file.write(row[0] + ' ' + row[1] + ' ' + row[2] + '\n') file_number += 1
the code assumes files containing text called in.txt
Comments
Post a Comment