string - Need to speed-up this Python script - i think the StringIO is very SLOW -
this working slow.
i have written custom .py convert .gpx .kml. working need sooooo slow: small .gpx of 477k, writing 207k .kml file takes 198 seconds complete! thats absurd , haven't got meaty .gpx size yet.
my hunch is stringio.stringio(x)
that's slow. ideas how speed up?
thanks in anticipation.
here key snips only:
f = open(filename, "r") x = f.read() x = re.sub(r'\n', '', x, re.s) #remove newline returns name = re.search('<name>(.*)</name>', x, re.s) print "attachment name (as recorded gps device): " + name.group(1) x = re.sub(r'<(.*)<trkseg>', '', x, re.s) #strip header x = x.replace("</trkseg></trk></gpx>",""); #strip footer x = x.replace("<trkpt","\n<trkpt"); #make file in lines x = re.sub(r'<speed>(.*?)</speed>', '', x, re.s) #strip speed x = re.sub(r'<extensions>(.*?)</extensions>', '', x, re.s) # strip out extensions
then
#.kml header goes here kmltrack = """<?xml version="1.0" encoding="utf-8"?><kml xmlns="http://www.ope......etc etc
then
buf = stringio.stringio(x) line in buf: if line not none: timm = re.search('time>(.*?)</time', line, re.s) if timm not none: kmltrack += (" <when>"+ timm.group(1)+"</when>\n") checksuma =+ 1 buf = stringio.stringio(x) line in buf: if line not none: lat = re.search('lat="(.*?)" lo', line, re.s) lon = re.search('lon="(.*?)"><ele>', line, re.s) ele = re.search('<ele>(.*?)</ele>', line, re.s) if lat not none: kmltrack += (" <gx:coord>"+ lon.group(1) + " " + lat.group(1) + " " + ele.group(1) + "</gx:coord>\n") checksumb =+ 1 if checksuma == checksumb: #put footer on kmltrack += """ </gx:track></placemark></document></kml>""" else: print ("checksum error") return none open("outfile.kml", "a") myfile: myfile.write(kmltrack) return ("succsesful .kml file-write completed in :" + str(c.seconds) + " seconds.")
once again, working very slow. if can see how speed up, please let me know! cheers
updated
thanks suggestions, all. i'm new python , appreciated hearing profiling. found out it. added script. , looks down 1 thing, 208 of total time of 209 seconds run, happen in 1 line. here snip:
ncalls tottime percall cumtime percall filename:lineno(function) .... 4052 0.013 0.000 0.021 0.000 stringio.py:139(readline) 8104 0.004 0.000 0.004 0.000 stringio.py:38(_complain_ifclosed) 2 0.000 0.000 0.000 0.000 stringio.py:54(__init__) 2 0.000 0.000 0.000 0.000 stringio.py:65(__iter__) 4052 0.010 0.000 0.033 0.000 stringio.py:68(next) 8101 0.018 0.000 0.078 0.000 re.py:139(search) 4 0.000 0.000 208.656 52.164 re.py:144(sub) 8105 0.016 0.000 0.025 0.000 re.py:226(_compile) 35 0.000 0.000 0.000 0.000 rpc.py:149(debug) 5 0.000 0.000 0.010 0.002 rpc.py:208(remotecall) ......
there 4 calls of 52 seconds per call. cprofile says happens on line number 144 script goes 94 lines. how move on this? much.
ok all. cprofile showed re.sub
call, though wasn't sure 1 - though trial , error, didnt take long isolate it. solution fix re.sub
being 'greedy' 'non-greedy' call.
so old header strip call x = re.sub(r'<(.*)<trkseg>', '', x, re.s) #strip header
becomes x = re.sub(r'<?xml(.*?)<trkseg>', '', x, re.s) #strip header fast
.
it finshes heavy .gxp conversions in 0 seconds. difference ?
makes !
Comments
Post a Comment