Need regex to extract fields from string -


i need extract title, location, , price string this:

10' starcraft pop camper (newport) $5500 

it should obvious which.

however, there cases this:

10' (approx.) starcraft pop camper (drigg's town, pa) $5500 

_

when use simple regex, can match first string correctly, not second:

^(?<title>.+?) \((?<area>.+?)\) \$(?<price>[\d]+)$ 

_

i'm pretty sure lookaheads/backreferences can handle this, don't know how. can me out explanation? (and maybe references easy read article on subject.)

with 2 examples, best can suggest change lazy quantifier greedy quantifier title capturing group:

^(?<title>.+) \((?<area>.+?)\) \$(?<price>[\d]+)$            ^^           here 

effectively, pattern in area capturing group capture text inside last brackets () (providing followed text can matched price capturing group).

the greedy quantifier in title consumes text possible, , force area capturing group take furthest possible match.


another way make sure sub-pattern in area capturing group not contain ():

^(?<title>.+) \((?<area>[^()]+)\) \$(?<price>[\d]+)$            ^^           ^^^^^^           here           here 

i remove lazy quantifier, since redundant. there 1 way match bracket () characters, before , after text captured area capturing group.


the 2 solutions above assume area never contain bracket () characters. pattern going more complicated if want allow that.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -