Need regex to extract fields from string -
i need extract title, location, , price string this:
10' starcraft pop camper (newport) $5500 it should obvious which.
however, there cases this:
10' (approx.) starcraft pop camper (drigg's town, pa) $5500 _
when use simple regex, can match first string correctly, not second:
^(?<title>.+?) \((?<area>.+?)\) \$(?<price>[\d]+)$ _
i'm pretty sure lookaheads/backreferences can handle this, don't know how. can me out explanation? (and maybe references easy read article on subject.)
with 2 examples, best can suggest change lazy quantifier greedy quantifier title capturing group:
^(?<title>.+) \((?<area>.+?)\) \$(?<price>[\d]+)$ ^^ here effectively, pattern in area capturing group capture text inside last brackets () (providing followed text can matched price capturing group).
the greedy quantifier in title consumes text possible, , force area capturing group take furthest possible match.
another way make sure sub-pattern in area capturing group not contain ():
^(?<title>.+) \((?<area>[^()]+)\) \$(?<price>[\d]+)$ ^^ ^^^^^^ here here i remove lazy quantifier, since redundant. there 1 way match bracket () characters, before , after text captured area capturing group.
the 2 solutions above assume area never contain bracket () characters. pattern going more complicated if want allow that.
Comments
Post a Comment