Need regex to extract fields from string -
i need extract title, location, , price string this:
10' starcraft pop camper (newport) $5500
it should obvious which.
however, there cases this:
10' (approx.) starcraft pop camper (drigg's town, pa) $5500
_
when use simple regex, can match first string correctly, not second:
^(?<title>.+?) \((?<area>.+?)\) \$(?<price>[\d]+)$
_
i'm pretty sure lookaheads/backreferences can handle this, don't know how. can me out explanation? (and maybe references easy read article on subject.)
with 2 examples, best can suggest change lazy quantifier greedy quantifier title
capturing group:
^(?<title>.+) \((?<area>.+?)\) \$(?<price>[\d]+)$ ^^ here
effectively, pattern in area
capturing group capture text inside last brackets ()
(providing followed text can matched price
capturing group).
the greedy quantifier in title
consumes text possible, , force area
capturing group take furthest possible match.
another way make sure sub-pattern in area
capturing group not contain ()
:
^(?<title>.+) \((?<area>[^()]+)\) \$(?<price>[\d]+)$ ^^ ^^^^^^ here here
i remove lazy quantifier, since redundant. there 1 way match bracket ()
characters, before , after text captured area
capturing group.
the 2 solutions above assume area
never contain bracket ()
characters. pattern going more complicated if want allow that.
Comments
Post a Comment