Python Regex doesn't match . (dot) as a character -
i have regex matches 3 characters words in string:
\b[^\s]{3}\b
when use string:
and tiger attacked you.
this result:
regex = re.compile("\b[^\s]{3}\b") regex.findall(string) [u'and', u'the', u'you']
as can see matches word of 3 characters, want expression take "you." "." 4 chars word.
i have same problem ",", ";", ":", etc.
i'm pretty new regex guess happens because characters treated word boundaries.
is there way of doing this?
thanks in advance,
edit
thaks answers of @brenbarn , @kendall frey managed regex looking for:
(?<!\w)[^\s]{3}(?=$|\s)
if want make sure word preceded , followed space (and not period happening in case), use lookaround.
(?<=\s)\w{3}(?=\s)
if need match punctuation part of words (such 'in.') \w
won't adequate, , can use \s
(anything space)
(?<=\s)\s{3}(?=\s)
Comments
Post a Comment