php - Hebrew special charcters in regexp -
this code:
preg_replace('/[^{hebrew}a-za-z0-9_ %\[\]\.\(\)%&-]/s', '', $q); it's supposed accept a-z, a-z, 0-9, number of single white spaces , hebrew charcters.
i tried in many varations , couldn't work.
thanks in advance!
in pcre, \p{xx} , \p{xx} can take in either unicode category name or unicode script name. list can found in php documentation or in pcre man page.
for hebrew script, need use \p{hebrew}.
i remove escape \ ., (, ), since loses special meaning inside character class []. s flag (dotall) useless, since there no dot metacharacter in regex.
preg_replace('/[^\p{hebrew}a-za-z0-9_ %\[\].()&-]/', '', $q); appendix
from unicode faqs. explains difference between blocks , scripts. information, pcre has support matching unicode scripts , unicode categories (character properties).
q: if unicode blocks aren't code pages, they?
a: blocks in unicode standard named ranges of code points. used organize standard groupings of related kinds of characters, convenience in reference. , used charting program define ranges of characters printed out code charts seen in book or posted online.
q: unicode blocks have defined character properties?
a: no. character properties associated encoded characters themselves, rather blocks encoded in.
q: apply script characters?
a: yes. example, thai block contains thai characters have thai script property, contains character baht currency sign, used in thai text, of course, defined have common script property. find script property value character need rely on unicode character database data file, scripts.txt, rather block value alone.
q: block value not same script value?
a: correct. in cases, such latin, encoded characters spread across many dozen different unicode blocks. unfortunate, result of history of standard. in other instances, single block may contain characters of more 1 script. example, greek , coptic block contains characters of greek script, few historic characters of coptic script.
Comments
Post a Comment