Regex to detect count of email addresses in email header? -


i have regex detect email address - trying create regex looks in header of email message counts email addresses , ignores email addresses specific domain (abc.com).

for example, there's ten email addresses 1@test.com ignoring 11th address 2@abc.com.

current regex:

^[a-z0-9._%+-]+@[a-z0-9.-]+.[a-z]{2,4}$

consider following powershell example of universal regex.

to find email addresses:

  • <(.*?)> handy if server surrounds email addresses brackets
  • (?<!content-type(.|\n){0,10000000})([a-za-z0-9.!#$%&''*+-/=?\^_``{|}~-]+@(?!abc.com)[a-za-z0-9-]+(?:\.[a-za-z0-9-]+)*) if don't have brackets around email addresses in header. note particular regex copied community wiki answer on stackoverflow 201323 , modified here prevent @abc.com. there edge cases regex not work for. on same page there complex regex looks match every email address. don't have time modify 1 skip @abc.com.

example

    $matches = @()     $string = 'return-path: <example_from@abc123.com> x-spamcatcher-score: 1 [x] received: [136.167.40.119] (helo abc.com)     fe3.abc.com (communigate pro smtp 4.1.8)     esmtp-tls id 61258719 example_to@mail.abc.com; message-id: <4129f3ca.2020509@abc.com> date: wed, 21 jan 2009 12:52:00 -0500 (est) from: taylor evans <remember@to.vote> user-agent: mozilla/5.0 (windows; u; windows nt 5.1; en-us; rv:1.0.1) x-accept-language: en-us, en mime-version: 1.0 to: jon smith <example_to@mail.abc.com> subject: business development meeting content-type: text/plain; charset=us-ascii; format=flowed content-transfer-encoding: 7bit content-type: multipart/alternative; boundary="------------060102080402030702040100" multi-part message in mime format. --------------060102080402030702040100 content-type: text/plain; charset=iso-8859-15; format=flowed content-transfer-encoding: 7bit hello, html mail, has *bold*, /italic /and _underlined_ text. , have table here: cell(1,1) cell(2,1) cell(1,2) cell(2,2) , put picture here: image alt text that''s it. --------------060102080402030702040100 content-type: multipart/related; boundary="------------030904080004010009060206" --------------030904080004010009060206 content-type: text/html; charset=iso-8859-15 content-transfer-encoding: 7bit <!doctype html public "-//w3c//dtd html 4.01 transitional//en"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=iso-8859-15"> </head> <body bgcolor="#ffffff" text="#000000"> hello,<br> <br> html mail, has <b>bold</b>, <i>italic </i>and <u>underlined</u> text.<br> , have table here:<br> <table border="1" cellpadding="2" cellspacing="2" height="62" width="401"> <tbody> <tr> <td valign="top">cell(1,1)<br> </td> <td valign="top">cell(2,1)</td> </tr> <tr> <td valign="top">cell(1,2)</td> <td valign="top">cell(2,2)</td> </tr> </tbody> </table> <br> , put picture here:<br> <br> <img alt="image alt text" src="cid:part1.ffffffff.5555555@example.com" height="79" width="98"><br> <br> that''s it. email me @ test@email.com<br> subject: <br> </body> </html>'      # write-host start  # write-host $string write-host write-host found [array]$found = ([regex]'(?<!content-type(.|\n){0,10000000})([a-za-z0-9.!#$%&''*+-/=?\^_`{|}~-]+@(?!abc.com)[a-za-z0-9-]+(?:\.[a-za-z0-9-]+)*)').matches($string)   $found | foreach {     write-host "key @ $($_.groups[1].index) = '$($_.groups[1].value)'"     } # next match write-host "found $($found.count) matching addresses" 

yields

found key @ 14 = 'example_from@abc123.com' key @ 200 = 'example_to@mail.abc.com' key @ 331 = 'remember@to.vote' key @ 485 = 'example_to@mail.abc.com' found 4 matching addresses 

summary

  • (?<!content-type(.|\n){0,10000000}) prevents content-type appearing within 10,000,000 characters before email address. has effect of preventing email address matches in body of message. because requester using java , java doesn't support use * inside lookbehind i'm using {0,10000000} instead. (see regex behind without obvious maximum length in java). aware may introduce edge cases may not captured expected.
  • <(.*?@(?!abc.com).*?)>
    • ( start return
    • [a-za-z0-9.!#$%&''*+-/=?\^_``{|}~-]+ match 1 or more allowed characters. double single quote escape single quote character powershell. , double tick escapes backtick stackoverflow.
    • @ include first @ sign
    • (?!abc.com) reject find if includes abc.com
    • [a-za-z0-9-]+ continue looking remaining characters non greedy upto first dot or end of string.
    • (?:\.[a-za-z0-9-]+)*) continue looking character chunks followed dot

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -