Regex: Retrieve values from one of the two HTML tags. -


i'm using outwit hub scrape company names website.

in pages, html tag this:

<p style="font-weight: bold;">company name</p> 

while in other pages:

<span style="font-weight: bold;">company name</span> 

all pages use 1 of above 2 options, never both.

if you're not familiar outwit hub, works asking marker before, , marker after piece of information want.

i'm trying create regex retrieve company name, regardless of 1 of markers used whether before or after.

so far have tried 'before' tag, doesn't work:

/[<p style="font-weight: bold;">]|[<p>name of company: <span style="font-weight: bold;">]/ 

can help?

lose square brackets ([...]), these used specify character class or character set, not sequence of characters.

/<p style="font-weight: bold;">|<p>name of company: <span style="font-weight: bold;">/ 

for understanding , debugging regular expressions, check out regexpr.

however, others have commented, regular expressions aren't reliable approach parsing html. example, how know there never other paragraphs or spans on page style of font-weight: bold?

if know c# html agility pack useful library parsing html. may overkill needs though.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -