regex - Coldfusion - Simple HTML Parsing -


we have articles posted onto our site. can appear following types of html

<p>this article<br> <img src="someimage"> </p>  <p>this article<br> <img src="someimage"> </p>  <p>this article<br> <img src="someimage"> </p>  <p>this article<br> <img src="someimage"> </p> 

or

<p><img src="someimage"> article<br> </p> <p>this article<br> <img src="someimage"> </p> <p><img src="someimage"> article<br> </p> 

some other html tags may inside sometimes, cant head around how scrape page using coldfusion achieve this

esentially need grab hold of first paragraph text , image , able arrange it.

is possible using coldfusion 8 ? able point me in direction on how learn ?

100% possible!

now, don't put off i'm going suggest, it's easy going this.

download library called jsoup...it's sole purpose scraping contents dom in web page:

http://jsoup.org/

you use java class doing like:

<!--- page. ---> <cfhttp method="get" url="http://example.com/" resolveurl="true" useragent="#cgi.http_user_agent#" result="mypage" timeout="10" charset="utf-8"> <cfhttpparam type="header" name="accept-encoding" value="*" />    <cfhttpparam type="header" name="te" value="deflate;q=0" />         </cfhttp>  <!--- load jsoup , parse document it. ---> <cfset jsoup = createobject("java", "org.jsoup.jsoup") /> <cfset document = jsoup.parse(mypage.filecontent) />  <!--- search parsed document contents of title tag. ---> <cfset title = document.select("title").first() />  <!--- let's see got. ---> <cfdump var="#title#" /> 

this example pretty simple can show how easy work with. scraping images , whatever else easy if check out docs on jsoup.

there examples on page, can use css style selectors:

http://jsoup.org/cookbook/extracting-data/selector-syntax

try avoid using regex task - believe me, i've tried , it's absolute can of worms!

hope helps. mikey.


Comments

Popular posts from this blog

php - Why I am getting the Error "Commands out of sync; you can't run this command now" -

linux - Does gcc have any options to add version info in ELF binary file? -

java - Are there any classes that implement javax.persistence.Parameter<T>? -