c# - Which solutions are faster when extract content from webcrawler -


i have made web crawler using asp.net. it's work well. problem when want extract content it. of content wrap between html tags. have of solutions extract content don't know 1 better. should performance , easy implement.

  1. using regex many patterns extact content.

  2. using linq xml extract content.

  3. using xpath extract content.

somebody please me choose better solutions. think go xpath not sure performance better regex or linq2xml.

many ideas.

none of solutions particularly good.

  1. html not regular language , such not fit regular expressions. see standard response parsing html regex.
  2. html not valid xml

instead, should use html parsing library html agility pack.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -