java - How can I get content of all scripts in HTML -


i making java program, involves extraction of tags webpage.for parsing, using jsoup, working fine.but there problem number of tags in downloaded page. have 4 files:-

  1. goog1.htm (which saved https://www.google.co.in through browser )
  2. goog2.html (which downloaded using command ' wget https://www.google.co.in ')
  3. goog3.html (which downloaded through java program using bufferedreader & inputstreamreader)
  4. goog4.html (which copying whole code ' view-source:https://www.google.co.in/ ' )

when searched string "< script/>" in these 4 files, gave different results.

  • goog1.htm - 16 times
  • goog2.html - 5 times
  • goog3.html - 5 times
  • goog4.html - 10 times

what's reason difference ? how script tags page ?

which file should use testing program ?

thanks in advance...

1) reason of different number of script tags there can multiple script tags defined in html page.

2) script tags in page loaded , running. if wanna test script codes, need test of them. depends on testing scope.

3) if handled taking content text java program can script tags content parsing substring methods. recommend using apache commons stringutils class this.

import org.apache.commons.lang.stringutils;  public class scriptcontentretriever{  public static void main(string[] args) {         string yourscriptcontent = "<script>this script 1 content</script><script>this script 2 content</script>";         string[] scriptstrings = stringutils.substringsbetween(yourscriptcontent, "<script>", "</script>");         (string scriptstring : scriptstrings) {             //do ever want script content right here.             system.out.println(scriptstring);         }     }  } 

Comments

Popular posts from this blog

php - Why I am getting the Error "Commands out of sync; you can't run this command now" -

linux - Does gcc have any options to add version info in ELF binary file? -

java - Are there any classes that implement javax.persistence.Parameter<T>? -