java - How can I get content of all scripts in HTML -
i making java program, involves extraction of tags webpage.for parsing, using jsoup, working fine.but there problem number of tags in downloaded page. have 4 files:-
- goog1.htm (which saved https://www.google.co.in through browser )
- goog2.html (which downloaded using command ' wget https://www.google.co.in ')
- goog3.html (which downloaded through java program using bufferedreader & inputstreamreader)
- goog4.html (which copying whole code ' view-source:https://www.google.co.in/ ' )
when searched string "< script/>" in these 4 files, gave different results.
- goog1.htm - 16 times
- goog2.html - 5 times
- goog3.html - 5 times
- goog4.html - 10 times
what's reason difference ? how script tags page ?
which file should use testing program ?
thanks in advance...
1) reason of different number of script tags there can multiple script tags defined in html page.
2) script tags in page loaded , running. if wanna test script codes, need test of them. depends on testing scope.
3) if handled taking content text java program can script tags content parsing substring methods. recommend using apache commons stringutils class this.
import org.apache.commons.lang.stringutils; public class scriptcontentretriever{ public static void main(string[] args) { string yourscriptcontent = "<script>this script 1 content</script><script>this script 2 content</script>"; string[] scriptstrings = stringutils.substringsbetween(yourscriptcontent, "<script>", "</script>"); (string scriptstring : scriptstrings) { //do ever want script content right here. system.out.println(scriptstring); } } }
Comments
Post a Comment