python - Establishing session with web app to crawl -
i planning write website crawler in python using requests , pyquery.
however, site targeting requires me signed account. using requests, possible me establish session server (using credentials site), , use session crawl sites have access when logged in?
i hope question clear, thank you.
yes possible.
i don't know pyquery i've made crawlers log in sites using urllib2. need use cookiejar handle cookies , send login form using request.
if ask more specific try more explicit too.
le: urllib2 not mess. it's best library such things in opinion.
here's code snipet log in site (after can parse site normally):
import urllib import urllib2 import cookielib """adding cookie support""" cj = cookielib.cookiejar() opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj)) urllib2.install_opener(opener) """next log in site. actual url different , data. should check log in form see parameters takes , values. """ data = {'username' : 'foo', 'password' : 'bar' } data = urllib.urlencode(data) urllib2.urlopen('http://www.siteyouwanttoparse.com/login', data) #this should log in """now can parse site""" html = urllib2.urlopen('http://www.siteyoutwanttoparse.com').read() print html
Comments
Post a Comment