python - Establishing session with web app to crawl -


i planning write website crawler in python using requests , pyquery.

however, site targeting requires me signed account. using requests, possible me establish session server (using credentials site), , use session crawl sites have access when logged in?

i hope question clear, thank you.

yes possible.

i don't know pyquery i've made crawlers log in sites using urllib2. need use cookiejar handle cookies , send login form using request.

if ask more specific try more explicit too.

le: urllib2 not mess. it's best library such things in opinion.

here's code snipet log in site (after can parse site normally):

import urllib import urllib2 import cookielib  """adding cookie support""" cj = cookielib.cookiejar() opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj)) urllib2.install_opener(opener)  """next log in site. actual url different , data. should check log in form see parameters takes , values.  """ data = {'username' : 'foo',         'password' : 'bar'        } data = urllib.urlencode(data) urllib2.urlopen('http://www.siteyouwanttoparse.com/login', data) #this should log in  """now can parse site""" html = urllib2.urlopen('http://www.siteyoutwanttoparse.com').read() print html 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -