I'm trying to crawl my college website and I set cookie, add headers then:
homepage=opener.open("website") content = homepage.read() print content
I can get the source code sometimes but sometime just nothing.
I can't figure it out what happened.
Is my code wrong?
Or the web matters?
geturl() can use to get double or even more redirect?
redirect = urllib2.urlopen(info_url) redirect_url = redirect.geturl() print redirect_url
It can turn out the final url, but sometimes gets me the middle one.