Menu
  • HOME
  • TAGS

Scraping the second page of a website in Python does not work

python,python-2.7,web-scraping,beautifulsoup,urlopen

The page is of a quite asynchronous nature, there are XHR requests forming the search results, simulate them in your code using requests. Sample code as a starting point for you: from bs4 import BeautifulSoup import requests url = 'http://www.amazon.com/Best-Sellers-Books-Architecture/zgbs/books/173508/#2' ajax_url = "http://www.amazon.com/Best-Sellers-Books-Architecture/zgbs/books/173508/ref=zg_bs_173508_pg_2" def get_books(data): soup = BeautifulSoup(data) for title...

urllib.request.urlopen(url) how to use this function with ip address?

network-programming,urllib2,urllib,urlopen,urllib3

Solved the problem when I look in to urllib literally. Actually what I need is urllib2 but because of I'm using python3.4 I souldn't import urllib it causes python use urllib part not urllib2. After importing urlib.request only and writing the url part as http://192.168.1.2 instead of 192.168.1.2 it works...

Improve URL reachable check

python,multithreading,url,urlopen

When I was running a crawler, I had all of my URLs prioritized by domain name. Basically, my queue of URLs to crawl was really a queue of domain names, and each domain name had a list of URLs. When it came time to get the next URL to crawl,...

with urllib urlopen read function but get none

python,urlopen

Hum I try with the python package requests and first have an error : requests.exceptions.TooManyRedirects: Exceeded 30 redirects. It seems it redirects from url to another and loop like that. Maybe it failed with urllib. Also I checked doc of urlopen and seems to have some problem with https request....

Using urlopen to open list of urls

python,urllib,urlopen

In your code there are some errors: You define getUrls with variable arguments list (the tuple in your error); You manage getUrls arguments as a single variable (list instead) You can try with this code import urllib2 import shutil urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com'] def getUrl(urls): for url in urls:...

python encoding json with 'æøå'

python,json,urlopen

Your decoded data contains Unicode strings, so you need to look things up using Unicode strings: print addressline % \ (adresse[u'etrs89koordinat'][u'øst'], adresse[u'etrs89koordinat'][u'nord']) (You might find it works for strings that only contain unaccented characters whether you use Unicode strings or not, because of the automatic conversion between Unicode and your...

data from url open in list

python,list,urlopen

I think you're overcomplicating it: print "Downloading with urllib2" f = urllib2.urlopen(malwareurl) ips = f.read().split("\r\n") # If you want to add '/32' to each IP ips = [x + "/32" for x in ips if x] ...

Python try/except statement for socket error Errno 10060

python,urlopen,try-except

As your code raises an IOError, run this code, but substitute your line of error for the raise. try: raise IOError except IOError: time.sleep(20) pass else: break ...

Python urllib2.urlopen returns a HTTP error 503 [closed]

python,urllib2,urlopen

The service is not currently working. curl: curl -i "http://www.gametracker.com/server_info/94.250.218.247:25200/top_players/" also returns a 503: HTTP/1.1 503 Service Temporarily Unavailable Date: Mon, 08 Dec 2014 09:37:17 GMT Content-Type: text/html; charset=UTF-8 Server: cloudflare-nginx The service is using CloudFlare, which provides a form of DDoS protection that requires you to use a full...