I get the following log when crawling:
DEBUG: Crawled (302) <GET http://fuyuanxincun.fang.com/xiangqing/> (referer: http://esf.hz.fang.com/housing/151__1_0_0_0_2_0_0/)
DEBUG: Scraped from <302 http://fuyuanxincun.fang.com/xiangqing/>
But it actually returns nothing. How can I deal with these response with status=302?
Any help would be much appreciated !
Best How To :
The HTTP status 302 means Moved Temporarily. When I do a HTTP GET request to the url http://fuyuanxincun.fang.com/xiangqing/ it show's me a HTTP 200 status. It's common that the server won't send anything after sending the 302 statuscode (altough technically sending data after a 302 is possible).
The reason why you get a HTTP 302 status can be one of the following:
- The website does not serve it's content when a specific referer (like: http://esf.hz.fang.com/housing/151__1_0_0_0_2_0_0/) is present.
- You didn't send the HTTP headers the server wants to see. For example like a certain User-Agent. The website can decide to reject requests without a specific header by sending a HTTP 302 status instead of a HTTP 200 status.
- The specific IP-address you try to send the request from is excluded by the website you try to gather.
I would recommend to:
- Make the request look like a "real" browser request (communicate similair headers).
- Try to send the request from another IP-address.
- Try to send the request with a (randomized) User-Agent.
I did the request at UTC time 07:30:29 Wednesday, 13 May 2015, the behavior of the website could be changed in the time between your and my request.
Also it can be helpfull to post the full RAW HTTP request and response.