I am building a Spider in Scrapy that follows all the links it can find, and sends the url to a pipeline. At the moment, this is my code:
from scrapy import Spider from scrapy.http import Request from scrapy.http import TextResponse from scrapy.selector import Selector from scrapyTest.items import TestItem import urlparse class TestSpider(Spider): name = 'TestSpider' allowed_domains = ['pyzaist.com'] start_urls = ['http://pyzaist.com/drone'] def parse(self, response): item = TestItem() item["url"] = response.url yield item links = response.xpath("//a/@href").extract() for link in links: yield Request(urlparse.urljoin(response.url, link))
This does the job, but throws an error whenever the response is just a Response, not a TextResponse or HtmlResponse. This is because there is no Response.xpath(). I tried to test for this by doing:
if type(response) is TextResponse: links = response.xpath("//[email protected]").extract() ...
But to no avail. When I do that, it never enters the if statement. I am new to Python, so it might be a language thing. I appreciate any help.