I'm trying to estimate the progress of a spider by counting how many start_url it has processed but I'm not sure how to detect this. I know it's nowhere near a real measure of current progress as the spider has no clue how big the remaining sites to be crawled are. Any ideas on how to get the current count of processed start_urls?
Best How To :
It looks like you might be able to accomplish this through the use of signals. Specifically, the item_scraped signal which allows you to register an event after an item is scraped. For each received response, check if the response.url is in the start_url list.
scrapy.signals.item_scraped(item, response, spider)
More info on the scrapy docs page: http://doc.scrapy.org/en/latest/topics/signals.html