My spider class is as follows:
class MySpider(BaseSpider): name = "dropzone" allowed_domains = ["dropzone.com"] start_urls = ["http://www.dropzone.com/cgi-bin/forum/gforum.cgi?post=4724043"] def parse(self, response): hxs = HtmlXPathSelector(response) reply = response.xpath('//*[@id="wrapper"]/div/div/table/tbody/tr/td/div/div/center/table/tbody/tr/td/table/tbody/tr/td/font/table/tbody/tr/td/table/tbody/tr/td/font/b') dates = response.xpath('//*[@id="wrapper"]/div/div/table/tbody/tr/td/div/div/center/table/tbody/tr/td/table/tbody/tr/td/font/table/tbody/tr/td/font/small') items =  for posts, day in zip(reply, dates): item = DozenItem() item["Reply"] = posts.re('/text()') item["Date"] = day.re('/text()') items.append(item) return items
I selected the item specifically within the source code and right clicked, selecting "Copy XPath" and then just pasted it into my xpath.
BUT..... of course it isn't working. My shell doesn't say it crawled or scraped anything and my CSV is empty.
I originally created my own XPath as I normally do, but it wasn't working either and the Chrome option intrigued me. Normally I only include 3 or 4 tags deep in my XPath. Is this appropriate with the html provided below?
The site is a forum site and I just want to have a self updating scraper that crawls one specific posting for replies to the original post, exporting Date/Post.
The posts date HTML that I think provides enough tags:
<br> <br> <!-- FORUM MINI PROFILE --> Registered: Sep 6, 2012<BR> Posts: 1850<BR><BR> </small></font> Apr 26, 2015, 7:51 AM <br> Post #2 of 11 (195 views) <br> <a href="/cgi-bin/forum/gforum.cgi?post=4724045#4724045">Shortcut</a> <br> <img src="http://www.dropzone.com/graphics/forum/clear_shim.gif" width="180" height="1"> </font> </td>
and the Subject of the post itself specifies it is a reply with "Re:" which will remove the original post from being crawled:
<td valign="top" width="100%" style="border-left: 1px solid #CCD2DE"> <!-- Adult Content Filter --> <table border=0 width="100%"> <tr> <td valign="top" align="left"> <font face="Verdana,Arial,Helvetica" size=2 color="#212126"> <b> Re: [pleasedtomeet] Skydiving with tinnitus? </b> [<small><a href="#4724043">In reply to</a></small>] </font> </td>