I wrote a Ruby web-crawler that retrieves data from a third-party website. I am using Nokogiri to extract information based on a specific CSS div and specific fields (accessing children and elements of the nodes I extract).
From time to time, the structure of the third-party website changes which breaks the crawler (
element.children might need to be changed to
So far, I have a utility that prints the structure of the node I extract which allows me to quickly fix the parser when the structures change. I also have an automated process that controls that it can extract "some" values.
I would like to know if there is a more elegant way to deal with this issue. How would one write a crawler that is easy to maintain?