I have managed to identify the culprit here. As my development machine is Windows-based, this seems to have been an issue with mechanize (or one of its dependencies) and Windows. By specifying the b (binary) part in the second argument of File.new, the problem went away on its own. tl;dr:...
Here is one way to do it: Edit( for added requirement to not skip any url's): metadesc = page.at("head meta[name='description']") puts "%s, %s, %s" % [line.chomp, title, metadesc ? metadesc[:content] : "N/A"] ...
ruby,web-crawler,nokogiri,net-http,mechanize-ruby
Thanks to the help of @theTinMan, @MarkThomas and a colleague, I've managed to log into jenkins and collect the page's XML, through Mechanize and Nokogiri: 1 require 'rubygems' 2 require 'nokogiri' 3 require 'net/https' 4 require 'openssl' 5 require 'mechanize' 6 7 # JenkinsXML logs into Jenkins and gets an...
Looks like there is a meta refresh in there (per your description). Try adding this to your Mechanize object: a.follow_meta_refresh = true Also, you may want your user_agent to an accepted value instead of your custom one: require 'mechanize' Mechanize::AGENT_ALIASES.each { |k,v| puts k } => Mechanize => Linux Firefox...