Navigation utility without browser, light weight and fail-proof

By : raju

I have a use case where I need to fill the form in a website but don't have access to API. Currently we are using webdriver along with browser but it gets very heavy and not fool proof as the process is asynchronous. Is there any way I can do it without browser and also make the process synchronous by closely monitoring the pending requests?

Casperjs and htmlunitdriver seems to be some of the best options I have. Can someone explain advantages or disadvantages in terms of maintenance, fail-proof, light weight.

I would need to navigate complex and many different types of webpages. Some of the webpages I would like to navigate are heavily JS driven.

Can Scrapy be used for this purpose?

By : raju


Well, I have been working with a LOT of different ways of doing this, dependable on how intelligent/advanced you want the system to be. I'm on ruby, and in ruby it's quite easy to do. Beneath are the methods I found most useful listed (of cause pretty ruby biased):

  • Mechanize (found here): Super light weight, super fast, and super reliable. It handles everything a browser does, except for JS. Under the hood it's a open-uri an XML parser with nice interface layer on top and a little extra spice. Checkout the tutorials in the documentation. I think it's also available for something like python and others.
  • Poltergeist (found here): Fast, real browser like behavior, pretty reliable and light weight, supports JS. Under the hood it's a phantom-js driven browser for capybara (but without all the nasty dependencies, and completely headless). Even though it's build for testing with e.g. rspec, it's easy to use in other ways, or as stand alone, just search google.
  • Watir-webdriver (found here): A super powerful library for driving REAL browsers like firefox, IE, chrome or safari. It's actually pretty stable. However if you don't have a real physical screen attached (e.g. on a server), you need to run xvfb, and map the output of the browser to it. This can be done very easily with the headless gem also mentioned here

So in other words, if your don't need JS support, go with mechanize.

This video can help you solving your question :)
By: admin