Introducing WebDriver

WebDriver is a clean, fast framework for automated testing of webapps. Why is it needed? And what problems does it solve that existing frameworks don't address?

For example, Selenium, a popular and well established testing framework is a wonderful tool that provides a handy unified interface that works with a large number of browsers, and allows you to write your tests in almost every language you can imagine (from Java or C# through PHP to Erlang!). It was one of the first Open Source projects to bring browser-based testing to the masses, and because it's written in JavaScript it's possible to quickly add support for new browsers that might be released

Like every large project, it's not perfect. Selenium is written in JavaScript which causes a significant weakness: browsers impose a pretty strict security model on any JavaScript that they execute in order to protect a user from malicious scripts. Examples of where this security model makes testing harder are when trying to upload a file (IE prevents JavaScript from changing the value of an INPUT file element) and when trying to navigate between domains (because of the single host origin policy problem).

Additionally, being a mature product, the API for Selenium RC has grown over time, and as it has done so it has become harder to understand how best to use it. For example, it's not immediately obvious whether you should be using "type" instead of "typeKeys" to enter text into a form control. Although it's a question of aesthetics, some find the large API intimidating and difficult to navigate.

WebDriver takes a different approach to solve the same problem as Selenium. Rather than being a JavaScript application running within the browser, it uses whichever mechanism is most appropriate to control the browser. For Firefox, this means that WebDriver is implemented as an extension. For IE, WebDriver makes use of IE's Automation controls. By changing the mechanism used to control the browser, we can circumvent the restrictions placed on the browser by the JavaScript security model. In those cases where automation through the browser isn't enough, WebDriver can make use of facilities offered by the Operating System. For example, on Windows we simulate typing at the OS level, which means we are more closely modeling how the user interacts with the browser, and that we can type into "file" input elements.

With the benefit of hindsight, we have developed a cleaner, Object-based API for WebDriver, rather than follow Selenium's dictionary-based approach. A typical example using WebDriver in Java looks like this:

// Create an instance of WebDriver backed by Firefox
WebDriver driver = new FirefoxDriver();

// Now go to the Google home page
driver.get("http://www.google.com");

// Find the search box, and (ummm...) search for something
WebElement searchBox = driver.findElement(By.name("q"));
searchBox.sendKeys("selenium");
searchBox.submit();

// And now display the title of the page
System.out.println("Title: " + driver.getTitle());

Looking at the two frameworks side-by-side, we found that the weaknesses of one are addressed by the strengths of the other. For example, whilst WebDriver's approach to supporting browsers requires a lot of work from the framework developers, Selenium can easily be extended. Conversely, Selenium always requires a real browser, yet WebDriver can make use of an implementation based on HtmlUnit which provides lightweight, super-fast browser emulation. Selenium has good support for many of the common situations you might want to test, but WebDriver's ability to step outside the JavaScript sandbox opens up some interesting possibilities.

These complementary capabilities explain why the two projects are merging: Selenium 2.0 will offer WebDriver's API alongside the traditional Selenium API, and we shall be merging the two implementations to offer a capable, flexible testing framework. One of the benefits of this approach is that there will be an implementation of WebDriver's cleaner APIs backed by the existing Selenium implementation. Although this won't solve the underlying limitations of Selenium's current JavaScript-based approach, it does mean that it becomes easier to test against a broader range of browsers. And the reverse is true; we'll also be emulating the existing Selenium APIs with WebDriver too. This means that teams can make the move to WebDriver's API (and Selenium 2) in a managed and considered way.

If you'd like to give WebDriver a try, it's as easy as downloading the zip files, unpacking them and putting the JARs on your CLASSPATH. For the Pythonistas out there, there's also a version of WebDriver for you, and a C# version is waiting in the wings. The project is hosted at http://webdriver.googlecode.com, and, like any project on Google Code, is Open Source (we're using the Apache 2 license) If you need help getting started, the project's wiki contains useful guides, and the WebDriver group is friendly and helpful (something which makes me feel very happy).

So that's WebDriver: a clean, fast framework for automated testing of webapps. We hope you like it as much as we do!

by Simon Stewart, Engineering Productivity Team

opensource.google.com

Google Open Source Blog

Introducing WebDriver

Friday, May 8, 2009