Saturday, October 16, 2010

Displaying screenshots on acceptance test failures

When running GUI tests it can be difficult to determine what was being shown on the GUI at the point of failure, especially if the tests are running minimised or on a CI server.

We developed a screenshot extension for Concordion to include screenshots in the output whenever failures occur. We've been finding this extension incredibly useful for diagnosing intermittent failures in our automated acceptance tests.

This requires the new Concordion 1.4.1, which introduces support for extensions. Amongst other things, extensions can add new listeners to commands and make changes to the Concordion output.

The following example is running an end-to-end test using WebDriver (Selenium 2) to test a Google search through the Firefox browser.

While the results show that the test is failing since Netherlands is displayed in the Google results, it would be helpful to see the actual results page. The screenshot extension allows us to hover over the failure to see an image of the browser page at the time the failure occurred.



Clicking on the image opens it for further inspection:

In this example, we've configured the extension to use WebDriver's TakesScreenshot interface, so we see an image of just the browser page, irrespective of whether it is currently visible on the screen.

The extension can also be used to take screenshots on success, and includes a command to explicitly take screenshots.

The screenshot extension project is on Github. You'll need to set the concordion.extensions system property to use it - see the README for details.

The source code for this example is in the demo project.

Acknowledgements:
This extension was partly inspired by Mark Derricutt's ScreenshotCommand, and by Adam Setch's post to the Concordion list.

UPDATESOct 25 2010. Added acknowledgements
Jan 03 2011. Project moved to Google Code and docs to Concordion site.
Oct 26 2014. Updated project and demo location to Github projects

Sunday, September 12, 2010

What's happening in my acceptance tests?

Agile Acceptance Testing allows us to describe desired behaviour using examples that describe the business intent. Good acceptance tests are written as plain language specifications, not scripts. Implementation details are coded in a separate test "fixture" class.

One downside of this approach is a loss of transparency of what the tests are actually doing. The "fixtures" are often written by developers, which may need a leap in faith for testers to trust. On a recent project, this trust was dented when the tests didn't do what they were supposed to be doing.

With this in mind, I set out to provide more insight into what our tests are actually doing, without undermining the principles of acceptance testing.

The result is a logging extension for Concordion. This adds a "tooltip" to the Concordion output HTML that shows log output when hovered over:


This tooltip is proving useful not only to testers, but also for developers to gain insight into what is happening in their tests and to find performance improvements. For example, in the above example we were surprised to see the web page being loaded twice, and a number of element lookups being duplicated.

This approach could also be used for user documentation of the steps required to complete an action, potentially with screen shots, or even as an embedded screen cast.

Implementation Details
We've added a new extension mechanism to Concordion 1.4.1 to make it easy to add features such as this.

This extension is available on Github. The extension captures java.util.logging output and has a number of configuration options. You'll need to set the concordion.extensions system property to use it - see the README for details.

For a example using this Concordion extension with WebDriver, see the demo project .

UPDATESOct 4  2010. Changed github link to new concordion-extensions project
Oct 6  2010. Source code moved to trunk of Concordion
Oct 24 2010. Updated to reference Concordion 1.4.1 and concordion-extension-demo project
Jan 03 2011. Project moved to Google Code and docs to Concordion site.
Oct 26 2014. Updated links to new Github projects.

Sunday, June 27, 2010

Triaging test failures

One of the goals of Open Spaces conferences is to turn "corridor conversations" into the focal point of the conference. This was aptly demonstrated at CITCON ANZ when Richard Vowles introduced a topic we'd been discussing over kebabs the night before.

Richard has subsequently discussed the topic on the Illegal Argument podcast. This post is an extension of the discussion.

The problem
When running integration and acceptance tests, test failures may be caused by factors other than incorrect code. This is most apparent when performing end-to-end testing through to Enterprise Information Systems. A number of factors can cause the test to fail - system unavailable, test data not in required state.

It would be useful to categorise the failures by cause, for notification and reporting purposes. Developers should be notified of code-related issues, testers might be responsible for data issues, and sys ops for server errors. Over time it would also be useful to visualise how often server and data errors are occurring.

Richard provided the example:
Given a customer has been in arrears for over 90 days...
In order to run this test in an end-to-end environment, the test code has to get a customer in this state. Richard's system uses an AS/400 back-end, and it simply is not possible to automate the setup of a customer in this state. The test code may need to be configured with a specific customer id, or it may be smart enough to search for a customer in the required state.

Over time, the customer data may no longer be available. For example, periodic data refreshes may remove or update the customer details.

Richard's not the only one with this problem - I'm also seeing it on a current project.

The problem of finding adequate test data is exacerbated when the test updates the state:
Given a customer has been in arrears for over 90 days,
when her invoices are paid in full,
then her status is changed to black.
In this case, the destructive change to the customer's state means that the data is no longer suitable for running this test. The test needs to find a different customer in arrears the next time it is run. Since debtors are a finite resource, the test may be unrunnable at some stage.

Why run these fragile end-to-end tests?
With Agile Acceptance Testing (ATDD, BDD etc), the focus is on testing business examples that will prove to the customer that a feature is "done". Running the tests end-to-end provides the greatest assurance to the customer that the functional requirements are being met, and reduces the need for manual regression testing.

Depending on the project, it may be possible to implement these tests "under the covers" of the user interface, or using mocks for back-end functionality. We often use these approaches to drive the design of our code, possibly before the user interface or back-end are available. However, these approaches don't provide the full benefits that we get from end-to-end tests.

"Unrunnable" test result
In the past, I have tackled this issue by making assertions on the Given clause of the tests. If the test pre-conditions are not met, the test results in a failure.

The proposal made at CITCON is to introduce a new "Unrunnable" test result state. This state is neither success nor failure. The discussion led to introducing a new colour for this state, to differentiate it from red (failure) and green (success).

Triaging test results
Extending the idea, it would be useful to be able to triage test failures into user-defined categories. Depending on the nature of the failure (and possibly the severity) the failure would be assigned a category.

The CI server would send failure notifications to a category-specific list. For example, system failures would be notified to sys ops, data issues to testers, and code issues to developers.

Each category would be displayed with a different failure colour, allowing the causes of test failures to be tracked over time.

For some categories, for example server errors, it may not be worthwhile continuing with the test run. The test runner could potentially be configured to abort the test run dependent on the category of the failure.

Comparison with Pending state
Many BDD and ATDD tools already model a separate Pending or Unimplemented state - displayed in yellow (Cucumber), or grey (Concordion). The pending state can be viewed as one of these test failure conditions ("code unavailable").

Example
A test could be annotated as follows:
@Triage(nature="SERVER_ERROR", exception=HostUnavailableException.class)
public class ArrearsFixture() {
@Triage(nature="DATA_UNAVAILABLE",exception=NoDataException.class)
public Customer findCustomerInArrears(Condition condition) {
....
}
}

On Hudson, test failures might show up as:

clearly showing that there is an ongoing issue with server errors, impacting on the team's ability to adequately test the system. Intermittent data-releated issues are also causing some tests to be unrunnable.

I'm not aware of any test tools/frameworks currently offering this capability. Does anyone know of anything similar?

UPDATE:
1. This topic was discussed on the Illegal Argument list. I liked Mark Derricutt's point:
It's my thought that the finer grained reporting you CAN get the better,
whether you make use of it depends on the project and problem space.

After all - Exception is good enough to be thrown for all errors right?
IllegalArgumentException, IOException, FileNotFoundException are all rather
"controversial, mean many things to many people, and cause inconsistency and
confusion" - but we need that differentiation of exceptions to separate out
a chain of responsibility.

We know this in our code, but I think we also need this for our
builds/tests.