Efficient Web Accessibility Testing — Tester B's report

Summary

Tester B does not trust the tools but rather uses them as a supplement to her own experience. Her experience shows that issues may not be flagged as such, and that reported issues may be invalid. She hence uses multiple tools to verify each other but always inspects the markup and DOM with a suitable debugger, too. Sometimes she also uses screen reader software to see if an issue is valid or not.

General comments

The pages were accessed on December 04, 2014, and are likely to have been changed since. A single page was tested with a number of tools simultaneously to get an impression of similarities and differences. I figured a developer would not want to verify the passes, so I looked at fails and "to verify" instead.

I was given the task to use the following tools:

Bitv accessibility evaluation

An explanation of what the Bitv self-assessment service is
Screenshot of information on a Bitv web page

While Bitv appears to be an active project on their website, I tried to create an account multiple times without any luck. Another serious limitation is that the test itself is in German only. Moreover, the inspection of a demo version, the so-called self-assessment test, let me think that this service is in fact nothing but a form to be filled out by the developer or tester, which in turn is supposed to use other accessibility testing tools in order to be able to answer the form questions. This renders Bitv useless in my eyes, and I didn't make other attempts to succeed with its use.

The eGovMon Checker

Results page with number of passes and failures
Screenshot of eGovMon results page

The eGovMon service is a bit difficult to find on Tingtun's site. The service itself makes a pretty neat and modern impression, with dynamic page content and quick response times, which are typically below 3 seconds with all tested pages. As a minus counts the fact that it is unclear what exactly is being tested; Is it Section 508, WCAG, or another relevant testsuite?

The number of applied tests depends on the page of concern and varies greatly between almost 900 and roughly 200. What is tested is explained sufficiently in detail, with explanatory descriptions and links to relevant standards. It should be noted, though, that fixing a single error on the page might lead to multiple fails being removed. The number of barriers counts every single occurrence. In this little investigation, the barriers found are usually real barriers, also referred to as true positives. There are a few false negatives, meaning issues which are not flagged as such (misses), like the use of tables for layout purposes, and false positives, like a piece of inaccessible content which has visibility:hidden set. Another drawback is that the forward/backward navigation is buggy as it interferes with the dynamic page content. Finally, it should be mentioned that the description of an error does not necessarily match the true cause of error. For instance, an input button with an empty value attribute might be described as "the input button has no name".

On the upside, it is informative to view markup extracts for quick inspection. It is furthermore advantageous that eGovMon stores the results for later visits.

The Cynthia Says Checker

Results page with passes, failures, non-applicable tests, and required inspections
Screenshot of Cynthia Says results page

Cynthia Says is much slower than eGovMon; a page check can endure as long as 25 seconds. The testing scope is clear, as it can be chosen as a parameter along with the address of the page to check, whereas the number of applied tests in unknown. The barrier count is often a low number as failures of the same category are counted as one, making the result not comparable to eGovMon.

As eGovMon, the vast majority of issues Cynthia Says reports are true positives. However, there are a few false positives, like seemingly duplicate links which in fact are not duplicates, and false negatives, like a wrong format of the ID attribute.

Generally, it is difficult to interpret the results page; such as what needs to be fixed or checked/verified again. Also, the markup snippet is a little too brief to be really helpful. On the bright side, Cynthia detected the use of layout tables, insufficient color contrasts, and absolute element dimensions, all of which were not flagged by eGovMon.

Concluding remarks

Neither of the tools is perfect, as too many false positives do occur. Neither of the tools is complete, as important issues are not flagged as such, which can be verified by using multiple tools, and by manual inspection. The results can thus not be trusted. In particular, the checkers seem not to be ready for HTML 5, as the found types of false falses reveal. The high number of tests in eGovMon is impressive, but is influenced by the sheer number of "to verify" tests, which can be as high as 70% or more than 500 tests, which makes verification not feasible in real-use situations.

The use of a combination of tools is advisable. Multiple tools can verify each other and find non-detected issues. However, this comes of course with a time penalty. Fails cannot be trusted and should always be re-visited. The testing tools should be made fit for HTML 5 and, if possible, should employ smarter methods to reduce the high number of tests "to verify".