Efficient Web Accessibility Testing — Tester C's report

Summary

Tester C develops software on a number of different platforms and usually has many projects running simultaneously. Being short of time, he would like to rely on a single checker, but occasionally uses a second tool if something "smells fishy". Even though is a developer through and through, he knows to appreciate the comfort of a well designed graphical user interface. He is also constantly looking for new tools and methods and thus rarely satisfied with the existing.

General comments

My task was to take a look at three different accessibility checking tools and see what sort of issues they found among a set of popular Norwegian web sites. The tools I used were:

Each tool has a different interface. GADT ideally should be inserted at the beginning of the page, but can also be run from the command-line. AChecker is run on a website very similar to W3C's web validators and allows you to enter the URL, upload an HTML file, or simple paste the markup. SortSite is a desktop application that takes a URL.

The reports that are generated are very different. GADT generates output on the console that you can save for your own reports. AChecker provides a detailed report that can be downloaded in a variety formats. SortSite creates a propriety report in a file, but it is very detailed report and also includes a capture of the entire page (pictures and all) for later reference.

Since each tool has its own way of starting the tests, I concentrated on starting with one tool, running all pages through the tool, and then switched to the next tool. After all of this was completed, I compared the results to see what was similar and if there were any false positives or other items.

Findings

The table below lists the page, the tool, and how many errors were found with warnings in parentheses. Note that warnings in the term of AChecker are from the "Likely Problems" section of the report. I'll cover AChecker's "potential problem" report section later below. SortSite doesn't provide an concept of errors and warnings, but flags "issues" in accordance with WCAG guideline levels, so I just counted all issues as errors. However, SortSite counts categories as issues (providing specific points in the report for drilling down afterwards), while AChecker counts each error in the markup. Additionally, SortSite stops counting occurrences after finding five issues and just lists additional occurrences as non-clickable ellipses.

Number of errors found by each tool, and number of warnings in parentheses
Page AChecker GADT SortSite
Finn Homepage 9 (0) 0 (10) 7
Finn torget page 18 (0) 1 (15) 6
Finn search results for “iPhone” 142 (0) 1 (16) 10
Norwegian.com homepage (in English) 31 (0) 5 (17) 8
Search results for a plane from Oslo to Stockholm 37 (8) 5 (5) 9
Help page for hand baggage 5 (1) 1 (6) 4
Yr homepage (in English) 4 (0) 0 (11) 5
Forecast for Tromsø 0 (0) 0 (16) 11
Hour-by-hour forecast for Tromsø 0 (0) 0 (11) 7
Ruter homepage 52 (0) 2 (14) 13
Travel times for Oslo central station to Majorstua 0 (0) 0 (15) 14
Map for the area around the Majorstua stop 45 (1) 0 (14) 0
NRK homepage 37 (23) 0 (10) 8
TV page 7 (0) 0 (8) 4
Nytt på nytt player page 9 (0) 0 (10) 6

A couple of pages had problems with the different tools. First, SortSite respects robots.txt on a site. Two of the pages from Finn are marked to be ignored by tools. I worked around this by downloading a copy of the page and then ran the checks locally. Given that SortSite is designed to crawl an entire site, it makes sense for it to respect robots.txt, but it would be nice if this behavior could be overridden when checking a single page.

Second, NRK's TV player uses Flash on desktop systems, but an HTML5-based solution on mobile devices. This check in done by checking the client string and changing the layout code accordingly. It arguably should check for the absence of Flash when deciding what to serve.

Comparison of tools

The tables shows quite a difference among the tools, though SortSite and AChecker are closer than they would appear due to the different ways that they count errors. Unfortunately, no single page made it through all the test runs without having some errors flagged. Overall, errors that were reported by the tools were genuine errors (at least in the coding sense). Many common errors (for example, an image missing alt text or form items missing labels) were faithfully flagged by both AChecker and SortSite. GADT would only sometimes catch these as errors, sometimes it would report them as warnings. For example, it rightly caught missing alt text on the frontpage for Finn.no, but missed it on the Nytt på nytt page. On the Norwegian.com homepage it decided to only mark missed images as warnings. Looking at their description for the rule, I'm not quite sure what would make the difference between an error and a warning.

As mentioned there were errors in the coding sense, but some errors were benign. For example, SortSite got confused about the accesskey attribute for some controls on NRK's site. On closer inspection, this was part of the sites responsive design and the controls with the same accesskey would never be shown on the screen at the same time. While there is no way of providing assertions in the markup, one can choose to ignore individual rules in SortSite. This can be dangerous as it may ignore a different accesskey error, but it would allow testers to get rid of noise on an otherwise compliant page. Thankfully, SortSite's reporting mechanism makes it straight forward to jump to the area where the code doesn't work. Neither AChecker nor GADT ran into this issue.

Each tool had its own way for presenting its results, and each had its own advantage and disadvantage. SortSite's was the most advanced. It provided a summary page that one could then travel to see reports for accessibility, compatibility, standards, etc. Each page was then further divided up into tests that linked to the WCAG 2.0 failure criteria, the location in the file for the error, and providing options for enabling or disabling the rule. The link to the file location was further marked up to show the error in context of the file and report what would happen with different AT when encountering this error.

AChecker provided a nicely formatted document for its report. It includes several sections: known problems, likely problems, and potential problems. When you are on AChecker's website, they provide an opportunity to validate the markup and navigate to the errors in the markup. In the PDF version of the report, only the line and column number are included along with a section of that line. Each error is grouped into a specific criteria, and then checks for each of these criteria, followed by the error. Overall, this works well. The known problems and likely problems are definitely worth looking at, but the section on potential problems is very daunting. It appears to apply many pedantic checks to the page (for example, if a page is encoded as UTF-8, you will see many warnings for "Unicode right-to-left marks or left-to-right marks may be required.") For all the webpages that were tested, this would not be an issue. The problem is that these pedantic error covers up other things that are genuinely worth checking out (for example, that a table may require TH elements). Most of the pages that were checked with AChecker had at least 300 "potential" problems. I personally like the idea of having lots of extra checks, but it would be good to tune what potential problems you can safely ignore.

The most primitive was GADT, it dumped its output to the command-line and instead of providing line number location, it provided a collapsed branch of the DOM tree. While this feels very primitive when compared to the other methods above, it probably works better for the tool's intended audience. GADT is looking at ARIA, and this means that a page may consist of many different bits that are generated by scripts. Providing the DOM elements (especially classes and IDs) helps make it possible for the developer to track it down among the different files.

Which tool?

There's no final answer to this. GADT certainly looked the worst out of this group, but that's because it is targeted at checking ARIA compliance. There is overlap with WCAG, but they are essentially different things. The warnings that were produced by GADT would be very helpful to check if one was creating an application instead of a classic page. It also has a much better fit for the developer as it can be scripted into the development and production process. But, if I was checking only checking against the WCAG, I would definitely look at the other tools.

This leads me to AChecker and SortSite. Both do a decent job of flagging errors, and I think one could use either one to start addressing WCAG issues on a website. SortSite provides a nicer front-end to this and provides a good basis for checking accessibility (and many other things). It is also amazing to just pass it a URL and have it crawl an entire site. But, it does require you to run Windows, which may be an obstacle to some of us. It would be interesting to be able to view reports on other platforms.

This leaves a platform-agnostic developer to look closer at AChecker. It certainly works well for flagging errors. Plus, one could spend a long time on the potential problems to get a feeling of going through a long battery of tests. Amazingly, the weather forecast on YR passed these tests, which makes me think that developers there might be checking their things against it.

However, if I got no errors on either AChecker or SortSite, I would run it immediately on the other tool to see if there are other issues. While most of the errors are detected easily on both, it seems that each still has different warnings that one can explore. These also may be issues that should be looked at. It seems we have still not yet found a complete checker, but it does seem that all the ones that are here are correct.