Assessing the accessibility of your backlist with Readium CLI
Discover how the Readium CLI can help you to extract metadata and prepare to remediate your EPUB backlist.
Readium CLI is a Swiss Army knife for your publications built on top of the Go toolkit.
With this article, we'll explore how publishers, distributors and aggregators can use it to assess the accessibility of their EPUB backlist and prepare for further remediation.
Extracting metadata
Readium CLI is completely EPUB version-agnostic, which means that you'll always receive a homogenized output that you can use in your workflows.
Once installed, you can call it using the manifest command:
readium manifest publication.epubThis will output a JSON document based on the Readium Web Publication Manifest.
The manifest contains information that you would usually find in an OPF in EPUB (metadata, list of files contained in the publication) along with lists (table of contents, print page list) extracted from NCX (EPUB 2.x) or a Navigation Document (EPUB 3.x).
readium manifest --infer-a11y=split moby-dick.epub | jq .metadataUsing jq to filter the output and display the metadata of a publication
{
"conformsTo": "https://readium.org/webpub-manifest/profiles/epub",
"title": "Moby-Dick",
"author": {
"name": "Herman Melville",
"sortAs": "Melville, Herman"
},
"identifier": "code.google.com.epub-samples.moby-dick-basic",
"language": "en",
"modified": "2012-01-18T12:47:00Z",
}Metadata for Moby Dick
If the publication contains any accessibility metadata, they will be parsed and displayed under accessibility in metadata.
{
"accessibility": {
"conformsTo": [
"http://www.idpf.org/epub/a11y/accessibility-20170105.html#wcag-aa"
],
"certification": {
"certifiedBy": "Matt Garrish"
},
"summary": "This EPUB Publication meets the requirements of the EPUB Accessibility specification with conformance to WCAG 2.0 Level AA. The publication is screen reader friendly.",
"accessMode": [
"textual",
"visual"
],
"accessModeSufficient": [
[
"textual",
"visual"
],
[
"textual"
]
],
"feature": [
"tableOfContents",
"readingOrder",
"alternativeText"
],
"hazard": [
"none"
]
}Accessibility metadata from an EPUB3 file
Inferring accessibility metadata
Many EPUB files have inherent accessibility features that are not documented in their metadata.
The Readium CLI is capable of inferring a number of them when it is called together with the --infer-a11y flag.
readium manifest --infer-a11y=split publication.epubTo avoid mixing up things up, we recommend using the split option which separates inferred metadata and lists them under https://readium.org/webpub-manifest#inferredAccessibility.
This won't magically turn inaccessible EPUB into fully accessible ones, but this feature is very useful to detect baseline accessibility features in your files.
Preparing for image remediation
Images can be the most challenging part of remediating your content, since they either require an alt text or to be properly identified as decorative.
To prepare for this task, you can extract additional info with the --inspect-images flag.
readium manifest --inspect-images publication.epubThis will automatically enhance the output with the following info:
- height and width in pixels
- size in bytes
- whether the image is animated
- and a SHA-256 hash of the file
Looking across your backlist, you might identify images that are repeated in multiple publications, such as logos for example.
To further help with this task, it's also possible to generate a perceptual hash using --hash=phash-dct .
While a cryptographic hash like SHA-256 is faster to compute, it will only match images that are an exact match, whereas a perceptual hash will remain stable across variations of the same image (for example using a different format).
readium manifest --infer-a11y=split --inspect-images --hash=sha256,phash-dct publication.epubInferring metadata and inspecting images with a cryptographic and perceptual hash
Once you've identified repeated images that can be safely skipped, you can start using --infer-a11y-ignore-image-hashes to provide a list of hashes to ignore.
readium manifest --infer-a11y=split --infer-a11y-ignore-image-hashes=phash-dct:YzZTDc7IMzk=,sha256:EvaoUnJkxsWkMM0NUf4CwOZMMvEpDRKk7omCBSN67Gc= publication.epubA distributor could for example allow each publisher to inspect repeated images and add them to an ignore list that way.
A future update will also introduce the ability to extract even more information, such as the alt text and whether the image is properly marked as decorative using an ARIA role.
