Last week when I commented on Joe Clark's PDF accessibility article, I didn't realize he might be listening. (Small world, huh? I commented on a blog in Sweden and Joe in Toronto asked for a comment from me in Austin.) It appears, however, that he was listening.
Mr. Clark asked if I would elaborate on what guidelines I would recommend for PDF usage. In particular, I stated that his list of fourteen conditions where PDF files would be appropriate, was, in my opinion, hmm...what did I say...somewhat too lenient.
Since I need to address Mr. Clark's conditions and since he was nice enough to number them, I'll repeat them here for reference:
But if your document is one of the following, PDF may be fine:
- Footnoted, endnoted, or sidenoted, since there is no way to mark up any of those structures in HTML. (You can use a hack like sub or sup for the footnote reference, but there are no footnote, endnote, sidenote, or even note elements. That hack may be adequate for simple footnoted documents, but try rendering David Foster Wallace’s footnotes-within-footnotes in HTML 4.)
- An interactive form, since PDF interactivity can do more than HTML can. (Use with caution and only if HTML really cannot do what you want.) For examples, check Jeremy Tankard’s order forms, especially for TypeBookOne (PDF).
- A multimedia presentation, since later versions of PDF can truly embed multimedia rather than simply refer to or call multimedia, as HTML does. (Same warning as above.) PDF multimedia can include captions and/or audio descriptions.
- Combined accessible and inaccessible versions. A typical case is a scan of a historical document that also includes live text. (You really need that live text. The Smoking Gun’s scanned court documents wouldn’t pass muster here.) Another example – one that is legal in Canada under a copyright exemption – is a sign-language translation inside or alongside a written text or audio recording.
Custom-crafted solely for printing. I really mean that, and not a document so badly designed that people have no choice but to print it out because reading onscreen is so tedious. Your service-bureau files, if they are on the web at all, can stay PDFs.
Designed for annotation and round-trip travel: If you’re posting something to elicit comments, which are then sent back to you, PDF has useful structures that HTML doesn’t.
- A type specimen, which are all but impossible to create in HTML, unless the specimen involved is a “typeface” like Arial.
- A sample of a format that cannot be rendered in a browser (e.g, Illustrator or Photoshop documents) or can only be rendered unsatisfactorily (CAD drawings where GIF and JPEG don’t have enough resolution). (In theory you could use SVG for CAD, but SVG remains mostly theoretical, doesn’t it?) This case also includes PDF files meant as samples of PDF files.
- A record of a document’s state at a specific moment. In this context, PDF is useful as a preservation format even for HTML web pages.
- A document in a language whose script has no satisfactory support in web browsers. This example must be used with caution: In 2005, there aren’t many “minority” languages that cannot be rendered in a browser. Perhaps this case must be limited to scripts that have not been encompassed by Unicode (of which there are several). This can also be a subset of the type-sample case if your PDF is meant as an illustration or documentation of the writing system used by a language.
- Mathematical, since even MathML cannot render certain notations.
- Documents with a legally restricted format, like U.S. tax forms.
- Documents with digital rights management, which everybody hates and which has likely accessibility barriers. (The use of 128-bit encryption with PDF is compatible with screen readers.)
- Multicolumnar, particularly if figures and illustrations are included, since multicolumn web layouts are a mere hack and are unreliable as a method of reproducing print layouts. (Your multicolumn document should be HTML if it is presented that way merely to save paper and it can work as a single column. It can be difficult to distinguish that case from a document that is structurally multicolumnar, and this category is somewhat iffy.)
Four of Mr. Clark's conditions seem to be irrefutable
- #5 a document solely for printing. Mr. Clark points out, and rightly so, that there are some things that were meant to be printed and don't translate well (or well enough) to HTML [note a].
- #12 legally restricted format. This, in my mind, is a subset of #5. The legal layouts are more than likely described by what they look like when printed.
- #8 cannot be rendered in a browser. This is a category I call Can't Do That. HTML does a lot of things, but there are many things it cannot do. In those instances, don't try to get blood from a turnip -- let an appropriate operating system application handle the job. This is not specific to PDF readers; there are any myriad of documents out there (spreadsheets, for instance) that HTML was never designed to handle and must be viewed by an operating system application.
- #13 digital rights management. While technically different from #8, basically, this is something that HTML and browsers aren't currently designed to handle and also falls into my broad Can't Do That category.
Mr. Clark brings up some esoteric situations that I lump into my broader category Can't Do That. Kudos and extra creativity points to him for these:
- #3 multimedia presentation. I think this is generic-speak for "PowerPoint presentation." HTML can't do that (subset of #8) but Eric Meyer's S5 can handle many presentations within the web browser.
- #4 live text over historical document. Now that's something I've not encountered in, whew, a couple weeks at least. HTML can't do that (subset of #8)
- #7 type specimen. I would guess that a very small percentage of the internet population is familiar with the OpenType (or other type) specification let alone posting specimens on a website. (subset of #8)
- #9 record of a document's state. Document versioning -- now there's something else HTML can't do. (subset of #8)
- #10 A document in a language whose script has no satisfactory support in web browsers. I restrained from about three different snide remarks here. In fairness to Mr. Clark, this is something that W3C committees deal with, and if I'm not mistaken he falls into that category. But really, isn't this just another example of Can't do that? (subset of #8)
PDF Is No Panacea
I'm in agreement with nine of fourteen consideration points. But I would like to clarify and distinguish my recommendations in this way: When browsers can't render the required document, PDF and PDF Viewers may fall short as well, despite enhancements and accessibility features.
Look, there's an endless number of things that HTML and browsers can't do. The same could be said about PDF and PDF Viewers.
Specialized applications may provide the same or better support for certain file types or document features. Document versioning, for instance, is usually associated with a small number of interested parties and if they feel more comfortable with a particular vendor's word processing program's versioning capabilities, then PDF would not be my recommended solution. Users might prefer to post and transfer AutoCAD files as AutoCAD files rather than PDF files. If the target audience is expected to already have the source application, then translation into PDF does not serve the purpose of the website.
Mr. Clark may indeed agree with me on this point since his list of guidelines begins with the statement...
But if your document is one of the following, PDF may be fine:
...(my emphasis) His article, however, is silent on other applications.
Enter the User
So far, I've addressed points based on technology limitations. HTML can't handle certain situations so in those cases PDF (or other file format) is appropriate. How then, do we decide whether PDF or some other file format is better? Consider the target audience.
Usability should be at least an equal concern as accessibility, and seems absent in Mr. Clark's article.
Mr. Clark seems to ignore the point that PDF Viewers disrupt the website visit and designers need to consider and weigh the trade-off of placing content in PDF format versus altering the content so that HTML will provide a similar experience without the disruption.
As was pointed out by Jakob Nielsen
Users are easily confused when websites link them to non-Web documents that offer a significantly different user experience than that of browsing Web pages.
This disruption, this movement away from the web browser, comes at a cost. Web designers -- people who make websites -- need to consider this tradeoff and make a conscious decision to impair usability for the sake of some other overriding factor. So far, the factors have been that the content was intended for printing, or that the type of content cannot be handled by HTML.
I call into question these situations where Mr. Clark recommends PDF. By and large these scenarios are situations where (1) HTML can closely approximate the PDF feature or (2) some other application may handle the format better.
- #1 footnoted, sidenoted. The presence of a footnote in a document should not force it to be presented as a PDF. HTML "hacks", as Mr. Clark calls them, do provide superscripting. And who said footnote references had to be superscripted numbers? HTML offers other unobtrusive and alternate ways to cite sources other than superscripted numbers. If a document is so footnoted and annotated that HTML simply can't handle it, then it is probably too long and intensive for internet reading consumption; ie it is intended for printing (#5). HTML is fine for moderately footnoted documents and doesn't warrant PDF conversion.
- #2 interactive form. I took a look at Mr. Clark's referenced PDF of an interactive form which, although nice, is nothing special and certainly nothing that HTML and modern browsers can't handle. Input highlighting happens all the time without the need for PDF Viewers, and if Mr. Clark would like to see some fancy specialized layout, Southwest Airlines allows passengers to print their own boarding passes -- barcodes and all -- without PDF intervention. The interactive nature (if there is one that I'm missing) that Mr. Clark seems fond of is not worth the disruption in website usability that it requires.
- #3 multimedia presentation. I've already addressed and agreed that PDF is an accessible alternative to the proprietary PowerPoint file format. However, the word "multimedia" could be interpreted to mean any number of video formats. For cases when video is presented, I would suggest launching the user's media viewer rather than embedding the video in PDF and having the user launch a PDF Viewer. Users are most likely as familiar with their media players as their PDF viewers so consideration must be made as to what is best for the user.
- #6 annotated text for round-trip travel. This doesn't sound like a website situation at all. This sounds like email collaboration in which case I would recommend letting users decide on what application they are most comfortable with.
- #11 math formulas. Although this is definately something that HTML (or as Mr. Clark points out MathML) can't handle, and PDF can handle why would I object? Bullet point #11 seems to come from the PDF Working Group Marketing Department. "Hey, here's something only PDF can handle so anyone with math formulas should use PDF!" I disagree. I'm not sure how accessibility features in PDF make complicated math formulas -- ones so complicated that MathML can't render them -- more accessible than HTML. I may have a learning experience coming my way on this point, but it seems that a JPEG of the complicated formula along with the textual representation -- however complicated it may be -- serves accessiblity needs in a manner least disruptive to the website experience.
- #14 multicolumnar output. Well here's where typography and usability collide head on isn't it? Multicolumnar output -- text that automatically flows from the bottom of a column to the top of another column -- is one hundred percent print biased. The Web is not a book. I would never recommend multicolumnar layouts for web content. Even for online PDF viewing this is not user-friendly. If the content is intended for internet viewing, reformat your content to a continuous column. If it is intended purely for print, then this is a subset of #5.
Since PDF files disrupt and cause confusion to the website visitor, their use must be carefully considered and used as a last resort.
If HTML can provide a close approximization of the typography required, then I recommend it over PDF. When HTML cannot handle the situation and some other application is required, the web designer should choose between PDF Viewers and other applications, taking both usability and accessibility into consideration.
This may mean additional work on the part of the designer -- taking a existing print-friendly documents and reformatting them to more web-friendly formats, for instance. The gains in user experience offset browser limitations or alternate viewer features.
My Content Guidelines
As a summary of my commentary, here is my stab at content guidelines:
- HTML If content can be displayed via HTML in a web browser, then do it. If it needs some work to make it browser-friendly, do it. Avoid the next two bullet points whenever possible.
- For Print: Use PDF when the intended purpose of the document is to be read from the printed page.
- Can't Do That: When HTML does not provide the technical support for a feature (digital rights management, presentation slides) or cannot approximate the typography requirements in an acceptable manner (unsupported character set), use an alternate document format -- PDF or other format -- that provides the best website visitor experience, taking usability and accessibility into consideration.
I agree with Mr. Clark on the bulk of his recommendations. I appreciate learning of the accessibility features that PDF provides. I will feel more comfortable posting PDF format files in the future when appropriate. There are just a couple instances where I believe serving up browser-friendly alternatives is worth maintaining user experience continuity.
[a] "HTML" within the context of my commentary, is to be interpreted as "HTML, XHTML, or other format natively supported by modern web browsers."