Straddling Tables

...isn't some new extreme interrogation technique at gitmo1 it's just the title of a blog post about the commonalities between file-format features. The goal is to have one file, a Pilferpage, that can be dynamically converted into HTML, XUL, ODF, XSL-FO, Flex, CALS2, DocBook, CSV, but to do this involves seeing what could possibly be converted between these various formats and what cannot. If one language doesn't support bold text, or hierarchical table rows, then this may affect the design of the unifying Pilferpage file-format. First up, tables.

At a bare minimum Tables consist of a flat list of rows containing cells. They don't necessarily contain headers or footers, or cell spans,  or hierarchical rows. As it turns out however when you add a few more features then Tables become a generalised model for both DataGrids and TreeViews so --for the purposes of comparing tables-- we'll call all three of these Tables.

Hierarchical Rows


Treeview

In HTML, XSL-FO, CSV, and ODF table rows are a one-dimensional list, whereas in XUL rows can be hierarchical (the Subject column, above). Hierarchical rows are a simple way of supporting multi-column TreeViews. HTML, DocBook, XSL-FO, ODF and CSV do not inherently support row hierarchy but you can fake this through formatting (style padding or indenting characters). In HTML you can make this faux-hierarchy faux-interactive with JavaScript. So hierarchical rows can be represented, and they seem like a useful feature for Pilferpage.

Hierarchical Tables


All of the markup-language-based formats (HTML, XSL-FO, ODF, DocBook) support hierarchical tables; which leaves CSV as the one that doesn't. For CSV you'd have to render different tables one after another (or, more sensibly, as separate CSV files). Providing there was a way of extracting tables from the page (perhaps with some URL parameters) then CSV could use hierarchical tables, so again this seems like a useful feature for pivot-tables and general data drilling.

Subtables


Subtables however are not useful -- these are typically an awkward way of encoding cell-spanning and row-spanning by declaring that multiple nested tables should be treated as a single unified table. I believe that Subtables will be deprecated in ODF 1.2.

Pilferpage won't support them.

Headers and Footers


As well as providing visual cues table headers and footers allow cell data to be accessible to disabled people. A disabled person navigating a cell that reads "11%" may not be able to easily glance up the column in order to understand that it's about "Elbow Growth" but by explicitly encoding headers the table can be made practically usable to these people. Another benefit of clearly articulating the table header/footer is that software can reuse the data more easily. Most DataGrids expect explicit column headings, for example.

Column Headers and Footers


HTML, ODF, XSL-FO and Flex all support basic single-level column headers.

HTML, ODF, and XSL-FO support single-level column footers.

HTML and ODF support multi-level column headers and footers.

Flex does not appear to support table footers, multiple column headers, multi-level headers or footers, or cell headers/footers.

CSV doesn't support table headers or footers. Some software implies headers by assuming that it's the first row.

HTML uses table headings and footers to allow progressive loading of table data via the <thead> <tbody> and <tfoot> tags. The idea is that a browser may receive the table header, then the footer, and then the table body. The browser keeps filling out the table body as more and more data is streamed in. If browsers support this reliably then it would be useful to rearrange a pilferpage in order to support this in HTML.

Cell Headers and Footers


There's a difference between column headers and cell headers,

Cross section table

In the screenshot the yellow table cells mark a cross-section from a table that only shows the top-left and bottom-right portions. This technique is already popular, and to make this accessible they couldn't use column headings -- they'd need cell headings. So, by cell headings we mean that each cell references the appropriate cell headings rather than headings being implied by columns or rows.

HTML and ODF support cell headings. XSL-FO, CSV, and Flex do not.

Another example is the periodic table of elements where it's groupings could be expressed by cell headings. Note the background colours:

Periodic table of elements snippet

Heading Levels


Again, this is perhaps best described by a screenshot,

Multi-level Table Headings

The 'System' heading is a grouping of the 'Metal Parts' and 'Wood Parts' headings. As well as the cells being headings the relationship between these three cells is expressed by encoding heading levels.

HTML and ODF support hierarchical cell headings by way of putting textural headings inside cell headings. XSL-FO, Flex, and CSV do not.

It seems that an ambiguity can occur when the heading levels are encoded using conventional text headings: if in HTML a cell heading contains H1 and H3 and another cell heading contains H2 then in which order should a screen-reader speak the headings? Because of this it seems that it would be better to encode cell headings per-cell, perhaps with a heading-level attribute.

Diagonal Table Headings


Diagonal table headings (or labels) are used to describe columns and rows in a more compact notation. Given a table of,

Diagonal Tables 4

One would make diagonal table heading of,

Diagonal table headings 1

They're popular in Asia, particularly in China and Japan. There seems to be some disagreement as to whether diagonal table headings are headings or labels but in both cases a user navigating cell data may want to access a description while browsing cell data. The distinction between heading or label may only be useful when browsing a table hierarchically, where headings (but not labels) would presumably group cell headings. Personally I'd say that they're headings and not labels.

These Diagonal Table Headings can also be multi-level. In the following table I've coloured the headings,

Diagonal Table headings 2

And sometimes they even cram a title in there...

Diagonal Table headings 3

While in English the letters do appear cramped it's not the same for the Chinese language,

Chinese Table Headings

(source: Diagonal Table Header Specification)

HTML, CSV, Flex, do not support them diagonal table headings. ODF doesn't yet support it although there is a proposal to support diagonal table headings in OpenOffice.org (and ODF). Unfortunately no one seems to be making much progress here. Chinese versions of Microsoft Office and OpenOffice.org appear to support it (via UOF and .doc) but these aren't part of ODF. So, how do we want diagonal headings in Pilferpage? Well, as there's no output format it's probably not worth the bother.

But it's worth considering to ensure that it wouldn't break our assumptions about table data-structures; that it can build upon our existing table heading hierarchies. With some simple rules it looks like Pilferpage could support this,

  1. Where there's an attribute of diagonalHeading="right down" in the <cell> tag. Values are up/right/down/left.

  2. This same cell should be empty (only text nodes of whitespace in the cell).

  3. From this cell any cells along the axis (depending on the diagonalHeading) must span in that direction to the edges of the table.


If a Pilferpage table does this then we would have enough information to generate a diagonal table header.

These diagonal table headings can appear in any cell, so you can get diagonal table footers too. Perhaps then it should be an attribute of diagonalTitle or aggregateTitle where you could specify several formatting options.

Spanning Cells


HTML, ODF, and XSL-FO cells can be spanned down or right but not left or up or in arbitrary shapes (for example, 'L' shaped spans aren't possible).

Flex and CSV do not support cell spanning.

Style


For now I'm just going to declare these out of scope. It's too big of a topic.

Request For Comment


I probably haven't gotten everything right so please post comments and I'll update the post accordingly. Cheers!

[1] It's an old technique there. [2] Just kidding, no one uses CALS.