Skip to content

Automating Documentation Puzzles

by Wim on September 22nd, 2016
By Co-Founder and CEO at Intuillion Ltd. (DITAToo)

Suppose you are putting together a puzzle. Puzzle pieces are coming from different people. You don’t know who owns each piece, how many pieces exists, how to reach out the pieces, and where each piece belongs to. But somehow you figure it out, assemble all pieces together, and get a full picture.

And then people who own different pieces of the puzzle begin to make changes in their pieces. Someone just changes the picture fragment shown on the piece, but someone changes the shape of the piece that requires making changes in all other pieces it connects to, while someone decides to add a brand new piece. Worse yet, you don’t always know what kind of change just happened.

At this point, you find yourself having to disassemble the whole thing (or maybe a part of it if you are lucky enough) and try to put the pieces together again. Now, imagine that you have to do this several times a week. Sounds bad, right?

This is how documentation process often works.

Too Much Manual Work

Manufacturing companies typically use a document called bills of materials (BOM) to assemble a physical product. A BOM includes a list of components and parts from which a specific product consists. Based on the BOM, technical writers can learn what information should be included into the product documentation. Getting back to the puzzle metaphor, you can think of the BOM as a picture shown on the puzzle box. This is where you need to get in the end.

Because there is a good chance that some of the information already exists (it was written for products released earlier), now you need to find it, adjust for the new product, and put all this together. These products are usually very complicated so we are talking about really big amounts of information that you have to deal with. Putting a wrong piece or missing a piece can end up with unhappy and confused customers at the very least.


Regardless of whether or not a company manufactures machines or develops software, complex products include multiple components that somehow interact with each other. In big documentation teams, each component can be assigned to a certain person or even a group. Plus, there is someone who documents how all these components work together. Since a picture is worth one thousand words, creating a diagram that depicts the components and interaction between them can help the customer better understand the product. The problem is that whenever a change is done in any of the components, or a new component is added, or an existing component becomes obsolete, the whole diagram has to be redrawn.

To make things even more complicated, if the product has multiple variations, multiple flavors of the diagram have to be created too. When a component is changed, all these diagrams have to be also redone.

In such complicated environments, the cost for content maintenance and content creation can be too high, leave alone the cost of a possible error.

Structured Content Is An Enabler For Content Automation

A solution is to automate of all the processes described above. Both documentation assembly and even generation of diagrams can be completely automated. To make this happen, the content needs to be structured.

Here’s why.

Content automation is about manipulating different pieces of information. Information about product components can be retrieved from a BOM, matching content can be found in the content repository based on metadata, and then put together into a deliverable, according to a pre-defined logic.

Similarly, if information about each product component and its interaction with other components is formalized in some standard way, it can be relatively easy transformed to a visual representation. And so on.

To make individual pieces of information retrievable and recognizable, they must be addressable. Addressability of data is the key to content automation.


An example of an addressable data is any Excel spreadsheet. Each cell has a unique address, like A1 or C4, so it can be used to retrieve the data from the specified cell.

An example of a non-addressable data is any plain text, for example, anything you would type in Notepad. This is a monolithic text whose individual paragraphs don’t have any ID or anything similar that would allow you to directly address to a specific paragraph or maybe even to a piece of text within a paragraph.

Semi-structured data

Finally, semi-structured data is the best of two worlds. On one hand, it does have some structure. However, it’s much more flexible and less restrictive than data in a tabular format. After all, a narrative cannot be represented in a tabular format. On the other hand, almost every single piece of information is addressable or can be made addressable. If the structure is semantic, it’s even better because you can also know the actual meaning of each piece of information. Having such a structure lets you manipulate content as you want: you can automatically generate an entire documentation, disintegrate the original content, re-aggregate it in a way required for a specific context, or dynamically change the visual representation of the content depending on the content specifics and user needs.

Content automation for our customers

I don’t want to sound salesy, but just to prove that all this is not a theory: we are doing this already. We’ve built several applications that do a very nice job with content automation for our customers. One of them, for example, parses formalized descriptions of product components and generates a diagram that depicts the whole product and shows interactions between different components. If a change is made, you just need to modify the formalized description of the component, and the diagram will be automatically re-generated. Another application automatically assembles a document (or even an entire documentation set) based on a BOM. In the next few weeks, I’m going to record a series of short videos so you can see them by yourself.

In the next post, I’ll discuss another issue related to content automation: changing the content representation depending on the customer’s needs or situational context. No, it’s not about publishing a PDF or Web Help. I’m going to take you much further and talk about how troubleshooting procedures written as text can be dynamically transformed to a flow chart or how a textual description of business processes can be dynamically represented graphically.

Stay tuned!

From → DITAToo, Divers

No comments yet

Leave a Reply

You must be logged in to post a comment.