joelkuiper.eu

Knowledge documents

A brief study in knowledge documents

TL;DR: Use Graphical User Interface (GUI) components as Lisp s-expressions to generate dynamic "knowledge" documents. See video below. Experimental prototype code in ClojureScript

I left academia, or am at least on hiatus. I still worry about its problems, though. So I have been playing with this idea, and while it might not bear any immediate practical fruits, it’s at least fun to write down.

Scientific publishing is really problematic. While the methods of science are usually structured1, its products are less so. Take, for example, a systematic review of clinical trials. Systematic reviews are the cornerstone of evidence-based medicine. They attempt to provide the most comprehensive and unbiased view of all the evidence available on a certain intervention, or condition. The way they do this is by finding all the published (and sometimes unpublished) literature, and give an expert opinion on their combined results. Systematic reviews are created in a structured process, the steps involve guidelines and methodology. However, the product is still a natural language document. While natural language might be efficient at verbal communication, it is highly inefficient at structured knowledge dissemination.

Here, I will present a different way of creating and publishing structured knowledge. I do not claim any novelty, the ideas have a rich background. The example I will use is the same as above: systematic reviews. They are something I have become familiar with during my time at drugis.org, but none of these views are necessarily ones from drugis.org. These ideas might also have a much wider practical application than just systematic reviews.

Systematic reviews

Structure and representation

Whenever there is methodology, one can automate. The same goes for systematic reviews of clinical trials. While many thousands of person hours are spent each year creating them, they’re essentially ripe for automation. Systematic reviews are created by combining and weighing evidence. Usually this is done using statistical techniques such as meta-analysis, which attempt to take into account the relative “weight” of each piece of evidence. This practice is called evidence synthesis. The results from these statistical analyses try to approximate an unbiased view of reality, simply put: does a drug work? Sometimes the results are not clear-cut, and one can ask: given what I know about the drug, do the benefits outweigh the risks? There are also various techniques to quantify those questions, usually borrowed from econometrics. The answers to these questions are published in prestigious journals. Given the amount of labor the production of a single review involves, they usually get accepted.

However, when you repeat this process over and over, a pattern inevitably emerges. It’s a tree-like pattern. The leafs are the studies, they collapse into branches which combine the evidence, and eventually the root of the tree will be the review. Those familiar with XML or HTML will recognize the following:

<review>
    <synthesis>
        <study title="a" />
        <study title="b" />
        <study title="c" />
    </synthesis>
</review>

A review consists of a synthesis (such as a meta-analysis), which in turn consists of several studies. The content of these studies could be anything, but will typically contain the results of the measured variables. This way of representing the content of the review is different from natural language. The natural language document might contain the same information, but it is serialized differently.

The example above was the promise of XML: to provide structured knowledge so that different views (such as natural language documents) can build upon them. XML failed in many aspects, but its ideas are solid. The representation above is in fact similar to the one used by Cochrane’s RevMan, a tool from one of the leading institutes for systematic reviews. The failure of XML as the carrier of structured information has been attributed to many factors: its verbosity, the lack of tooling, lack of a clear benefit over HTML for presenting pictures of cats, etc. None of these are truly relevant. XML failed because it was just the representation of information, not the means to obtain it. Representation without context is meaningless.

The inventors of Lisp knew this, of course. Data is computation, computation is data. If the previous XML is phrased as a Lisp s-exp we get the following:

(review (synthesize '(study-a study-b study-c)))

Or more realistically:

(review (synthesize (filter (λ '(study) (eligable? study)) studies)))

Where, the λ represents a function. Briefly it says: review the synthesis of the list of studies, filtered by whether they are eligable.

Lisp also failed in many ways. The reason perhaps is that most people don’t really care. Especially not the kind of people who write systematic reviews. And why should they care? At the end of the day, if it’s not usable by the target audience these things are considered secondary. However, I’d like think it would help with disseminating knowledge in such a way that it is reusable and reproducible. So there might be a way of bringing structured disseminations to the masses.

Interface and flow

The core idea is that functions should be represented by Graphical User Interfaces (GUIs), not merely as textual symbols. Let’s take an easy example: the adding of the numbers 1 and 2. In Lisp this would look like (+ a b), or concretely (+ 1 2) If we represent this graphically it might look like this:

nil

The outer gray area encapsulates the arguments a and b. The result of the outer gray box would be 3 in the case of (+ 1 2). What this illustrates is that each function can be seen as a plate with sockets for its arguments. The + here takes two arguments, so its plate has two sockets. To further illustrate this point, lets add the result of (+ a b) and (+ b c) together:

nil

This mode of thinking is called “structural editing” and has been extensively studied in the context of Lisp.2 Tree structures are uniquely suited for manipulation in terms of nesting and branching, rather than “flat text”. Lisp’s syntax (and its homoiconicity) makes it trivial to implement. Structured editing (apart paredit) never really took off as a general mode of programming, but again its ideas are solid.

lisp-edit.png

Figure 1: An example of structural editing

What if we substitute the textual symbols with user interface components? For example, a component that inputs a study might look like this:

nil

Each field (such as study name, population, etc.) is editable. All the fields together form the information of that study. There should be no hidden state, the information that is presented visually should be the only information within one component. The mode of presentation or input can be radically different, however.

Now the previous structure of a systematic review,

(review (synthesize '(study-a study-b study-c)))

might concretely look like:

nil

In practice each of the studies might be vertically aligned and optionally folded to save visual space. The results occupy another socket on the plate. Each time a component gets updated the changes propagate downwards. If a study get added, removed, or is otherwise changed the synthesis is recomputed. This is similar to the Explorable Explanations approach of Bret Victor.

One can imagine a variety of different components, the most basic one being a simple text component. Whenever a component get added it declares its sockets (and which types it accepts).

Building a document, such as a systematic review, then simply becomes adding (nested) components to sockets.

This interface could be similar to Squarespace, where each socket is represented by a line, teardrop or circle. If this element is clicked a menu pops up with the potential elements that could be placed there.

nil

Implementation ideas

The whole nasty “configuration” problem becomes incredibly more convenient in the Lisp world. No more stanza files, apache-config, .properties files, XML configuration files, Makefiles — all those lame, crappy, half-language creatures that you wish were executable, or at least loaded directly into your program without specialized processing. I know, I know — everyone raves about the power of separating your code and your data. That’s because they’re using languages that simply can’t do a good job of representing data as code. But it’s what you really want, or all the creepy half-languages wouldn’t all evolve towards being Turing-complete, would they? — The Emacs Problem

Footnotes:

1

For the sake of argument

2

For example, this implementation by Kevin Mahoney, but there are many others.