Generating Code Examples

August 28, 2016

I've been thinking about a specific problem when writing technical documentation: code examples. In most cases, code examples are written by hand in a single language. If you want to add more languages, you'll need to write more examples. As you write more examples, typos creep in. The code no longer compiles. How can you fix this problem? I have a couple of ideas.

Requirements

For my purposes, I'm looking to write a single code snippet and then generate sample code in a variety of languages. Right now I'm targeting Ruby, Python, Go, Java, Node, C#, PHP, C++, C and Swift. At a minimum, I need support for:

Importing modules
Control flow (if, else, while, for, etc.)
Lists and maps
Calling functions and methods

I don't have a functional language in my list. I'm not sure how you'd translate a code example written in an imperative language to one in a functional language.

Approaches

These are some of the approaches I thought through.

Templates

The first idea that came to mind was templates. Apparently this is how most code generators work. Note that when I say "code generator" here, I'm not talking about machine code. Instead, I'm talking about projects like swagger or gRPC. They take some input and generate a large amount of source code.

I think templates work for these projects because the input is constrained. For example, Swagger ingests a JSON document which describes an API. gRPC and protoc ingest .proto files. These inputs aren't turing-complete programming languages.

We're trying to translate one sample program into many languages.

A DSL

I thought, alright, let's build a DSL. However, most DSLs I've seen are glorified configuration languages. Ansible uses YAML to encode a psuedo-programming language. There are a million gems for creating DSLs in Ruby, but I don't think those are up for the task (would love to be proven wrong).

When writing code examples, we want to be able to use the full power of a turing-complete programming language, but one that has features that cleanly map to all languages we wish to support. This would be a "lowest-common denominator" language, one that can easily be correctly transpiled into many languages.

It wouldn't be a particularly powerful language, but it would allow you to write the basic code you'd need. You could then include macros (or some type of extensions) to generate specific code sections. An example would be initializing a gRPC client. Each language does it in a separate way, but you could abstract it behind a macro.

Use a subset of an existing language

Instead of creating a new language, use a subset of an existing language. For example, Go has a parser built into the standard library. I could write examples in a subset of Go (no goroutines, no type casting, etc.) and generate my code examples that way. The macros I mentioned above aren't super easy to implement, but I could just hardcode them into the "language".

Wrapping up

I'm not particularly happy with any of these ideas.

I can't be the first person to run into this issue. If you've solved this problem (no matter the context) I'd love to hear how you did it.