CodaText Format Specification

Specification of the text format for Coda docifier program.
Agnar Renolen
version 1.3
August 17th, 2001

CONTENTS

INTRODUCTION
THE HEADER
SECTIONING
PARAGRAPHS AND INDENTATION
LISTS
COMPUTER VOICE
EMPHASIZED TEXT AND LINKS
SPECIAL CHARACTERS
FIGURES AND TABLES
USING CODATEXT IN PRACTICE
CodaText comments for prefix-style comment syntax
CodaText comments for embrace-style comment syntax
Plain CodaText files.

INTRODUCTION

CodaText is a text format designed for documenting source code by writing the documentation as comments within the source files. The main purpose of designing this format is to provide a text format that is

However, it can also be used to write ordinary documents, such as this specification itself. In case you are reading the html-version of the spec, you are recommended to see the original CodaText source.

A CodaText document, document item, or doc-item in snort, is divided into two sections: the header and the body.

THE HEADER

The header consists of a number of key - value pairs. The keys are identifyied by the initial "~" character, while the value will comprise the rest of the line. However, only one of the keys are mandatory: the "~name" key. Each key-value pair must only occupy one line of text. The following example is a replication of the header of this document:

~name     codatext
~type     document

~title    CodaText Format Specification

~summary  Specification of the text format for Coda docifier program.
~author   Agnar Renolen

~version  1.3
~date     August 6th, 2001

The recognized keys are:

name
This is a mandatory key, and the value must be a string containing no spaces. If you want to reffer to a document from another document, this is the name you should reffer to. If you are documenting a class or a procedure, the name should be equal to the name of the class or procedure.
Names can be set up hierarchically, by specifying sub-names separated by a name space separator which is either a period (like in MyClass.MyMethod) or a double colon (like in MyClass::MyMethod).
title
For a document like this, a title is appropriate. If you are documenting a procedure, it is recommended to leave the title undefined.
summary
Gives a brief one-line summary of the document. You can add several subsequent summary keys, if you need multiple lines.
author
Specifies the author of the document. This element may be repeated multiple times in the header if there are multiple authors.
version
Specifies the version of the document.
date
Specifies the date of writing, or the date of approval of the document.
mansection
If coda is to produce output in Unix man page format (nroff/troff), this specifies which man section the doc-item is to be put.
type
Specifies the type of document item. If this is not specified in the header, the value will be set to "item" as the default. If the type is set to "document", then coda will produce a standalone document for this item, rather than including it in a larger project.

The header ends automatically by the first line that does not start with the "~" character. The text following the header is referred to as the body of the document. You may add your own header items at your own discression, although these will be ignored by Coda.

CodaText also allows you to provide a compact form of header containing only the name and optionally a summary: If the first non-empty line of the doc-item does not start with a key prefixed by the "~" character, the first word will be recognized as the name. The subsequent words (optionally separated from the name by one or more dash es) will be recognized as the summary. Subsequent lines will be recognized as the summary until an empty line is encountered. Hence the following syntax applies.

 <name> [[-] <summary>]

 <body text>

SECTIONING

The body can be divided into sections by headings. CodaText specifies two levels of headings. The reason for this limitation is that one doc-item may be included into another document, one section per item, and thus effectively create three sectioning levels. Moreover, whenever several CodaText items are generated into one source file, we need one header for the name ot title of the item as well.

Headings are identified by a prefix \h1 for level 1 headings and \h2 for level 2 headings. If a heading requires more than two lines of text, the second line must be alligned with the first non-prefix character of the first line. As for example

 \h1 This is a heading that
     is written over two lines

To make the headings stand out mor in the source, a level 1 heading is also recognized as a paragraph containing no lowercase letters. This style of headings are used in this document. If you need to write a normal paragraph in captial letters, you can do thus be prefixing the paragraph with the \p paragraph prefix.

PARAGRAPHS AND INDENTATION

The body of the text is, apart from sections, divided into units called paragraphs. A paragraph is identified in either of three ways:

The last way of separating paragraphs allows you to make your documentation text a little more compact. For example, you don't need to insert an empty line between a heading and the subsequent paragraph if you provide different indentation:

  THIS IS A LEVEL 1 HEADING
    This is the first paragraph 
    in this section.

Note that the indentation of the text does not reckon prefixes as being a part of the text, which is illustrated in the following example:

  \h1 This is a heading
  This is a paragraph, and not the next line of the heading

  * This is an item
  This is not a part of the item above.

  * This is an item
    that goes over two lines

Indentation, can also be used to create nested lists, as the subsequent section describes.

LISTS

CodaText provides three types of lists: itemized lists, enumerated lists and description lists. The example below demonstrates itemized and enumerated lists:

 * Each list item is a paragraph marked with a prefix.  The prefix in
   this itemized list is the asterix ("*").
 * Lists can also be nested within each other by increasing the
   indentation of the list items:
    (1) The prefix of an enumerated list is the a number embeded between
        a pair of parenthesis.
    (2) A list should have at least two items.
   A list item can also contain several paragraphs.
 * Since this item has the same indentation as the previous item, we are
   back in the outer list.
 
 * Blank lines between the items have no effect.

This example should produce the output:

The items of a description list contain pairs of values:

the key
Which is a single-line paragraph prefixed with a dash ("-"), and ending in a colon ":". No other text is permitted on this line.
the description
Which is the subsequent paragraph. Usually, this should be slightly indented from the key to make it readable, although this is not required.
The description can also contain several paragraphs, although you have to do that by separating them with an empty line.

The description list above is produced from the following text:

 -the key:
     Which is a single-line paragraph prefixed with a dash ("-"), and
     ending in a colon ":".  No other text is permitted on this line.
 -the description:
     Which is the subsequent paragraph.  Usually, this should be slightly
     indented from the key to make it readable, although this is not
     required. 

     The description can also contain several paragraphs, although you
     have to do that by separating them with an empty line.

COMPUTER VOICE

When documenting code, you often need to provide code examples, or output/input into functions and procedures. For this we use computer voice, which is ususally presented in a monospace typewriter-style font such as Courier. If you need to provide a code example, you should use preformated text as demonstrated in the following example:

  [
    To provide formatted text, start with a single line containing only
    a left square bracket "[". 
 
     A block of preformatted text is considered as one single
     paragraph, so empty lines or different indentation will be
        reproduced verbatim in the output.  The indentation of the coda
     will be reckoned relatively from the first "[".
  ]

In order to end the preformatted text, you must provide a single line containining only a right square bracket "]", having the exact same indentation as the initial bracket. This allows the preformatted text to have lines containing a single "]" if it has a different indentation that the initial one. As was necessary to produce the example of preformatted text above.

When you need to produce computer voice in running text, embrace the text in [square brackets]. Whenever you are reffering to a metasymbol (that is, an artificial term that has meaning only when it is replaced by a value or symbol), embrace it in <angular braces>. Metasymbols are usually presented in italic text, but to avoid confusion with ordinary emphasized terms, Coda also presents metasymbols in a monospace font.

CodaText allows you to nest these markup tags into each other. The following example reproduces the synopsis of the tcl array command:

  [array <option> <arrayName> ?<arg> <arg> ...?]

which would produce the output

array option arrayName ?arg arg ...?

If your text contains a reference to a command, such as a menu command, embrace the command in a pair of |vertical bars|. This will make the command stad out in bold text in the output.

EMPHASIZED TEXT AND LINKS

Emphasized text are usually italicized. In CodaText you embrace the text to be emphasized in a pair of /forward slashes/.

You can also make a reference to the name of another named document (declared by the ~name header key), by prefixing the name with a dollar sign ("$"). However, this works fine if the referred name is followed by a space character. But, if it occurs at the end of a sentence, the CodaText parser cannot determine whether the period is a namespace separator or the end of the sentence. Therefore, if your reference is to be succeeded by a period or a colon, terminate the reference by another dollar sign like $this$.

A link to any arbitrary url can be made by using the following syntax:

  @{<url> <link text>}

If the "@" character is not followed by a left curly brace, it will be reproduced verbatim (as for example in an email address). The first character space (can also be a newline) inside the curly braces separates the url from the link text.

The example @\{http://coda.sourceforge.net coda home page} will produce a symbolik link to the coda home page. A problem here is if such links are used with Pascal, as the right curly brace effectively would trick the compiler into thinking it's the end of the comment. Therefore, it is also allowed to use ordinary parenthesis for the url link.

SPECIAL CHARACTERS

As you can see, the CodaText format relies on a set of special characters to provide formatting instructions to the CodaText parser. Whenever you need to reproduce these characters verbatim, such as the forward slash, you simply use the backslash as an escape character. Any character that follows a backslash will be reproduced verbatim in the output. Hence, the sequence "\n" will produce the character "n", not a new line. Use this to produce [, ], <, > and so forth.

Also bear in mind that any non-letter symbol might be utilized in future versions of CodaText, so it is good practice to 'escape' them with the backslash to avoid compatibility problems with future versions of Coda. Typical characters that might be used are %, &, * and #.

FIGURES AND TABLES

The current version of Coda does not support tables, but a limited support for figures is implemented. 'Limmited' in this context means that it is only supported for html output and only accepts images accepted by the html <img> tag.

A figure is recognized a paragraph prefixed with the sequence \fig(url). The url can be a local file name, or a an url to an image on the internet. The remainder of the paragraph will be inserted as the figure caption.

A table will be recognized as a paragraph prefixed with the sequence \table. The format of the rest of the paragraph has not yet been determined, so suggestions are welcome.

USING CODATEXT IN PRACTICE

In order to put CodaText into practice, the text must be embedded into the comments of the programming language it is ussed. In order for a CodaText parser to distinguish comments produced in CodaText and other comments, the comments needs to be formatted in various ways, depending on the commenting style and the commenting markup implemented by the programming language.

A typical programming language like Tcl, (which Coda is implemented in), uses the pound character ("#") to prefix all comments. Programming languages that uses this styles of commenting is said to provide a prefix style commenting syntax. A programming language, such as Pascal that embraces comments within curly braces uses an embrace style commenting syntax. Programming languages like java and C++ provide both commenting syntaxes.

Coda uses a Comment Reader to extract comments in CodaText from the source file. The job of the Comment Reader is to extract the CodaText comments from the source and to trim away characters that belong to the comment markup (such as the comment prefix).

CodaText comments for prefix-style comment syntax

For prefix style comments, CodaText comments are recognized by the standard comment prefix for the language, extended with a few other predefined characters comprising a single line. All subsequent commented lines will be considered part of the CodaText comment. For languages like Tcl and Perl, the first line must contain a double pound "##", where as all subsequent lines must be prefixed by one or more pounds. For example

  ##
  # This is a CodaText comment.  Note that the first double pound needs to
  # be alone on the line
  #
  #   Note that the comment may contain several prefixes in sequence,
  #   they will all be stripped a way by the reader, like in the last
  #   line, which has a double pound for aestetical reasons.
  ##

The following table defines the prefixes recognized for programming languages having a prefix-style comment syntax:

                       prefix       first line
   Tcl, Perl ...         #              ##
   Lisp                  ;              ;;
   Visual Basic          '              ''

CodaText comments for embrace-style comment syntax

For embrace-style comments, the start sequence needs to be modified, and also this needs to comprise one line. The end sequence is the normal one for the language. However, it is often common to prefix each line in the comment with a symbol (most often a "*"), to make the comment stand out from the rest of the code. The CodaText parser also allows such prefixes for embrace-style comments, and it happily distinguishes it from CodaText characters (such as the bullet of list items) provided that the initial prefix is in the same column as the same character in the start sequence. Hence, the following C comment would comme out correctly:

  /**
   *  This is a normal paragraph
   *  * This is an item in a list
   *  * This is another item
   **/

The following table defines the comment markers are recognized for programming languages having a embrace-style comment syntax:

                        srtart      end     prefix
   C, C++, Java, C#      /**        */        *
   Pascal                {**        }         *

Plain CodaText files.

Coda can also read files written in raw CodaText. Files having the extension ".txt" or ".coda" will be considered to contain raw coda text, such as in this document.

In such documents, you can also write comments that will be striped away before passig the text to the CodaText parser. Such comments are identified by lines having the percent character ("%") as the first non-space character. Thus, a percent character within the running text, will not make the rest of a line into a comment. Note that if a comment is inserted between two non-empty lines of text, the comment will act as a paragraph separator. You must therefor nevere insert comments within paragraphs.


in file: "format.txt"