docstrip (3)
NAME
docstrip - Docstrip style source code extractionSYNOPSIS
package require Tcl 8.4package require docstrip ?1.2?
docstrip::extract text terminals ?option value ...?
docstrip::sourcefrom filename terminals ?option value ...?
DESCRIPTION
Docstrip is a tool created to support a brand of Literate Programming. It is most common in the (La)TeX community, where it is being used for pretty much everything from the LaTeX core and up, but there is nothing about docstrip which prevents using it for other types of software.In short, the basic principle of literate programming is that program source should primarily be written and structured to suit the developers (and advanced users who want to peek "under the hood"), not to suit the whims of a compiler or corresponding source code consumer. This means literate sources often need some kind of "translation" to an illiterate form that dumb software can understand. The docstrip Tcl package handles this translation.
Even for those who do not whole-hartedly subscribe to the philosophy behind literate programming, docstrip can bring greater clarity to in particular:
- *
- programs employing non-obvious mathematics
- *
- projects where separate pieces of code, perhaps in different languages, need to be closely coordinated.
The first is by providing access to much more powerful typographical features for source code comments than are possible in plain text. The second is because all the separate pieces of code can be kept next to each other in the same source file.
The way it works is that the programmer edits directly only one or several "master" source code files, from which docstrip generates the more traditional "source" files compilers or the like would expect. The master sources typically contain a large amount of documentation of the code, sometimes even in places where the code consumers would not allow any comments. The etymology of "docstrip" is that this documentation was stripped away (although "code extraction" might be a better description, as it has always been a matter of copying selected pieces of the master source rather than deleting text from it). The docstrip Tcl package contains a reimplementation of the basic extraction functionality from the docstrip program, and thus makes it possible for a Tcl interpreter to read and interpret the master source files directly.
Readers who are not previously familiar with docstrip but want to know more about it may consult the following sources.
- [1]
- The tclldoc package and class, ctan.org/tex-archive/macros/latex/contrib/tclldoc.
- [2]
- The DocStrip utility, ctan.org/tex-archive/macros/latex/base/docstrip.dtx.
- [3]
- The doc and shortvrb Packages, ctan.org/tex-archive/macros/latex/base/doc.dtx.
- [4]
- Chapter 14 of The LaTeX Companion (second edition), Addison-Wesley, 2004; ISBN 0-201-36299-6.
FILE FORMAT
The basic unit docstrip operates on are the lines of a master source file. Extraction consists of selecting some of these lines to be copied from input text to output text. The basic distinction is that between code lines (which are copied and do not begin with a percent character) and comment lines (which begin with a percent character and are not copied).-
docstrip::extract [join { {% comment} {% more comment !"#$%&/(} {some command} { % blah $blah "Not a comment."} {% abc; this is comment} {# def; this is code} {ghi} {% jkl} } \n] {}
-
join { {some command} { % blah $blah "Not a comment."} {# def; this is code} {ghi} "" } \n
Besides the basic code and comment lines, there are also guard lines, which begin with the two characters '%<', and meta-comment lines, which begin with the two characters '%%'. Within guard lines there is furthermore the distinction between verbatim guard lines, which begin with '%<<', and ordinary guard lines, where the '%<' is not followed by another '<'. The last category is by far the most common.
Ordinary guard lines conditions extraction of the code line(s) they guard by the value of a boolean expression; the guarded block of code lines will only be included if the expression evaluates to true. The syntax of an ordinary guard line is one of
-
'%' '<' STARSLASH EXPRESSION '>' '%' '<' PLUSMINUS EXPRESSION '>' CODE
-
STARSLASH ::= '*' | '/' PLUSMINUS ::= | '+' | '-' EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION | SECONDARY '|' EXPRESSION SECONDARY ::= PRIMARY | PRIMARY '&' SECONDARY PRIMARY ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')' CODE ::= { any character except end-of-line }
In the case of a '%<*EXPRESSION>' guard, the lines guarded are all lines up to the next '%</EXPRESSION>' guard with the same EXPRESSION (compared as strings). The blocks of code delimited by such '*' and '/' guard lines must be properly nested.
-
set text [join { {begin} {%<*foo>} {1} {%<*bar>} {2} {%</bar>} {%<*!bar>} {3} {%</!bar>} {4} {%</foo>} {5} {%<*bar>} {6} {%</bar>} {end} } \n] set res [docstrip::extract $text foo] append res [docstrip::extract $text {foo bar}] append res [docstrip::extract $text bar]
-
join { {begin} {1} {3} {4} {5} {end} {begin} {1} {2} {4} {5} {6} {end} {begin} {5} {6} {end} "" } \n
Metacomment lines are "comment lines which should not be stripped away", but be extracted like code lines; these are sometimes used for copyright notices and similar material. The '%%' prefix is however not kept, but substituted by the current -metaprefix, which is customarily set to some "comment until end of line" character (or character sequence) of the language of the code being extracted.
-
set text [join { {begin} {%<foo> foo} {%<+foo>plusfoo} {%<-foo>minusfoo} {middle} {%% some metacomment} {%<*foo>} {%%another metacomment} {%</foo>} {end} } \n] set res [docstrip::extract $text foo -metaprefix {# }] append res [docstrip::extract $text bar -metaprefix {#}]
-
join { {begin} { foo} {plusfoo} {middle} {# some metacomment} {# another metacomment} {end} {begin} {minusfoo} {middle} {# some metacomment} {end} "" } \n
-
set text [join { {begin} {%<*myblock>} {some stupid()} { #computer<program>} {%<<QQQ-98765} {% These three lines are copied verbatim (including percents} {%% even if -metaprefix is something different than %%).} {%</myblock>} {%QQQ-98765} { using*strange@programming<language>} {%</myblock>} {end} } \n] set res [docstrip::extract $text myblock -metaprefix {# }] append res [docstrip::extract $text {}]
-
join { {begin} {some stupid()} { #computer<program>} {% These three lines are copied verbatim (including percents} {%% even if -metaprefix is something different than %%).} {%</myblock>} { using*strange@programming<language>} {end} {begin} {end} "" } \n
The final piece of docstrip syntax is that extraction stops at a line that is exactly "\endinput"; this is often used to avoid copying random whitespace at the end of a file. In the unlikely case that one wants such a code line, one can protect it with a verbatim guard.
COMMANDS
The package defines two commands.- docstrip::extract text terminals ?option value ...?
-
The extract command docstrips the text and returns the
extracted lines of code, as a string with each line terminated with
a newline. The terminals is the list of those guard
expression terminals which should evaluate to true.
The available options are:
-
- -annotate lines
- Requests the specified number of lines of annotation to follow each extracted line in the result. Defaults to 0. Annotation lines are mostly useful when the extracted lines are to undergo some further transformation. A first annotation line is a list of three elements: line type, prefix removed in extraction, and prefix inserted in extraction. The line type is one of: 'V' (verbatim), 'M' (metacomment), '+' (+ or no modifier guard line), '-' (- modifier guard line), '.' (normal line). A second annotation line is the source line number. A third annotation line is the current stack of block guards. Requesting more than three lines of annotation is currently not supported.
- -metaprefix string
- The string by which the '%%' prefix of a metacomment line will be replaced. Defaults to '%%'. For Tcl code this would typically be '#'.
- -onerror keyword
-
Controls what will be done when a format error in the text
being processed is detected. The settings are:
-
- ignore
- Just ignore the error; continue as if nothing happened.
- puts
- Write an error message to stderr, then continue processing.
- throw
- Throw an error. The -errorcode is set to a list whose first element is DOCSTRIP, second element is the type of error, and third element is the line number where the error is detected. This is the default.
-
- -trimlines boolean
- Controls whether spaces at the end of a line should be trimmed away before the line is processed. Defaults to true.
-
- It should be remarked that the terminals are often called "options" in the context of the docstrip program, since these specify which optional code fragments should be included.
- docstrip::sourcefrom filename terminals ?option value ...?
- The sourcefrom command is a docstripping emulation of source. It opens the file filename, reads it, closes it, docstrips the contents as specified by the terminals, and evaluates the result in the local context of the caller, during which time the info script value will be the filename. The options are passed on to fconfigure to configure the file before its contents are read. The -metaprefix is set to '#', all other extract options have their default values.
DOCUMENT STRUCTURE
The file format (as described above) determines whether a master source code file can be processed correctly by docstrip, but the usefulness of the format is to no little part also dependent on that the code and comment lines together constitute a well-formed document.For a document format that does not require any non-Tcl software, see the ddt2man command in the docstrip::util package. It is suggested that files employing that document format are given the suffix ".ddt", to distinguish them from the more traditional LaTeX-based ".dtx" files.
Master source files with ".dtx" extension are usually set up so that they can be typeset directly by latex without any support from other files. This is achieved by beginning the file with the lines
-
% \iffalse %<*driver> \documentclass{tclldoc} \begin{document} \DocInput{filename.dtx} \end{document} %</driver> % \fi
The function of the \iffalse ... \fi is to skip lines two to seven on this second time through; this is similar to the "if 0 { ... }" idiom for block comments in Tcl code, and it is needed here because (amongst other things) the \documentclass command may only be executed once. The function of the <driver> guards is to prevent this short piece of LaTeX code from being extracted by docstrip. The total effect is that the file can function both as a LaTeX document and as a docstrip master source code file.
It is not necessary to use the tclldoc document class, but that does provide a number of features that are convenient for ".dtx" files containing Tcl code. More information on this matter can be found in the references above.
SEE ALSO
docstrip_utilKEYWORDS
\.dtx, LaTeX, docstrip, documentation, literate programming, sourceCATEGORY
Documentation toolsCOPYRIGHT
Copyright (c) 2003–2010 Lars Hellström <Lars dot Hellstrom at residenset dot net>