OODoc::Parser::Markov

The Markov parser is named after the author, because the author likes to invite other people to write their own parser as well: every one has not only their own coding style, but also their own documentation wishes.

The task for the parser is to strip Perl package files into a code part and a documentation tree. The code is written to a directory where the module distribution is built, the documenation tree is later formatted into manual pages.

See DESCRIPTION in OODoc::Parser

DETAILS

General Description

The Markov parser has some commonalities with the common POD syntax. You can use the same tags as are defined by POD, however these tags are "visual style", which means that OODoc can not treat it smart. The Markov parser adds many logical markups which will produce nicer pages.

Furthermore, the parser will remove the documentation from the source code, because otherwise the package installation would fail: Perl's default installation behavior will extract POD from packages, but the markup is not really POD, which will cause many complaints.

The version of the module is defined by the OODoc object which creates the manual page. Therefore, $VERSION will be added to each package automatically.

Disadvantages

The Markov parser removes all raw documentation from the package files, which means that people sending you patches will base them on the processed source: the line numbers will be wrong. Usually, it is not much of a problem to manually process the patch: you have to check the correctness anyway.

A second disadvantage is that you have to backup your sources separately: the sources differ from what is published on CPAN, so CPAN is not your backup anymore. The example scripts, contained in the distribution, show how to produce these "raw" packages.

Finally, a difference with the standard POD process: the manual-page must be preceeded with a package keyword.

Structural tags

Heading

 =chapter       STRING
 =section       STRING
 =subsection    STRING
 =subsubsection STRING

These text structures are used to group descriptive text and subroutines. You can use any name for a chapter, but the formatter expects certain names to be used: if you use a name which is not expected by the formatter, that documentation will be ignored.

Subroutines

Perl has many kinds of subroutines, which are distinguished in the logical markup. The output may be different per kind.

 =i_method  NAME PARAMETERS   (instance method)
 =c_method  NAME PARAMETERS   (class method)
 =ci_method NAME PARAMETERS   (class and instance method)
 =method    NAME PARAMETERS   (short for i_method)
 =function  NAME PARAMETERS
 =tie       NAME PARAMETERS
 =overload  STRING

The NAME is the name of the subroutine, and the PARAMETERS an argument indicator.

Then the subroutine description follows. These tags have to follow the general description of the subroutines. You can use

 =option    NAME PARAMETERS
 =default   NAME VALUE
 =requires  NAME PARAMETERS

If you have defined an =option, you have to provide a =default for this option anywhere. Use of =default for an option on a higher level will overrule the one in a subclass.

Include examples

Examples can be added to chapters, sections, subsections, subsubsections, and subroutines. They run until the next markup line, so can only come at the end of the documentation pieces.

 =example
 =examples

Include diagnostics

A subroutine description can also contain error or warning descriptions. These diagnostics are usually collected into a special chapter of the manual page.

 =error this is very wrong
 Of course this is not really wrong, but only as an example
 how it works.

 =warning wrong, but not sincerely
 Warning message, which means that the program can create correct output
 even though it found sometning wrong.

Compatibility

For comfort, all POD markups are supported as well

 =head1 Heading Text   (same as =chapter)
 =head2 Heading Text   (same as =section)
 =head3 Heading Text   (same as =subsection)
 =head4 Heading Text   (same as =subsubsection)
 =over indentlevel
 =item stuff

 =cut
 =pod
 =begin format
 =end format
 =for format text...

Text markup

Next to the structural markup, there is textual markup. This markup is the same as POD defines in the perlpod manual page. For instance, <some codeE<gt> can be used to create visual markup as a code fragment.

One kind is added to the standard list: the M.

The M-link

The M-link can not be nested inside other text markup items. It is used to refer to manuals, subroutines, and options. You can use an L-link to manuals as well, however then the POD output filter will modify the manual page while converting it to other manual formats.

Syntax of the M-link:

 M < OODoc::Object >
 M < OODoc::Object::new() >
 M < OODoc::Object::new(verbose) >
 M < new() >
 M < new(verbose) >

These links refer to a manual page, a subroutine within a manual page, and an option of a subroutine respectively. And then two abbreviations are shown: they refer to subroutines of the same manual page, in which case you may refer to inherited documentation as well.

The L-link

The standard POD defines a L markup tag. This can also be used with this Markov parser.

The following syntaxes are supported:

 L < manual >
 L < manual/section >
 L < manual/"section" >
 L < manual/subsection >
 L < manual/"subsection" >
 L < /section >
 L < /"section" >
 L < /subsection >
 L < /"subsection" >
 L < "section" >
 L < "subsection" >
 L < "subsubsection" >
 L < unix-manual >
 L < url >

In the above, manual is the name of a manual, section the name of any section (in that manual, by default the current manual), and subsection a subsection (in that manual, by default the current manual).

The unix-manual MUST be formatted with its chapter number, for instance cat(1), otherwise a link will be created. See the following examples in the html version of these manual pages:

 M < perldoc >              illegal: not in distribution
 L < perldoc >              manual perldoc
 L < perldoc(1perl) >       manual perldoc(1perl)
 M < OODoc::Object >        OODoc::Object
 L < OODoc::Object >        OODoc::Object
 L < OODoc::Object(3pm) >   manual OODoc::Object(3pm)

Grouping subroutines

Subroutine descriptions can be grouped in a chapter, section, subsection, or subsubsection. It is very common to have a large number of subroutines, so some structure has to be imposed here.

If you document the same routine in more than one manual page with an inheritance relationship, the documentation location shall not conflict. You do not need to give the same level of detail about the exact location of a subroutine, as long as it is not conflicting. This relative freedom is created to be able to regroup existing documentation without too much effort.

For instance, in the code of OODoc itself (which is of course documented with OODoc), the following happens:

 package OODoc::Object;
 ...
 =chapter METHODS
 =section Initiation
 =c_method new OPTIONS

 package OODoc;
 use base 'OODoc::Object';
 =chapter METHODS
 =c_method new OPTIONS

As you can see in the example, in the higher level of inheritance, the new method is not put in the Initiation section explicitly. However, it is located in the METHODS chapter, which is required to correspond to the base class. The generated documentation will show new in the Initiation section in both manual pages.

Caveats

The markov parser does not require blank lines before or after tags, like POD does. This means that the change to get into parsing problems have increased: lines within here documents which start with a = will cause confusion. However, I these case, you can usually simply add a backslash in front of the printed =, which will disappear once printed.

Examples

You may also take a look at the raw code archive for OODoc (the text as is before it was processed for distribution).

» Example: how subroutines are documented

 =chapter FUNCTIONS

 =function countCharacters FILE|STRING, OPTIONS
 Returns the number of bytes in the FILE or STRING,
 or undef if the string is undef or the character
 set unknown.

 =option  charset CHARSET
 =default charset 'us-ascii'
 Characters in, for instance, utf-8 or unicode encoding
 require variable number of bytes per character.  The
 correct CHARSET is needed for the correct result.

 =examples

   my $count = countCharacters("monkey");
   my $count = countCharacters("monkey",
       charset => 'utf-8');

 =error unknown character set $charset

 The character set you can use is limited by the sets
 defined by manual Encode.  The characters of the input can
 not be seperated from each other without this definition.

 =cut

 # now the coding starts
 sub countCharacters($@) {
    my ($self, $input, %options) = @_;
    ...
 }