Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MANDOC(3)		 BSD Library Functions Manual		     MANDOC(3)

NAME
     mandoc, deroff, mandocmsg,	man_mparse, man_validate, mdoc_validate,
     mparse_alloc, mparse_free,	mparse_getkeep,	mparse_keep, mparse_open,
     mparse_readfd, mparse_reset, mparse_result, mparse_strerror,
     mparse_strlevel --	mandoc macro compiler library

SYNOPSIS
     #include <sys/types.h>
     #include <mandoc.h>

     #define ASCII_NBRSP
     #define ASCII_HYPH
     #define ASCII_BREAK

     struct mparse *
     mparse_alloc(int options, enum mandoclevel	wlevel,	mandocmsg mmsg,
	 char *defos);

     void
     (*mandocmsg)(enum mandocerr errtype, enum mandoclevel level,
	 const char *file, int line, int col, const char *msg);

     void
     mparse_free(struct	mparse *parse);

     const char	*
     mparse_getkeep(const struct mparse	*parse);

     void
     mparse_keep(struct	mparse *parse);

     int
     mparse_open(struct	mparse *parse, const char *fname);

     enum mandoclevel
     mparse_readfd(struct mparse *parse, int fd, const char *fname);

     void
     mparse_reset(struct mparse	*parse);

     void
     mparse_result(struct mparse *parse, struct	roff_man **man,
	 char **sodest);

     const char	*
     mparse_strerror(enum mandocerr);

     const char	*
     mparse_strlevel(enum mandoclevel);

     #include <roff.h>

     void
     deroff(char **dest, const struct roff_node	*node);

     #include <sys/types.h>
     #include <mandoc.h>
     #include <mdoc.h>

     extern const char * const * mdoc_argnames;
     extern const char * const * mdoc_macronames;

     void
     mdoc_validate(struct roff_man *mdoc);

     #include <sys/types.h>
     #include <mandoc.h>
     #include <man.h>

     extern const char * const * man_macronames;

     const struct mparse *
     man_mparse(const struct roff_man *man);

     void
     man_validate(struct roff_man *man);

DESCRIPTION
     The mandoc	library	parses a UNIX manual into an abstract syntax tree
     (AST).  UNIX manuals are composed of mdoc(7) or man(7), and may be	mixed
     with roff(7), tbl(7), and eqn(7) invocations.

     The following describes a general parse sequence:

     1.	  initiate a parsing sequence with mchars_alloc(3) and mparse_alloc();

     2.	  open a file with open(2) or mparse_open();

     3.	  parse	it with	mparse_readfd();

     4.	  close	it with	close(2);

     5.	  retrieve the syntax tree with	mparse_result();

     6.	  depending on whether the macroset member of the returned struct
	  roff_man is MACROSET_MDOC or MACROSET_MAN, validate it with
	  mdoc_validate() or man_validate(), respectively;

     7.	  iterate over parse nodes with	starting from the first	member of the
	  returned struct roff_man;

     8.	  free all allocated memory with mparse_free() and mchars_free(3), or
	  invoke mparse_reset()	and go back to step 2 to parse new files.

REFERENCE
     This section documents the	functions, types, and variables	available via
     <mandoc.h>, with the exception of those documented	in mandoc_escape(3)
     and mchars_alloc(3).

   Types
     enum mandocerr
     An	error or warning message during	parsing.

     enum mandoclevel
     A classification of an enum mandocerr as regards system operation.	 See
     the DIAGNOSTICS section in	mandoc(1) regarding the	meanings of the	lev-
     els.

     struct mparse
     An	opaque pointer to a running parse sequence.  Created with
     mparse_alloc() and	freed with mparse_free().  This	may be used across
     parsed input if mparse_reset() is called between parses.

     mandocmsg
     A prototype for a function	to handle error	and warning messages emitted
     by	the parser.

   Functions
     deroff()
     Obtain a text-only	representation of a struct roff_node, including	text
     contained in its child nodes.  To be used on children of the first	member
     of	struct roff_man.  When it is no	longer needed, the pointer returned
     from deroff() can be passed to free(3).

     man_mparse()
     Get the parser used for the current output.  Declared in <man.h>, imple-
     mented in man.c.

     man_validate()
     Validate the MACROSET_MAN parse tree obtained with	mparse_result().  De-
     clared in <man.h>,	implemented in man.c.

     mdoc_validate()
     Validate the MACROSET_MDOC	parse tree obtained with mparse_result().  De-
     clared in <mdoc.h>, implemented in	mdoc.c.

     mparse_alloc()
     Allocate a	parser.	 The arguments have the	following effect:

	  options  When	the MPARSE_MDOC	or MPARSE_MAN bit is set, only that
		   parser is used.  Otherwise, the document type is automati-
		   cally detected.

		   When	the MPARSE_SO bit is set, roff(7) so file inclusion
		   requests are	always honoured.  Otherwise, if	the request is
		   the only content in an input	file, only the file name is
		   remembered, to be returned in the sodest argument of
		   mparse_result().

		   When	the MPARSE_QUICK bit is	set, parsing is	aborted	after
		   the NAME section.  This is for example useful in
		   makewhatis(8) -Q to quickly build minimal databases.

	  wlevel   Can be set to MANDOCLEVEL_BADARG, MANDOCLEVEL_ERROR,	or
		   MANDOCLEVEL_WARNING.	 Messages below	the selected level
		   will	be suppressed.

	  mmsg	   A callback function to handle errors	and warnings.  See
		   main.c for an example.  If printing of error	messages is
		   not desired,	NULL may be passed.

	  defos	   A default string for	the mdoc(7) `Os' macro,	overriding the
		   OSNAME preprocessor definition and the results of uname(3).
		   Passing NULL	sets no	default.

     The same parser may be used for multiple files so long as mparse_reset()
     is	called between parses.	mparse_free() must be called to	free the mem-
     ory allocated by this function.  Declared in <mandoc.h>, implemented in
     read.c.

     mparse_free()
     Free all memory allocated by mparse_alloc().  Declared in <mandoc.h>, im-
     plemented in read.c.

     mparse_getkeep()
     Acquire the keep buffer.  Must follow a call of mparse_keep().  Declared
     in	<mandoc.h>, implemented	in read.c.

     mparse_keep()
     Instruct the parser to retain a copy of its parsed	input.	This can be
     acquired with subsequent mparse_getkeep() calls.  Declared	in <mandoc.h>,
     implemented in read.c.

     mparse_open()
     Open the file for reading.	 If that fails and fname does not already end
     in	`.gz', try again after appending `.gz'.	 Save the information whether
     the file is zipped	or not.	 Return	a file descriptor open for reading or
     -1	on failure.  It	can be passed to mparse_readfd() or used directly.
     Declared in <mandoc.h>, implemented in read.c.

     mparse_readfd()
     Parse a file descriptor opened with open(2) or mparse_open().  Pass the
     associated	filename in fname.  This function may be called	multiple times
     with different parameters;	however, close(2) and mparse_reset() should be
     invoked between parses.  Declared in <mandoc.h>, implemented in read.c.

     mparse_reset()
     Reset a parser so that mparse_readfd() may	be used	again.	Declared in
     <mandoc.h>, implemented in	read.c.

     mparse_result()
     Obtain the	result of a parse.  One	of the two pointers will be filled in.
     Declared in <mandoc.h>, implemented in read.c.

     mparse_strerror()
     Return a statically-allocated string representation of an error code.
     Declared in <mandoc.h>, implemented in read.c.

     mparse_strlevel()
     Return a statically-allocated string representation of a level code.  De-
     clared in <mandoc.h>, implemented in read.c.

   Variables
     man_macronames
     The string	representation of a man(7) macro as indexed by enum mant.

     mdoc_argnames
     The string	representation of an mdoc(7) macro argument as indexed by enum
     mdocargt.

     mdoc_macronames
     The string	representation of an mdoc(7) macro as indexed by enum mdoct.

IMPLEMENTATION NOTES
     This section consists of structural documentation for mdoc(7) and man(7)
     syntax trees and strings.

   Man and Mdoc	Strings
     Strings may be extracted from mdoc	and man	meta-data, or from text	nodes
     (MDOC_TEXT	and MAN_TEXT, respectively).  These strings have special non-
     printing formatting cues embedded in the text itself, as well as roff(7)
     escapes preserved from input.  Implementing systems will need to handle
     both situations to	produce	human-readable text.  In general, strings may
     be	assumed	to consist of 7-bit ASCII characters.

     The following non-printing	characters may be embedded in text strings:

     ASCII_NBRSP
	     A non-breaking space character.

     ASCII_HYPH
	     A soft hyphen.

     ASCII_BREAK
	     A breakable zero-width space.

     Escape characters are also	passed verbatim	into text strings.  An escape
     character is a sequence of	characters beginning with the backslash	(`\').
     To	construct human-readable text, these should be intercepted with
     mandoc_escape(3) and converted with one the functions described in
     mchars_alloc(3).

   Man Abstract	Syntax Tree
     This AST is governed by the ontological rules dictated in man(7) and de-
     rives its terminology accordingly.

     The AST is	composed of struct roff_node nodes with	element, root and text
     types as declared by the type field.  Each	node also provides its parse
     point (the	line, pos, and sec fields), its	position in the	tree (the
     parent, child, next and prev fields) and some type-specific data.

     The tree itself is	arranged according to the following normal form, where
     capitalised non-terminals represent nodes.

     ROOT	<- mnode+
     mnode	<- ELEMENT | TEXT | BLOCK
     BLOCK	<- HEAD	BODY
     HEAD	<- mnode*
     BODY	<- mnode*
     ELEMENT	<- ELEMENT | TEXT*
     TEXT	<- [[:ascii:]]*

     The only elements capable of nesting other	elements are those with	next-
     line scope	as documented in man(7).

   Mdoc	Abstract Syntax	Tree
     This AST is governed by the ontological rules dictated in mdoc(7) and de-
     rives its terminology accordingly.	 "In-line" elements described in
     mdoc(7) are described simply as "elements".

     The AST is	composed of struct roff_node nodes with	block, head, body, el-
     ement, root and text types	as declared by the type	field.	Each node also
     provides its parse	point (the line, pos, and sec fields), its position in
     the tree (the parent, child, last,	next and prev fields) and some type-
     specific data, in particular, for nodes generated from macros, the	gener-
     ating macro in the	tok field.

     The tree itself is	arranged according to the following normal form, where
     capitalised non-terminals represent nodes.

     ROOT	<- mnode+
     mnode	<- BLOCK | ELEMENT | TEXT
     BLOCK	<- HEAD	[TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
     ELEMENT	<- TEXT*
     HEAD	<- mnode*
     BODY	<- mnode* [ENDBODY mnode*]
     TAIL	<- mnode*
     TEXT	<- [[:ascii:]]*

     Of	note are the TEXT nodes	following the HEAD, BODY and TAIL nodes	of the
     BLOCK production: these refer to punctuation marks.  Furthermore, al-
     though a TEXT node	will generally have a non-zero-length string, in the
     specific case of `.Bd -literal', an empty line will produce a zero-length
     string.  Multiple body parts are only found in invocations	of `Bl
     -column', where a new body	introduces a new phrase.

     The mdoc(7) syntax	tree accommodates for broken block structures as well.
     The ENDBODY node is available to end the formatting associated with a
     given block before	the physical end of that block.	 It has	a non-null end
     field, is of the BODY type, has the same tok as the BLOCK it is ending,
     and has a pending field pointing to that BLOCK's BODY node.  It is	an in-
     direct child of that BODY node and	has no children	of its own.

     An	ENDBODY	node is	generated when a block ends while one of its child
     blocks is still open, like	in the following example:

	   .Ao ao
	   .Bo bo ac
	   .Ac bc
	   .Bc end

     This example results in the following block structure:

	   BLOCK Ao
	       HEAD Ao
	       BODY Ao
		   TEXT	ao
		   BLOCK Bo, pending ->	Ao
		       HEAD Bo
		       BODY Bo
			   TEXT	bo
			   TEXT	ac
			   ENDBODY Ao, pending -> Ao
			   TEXT	bc
	   TEXT	end

     Here, the formatting of the `Ao' block extends from TEXT ao to TEXT ac,
     while the formatting of the `Bo' block extends from TEXT bo to TEXT bc.
     It	renders	as follows in -Tascii mode:

	   <ao [bo ac> bc] end

     Support for badly-nested blocks is	only provided for backward compatibil-
     ity with some older mdoc(7) implementations.  Using badly-nested blocks
     is	strongly discouraged; for example, the -Thtml and -Txhtml front-ends
     to	mandoc(1) are unable to	render them in any meaningful way.  Further-
     more, behaviour when encountering badly-nested blocks is not consistent
     across troff implementations, especially when using multiple levels of
     badly-nested blocks.

SEE ALSO
     mandoc(1),	man.cgi(3), mandoc_escape(3), mandoc_headers(3),
     mandoc_malloc(3), mansearch(3), mchars_alloc(3), tbl(3), eqn(7), man(7),
     mandoc_char(7), mdoc(7), roff(7), tbl(7)

AUTHORS
     The mandoc	library	was written by Kristaps	Dzonsons <kristaps@bsd.lv> and
     is	maintained by Ingo Schwarze <schwarze@openbsd.org>.

BSD				 July 7, 2016				   BSD

NAME | SYNOPSIS | DESCRIPTION | REFERENCE | IMPLEMENTATION NOTES | SEE ALSO | AUTHORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=mandoc&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help