Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
SGMLS(1)		    General Commands Manual		      SGMLS(1)

NAME
       sgmls - a validating SGML parser

       An  System Conforming to
       International Standard ISO 8879 --
       Standard	Generalized Markup Language

SYNOPSIS
       sgmls [ -deglprsuv ] [ -cfile ] [ -iname	] [ -mfile ] [ filename...  ]

DESCRIPTION
       Sgmls  parses  and  validates  the  document entity in filename...  and
       prints on the standard output a simple ASCII representation of its Ele-
       ment  Structure	Information Set.  (This	is the information set which a
       structure-controlled conforming	application should  act	 upon.)	  Note
       that the	document entity	may be spread amongst several files; for exam-
       ple, the	SGML declaration, document type	declaration and	 document  in-
       stance set could	each be	in a separate file.  If	no filenames are spec-
       ified, then sgmls will read the document	entity from the	 standard  in-
       put.  A filename	of - can also be used to refer to the standard input.

       The following options are available:

       -cfile Report  any capacity limits that are exceeded and	write a	report
	      of capacity usage	to file.  The report is	in  the	 format	 of  a
	      RACT  result.   RACT  is	the Reference Application for Capacity
	      Testing defined in the Proposed American National	Standard  Con-
	      formance	Testing	for Standard Generalized Markup	Language (SGL)
	      Systems (X3.190-199X), Draft July	1991.

       -d     Warn about duplicate entity declarations.

       -e     Describe open entities in	error messages.	 Error messages	always
	      include  the  position  of the most recently opened external en-
	      tity.

       -g     Show the GIs of open elements in error messages.

       -iname Pretend that

		     <!ENTITY %	name "INCLUDE">

	      occurs at	the start of the document type declaration  subset  in
	      the   document  entity.  Since repeated definitions of an	entity
	      are ignored, this	definition will	take precedence	over any other
	      definitions  of  this  entity  in	the document type declaration.
	      Multiple -i options are allowed.	If the	 declaration  replaces
	      the reserved name	INCLUDE	then the new reserved name will	be the
	      replacement text of the entity.	Typically  the	document  type
	      declaration will contain

		     <!ENTITY %	name "IGNORE">

	      and  will	 use  %name;  in the status keyword specification of a
	      marked section declaration.  In this case	the effect of the  op-
	      tion will	be to cause the	marked section not to be ignored.

       -l     Output L commands	giving the current line	number and filename.

       -mfile Map  public  identifiers	and entity names to system identifiers
	      using the	catalog	entry file file.  Multiple -m options are  al-
	      lowed.  Catalog entry files specified with the -m	option will be
	      searched before the defaults.

       -p     Parse only the prolog.  Sgmls will exit after parsing the	 docu-
	      ment type	declaration.  Implies -s.

       -r     Warn about defaulted references.

       -s     Suppress output.	Error messages will still be printed.

       -u     Warn  about undefined elements: elements used in the DTD but not
	      defined.

       -v     Print the	version	number.

   Entity Manager
       An external entity resides in one or more files.	  The  entity  manager
       component of sgmls maps a sequence of files into	an entity in three se-
       quential	stages:

       1.     each carriage return character is	turned into a non-SGML charac-
	      ter;

       2.     each  newline  character	is turned into a record	end character,
	      and at the same time a record start character is inserted	at the
	      beginning	of each	line;

       3.     the files	are concatenated.

       A  system identifier is interpreted as a	list of	filenames separated by
       colons.	A filename of -	can be used to refer to	the standard input.

       If a system identifier is not specified,	then the  entity  manager  can
       generate	 one  using  catalog  entry files in the format	defined	in the
       SGML Open Draft Technical Resolution on Entity Management.   A  catalog
       entry  file contains a sequence of entries in one of the	following four
       forms:

       PUBLIC pubid sysid
	      This specifies that sysid	should be used as the  system  identi-
	      fier  if	the the	public identifier is pubid.  Sysid is a	system
	      identifier as defined in ISO 8879	and pubid is a public  identi-
	      fier as defined in ISO 8879.

       ENTITY name sysid
	      This  specifies  that sysid should be used as the	system identi-
	      fier if the entity is a general entity whose name	is name.

       ENTITY %name sysid
	      This specifies that sysid	should be used as the  system  identi-
	      fier  if	the  entity  is	a parameter entity whose name is name.
	      Note that	there is no space between the %	and the	name.

       DOCTYPE name sysid
	      This specifies that sysid	should be used as the  system  identi-
	      fier if the entity is an entity declared in a document type dec-
	      laration whose document type name	is name.

       The last	two forms are extensions to the	SGML Open format.  The	delim-
       iters  can  be  omitted from the	sysid provided it does not contain any
       white space.  Comments are allowed between parameters delimited	by  --
       as  in  SGML.   The  environment	variable SGML_CATALOG_FILES contains a
       colon-separated list of catalog entry files.  These  will  be  searched
       after  any  catalog entry files specified using the -m option.  If this
       environment variable is not set,	then a system dependent	list of	 cata-
       log  entry  files  will be used.	 A match in a catalog entry file for a
       PUBLIC entry will take precedence over a	match in the same file for  an
       ENTITY  or DOCTYPE entry.  A filename in	a system identifier in a cata-
       log entry file is interpreted relative to the directory containing  the
       catalog entry file.

       If  no match can	be found in a catalog entry file, then the entity man-
       ager will attempt to generate a filename	using  the  public  identifier
       (if  there  is  one)  and  other	information available to it.  Notation
       identifiers are not subject to this treatment.  This  process  is  con-
       trolled	by  the	environment variable SGML_PATH;	this contains a	colon-
       separated list of filename templates.  A	filename template is  a	 file-
       name  that may contain substitution fields; a substitution field	is a %
       character followed by a single letter that indicates the	value  of  the
       substitution.  The value	of a substitution can either be	a string or it
       can be null.  The entity	manager	transforms the list of	filename  tem-
       plates  into  a list of filenames by substituting for each substitution
       field and discarding any	template that contained	a  substitution	 field
       whose  value  was null.	It then	uses the first resulting filename that
       exists and is readable.	Substitution values are	transformed before be-
       ing  used for substitution: firstly, any	names that were	subject	to up-
       per case	substitution are folded	to lower case; secondly, space charac-
       ters are	mapped to underscores and slashes are mapped to	percents.  The
       value of	the %S field is	not transformed.  The values  of  substitution
       fields are as follows:

       %%     A	single %.

       %D     The entity's data	content	notation.  This	substitution will suc-
	      ceed only	for external data entities.

       %N     The entity, notation or document type name.

       %P     The public identifier if there was a public  identifier,	other-
	      wise null.

       %S     The system identifier if there was a system identifier otherwise
	      null.

       %X     (This is provided	mainly for  compatibility  with	 ARCSGML.)   A
	      three-letter string chosen as follows:
					 |	      |
					 |	      |	With public identifier
					 |	      +-------------+-----------
					 | No public  |	  Device    |  Device
					 | identifier |	independent | dependent
	      ---------------------------+------------+-------------+-----------
	      Data or subdocument entity | nsd	      |	pns	    | vns
	      General SGML text	entity	 | gml	      |	pge	    | vge
	      Parameter	entity		 | spe	      |	ppe	    | vpe
	      Document type definition	 | dtd	      |	pdt	    | vdt
	      Link process definition	 | lpd	      |	plp	    | vlp

	      The  device  dependent  version  is  selected if the public text
	      class allows a public text display version but  no  public  text
	      display version was specified.

       %Y     The type of thing	for which the filename is being	generated:

	      SGML subdocument entity	 sgml
	      Data entity		 data
	      General text entity	 text
	      Parameter	entity		 parm
	      Document type definition	 dtd
	      Link process definition	 lpd

       The  value  of  the following substitution fields will be null unless a
       valid formal public identifier was supplied.

       %A     Null if the text identifier in the formal	public identifier con-
	      tains an unavailable text	indicator, otherwise the empty string.

       %C     The public text class, mapped to lower case.

       %E     The  public  text	 designating sequence (escape sequence)	if the
	      public text class	is CHARSET, otherwise null.

       %I     The empty	string if the owner identifier in  the	formal	public
	      identifier is an ISO owner identifier, otherwise null.

       %L     The  public text language, mapped	to lower case, unless the pub-
	      lic text class is	CHARSET, in which case null.

       %O     The owner	identifier (with the +// or -//	prefix stripped.)

       %R     The empty	string if the owner identifier in  the	formal	public
	      identifier is a registered owner identifier, otherwise null.

       %T     The public text description.

       %U     The  empty  string  if the owner identifier in the formal	public
	      identifier is an unregistered owner identifier, otherwise	null.

       %V     The public text display version.	This substitution will be null
	      if  the public text class	does not allow a display version or if
	      no version was specified.	 If an empty version was specified,  a
	      value of default will be used.

       Normally	 if  the  external  identifier for an entity includes a	system
       identifier, the entity manager will use the specified system identifier
       and  not	 attempt  to generate one.  If,	however, SGML_PATH uses	the %S
       field, then the entity manager will first search	for a  matching	 entry
       in  the	catalog	 entry	files.	If a match is found, then this will be
       used instead of the specified system  identifier.   Otherwise,  if  the
       specified  system  identifier  does  not	contain	any colons, the	entity
       manager will use	SGML_PATH to generate a	filename.  Otherwise  the  en-
       tity manager will use the specified system identifier.

   System declaration
       The system declaration for sgmls	is as follows:

			  SYSTEM "ISO 8879:1986"
				  CHARSET
       BASESET	"ISO 646-1983//CHARSET
		 International Reference Version (IRV)//ESC 2/5	4/0"
       DESCSET	0 128 0
       CAPACITY	PUBLIC	"ISO 8879:1986//CAPACITY Reference//EN"
				 FEATURES
       MINIMIZE	DATATAG	NO  OMITTAG  YES   RANK	    NO	SHORTTAG YES
       LINK	SIMPLE	NO  IMPLICIT NO	   EXPLICIT NO
       OTHER	CONCUR	NO  SUBDOC   YES 1 FORMAL   YES
       SCOPE	DOCUMENT
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Core//EN"
				 VALIDATE
		GENERAL	YES MODEL    YES   EXCLUDE  YES	CAPACITY YES
		NONSGML	YES SGML     YES   FORMAL   YES
				   SDIF
		PACK	NO  UNPACK   NO

       Exceeding  a  capacity  limit  will  be ignored unless the -c option is
       given.

       The memory usage	of sgmls is not	a function of the capacity points used
       by  a  document;	 however,  sgmls  can  handle capacities significantly
       greater than the	reference capacity set.

       In some environments, higher values may be supported for	the SUBDOC pa-
       rameter.

       Documents  that	do  not	use optional features are also supported.  For
       example,	if FORMAL NO is	specified in the  declaration, public  identi-
       fiers will not be required to be	valid formal public identifiers.

       Certain parts of	the concrete syntax may	be changed:

	      The shunned character numbers can	be changed.

	      Eight bit	characters can be assigned to LCNMSTRT,	UCNMSTRT, LCN-
	      MCHAR and	UCNMCHAR.

	      Uppercase	substitution can be performed or  not  performed  both
	      for entity names and for other names.

	      Either  short reference delimiters assigned by the reference de-
	      limiter set or no	short reference	delimiters are supported.

	      The reserved names can be	changed.

	      The quantity set can be increased	within certain limits  subject
	      to  there	being sufficient memory	available.  The	upper limit on
	      NAMELEN is 239.  The upper limits	on ATTCNT, ATTSPLEN,  BSEQLEN,
	      ENTLVL,  LITLEN,	PILEN, TAGLEN, and TAGLVL are more than	thirty
	      times greater than the reference limits.	 The  upper  limit  on
	      GRPCNT, GRPGTCNT,	and GRPLVL is 253.  NORMSEP cannot be changed.
	      DTAGLEN are DTEMPLEN irrelevant since sgmls does not support the
	      DATATAG feature.

    declaration
       The   declaration may be	omitted, the following declaration will	be im-
       plied:

			     <!SGML "ISO 8879:1986"
				     CHARSET
       BASESET	"ISO 646-1983//CHARSET
		 International Reference Version (IRV)//ESC 2/5	4/0"
       DESCSET	  0  9 UNUSED
		  9  2	9
		 11  2 UNUSED
		 13  1 13
		 14 18 UNUSED
		 32 95 32
		127  1 UNUSED
       CAPACITY	PUBLIC	"ISO 8879:1986//CAPACITY Reference//EN"
       SCOPE	DOCUMENT
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
				    FEATURES
       MINIMIZE	DATATAG	NO OMITTAG  YES		 RANK	  NO  SHORTTAG YES
       LINK	SIMPLE	NO IMPLICIT NO		 EXPLICIT NO
       OTHER	CONCUR	NO SUBDOC   YES	99999999 FORMAL	  YES
				  APPINFO NONE>
       with the	exception that characters 128 through 254 will be assigned  to
       DATACHAR.

       Sgmls  identifies base character	sets using the designating sequence in
       the public identifier.  The following designating sequences are	recog-
       nized:

	 Designating	      ISO	  Minimum      Number
	    Escape	  Registration	 Character	 of		Description
	   Sequence	     Number	  Number     Characters
       ------------------------------------------------------------------------------------
       ESC 2/5 4/0	       -	     0		128	  full set of ISO 646 IRV
       ESC 2/8 4/0		2	    33		 94	  G0 set of ISO	646 IRV
       ESC 2/8 4/2		6	    33		 94	  G0 set of ASCII
       ESC 2/13	4/1	      100	    32		 96	  G1 set of ISO	8859-1
       ESC 2/1 4/0		1	     0		 32	  C0 set of ISO	646
       ESC 2/2 4/3	       77	     0		 32	  C1 set of ISO	6429
       ESC 2/5 2/15 3/0	       -	     0		256	  the system character set

       When one	of the G0 sets is used as a base set, the characters SPACE and
       DELETE are treated as occurring at positions 32 and  127	 respectively;
       although	these characters are not part of the character sets designated
       by the escape sequences,	this mimics the	behaviour of ISO 2022 with re-
       spect to	these code positions.

   Output format
       The  output is a	series of lines.  Lines	can be arbitrarily long.  Each
       line consists of	an initial command character and  one  or  more	 argu-
       ments.	Arguments  are separated by a single space, but	when a command
       takes a fixed number of arguments the last argument can contain spaces.
       There is	no space between the command character and the first argument.
       Arguments can contain the following escape sequences.

       \\     A	\.

       \n     A	record end character.

       \|     Internal SDATA entities are bracketed by these.

       \nnn   The character whose code is nnn octal.

       A record	start character	will be	represented by	\012.	Most  applica-
       tions will need to ignore \012 and translate \n into newline.

       The possible command characters and arguments are as follows:

       (gi    The start	of an element whose generic identifier is gi.  Any at-
	      tributes for this	element	will have been specified with  A  com-
	      mands.

       )gi    The end an element whose generic identifier is gi.

       -data  Data.

       &name  A	reference to an	external data entity name; name	will have been
	      defined using an E command.

       ?pi    A	processing instruction with data pi.

       Aname val
	      The next element to start	has an attribute name with  value  val
	      which takes one of the following forms:

	      IMPLIED
		     The value of the attribute	is implied.

	      CDATA data
		     The  attribute  is	 character data.  This is used for at-
		     tributes whose declared value is CDATA.

	      NOTATION nname
		     The attribute is a	notation name; nname  will  have  been
		     defined  using  a N command.  This	is used	for attributes
		     whose declared value is NOTATION.

	      ENTITY name...
		     The attribute is a	list of	general	 entity	 names.	  Each
		     entity  name  will	 have  been defined using an I,	E or S
		     command.  This is	used  for  attributes  whose  declared
		     value is ENTITY or	ENTITIES.

	      TOKEN token...
		     The  attribute is a list of tokens.  This is used for at-
		     tributes whose declared value is anything else.

       Dename name val
	      This is the same as the A	command, except	that  it  specifies  a
	      data  attribute  for an external entity named ename.  Any	D com-
	      mands will come after the	E command that defines the  entity  to
	      which  they apply, but before any	& or A commands	that reference
	      the entity.

       Nnname nname.  Define a notation	This command will be preceded by  a  p
	      command  if  the notation	was declared with a public identifier,
	      and by a s command if the	notation was declared  with  a	system
	      identifier.  A notation will only	be defined if it is to be ref-
	      erenced in an E command or in an A command for an	attribute with
	      a	declared value of NOTATION.

       Eename typ nname
	      Define an	external data entity named ename with type typ (CDATA,
	      NDATA or SDATA) and notation not.	 This command will be preceded
	      by  one or more f	commands giving	the filenames generated	by the
	      entity manager from the system and public	identifiers,  by  a  p
	      command  if a public identifier was declared for the entity, and
	      by a s command if	a system identifier was	declared for  the  en-
	      tity.   not  will	have been defined using	a N command.  Data at-
	      tributes may be specified	for the	entity using D	commands.   An
	      external	data entity will only be defined if it is to be	refer-
	      enced in a & command or in an A command for an  attribute	 whose
	      declared value is	ENTITY or ENTITIES.

       Iename typ text
	      Define  an internal data entity named ename with type typ	(CDATA
	      or SDATA)	and entity text	text.  An internal  data  entity  will
	      only  be	defined	if it is referenced in an A command for	an at-
	      tribute whose declared value is ENTITY or	ENTITIES.

       Sename Define a subdocument entity named	ename.	This command  will  be
	      preceded	by  one	or more	f commands giving the filenames	gener-
	      ated by the entity manager from the system  and  public  identi-
	      fiers,  by  a  p command if a public identifier was declared for
	      the entity, and by a s command if	a system  identifier  was  de-
	      clared  for  the	entity.	 A subdocument entity will only	be de-
	      fined if it is referenced	in a { command or in an	A command  for
	      an attribute whose declared value	is ENTITY or ENTITIES.

       ssysid This command applies to the next E, S or N command and specifies
	      the associated system identifier.

       ppubid This command applies to the next E, S or N command and specifies
	      the associated public identifier.

       ffilename
	      This command applies to the next E or S command and specifies an
	      associated filename.  There will be more than one	f command  for
	      a	single E or S command if the system identifier used a colon.

       {ename The start	of the	subdocument entity ename; ename	will have been
	      defined using a S	command.

       }ename The end of the  subdocument entity ename.

       Llineno file
       Llineno
	      Set the current line number and filename.	 The filename argument
	      will  be omitted if only the line	number has changed.  This will
	      be output	only if	the -l option has been given.

       #text  An APPINFO parameter of text was specified in the	  declaration.
	      This  is	not  strictly  part  of	the ESIS, but a	structure-con-
	      trolled application is permitted to act on  it.	No  #  command
	      will  be output if APPINFO NONE was specified.  A	# command will
	      occur at most once, and may be preceded only by a	single L  com-
	      mand.

       C      This command indicates that the document was a conforming	 docu-
	      ment.  If	this command is	output,	it will	be the	last  command.
	      An   document  is	 not conforming	if it references a subdocument
	      entity that is not conforming.

BUGS
       Some non-SGML characters	in literals are	counted	as two characters  for
       the purposes of quantity	and capacity calculations.

SEE ALSO
       The  Handbook, Charles F. Goldfarb
       ISO  8879 (Standard Generalized Markup Language), International Organi-
       zation for Standardization

ORIGIN
       ARCSGML was written by Charles F. Goldfarb.

       Sgmls was derived from ARCSGML by James Clark (jjc@jclark.com), to whom
       bugs should be reported.

								      SGMLS(1)

NAME | SYNOPSIS | DESCRIPTION | BUGS | SEE ALSO | ORIGIN

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=sgmls&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help