Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TAGS(5)				Universal Ctags			       TAGS(5)

NAME
       tags - Vi tags file format extended in ctags projects

DESCRIPTION
       The  contents  of  next	section	 is a copy of FORMAT file in Exuberant
       Ctags source code in its	subversion repository at sourceforge.net.

       Exceptions introduced in	Universal Ctags	are explained inline with "EX-
       CEPTION"	marker.

					----

PROPOSAL FOR EXTENDED VI TAGS FILE FORMAT
       Version:	0.06 DRAFT
       Date: 1998 Feb 8
       Author: Bram Moolenaar <Bram at vim.org>	and Darren Hiebert <dhiebert at	users.sourceforge.net>

   Introduction
       The  file format	for the	"tags" file, as	used by	Vi and many of its de-
       scendants, has limited capabilities.

       This additional functionality is	desired:

       1. Static or local tags.	 The scope of these tags  is  the  file	 where
	  they are defined.  The same tag can appear in	several	files, without
	  really being a duplicate.

       2. Duplicate tags.  Allow the same tag to occur more then  once.	  They
	  can be located in a different	file and/or have a different command.

       3. Support  for C++.  A tag is not only specified by its	name, but also
	  by the context (the class name).

       4. Future extension.  When even more additional	functionality  is  de-
	  sired,  it must be possible to add this later, without breaking pro-
	  grams	that don't support it.

   From	proposal to standard
       To make this proposal into a standard for tags files, it	 needs	to  be
       supported  by most people working on versions of	Vi, ctags, etc..  Cur-
       rently this standard is supported by:

       Darren Hiebert <dhiebert	at users.sourceforge.net>
	      Exuberant	Ctags

       Bram Moolenaar <Bram at vim.org>
	      Vim (Vi IMproved)

       These have been or will be asked	to support this	standard:

       Nvi    Keith Bostic <bostic at bsdi.com>

       Vile   Tom E. Dickey <dickey at clark.net>

       NEdit  Mark Edel	<edel at ltx.com>

       CRiSP  Paul Fox <fox at crisp.demon.co.uk>

       Lemmy  James Iuliano <jai at accessone.com>

       Zeus   Jussi Jumppanen <jussij at ca.com.au>

       Elvis  Steve Kirkendall <kirkenda at cs.pdx.edu>

       FTE    Marko Macek <Marko.Macek at snet.fri.uni-lj.si>

   Backwards compatibility
       A tags file that	is generated in	the new	format should still be	usable
       by Vi.  This makes it possible to distribute tags files that are	usable
       by all versions and descendants of Vi.

       This restricts the format to what Vi can	handle.	 The format is:

       1. The tags file	is a list of lines, each line in the format:

	     {tagname}<Tab>{tagfile}<Tab>{tagaddress}

	  {tagname}
		 Any identifier, not containing	white space..

		 EXCEPTION: Universal Ctags violates this  item	 of  the  pro-
		 posal;	 tagname may contain spaces. However, tabs are not al-
		 lowed.

	  <Tab>	 Exactly one TAB character (although many versions of  Vi  can
		 handle	any amount of white space).

	  {tagfile}
		 The  name of the file where {tagname} is defined, relative to
		 the current directory (or location of the tags	file?).

	  {tagaddress}
		 Any Ex	command.  When executed, it behaves like  'magic'  was
		 not set.

       2. The  tags  file  is  sorted  on {tagname}.  This allows for a	binary
	  search in the	file.

       3. Duplicate tags are allowed, but which	one is actually	used is	unpre-
	  dictable (because of the binary search).

       The  best  way to add extra text	to the line for	the new	functionality,
       without breaking	it for Vi, is to put a comment	in  the	 {tagaddress}.
       This  gives  the	freedom	to use any text, and should work in any	tradi-
       tional Vi implementation.

       For example, when the old tags file contains:

	  main	  main.c  /^main(argc, argv)$/
	  DEBUG	  defines.c	  89

       The new lines can be:

	  main	  main.c  /^main(argc, argv)$/;"any additional text
	  DEBUG	  defines.c	  89;"any additional text

       Note that the ';' is required to	put the	cursor in the right line,  and
       then the	'"' is recognized as the start of a comment.

       For  Posix  compliant Vi	versions this will NOT work, since only	a line
       number or a search command is recognized.  I  hope  Posix  can  be  ad-
       justed.	Nvi suffers from this.

   Security
       Vi  allows  the use of any Ex command in	a tags file.  This has the po-
       tential of a trojan horse security leak.

       The proposal is to allow	only Ex	commands that position the cursor in a
       single  file.   Other commands, like editing another file, quitting the
       editor, changing	a file or writing a file,  are	not  allowed.	It  is
       therefore logical to call the command a tagaddress.

       Specifically, these two Ex commands are allowed:

       o A decimal line	number:

	    89

       o A search command.  It is a regular expression pattern,	as used	by Vi,
	 enclosed in //	or ??:

	    /^int c;$/
	    ?main()?

       There are two combinations possible:

       o Concatenation of the above, with ';' in between.  The meaning is that
	 the  first line number	or search command is used, the cursor is posi-
	 tioned	in that	line, and then the second search command  is  used  (a
	 line  number  would not be useful).  This can be done multiple	times.
	 This is useful	when the information in	a single line is  not  unique,
	 and the search	needs to start in a specified line.

	    /struct xyz	{/;/int	count;/
	    389;/struct	foo/;/char *s;/

       o A  trailing comment can be added, starting with ';"' (two characters:
	 semi-colon and	double-quote).	This is	used below.

	    89;" foo bar

       This might be extended in the future.  What is currently	missing	 is  a
       way to position the cursor in a certain column.

   Goals
       Now  the	usage of the comment text has to be defined.  The following is
       aimed at:

       1. Keep the text	short, because:

	  o The	line length that Vi can	handle is limited to 512 characters.

	  o Tags files can contain thousands of	tags.  I have seen tags	 files
	    of several Mbytes.

	  o More text makes searching slower.

       2. Keep the text	readable, because:

	  o It is often	necessary to check the output of a new ctags program.

	  o Be able to edit the	file by	hand.

	  o Make it easier to write a program to produce or parse the file.

       3. Don't	use special characters,	because:

	  o It	should	be  possible to	treat a	tags file like any normal text
	    file.

   Proposal
       Use a comment after the {tagaddress} field.  The	format would be:

	  {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]

       {tagname}
	      Any identifier, not containing white space..

	      EXCEPTION: Universal Ctags violates this item of	the  proposal;
	      name may contain spaces. However,	tabs are not allowed.  Conver-
	      sion, for	some characters	including <Tab>	in  the	 "value",  ex-
	      plained in the last of this section is applied.

       <Tab>  Exactly one TAB character	(although many versions	of Vi can han-
	      dle any amount of	white space).

       {tagfile}
	      The name of the file where {tagname} is defined, relative	to the
	      current directory	(or location of	the tags file?).

       {tagaddress}
	      Any  Ex command.	When executed, it behaves like 'magic' was not
	      set.  It may be restricted to a line number or a search  pattern
	      (Posix).

       Optionally:

       ;"     semicolon	 +  doublequote: Ends the tagaddress in	way that looks
	      like the start of	a comment to Vi.

       {tagfield}
	      See below.

       A tagfield has a	name, a	colon, and a value: "name:value".

       o The name consist only out  of	alphabetical  characters.   Upper  and
	 lower	case  are  allowed.   Lower case is recommended.  Case matters
	 ("kind:" and "Kind: are different tagfields).

	 EXCEPTION: Universal Ctags allows users to use	a numerical  character
	 in the	name other than	its initial letter.

       o The value may be empty.  It cannot contain a <Tab>.

	 o When	a value	contains a \t, this stands for a <Tab>.

	 o When	a value	contains a \r, this stands for a <CR>.

	 o When	a value	contains a \n, this stands for a <NL>.

	 o When	a value	contains a \\, this stands for a single	\ character.

	 Other	use  of	 the backslash character is reserved for future	expan-
	 sion.	Warning: When a	tagfield value holds an	MS-DOS file name,  the
	 backslashes must be doubled!

	 EXCEPTION: Universal Ctags introduces more conversion rules.

	 o When	a value	contains a \a, this stands for a <BEL> (0x07).

	 o When	a value	contains a \b, this stands for a <BS> (0x08).

	 o When	a value	contains a \v, this stands for a <VT> (0x0b).

	 o When	a value	contains a \f, this stands for a <FF> (0x0c).

	 o The	characters  in	range 0x01 to 0x1F included, and 0x7F are con-
	   verted to \x	prefixed hexadecimal number if the characters are  not
	   handled in the above	"value"	rules.

	 o The leading space (0x20) and	! (0x21) in {tagname} are converted to
	   \x prefixed hexadecimal number (\x20	and \x21) if the tag is	not  a
	   pseudo-tag.	As  described later, a pseudo-tag starts with !. These
	   rules are for distinguishing	pseudo-tags and	non pseudo-tags	(regu-
	   lar tags) when tags lines in	a tag file are sorted.

       Proposed	tagfield names:

		      +-----------+----------------------------+
		      |FIELD-NAME | DESCRIPTION		       |
		      +-----------+----------------------------+
		      |arity	  | Number  of arguments for a |
		      |		  | function tag.	       |
		      +-----------+----------------------------+
		      |class	  | Name  of  the  class   for |
		      |		  | which this tag is a	member |
		      |		  | or method.		       |
		      +-----------+----------------------------+
		      |enum	  | Name of the	enumeration in |
		      |		  | which  this	tag is an enu- |
		      |		  | merator.		       |
		      +-----------+----------------------------+
		      |file	  | Static (local) tag,	with a |
		      |		  | scope   of	the  specified |
		      |		  | file.  When	the  value  is |
		      |		  | empty, {tagfile} is	used.  |
		      +-----------+----------------------------+
		      |function	  | Function in	which this tag |
		      |		  | is	defined.   Useful  for |
		      |		  | local variables (and func- |
		      |		  | tions).   When   functions |
		      |		  | nest  (e.g.,  in  Pascal), |
		      |		  | the	 function  names   are |
		      |		  | concatenated,    separated |
		      |		  | with '/', so it looks like |
		      |		  | a path.		       |
		      +-----------+----------------------------+

		      |kind	  | Kind  of  tag.   The value |
		      |		  | depends on	the  language. |
		      |		  | For	 C and C++ these kinds |
		      |		  | are	recommended:	       |
		      |		  |			       |
		      |		  |	   c	  class	name   |
		      |		  |			       |
		      |		  |	   d	  define (from |
		      |		  |		  #define XXX) |
		      |		  |			       |
		      |		  |	   e	  enumerator   |
		      |		  |			       |
		      |		  |	   f	  function  or |
		      |		  |		  method name  |
		      |		  |			       |
		      |		  |	   F	  file name    |
		      |		  |			       |
		      |		  |	   g	  enumeration  |
		      |		  |		  name	       |
		      |		  |			       |
		      |		  |	   m	  member   (of |
		      |		  |		  structure or |
		      |		  |		  class	data)  |
		      |		  |			       |
		      |		  |	   p	  function     |
		      |		  |		  prototype    |
		      |		  |			       |
		      |		  |	   s	  structure    |
		      |		  |		  name	       |
		      |		  |			       |
		      |		  |	   t	  typedef      |
		      |		  |			       |
		      |		  |	   u	  union	name   |
		      |		  |			       |
		      |		  |	   v	  variable     |
		      |		  |			       |
		      |		  |	   When	 this field is |
		      |		  |	   omitted,  the  kind |
		      |		  |	   of	tag  is	 unde- |
		      |		  |	   fined.	       |
		      +-----------+----------------------------+
		      |struct	  | Name  of  the  struct   in |
		      |		  | which  this	 tag is	a mem- |
		      |		  | ber.		       |
		      +-----------+----------------------------+
		      |union	  | Name of the	union in which |
		      |		  | this tag is	a member.      |
		      +-----------+----------------------------+

       Note that these are mostly for C	and C++.  When tags programs are writ-
       ten for other languages,	this list should be extended  to  include  the
       used  field  names.  This will help users to be independent of the tags
       program used.

       Examples:

	  asdf	  sub.cc  /^asdf()$/;"	  new_field:some\svalue	  file:
	  foo_t	  sub.h	  /^typedef foo_t$/;"	  kind:t
	  func3	  sub.p	  /^func3()$/;"	  function:/func1/func2	  file:
	  getflag sub.c	  /^getflag(arg)$/;"	  kind:f  file:
	  inc	  sub.cc  /^inc()$/;"	  file:	class:PipeBuf

       The name	of the "kind:" field can be omitted.  This is  to  reduce  the
       size  of	 the  tags file	by about 15%.  A program reading the tags file
       can recognize the "kind:" field by the missing ':'.  Examples:

	  foo_t	  sub.h	  /^typedef foo_t$/;"	  t
	  getflag sub.c	  /^getflag(arg)$/;"	  f	  file:

       Additional remarks:

       o When a	tagfield appears twice in a tag	line, only  the	 last  one  is
	 used.

       Note about line separators:

       Vi  traditionally  runs	on Unix	systems, where the line	separator is a
       single linefeed character  <NL>.	  On  MS-DOS  and  compatible  systems
       <CR><NL>	is the standard	line separator.	 To increase portability, this
       line separator is also supported.

       On the Macintosh	a single <CR> is used for line separator.   Supporting
       this  on	Unix systems causes problems, because most fgets() implementa-
       tion don't see the <CR> as a line separator.  Therefore the support for
       a <CR> as line separator	is limited to the Macintosh.

       Summary:

		+---------------+--------------+---------------------+
		|line separator	| generated on | accepted on	     |
		+---------------+--------------+---------------------+
		|<LF>		| Unix	       | Unix,	MS-DOS,	Mac- |
		|		|	       | intosh		     |
		+---------------+--------------+---------------------+
		|<CR>		| Macintosh    | Macintosh	     |
		+---------------+--------------+---------------------+
		|<CR><LF>	| MS-DOS       | Unix, MS-DOS,	Mac- |
		|		|	       | intosh		     |
		+---------------+--------------+---------------------+

       The characters <CR> and <LF> cannot be used inside a tag	line.  This is
       not mentioned elsewhere (because	it's obvious).

       Note about white	space:

       Vi allowed any white space to separate the tagname  from	 the  tagfile,
       and  the	 filename  from	the tagaddress.	 This would need to be allowed
       for backwards compatibility.  However, all known	programs that generate
       tags use	a single <Tab> to separate fields.

       There  is  a  problem for using file names with embedded	white space in
       the tagfile field.  To work around this,	the  same  special  characters
       could  be  used	as  in	the new	fields,	for example \s.	 But, unfortu-
       nately, in MS-DOS the backslash character  is  used  to	separate  file
       names.	The  file  name	 c:\vim\sap  contains  \s,  but	 this is not a
       <Space>.	 The number of backslashes could be doubled, but that will add
       a lot of	characters, and	make parsing the tags file slower and clumsy.

       To avoid	these problems,	we will	only allow a <Tab> to separate fields,
       and not support a file name or tagname that contains a <Tab> character.
       This  means  that  we are not 100% Vi compatible.  However, there is no
       known tags program that uses something else than	a  <Tab>  to  separate
       the  fields.  Only when a user typed the	tags file himself, or made his
       own program to generate a tags file, we could run  into	problems.   To
       solve  this, the	tags file should be filtered, to replace the arbitrary
       white space with	a single <Tab>.	 This Vi command can be	used:

	  :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/

       (replace	^I with	a real <Tab>).

       TAG FILE	INFORMATION:

       Pseudo-tag lines	can be used to encode information into	the  tag  file
       regarding  details  about its content (e.g. have	the tags been sorted?,
       are the optional	tagfields present?), and regarding the program used to
       generate	 the  tag file.	 This information can be used both to optimize
       use of the tag file (e.g.  enable/disable binary	searching) and provide
       general information (what version of the	generator was used).

       The names of the	tags used in these lines may be	suitably chosen	to en-
       sure that when sorted, they will	always be located near the first lines
       of the tag file.	 The use of "!_TAG_" is	recommended.  Note that	a rare
       tag like	"!"  can sort to before	these lines.  The program reading  the
       tags file should	be smart enough	to skip	over these tags.

       The  lines  described  below have been chosen to	convey a select	set of
       information.

       Tag lines providing information about the content of the	tag file:

	  !_TAG_FILE_FORMAT   {version-number}	      /optional	comment/
	  !_TAG_FILE_SORTED   {0|1}		      /0=unsorted, 1=sorted/

       The {version-number} used in the	tag  file  format  line	 reserves  the
       value  of  "1"  for tag files complying with the	original UNIX vi/ctags
       format, and reserves the	value "2" for tag files	 complying  with  this
       proposal.  This value may be used to determine if the extended features
       described in this proposal are present.

       Tag lines providing information about the program used to generate  the
       tag file, and provided solely for documentation purposes:

	  !_TAG_PROGRAM_AUTHOR	      {author-name}   /{email-address}/
	  !_TAG_PROGRAM_NAME  {program-name}  /optional	comment/
	  !_TAG_PROGRAM_URL   {URL}   /optional	comment/
	  !_TAG_PROGRAM_VERSION	      {version-id}    /optional	comment/

       EXCEPTION:  Universal  Ctags introduces more kinds of pseudo-tags.  See
       ctags-client-tools(7) about them.

					----

EXCEPTIONS IN UNIVERSAL	CTAGS
       Universal Ctags supports	this proposal with some	exceptions.

   Exceptions
       1. {tagname} in tags file generated by Universal	Ctags may contain spa-
	  ces and several escape sequences. Parsers for	documents like Tex and
	  reStructuredText, or liberal languages such as JavaScript need these
	  exceptions.  See {tagname} of	Proposal section for more detail about
	  the conversion.

       2. "name" part of {tagfield} in a tag generated by Universal Ctags  may
	  contain  numeric  characters,	 but the first character of the	"name"
	  must be alphabetic.

   Compatible output and weakness
       Default behavior	(--output-format=u-ctags option) has  the  exceptions.
       In  other hand, with --output-format=e-ctags option ctags has no	excep-
       tion; Universal Ctags command may use the same file format as Exuberant
       Ctags.  However,	 --output-format=e-ctags throws	away a tag entry which
       name includes a space or	a tab  character.  TAG_OUTPUT_MODE  pseudo-tag
       tells which format is used when ctags generating	tags file.

SEE ALSO
       ctags(1),   ctags-client-tools(7),   ctags-incompatibilities(7),	 read-
       tags(1)

2+								       TAGS(5)

NAME | DESCRIPTION | PROPOSAL FOR EXTENDED VI TAGS FILE FORMAT | EXCEPTIONS IN UNIVERSAL CTAGS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=utags&sektion=5&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help