Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Scanf(3)			   OCamldoc			      Scanf(3)

       Scanf - Formatted input functions.

       Module	Scanf

       Module Scanf
	: sig end

       Formatted input functions.

       === Introduction	===

       === Functional input with format	strings	===

       ===  The	 module	 Scanf provides	formatted input	functions or scanners.
       The formatted input functions can read from any kind of input,  includ-
       ing  strings,  files,  or anything that can return characters. The more
       general source of characters is named a	formatted  input  channel  (or
       scanning	 buffer) and has type Scanf.Scanning.in_channel. The more gen-
       eral formatted input function reads from	any  scanning  buffer  and  is
       named bscanf.  Generally	speaking, the formatted	input functions	have 3
       arguments: - the	first argument is a source of characters for  the  in-
       put, - the second argument is a format string that specifies the	values
       to read,	- the third argument is	a receiver function that is applied to
       the values read.	 Hence,	a typical call to the formatted	input function
       Scanf.bscanf is bscanf ic fmt f,	where: - ic is a source	of  characters
       (typically  a formatted input channel with type Scanf.Scanning.in_chan-
       nel), - fmt is a	format string (the same	format strings as  those  used
       to print	material with module Printf or Format),	- f is a function that
       has as many arguments as	the number of values to	read in	the input  ac-
       cording to fmt.	===

       === A simple example ===

       ===  As	suggested above, the expression	bscanf ic %d f reads a decimal
       integer n from the source of characters ic and returns f	 n.   For  in-
       stance,	-  if  we  use	stdin as the source of characters (Scanf.Scan-
       ning.stdin is the predefined formatted input channel  that  reads  from
       standard	input),	- if we	define the receiver f as let f x = x + 1, then
       bscanf Scanning.stdin %d	f reads	an integer n from the  standard	 input
       and  returns  f n (that is n + 1). Thus,	if we evaluate bscanf stdin %d
       f, and then enter 41 at the keyboard, the result	we get is 42. ===

       === Formatted input as a	functional feature ===

       === The OCaml scanning facility is reminiscent of the  corresponding  C
       feature.	  However, it is also largely different, simpler, and yet more
       powerful: the formatted input functions	are  higher-order  functionals
       and the parameter passing mechanism is just the regular function	appli-
       cation not the variable assignment based	mechanism which	is typical for
       formatted  input	in imperative languages; the OCaml format strings also
       feature useful additions	to easily define complex tokens;  as  expected
       within a	functional programming language, the formatted input functions
       also support polymorphism, in  particular  arbitrary  interaction  with
       polymorphic user-defined	scanners. Furthermore, the OCaml formatted in-
       put facility is fully type-checked at compile time. ===

       === Formatted input channel ===

       module Scanning : sig end

       === Type	of formatted input functions ===

       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
       'd, 'd) Pervasives.format6 -> 'c

       The  type  of formatted input scanners: ('a, 'b,	'c, 'd)	scanner	is the
       type of a formatted input function that reads from some formatted input
       channel	according  to  some  format string; more precisely, if scan is
       some formatted input function, then scan	ic fmt f applies f to all  the
       arguments specified by format string fmt	, when scan has	read those ar-
       guments from the	Scanf.Scanning.in_channel formatted input channel ic .

       For instance, the Scanf.scanf function below has	type ('a, 'b, 'c,  'd)
       scanner	,  since  it  is  a  formatted	input function that reads from
       Scanf.Scanning.stdin : scanf fmt	f applies f to the arguments specified
       by fmt ,	reading	those arguments	from !Pervasives.stdin as expected.

       If  the format fmt has some %r indications, the corresponding formatted
       input functions must be provided	before receiver	function f .  For  in-
       stance,	if  read_elem is an input function for values of type t	, then
       bscanf ic %r; read_elem f reads a value v of type t followed by	a  ';'
       character, and returns f	v .

       Since 3.10.0

       exception Scan_failure of string

       When  the input can not be read according to the	format string specifi-
       cation, formatted input functions typically raise exception  Scan_fail-
       ure .

       === The general formatted input function	===

       val bscanf : Scanning.in_channel	-> ('a,	'b, 'c,	'd) scanner

       ===  bscanf  ic	fmt  r1	... rN f reads characters from the Scanf.Scan-
       ning.in_channel formatted input channel ic and converts them to	values
       according  to  format string fmt.  As a final step, receiver function f
       is applied to the values	read and gives the result of the bscanf	 call.
       For  instance, if f is the function fun s i -> i	+ 1, then Scanf.sscanf
       x= 1 %s = %i f returns 2.  Arguments r1 to rN  are  user-defined	 input
       functions  that	read  the argument corresponding to the	%r conversions
       specified in the	format string. ===

       === Format string description ===

       === The format string is	a character string which contains three	 types
       of objects: - plain characters, which are simply	matched	with the char-
       acters of the input (with a special case	for space and line  feed,  see, - conversion specifications, each of which	causes reading
       and conversion of one argument for the function	f  (see	 Scanf.conver-
       sion),  -  scanning  indications	 to  specify boundaries	of tokens (see
       scanning	Scanf.indication).  ===

       === The space character in format strings ===

       === As mentioned	above, a plain character in the	format string is  just
       matched	with  the next character of the	input; however,	two characters
       are special exceptions to this rule: the	space character	(' ' or	 ASCII
       code  32) and the line feed character ('\n' or ASCII code 10).  A space
       does not	match a	single space character,	but any	amount of 'whitespace'
       in  the input. More precisely, a	space inside the format	string matches
       any number of tab, space, line feed  and	 carriage  return  characters.
       Similarly,  a line feed character in the	format string matches either a
       single line feed	or a carriage return followed by a line	feed.	Match-
       ing any amount of whitespace, a space in	the format string also matches
       no amount of whitespace at all; hence, the call bscanf ib Price = %d  $
       (fun  p -> p) succeeds and returns 1 when reading an input with various
       whitespace in it, such as Price = 1 $, Price = 1	$, or  even  Price=1$.

       === Conversion specifications in	format strings ===

       ===  Conversion	specifications consist in the %	character, followed by
       an optional flag, an optional field width, and followed by one  or  two
       conversion  characters.	 The  conversion characters and	their meanings
       are: - d: reads an optionally signed  decimal  integer  (0-9+).	 -  i:
       reads an	optionally signed integer (usual input conventions for decimal
       (0-9+), hexadecimal (0x[0-9a-f]+	and  0X[0-9A-F]+),  octal  (0o[0-7]+),
       and  binary  (0b[0-1]+)	notations  are understood).  - u: reads	an un-
       signed decimal integer.	- x or X: reads	an unsigned hexadecimal	 inte-
       ger  ([0-9a-fA-F]+).  - o: reads	an unsigned octal integer ([0-7]+).  -
       s: reads	a string argument that spreads as much as possible, until  the
       following  bounding condition holds: - a	whitespace has been found (see, - a scanning indication (see  scanning  Scanf.indication)
       has been	encountered, - the end-of-input	has been reached.  Hence, this
       conversion always succeeds: it returns an empty string if the  bounding
       condition  holds	 when  the scan	begins.	 - S: reads a delimited	string
       argument	(delimiters and	special	escaped	characters follow the  lexical
       conventions of OCaml).  - c: reads a single character. To test the cur-
       rent input character without reading it,	specify	a  null	 field	width,
       i.e.  use specification %0c. Raise Invalid_argument, if the field width
       specification is	greater	than 1.	 - C: reads a single delimited charac-
       ter  (delimiters	and special escaped characters follow the lexical con-
       ventions	of OCaml).  - f, e, E, g, G: reads an optionally signed	float-
       ing-point number	in decimal notation, in	the style dddd.ddd e/E+-dd.  -
       h, H: reads an optionally signed	floating-point number  in  hexadecimal
       notation.   - F:	reads a	floating point number according	to the lexical
       conventions of OCaml (hence the decimal point is	mandatory if the expo-
       nent  part  is  not mentioned).	- B: reads a boolean argument (true or
       false).	- b: reads a boolean argument (for backward compatibility;  do
       not use in new programs).  - ld,	li, lu,	lx, lX,	lo: reads an int32 ar-
       gument to the format specified by the second letter for	regular	 inte-
       gers.  -	nd, ni,	nu, nx,	nX, no:	reads a	nativeint argument to the for-
       mat specified by	the second letter for regular integers.	 - Ld, Li, Lu,
       Lx, LX, Lo: reads an int64 argument to the format specified by the sec-
       ond letter for regular integers.	 - [ range ]:  reads  characters  that
       matches	one  of	 the  characters  mentioned in the range of characters
       range (or not mentioned in it, if the range starts  with	 ^).  Reads  a
       string  that  can  be empty, if the next	input character	does not match
       the range. The set of characters	from c1	to c2 (inclusively) is denoted
       by c1-c2.  Hence, %[0-9]	returns	a string representing a	decimal	number
       or an empty string if no	decimal	digit is found;	 similarly,  %[0-9a-f]
       returns	a  string of hexadecimal digits.  If a closing bracket appears
       in a range, it must occur as the	first character	of the range (or  just
       after the ^ in case of range negation); hence []] matches a ] character
       and [^]]	matches	any character that is not ].  Use %% and %@ to include
       a  %  or	 a  @ in a range.  - r:	user-defined reader. Takes the next ri
       formatted input function	and applies it to the scanning	buffer	ib  to
       read  the next argument.	The input function ri must therefore have type
       Scanning.in_channel -> 'a and the argument read has type	'a.  -	{  fmt
       %}:  reads  a  format string argument. The format string	read must have
       the same	type as	the format string specification	fmt. For instance,  %{
       %i %} reads any format string that can read a value of type int;	hence,
       if s is the string fmt:\	number is  %u\"",  then	 Scanf.sscanf  s  fmt:
       %{%i%}  succeeds	 and returns the format	string number is %u .  - ( fmt
       %): scanning sub-format substitution.  Reads a format string rf in  the
       input, then goes	on scanning with rf instead of scanning	with fmt.  The
       format string rf	must have the same type	as the format string  specifi-
       cation  fmt  that it replaces.  For instance, %(	%i %) reads any	format
       string that can read a value of type int.  The conversion  returns  the
       format  string read rf, and then	a value	read using rf.	Hence, if s is
       the string \ %4d\"1234.00", then	Scanf.sscanf s %(%i%) (fun  fmt	 i  ->
       fmt,  i)	evaluates to ("%4d", 1234).  This behaviour is not mere	format
       substitution, since the conversion returns the format  string  read  as
       additional  argument. If	you need pure format substitution, use special
       flag _ to discard the extraneous	argument: conversion %_( fmt %)	 reads
       a  format  string  rf  and  then	 behaves the same as format string rf.
       Hence, if s is the string \ %4d\"1234.00", then Scanf.sscanf s  %_(%i%)
       is  simply  equivalent  to Scanf.sscanf 1234.00 %4d .  -	l: returns the
       number of lines read so far.  - n: returns  the	number	of  characters
       read so far.  - N or L: returns the number of tokens read so far.  - !:
       matches the end of input	condition.  - %: matches one  %	 character  in
       the input.  - @:	matches	one @ character	in the input.  - ,: does noth-
       ing.  Following the % character that introduces a conversion, there may
       be the special flag _: the conversion that follows occurs as usual, but
       the resulting value is discarded.  For instance,	if f is	 the  function
       fun  i  -> i + 1, and s is the string x = 1 , then Scanf.sscanf s %_s =
       %i f returns 2.	The field width	is composed  of	 an  optional  integer
       literal	indicating  the	 maximal  width	of the token to	read.  For in-
       stance, %6d reads an integer, having at	most  6	 decimal  digits;  %4f
       reads  a	float with at most 4 characters; and %8[\000-\255] returns the
       next 8 characters (or all the characters	still available, if fewer than
       8 characters are	available in the input).  Notes: - as mentioned	above,
       a %s conversion always succeeds,	even if	there is nothing  to  read  in
       the  input:  in	this  case, it simply returns  .  - in addition	to the
       relevant	digits,	'_' characters may appear inside numbers (this is rem-
       iniscent	 to the	usual OCaml lexical conventions). If stricter scanning
       is desired, use the range conversion facility  instead  of  the	number
       conversions.  - the scanf facility is not intended for heavy duty lexi-
       cal analysis and	parsing. If it appears not expressive enough for  your
       needs,  several	alternative  exists: regular expressions (module Str),
       stream parsers, ocamllex-generated lexers, ocamlyacc-generated parsers.

       === Scanning indications	in format strings ===

       ===  Scanning  indications  appear just after the string	conversions %s
       and %[ range ] to delimit the end of the	token. A  scanning  indication
       is  introduced by a @ character,	followed by some plain character c. It
       means that the string token should end just before the next matching  c
       (which  is skipped). If no c character is encountered, the string token
       spreads as much as possible. For	instance, %s@\t	reads a	string	up  to
       the next	tab character or to the	end of input. If a @ character appears
       anywhere	else in	the format string, it is treated as a plain character.
       Note:  -	As usual in format strings, % and @ characters must be escaped
       using %%	and %@;	this rule still	holds within range specifications  and
       scanning	 indications.  For instance, format %s@%% reads	a string up to
       the next	% character, and format	%s@%@ reads a string up	to the next @.
       -  The  scanning	indications introduce slight differences in the	syntax
       of Scanf	format strings,	compared to those used for the Printf  module.
       However,	the scanning indications are similar to	those used in the For-
       mat module; hence, when producing  formatted  text  to  be  scanned  by
       Scanf.bscanf, it	is wise	to use printing	functions from the Format mod-
       ule (or,	if you need to use functions from Printf, banish or  carefully
       double check the	format strings that contain '@'	characters).  ===

       === Exceptions during scanning ===

       ===  Scanners  may raise	the following exceptions when the input	cannot
       be read according to the	format string: - Raise	Scanf.Scan_failure  if
       the  input  does	not match the format.  - Raise Failure if a conversion
       to a number is not possible.  - Raise End_of_file if the	end  of	 input
       is  encountered	while some more	characters are needed to read the cur-
       rent conversion specification.  - Raise Invalid_argument	if the	format
       string  is invalid.  Note: - as a consequence, scanning a %s conversion
       never raises exception End_of_file: if the end of input is reached  the
       conversion  succeeds  and simply	returns	the characters read so far, or
       if none were ever read.	===

       === Specialised formatted input functions ===

       val sscanf : string -> ('a, 'b, 'c, 'd) scanner

       Same as Scanf.bscanf , but reads	from the given string.

       val scanf : ('a,	'b, 'c,	'd) scanner

       Same as Scanf.bscanf , but reads	from the  predefined  formatted	 input
       channel Scanf.Scanning.stdin that is connected to Pervasives.stdin .

       val  kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn	-> 'd)
       -> ('a, 'b, 'c, 'd) scanner

       Same as Scanf.bscanf , but takes	an  additional	function  argument  ef
       that  is	 called	in case	of error: if the scanning process or some con-
       version fails, the scanning function aborts and calls  the  error  han-
       dling  function	ef  with the formatted input channel and the exception
       that aborted the	scanning process as arguments.

       val ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a,  'b,
       'c, 'd) scanner

       Same as Scanf.kscanf but	reads from the given string.

       Since 4.02.0

       === Reading format strings from input ===

       val  bscanf_format  :  Scanning.in_channel  -> ('a, 'b, 'c, 'd, 'e, 'f)
       Pervasives.format6 -> (('a, 'b, 'c, 'd, 'e, 'f)	Pervasives.format6  ->
       'g) -> 'g

       bscanf_format  ic  fmt f	reads a	format string token from the formatted
       input channel ic	, according to the given format	string fmt ,  and  ap-
       plies f to the resulting	format string value.  Raise Scanf.Scan_failure
       if the format string value read does not	have the same type as fmt .

       Since 3.09.0

       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f)  Pervasives.for-
       mat6 -> (('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6 -> 'g) -> 'g

       Same as Scanf.bscanf_format , but reads from the	given string.

       Since 3.09.0

       val  format_from_string	:  string  ->  ('a, 'b,	'c, 'd,	'e, 'f)	Perva-
       sives.format6 ->	('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6

       format_from_string s fmt	converts a string argument to a	format string,
       according to the	given format string fmt	.  Raise Scanf.Scan_failure if
       s , considered as a format string, does not have	the same type as fmt .

       Since 3.10.0

       val unescaped : string -> string

       unescaped s return a copy of s with escape sequences (according to  the
       lexical	conventions  of	OCaml) replaced	by their corresponding special
       characters.  More precisely, Scanf.unescaped has	 the  following	 prop-
       erty: for all string s ,	Scanf.unescaped	(String.escaped	s) = s .

       Always  return  a  copy of the argument,	even if	there is no escape se-
       quence in the argument.	Raise Scanf.Scan_failure if s is not  properly
       escaped	(i.e.	s  has	invalid	escape sequences or special characters
       that are	not properly escaped).	For instance, String.unescaped \" will

       Since 4.00.0

       === Deprecated ===

       val fscanf : Pervasives.in_channel -> ('a, 'b, 'c, 'd) scanner


       Scanf.fscanf is error prone and deprecated since	4.03.0.

       This  function violates the following invariant of the Scanf module: To
       preserve	scanning semantics, all	scanning functions  defined  in	 Scanf
       must read from a	user defined Scanf.Scanning.in_channel formatted input

       If you need to read from	a Pervasives.in_channel	 input	channel	 ic  ,
       simply define a Scanf.Scanning.in_channel formatted input channel as in
       let ib =	Scanning.from_channel ic , then	use Scanf.bscanf ib as usual.

       val kfscanf : Pervasives.in_channel -> (Scanning.in_channel ->  exn  ->
       'd) -> ('a, 'b, 'c, 'd) scanner


       Scanf.kfscanf is	error prone and	deprecated since 4.03.0.

2022-03-28			    source:			      Scanf(3)

NAME | Module | Documentation

Want to link to this manual page? Use this URL:

home | help