Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Image::OCR::Tesseract(User Contributed Perl DocumentatImage::OCR::Tesseract(3)

NAME
       Image::OCR::Tesseract - read an image with tesseract ocr	and get	output

SYNOPSIS
	       use Image::OCR::Tesseract 'get_ocr';

	       my $image = './hi.jpg';

	       my $text	= get_ocr($image);

DESCRIPTION
       This is a wrapper for tesseract.	 Tesseract expects a tiff file,
       get_ocr() will convert to a temporary tiff.  If your file is not	a tiff
       file, that way you don't	have to	worry about your image format for ocr.

       Tesseract spits out a text file-	get_ocr() will erase that and return
       you the output.

SUBS
       No subs are exported by default.

   get_ocr()
       Argument	is abs path to image file. Can be most image formats.  Second
       argument	is optional, abs path to temp dir.  Third optional argument is
       optional, it is the -l language type argument to	tesseract.

       If you don't have write access to the directory the image resides on,
       you should provide as argument a	directory you do have write access to,
       this would be the second	argument.

       Returns text content as read by tesseract.

       Does not	clean up after itself if DEBUG is on.

       Warns if	no output.

       This takes care of converting to	the right image	format,	etc.  The
       original	image is unchanged.

   tesseract()
       First argument is abs path to tif file.	Second argument	is optional,
       it is the -l language type argument to tesseract.

       Will return text	output.	 If none inside	or tesseract fails, returns
       empty string.  If tesseract fails, warns.

   convert_8bpp_tif()
       Argument	is abs path to image file.  Optional argument is abs path to
       image out.  Returns abs path of image created. Uses 'convert', from
       ImageMagick.

	  my $img_non_tif = './img.jpg';
	  my $img_out	  = './img.tif';

	  my $out = convert_8bpp_tif( $img_non_tif );
	  my $out = convert_8bpp_tif( $img_non_tif, $img_out );

TESSERACT NOTES
       Tesseract is an open source ocr engine.	For an image to	be read	by
       tesseract properly, it must be an 8 bit per pixel tif format image
       file.  What this	module does is to create a temporary file from your
       target image, which will	be an 8	bit per	pixel image, it	then reads the
       output and returns it to	you as a string.

   INSTALLING TESSERACT
       Included	in this	package	is t/tesseract_install_helper.pl which will
       check for packages needed.

       Installing tesseract can	be tricky.  You	will basically need gcc-c++
       and automake installed on your system.  After you have automake and
       gcc-c++,	you should be able to install.

       SVN

       You may be able to simply install the SVN version of Tesseract by
       using:

	svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
	./runautoconf
	mkdir build-directory
	cd build-directory
	../configure
	make
	make install

       for more	see google project on ocr, they	use tesseract

GOCR
       Another great OCR engine	is gocr, but it	is not suited for the purpose
       of reading text from images.  gocr is great if you need to tweak	what
       you are reading,	and for	other specialized purposes.

       An example using	gocr as	engine is Finance::MICR::GOCR::Check.

SEE ALSO
       tesseract on google code.  gocr convert ImageMagick.

       ocr

CAVEATS
       This module is for POSIX	systems.  It is	not intended to	run on other
       "systems" and no	support	for such will be added in the future.
       Attempting to install on	an unsupported OS will throw an	exception.

DEBUG
       Set the debug flag on:	   $Image::OCR::Tesseract::DEBUG = 1;

       A temporary file	is created, if DEBUG is	on, the	file is	not deleted,
       the file	path is	printed	to STDERR.

AUTHOR
       Leo Charre leocharre at cpan dot	org

   THANKS
       Daniel Beuchler - patches.

COPYRIGHT
       Copyright (c) 2009 Leo Charre. All rights reserved.

LICENSE
       This package is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself, i.e., under	the terms of the
       "Artistic License" or the "GNU General Public License".

DISCLAIMER
       This package is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A	PARTICULAR PURPOSE.

       See the "GNU General Public License" for	more details.

perl v5.32.1			  2010-02-16	      Image::OCR::Tesseract(3)

NAME | SYNOPSIS | DESCRIPTION | SUBS | TESSERACT NOTES | GOCR | SEE ALSO | CAVEATS | DEBUG | AUTHOR | COPYRIGHT | LICENSE | DISCLAIMER

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Image::OCR::Tesseract&sektion=3&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help