Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Lingua::EN::Squeeze(3)User Contributed Perl DocumentatioLingua::EN::Squeeze(3)

NAME
       Lingua::EN::Squeeze - Shorten text to minimum syllables using hash
       table lookup and	vowel deletion

SYNOPSIS
	   use Lingua::EN::Squeeze;		 # import only function
	   use Lingua::EN::Squeeze qw( :ALL );	 # import all functions	and variables
	   use English;				 # to use readable variable names

	   while (<>) {
	       print "Original:	$_\n";
	       print "Squeezed:	", SqueezeText(lc $_), "\n";
	   }

	   #  Or you can use object oriented interface

	   $squeeze = Lingua::EN::Squeeze->new();

	   while (<>) {
	       print "Original:	$_\n";
	       print "Squeezed:	", $squeeze->SqueezeText(lc $_);
	   }

VERSION
       This document describes version 2016.01

DESCRIPTION
       This module squeezes English text to the	most compact format possible,
       so that it is barely readable.  Be sure to convert all text to
       lowercase before	using the SqueezeText()	for maximum compression,
       because optimizations have been designed	mostly for lower case letters.

       Warning:	Each line is processed multiple	times, so prepare for slow
       conversion time

       You can use this	module e.g. to preprocess text before it is sent to
       electronic media	that has some maximum text size	limit. For example
       pagers have an arbitrary	text size limit, typically around 200
       characters, which you want to fill as much as possible. Alternatively
       you may have GSM	cellular phone which is	capable	of receiving Short
       Messages	(SMS), whose message size limit	is 160 characters. For
       demonstration of	this module's SqueezeText() function, this paragraph's
       conversion result is presented below. See yourself if it's readable
       (Yes, it	takes some time	to get used to). The compression ratio is
       typically 30-40%

	   u _n	use thi	mod e.g. to prprce txt bfre i_s	snt to
	   elrnic mda has som max txt siz lim. f_xmple pag
	   hv  abitry txt siz lim, tpcly 200 chr, W/ u wnt
	   to fll as mch as psbleAlternatvly u may hv GSM cllar	P8
	   w_s cpble of	rcivng Short msg (SMS),	WS/ msg	siz
	   lim is 160 chr. 4 demonstrton of thi	mods SquezText
	   fnc ,  dsc txt of thi prgra has ben cnvd_ blow
	   See uself if	i_s redble (Yes, it tak	som T to get usdto
	   compr rat is	tpcly 30-40

       And if $SQZ_OPTIMIZE_LEVEL is set to non-zero

	   u_nUseThiModE.g.ToPrprceTxtBfreI_sSntTo
	   elrnicMdaHasSomMaxTxtSizLim.F_xmplePag
	   hvAbitryTxtSizLim,Tpcly200Chr,W/UWnt
	   toFllAsMchAsPsbleAlternatvlyUMayHvGSMCllarP8
	   w_sCpbleOfRcivngShortMsg(SMS),WS/MsgSiz
	   limIs160Chr.4DemonstrtonOfThiModsSquezText
	   fnc,DscTxtOfThiPrgraHasBenCnvd_Blow
	   SeeUselfIfI_sRedble(Yes,ItTakSomTToGetUsdto
	   comprRatIsTpcly30-40

       The comparision of these	two show

	   Original text   : 627 characters
	   Level 0	   : 433 characters    reduction 31 %
	   Level 1	   : 345 characters    reduction 45 %  (+14% improvement)

       There are few grammar rules which are used to shorten some English
       tokens considerably:

	   Word	that has _ is usually a	verb

	   Word	that has / is usually a	substantive, noun,
			   pronomine or	other non-verb

       Read following substituting tokens in order to understand the basics of
       converted text. Hopefully, the text is not pure Geek code (tm) to you
       after some practice. In Geek code (Like G++L--J)	you would need an
       external	parser to understand it. Here some common sense	and time is
       needed to adapt oneself to the compressed format. For a complete	up to
       date list, you would be better off peeking the source code

	   automatically => 'acly_'

	   for		 => 4
	   for him	 => 4h
	   for her	 => 4h
	   for them	 => 4t
	   for those	 => 4t

	   can		 => _n
	   does		 => _s

	   it is	 => i_s
	   that	is	 => t_s
	   which is	 => w_s
	   that	are	 => t_r
	   which are	 => w_r

	   less		 => -/
	   more		 => +/
	   most		 => ++

	   however	 => h/ver
	   think	 => thk_

	   useful	 => usful

	   you		 => u
	   your		 => u/
	   you'd	 => u/d
	   you'll	 => u/l
	   they		 => t/
	   their	 => t/r

	   will		 => /w
	   would	 => /d
	   with		 => w/
	   without	 => w/o
	   which	 => W/
	   whose	 => WS/

       Time is expressed with big letters

	   time		 => T
	   minute	 => MIN
	   second	 => SEC
	   hour		 => HH
	   day		 => DD
	   month	 => MM
	   year		 => YY

       Other big letter	acronyms, think	8 to represent the speaker and the
       microphone.

	   phone	 => P8

EXAMPLES
       To add new words	e.g. to	word conversion	hash table, you'd define a
       custom set and merge them to existing ones. Do similarly	to
       %SQZ_WXLATE_MULTI_HASH and $SQZ_ZAP_REGEXP and then start using the
       conversion function.

	   use English;
	   use Squeeze qw( :ALL	);

	   my %myExtraWordHash =
	   (
		 new-word1  => 'conversion1'
	       , new-word2  => 'conversion2'
	       , new-word3  => 'conversion3'
	       , new-word4  => 'conversion4'
	   );

	   #   First take the existing tables and merge	them with the above
	   #   translation table

	   my %mySustomWordHash	=
	   (
		 %SQZ_WXLATE_HASH
	       , %SQZ_WXLATE_EXTRA_HASH
	       , %myExtraWordHash
	   );

	   my $myXlat =	0;			       # state flag

	   while (<>)
	   {
	       if ( $condition )
	       {
		   SqueezeHashSet \%mySustomWordHash;  # Use MY	conversions
		   $myXlat = 1;
	       }

	       if ( $myXlat and	$condition )
	       {
		   SqueezeHashSet "reset";	       # Back to default table
		   $myXlat = 0;
	       }

	       print SqueezeText $ARG;
	   }

       Similarly you can redefine the multi word translation table by
       supplying another hash reference	in call	to SqueezeHashSet(). To	kill
       more text immediately in	addition to default, just concatenate regexps
       to variable $SQZ_ZAP_REGEXP

KNOWN BUGS
       There may be lot	of false conversions and if you	think that some	word
       squeezing went too far, please 1) turn on the debug 2) send you example
       text 3) debug log log to	the maintainer.	To see how the conversion goes
       e.g. for	word Messages:

	   use English;
	   use Lingua::EN:Squeeze;

	   #   Activate	debug when case-insensitive word "Messages" is found from
	   #   the line.

	   SqueezeDebug( 1, '(?i)Messages' );

	   $ARG	= "This	line has some Messages in it";
	   print SqueezeText $ARG;

EXPORTABLE VARIABLES
       The defaults may	not apply to all types of text,	so you may wish	to
       extend the hash tables and $SQZ_ZAP_REGEXP to cope with your typical
       text.

   $SQZ_ZAP_REGEXP
       Text to kill immediately, like "Hm, Hi, Hello..." You can only set this
       once, because this regexp is compiled immediately when "SqueezeText()"
       is called for the first time.

   $SQZ_OPTIMIZE_LEVEL
       This controls how optimized the text will be. Currently there is	only
       level 0 (default) and level 1. Level 1 removes all spaces. That usually
       improves	compression by average of 10%, but the text is more harder to
       read. If	space is real tight, use this extended compression
       optimization.

   %SQZ_WXLATE_MULTI_HASH
       Multi Word conversion hash table:  "for you" => "4u" ...

   %SQZ_WXLATE_HASH
       Single Word conversion hash table: word => conversion. This table is
       applied after %SQZ_WXLATE_MULTI_HASH has	been used.

   %SQZ_WXLATE_EXTRA_HASH
       Aggressive Single Word conversions like:	without	=> w/o are applied
       last.

INTERFACE FUNCTIONS
   SqueezeObjectArg($)
       Description
	   Return subroutine argument in both function and object cases.  This
	   is a	wrapper	utility	to make	package	work as	a function library as
	   well	as OO class.

       @list
	   List	of arguments. Usually the first	one is object if class
	   interface is	used.

       Return values
	   Return arguments without the	first object parameter.

   SqueezeText($)
       Description
	   Squeeze text	by using vowel substitutions and deletions and hash
	   tables that guide text substitutions. The line is parsed multiple
	   times and this will take some time.

       arg1: $text
	   String. Line	of Text.

       Return values
	   String, squeezed text.

   new()
       Description
	   Return new class object.

       Return values
	   Object.

   SqueezeHashSet($;$)
       Description
	   Set hash tables to use for converting text. The multiple word
	   conversion is done first and	after that the single words
	   conversions.

       arg1: \%wordHashRef
	   Pointer to a	hash to	be used	to convert single words. If "reset",
	   use default hash table.

       arg2: \%multiHashRef [optional]
	   Pointer to a	hash to	be used	to convert multiple words. If "reset",
	   use default hash table.

       Return values
	   None.

   SqueezeControl(;$)
       Description
	   Select level	of compression,	which can be one of noconv, enable,
	   medium, maximum.

       arg1: $state
	   String. If nothing, use maximum squeeze level. Other	string values
	   accepted are:

	       noconv	   Turn	off squeeze
	       conv	   Turn	on squeeze
	       med	   Set squeezing level to medium
	       max	   Set squeezing level to maximum

       Return values
	   None.

   SqueezeDebug(;$$)
       Description
	   Activate or deactivate debug.

       arg1: $state [optional]
	   If not given, turn debug off. If non-zero, turn debug on. You must
	   also	supply "regexp"	if you turn on debug, unless you have given it
	   previously.

       arg2: $regexp [optional]
	   If given, use regexp	to trigger debug output	when debug is on.

       Return values
	   None.

AVAILABILITY
       Latest version of this module can be found at
       CPAN/modules/by-module/Lingua/

AUTHOR
       Jari Aalto <jariaalto@cpan.org>

COPYRIGHT AND LICENSE
       This software is	Copyright (c) 1998-2016	by Jari	Aalto.

       This is free software, licensed under:

	 The GNU General Public	License, Version 2, June 1991

       You can redistribute it and/or modify it	under the terms	of GNU General
       Public License v2 or later.

perl v5.24.1			  2016-01-21		Lingua::EN::Squeeze(3)

NAME | SYNOPSIS | VERSION | DESCRIPTION | EXAMPLES | KNOWN BUGS | EXPORTABLE VARIABLES | INTERFACE FUNCTIONS | AVAILABILITY | AUTHOR | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Lingua::EN::Squeeze&sektion=3&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help