Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MCE::Stream(3)	      User Contributed Perl Documentation	MCE::Stream(3)

NAME
       MCE::Stream - Parallel stream model for chaining	multiple maps and
       greps

VERSION
       This document describes MCE::Stream version 1.874

SYNOPSIS
	## Exports mce_stream, mce_stream_f, mce_stream_s
	use MCE::Stream;

	my (@m1, @m2, @m3);

	## Default mode	is map and processed from right-to-left
	@m1 = mce_stream sub { $_ * 3 }, sub { $_ * 2 }, 1..10000;
	mce_stream \@m2, sub { $_ * 3 }, sub { $_ * 2 }, 1..10000;

	## Native Perl
	@m3 = map { $_ * $_ } grep { $_	% 5 == 0 } 1..10000;

	## Streaming grep and map in parallel
	mce_stream \@m3,
	   { mode => 'map',  code => sub { $_ *	$_ } },
	   { mode => 'grep', code => sub { $_ %	5 == 0 } }, 1..10000;

	## Array or array_ref
	my @a =	mce_stream sub { $_ * $_ }, 1..10000;
	my @b =	mce_stream sub { $_ * $_ }, \@list;

	## Important; pass an array_ref	for deeply input data
	my @c =	mce_stream sub { $_->[1] *= 2; $_ }, [ [ 0, 1 ], [ 0, 2	], ... ];
	my @d =	mce_stream sub { $_->[1] *= 2; $_ }, \@deeply_list;

	## File	path, glob ref,	IO::All::{ File, Pipe, STDIO } obj, or scalar ref
	## Workers read	directly and not involve the manager process
	my @e =	mce_stream_f sub { chomp; $_ },	"/path/to/file"; # efficient

	## Involves the	manager	process, therefore slower
	my @f =	mce_stream_f sub { chomp; $_ },	$file_handle;
	my @g =	mce_stream_f sub { chomp; $_ },	$io;
	my @h =	mce_stream_f sub { chomp; $_ },	\$scalar;

	## Sequence of numbers (begin, end [, step, format])
	my @i =	mce_stream_s sub { $_ *	$_ }, 1, 10000,	5;
	my @j =	mce_stream_s sub { $_ *	$_ }, [	1, 10000, 5 ];

	my @k =	mce_stream_s sub { $_ *	$_ }, {
	   begin => 1, end => 10000, step => 5,	format => undef
	};

DESCRIPTION
       This module allows one to stream	multiple map and/or grep operations in
       parallel. Code blocks run simultaneously	from right-to-left. The
       results are appended immediately	when providing a reference to an
       array.

	## Appends are serialized, even	out-of-order ok, but immediately.
	## Out-of-order	chunks are held	temporarily until ordered chunks
	## arrive.

	mce_stream \@a,	sub { $_ }, sub	{ $_ },	sub { $_ }, 1..10000;

	##						      input
	##					  chunk1      input
	##			      chunk3	  chunk2      input
	##		  chunk2      chunk2	  chunk3      input
	##   append1	  chunk3      chunk1	  chunk4      input
	##   append2	  chunk1      chunk5	  chunk5      input
	##   append3	  chunk5      chunk4	  chunk6      ...
	##   append4	  chunk4      chunk6	  ...
	##   append5	  chunk6      ...
	##   append6	  ...
	##   ...
	##

       MCE incurs a small overhead due to passing of data. A fast code block
       will run	faster natively	when chaining multiple map functions. However,
       the overhead will likely	diminish as the	complexity increases for the
       code.

	## 0.334 secs -- baseline using	the native map function
	my @m1 = map { $_ * 4 }	map { $_ * 3 } map { $_	* 2 } 1..1000000;

	## 0.427 secs -- this is quite amazing considering data	passing
	my @m2 = mce_stream
	      sub { $_ * 4 }, sub { $_ * 3 }, sub { $_ * 2 }, 1..1000000;

	## 0.355 secs -- appends to @m3	immediately, not after running
	my @m3;	mce_stream \@m3,
	      sub { $_ * 4 }, sub { $_ * 3 }, sub { $_ * 2 }, 1..1000000;

       Even faster is mce_stream_s; useful when	input data is a	range of
       numbers.	 Workers generate sequences mathematically among themselves
       without any interaction from the	manager	process. Two arguments are
       required	for mce_stream_s (begin, end). Step defaults to	1 if begin is
       smaller than end, otherwise -1.

	## 0.278 secs -- numbers are generated mathematically via sequence
	my @m4;	mce_stream_s \@m4,
	      sub { $_ * 4 }, sub { $_ * 3 }, sub { $_ * 2 }, 1, 1000000;

OVERRIDING DEFAULTS
       The following list options which	may be overridden when loading the
       module.	The fast option	is obsolete in 1.867 onwards; ignored if
       specified.

	use Sereal qw( encode_sereal decode_sereal );
	use CBOR::XS qw( encode_cbor decode_cbor );
	use JSON::XS qw( encode_json decode_json );

	use MCE::Stream
	    max_workers	=> 8,		     # Default 'auto'
	    chunk_size => 500,		     # Default 'auto'
	    tmp_dir => "/path/to/app/tmp",   # $MCE::Signal::tmp_dir
	    freeze => \&encode_sereal,	     # \&Storable::freeze
	    thaw => \&decode_sereal,	     # \&Storable::thaw
	    default_mode => 'grep',	     # Default 'map'
	    fast => 1			     # Default 0 (fast dequeue)
	;

       From MCE	1.8 onwards, Sereal 3.015+ is loaded automatically if
       available.  Specify "Sereal => 0" to use	Storable instead.

	use MCE::Stream	Sereal => 0;

CUSTOMIZING MCE
       MCE::Stream->init ( options )
       MCE::Stream::init { options }

       The init	function accepts a hash	of MCE options.	The gather and
       bounds_only options, if specified, are ignored due to being used
       internally by the module	(not shown below).

	use MCE::Stream;

	MCE::Stream->init(
	   chunk_size => 1, max_workers	=> 4,

	   user_begin => sub {
	      print "##	", MCE->wid, " started\n";
	   },

	   user_end => sub {
	      print "##	", MCE->wid, " completed\n";
	   }
	);

	my @a =	mce_stream sub { $_ * $_ }, 1..100;

	print "\n", "@a", "\n";

	-- Output

	## 1 started
	## 2 started
	## 3 started
	## 4 started
	## 3 completed
	## 1 completed
	## 2 completed
	## 4 completed

	1 4 9 16 25 36 49 64 81	100 121	144 169	196 225	256 289	324 361
	400 441	484 529	576 625	676 729	784 841	900 961	1024 1089 1156
	1225 1296 1369 1444 1521 1600 1681 1764	1849 1936 2025 2116 2209
	2304 2401 2500 2601 2704 2809 2916 3025	3136 3249 3364 3481 3600
	3721 3844 3969 4096 4225 4356 4489 4624	4761 4900 5041 5184 5329
	5476 5625 5776 5929 6084 6241 6400 6561	6724 6889 7056 7225 7396
	7569 7744 7921 8100 8281 8464 8649 8836	9025 9216 9409 9604 9801
	10000

       Like with MCE::Stream->init above, MCE options may be specified using
       an anonymous hash for the first argument. Notice	how both max_workers
       and task_name can take an anonymous array for setting values uniquely
       per each	code block.

       Remember	that MCE::Stream processes from	right-to-left when setting the
       individual values.

	use MCE::Stream;

	my @a =	mce_stream {
	   task_name   => [ 'c', 'b', 'a' ],
	   max_workers => [  2,	  4,   3, ],

	   user_end => sub {
	      my ($mce,	$task_id, $task_name) =	@_;
	      print "$task_id -	$task_name completed\n";
	   },

	   task_end => sub {
	      my ($mce,	$task_id, $task_name) =	@_;
	      MCE->print("$task_id - $task_name	ended\n");
	   }
	},
	sub { $_ * 4 },		    ## 2 workers, named	c
	sub { $_ * 3 },		    ## 4 workers, named	b
	sub { $_ * 2 },	1..10000;   ## 3 workers, named	a

	-- Output

	0 - a completed
	0 - a completed
	0 - a completed
	0 - a ended
	1 - b completed
	1 - b completed
	1 - b completed
	1 - b completed
	1 - b ended
	2 - c completed
	2 - c completed
	2 - c ended

       Note that the anonymous hash, for specifying options, also comes	first
       when passing an array reference.

	my @a; mce_stream {
	   ...
	}, \@a,	sub { ... }, sub { ... }, 1..10000;

API DOCUMENTATION
       Scripts using MCE::Stream can be	written	using the long or short	form.
       The long	form becomes relevant when mixing modes. Again,	processing
       occurs from right-to-left.

	my @m3 = mce_stream
	   { mode => 'map',  code => sub { $_ *	$_ } },
	   { mode => 'grep', code => sub { $_ %	5 == 0 } }, 1..10000;

	my @m4;	mce_stream \@m4,
	   { mode => 'map',  code => sub { $_ *	$_ } },
	   { mode => 'grep', code => sub { $_ %	5 == 0 } }, 1..10000;

       For multiple grep blocks, the short form	can be used. Simply specify
       the default mode	for the	module.	The two	valid values for default_mode
       is 'grep' and 'map'.

	use MCE::Stream	default_mode =>	'grep';

	my @f =	mce_stream_f sub { /ending$/ },	sub { /^starting/ }, $file;

       The following assumes 'map' for default_mode in order to	demonstrate
       all the possibilities for providing input data.

       MCE::Stream->run	( sub {	code },	list )
       mce_stream sub {	code },	list

       Input data may be defined using a list or an array reference. Unlike
       MCE::Loop, Flow,	and Step, specifying a hash reference as input data
       isn't allowed.

	## Array or array_ref
	my @a =	mce_stream sub { $_ * 2	}, 1..1000;
	my @b =	mce_stream sub { $_ * 2	}, \@list;

	## Important; pass an array_ref	for deeply input data
	my @c =	mce_stream sub { $_->[1] *= 2; $_ }, [ [ 0, 1 ], [ 0, 2	], ... ];
	my @d =	mce_stream sub { $_->[1] *= 2; $_ }, \@deeply_list;

	## Not supported
	my @z =	mce_stream sub { ... },	\%hash;

       MCE::Stream->run_file ( sub { code }, file )
       mce_stream_f sub	{ code }, file

       The fastest of these is the /path/to/file. Workers communicate the next
       offset position among themselves	with zero interaction by the manager
       process.

       "IO::All" { File, Pipe, STDIO } is supported since MCE 1.845.

	my @c =	mce_stream_f sub { chomp; $_ . "\r\n" }, "/path/to/file";  # faster
	my @d =	mce_stream_f sub { chomp; $_ . "\r\n" }, $file_handle;
	my @e =	mce_stream_f sub { chomp; $_ . "\r\n" }, $io;		   # IO::All
	my @f =	mce_stream_f sub { chomp; $_ . "\r\n" }, \$scalar;

       MCE::Stream->run_seq ( sub { code }, $beg, $end [, $step, $fmt ]	)
       mce_stream_s sub	{ code }, $beg,	$end [,	$step, $fmt ]

       Sequence	may be defined as a list, an array reference, or a hash
       reference.  The functions require both begin and	end values to run.
       Step and	format are optional. The format	is passed to sprintf (%	may be
       omitted below).

	my ($beg, $end,	$step, $fmt) = (10, 20,	0.1, "%4.1f");

	my @f =	mce_stream_s sub { $_ }, $beg, $end, $step, $fmt;
	my @g =	mce_stream_s sub { $_ }, [ $beg, $end, $step, $fmt ];

	my @h =	mce_stream_s sub { $_ }, {
	   begin => $beg, end => $end, step => $step, format =>	$fmt
	};

       MCE::Stream->run	( { input_data => iterator }, sub { code } )
       mce_stream { input_data => iterator }, sub { code }

       An iterator reference may be specified for input_data. The only other
       way is to specify input_data via	MCE::Stream->init. This	prevents
       MCE::Stream from	configuring the	iterator reference as another user
       task which will not work.

       Iterators are described under section "SYNTAX for INPUT_DATA" at
       MCE::Core.

	MCE::Stream->init(
	   input_data => iterator
	);

	my @a =	mce_stream sub { $_ * 3	}, sub { $_ * 2	};

MANUAL SHUTDOWN
       MCE::Stream->finish
       MCE::Stream::finish

       Workers remain persistent as much as possible after running. Shutdown
       occurs automatically when the script terminates.	Call finish when
       workers are no longer needed.

	use MCE::Stream;

	MCE::Stream->init(
	   chunk_size => 20, max_workers => 'auto'
	);

	my @a =	mce_stream { ... } 1..100;

	MCE::Stream->finish;

INDEX
       MCE, MCE::Core

AUTHOR
       Mario E.	Roy, <marioeroyA ATA gmailA DOTA com>

perl v5.32.0			  2020-08-18			MCE::Stream(3)

NAME | VERSION | SYNOPSIS | DESCRIPTION | OVERRIDING DEFAULTS | CUSTOMIZING MCE | API DOCUMENTATION | MANUAL SHUTDOWN | INDEX | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=MCE::Stream&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help