Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Search::Elasticsearch:UsereContributedhPerlaDticsearch::Client::5_0::Scroll(3)

NAME
       Search::Elasticsearch::Client::5_0::Scroll - A helper module for
       scrolled	searches

VERSION
       version 5.02

SYNOPSIS
	   use Search::Elasticsearch;

	   my $es     =	Search::Elasticsearch->new;

	   my $scroll =	$es->scroll_helper(
	       index	   => 'my_index',
	       body => {
		   query   => {...},
		   size	   => 1000,
		   sort	   => '_doc'
	       }
	   );

	   say "Total hits: ". $scroll->total;

	   while (my $doc = $scroll->next) {
	       # do something
	   }

DESCRIPTION
       A scrolled search is a search that allows you to	keep pulling results
       until there are no more matching	results, much like a cursor in an SQL
       database.

       Unlike paginating through results (with the "from" parameter in
       search()), scrolled searches take a snapshot of the current state of
       the index. Even if you keep adding new documents	to the index or
       updating	existing documents, a scrolled search will only	see the	index
       as it was when the search began.

       This module is a	helper utility that wraps the functionality of the
       search()	and scroll() methods to	make them easier to use.

       This class does Search::Elasticsearch::Client::5_0::Role::Scroll	and
       Search::Elasticsearch::Role::Is_Sync.

USE CASES
       There are two primary use cases:

   Pulling enough results
       Perhaps you want	to group your results by some field, and you don't
       know exactly how	many results you will need in order to return 10
       grouped results.	 With a	scrolled search	you can	keep pulling more
       results until you have enough.  For instance, you can search emails in
       a mailing list, and return results grouped by "thread_id":

	   my (%groups,@results);

	   my $scroll =	$es->scroll_helper(
	       index =>	'my_emails',
	       type  =>	'email',
	       body  =>	{ query	=> {...	some query ... }}
	   );

	   my $doc;
	   while (@results < 10	and $doc = $scroll->next) {

	       my $thread = $doc->{_source}{thread_id};

	       unless ($groups{$thread}) {
		   $groups{$thread} = [];
		   push	@results, $groups{$thread};
	       }
	       push @{$groups{$thread}},$doc;

	   }

   Extracting all documents
       Often you will want to extract all (or a	subset of) documents in	an
       index.  If you want to change your type mappings, you will need to
       reindex all of your data. Or perhaps you	want to	move a subset of the
       data in one index into a	new dedicated index. In	these cases, you don't
       care about sort order, you just want to retrieve	all documents which
       match a query, and do something with them. For instance,	to retrieve
       all the docs for	a particular "client_id":

	   my $scroll =	$es->scroll_helper(
	       index	   => 'my_index',
	       size	   => 1000,
	       body	   => {
		   query => {
		       match =>	{
			   client_id =>	123
		       }
		   },
		   sort	=> '_doc'
	       }
	   );

	   while (my $doc = $scroll->next) {
	       # do something
	   }

       Very often the something	that you will want to do with these results
       involves	bulk-indexing them into	a new index. The easiest way to	do
       this is to use the built-in "reindex()" in
       Search::Elasticsearch::Client::5_0::Direct functionality	provided by
       Elasticsearch.

METHODS
   "new()"
	   use Search::Elasticsearch;

	   my $es = Search::Elasticsearch->new(...);
	   my $scroll =	$es->scroll_helper(
	       scroll	      => '1m',		  # optional
	       scroll_in_qs   => 0|1,		  # optional
	       %search_params
	   );

       The "scroll_helper()" in	Search::Elasticsearch::Client::5_0::Direct
       method loads Search::Elasticsearch::Client::5_0::Scroll class and calls
       "new()",	passing	in any arguments.

       You can specify a "scroll" duration (which defaults to "1m") and
       "scroll_in_qs" (which defaults to "false"). Any other parameters	are
       passed directly to "search()" in
       Search::Elasticsearch::Client::5_0::Direct.

       The "scroll" duration tells Elasticearch	how long it should keep	the
       scroll alive.  Note: this duration doesn't need to be long enough to
       process all results, just long enough to	process	a single batch of
       results.	 The expiry gets renewed for another "scroll" period every
       time new	a new batch of results is retrieved from the cluster.

       By default, the "scroll_id" is passed as	the "body" to the scroll
       request.	 To send it in the query string	instead, set "scroll_in_qs" to
       a true value, but be aware: when	querying very many indices, the	scroll
       ID can become too long for intervening proxies.

       The "scroll" request uses "GET" by default.  To use "POST" instead, set
       send_get_body_as	to "POST".

   "next()"
	   $doc	 = $scroll->next;
	   @docs = $scroll->next($num);

       The "next()" method returns the next result, or the next	$num results
       (pulling	more results if	required).  If all results have	been
       exhausted, it returns an	empty list.

   "drain_buffer()"
	   @docs = $scroll->drain_buffer;

       The "drain_buffer()" method returns all of the documents	currently in
       the buffer, without fetching any	more from the cluster.

   "refill_buffer()"
	   $total = $scroll->refill_buffer;

       The "refill_buffer()" method fetches the	next batch of results from the
       cluster,	stores them in the buffer, and returns the total number	of
       docs currently in the buffer.

   "buffer_size()"
	   $total = $scroll->buffer_size;

       The "buffer_size()" method returns the total number of docs currently
       in the buffer.

   "finish()"
	   $scroll->finish;

       The "finish()" method clears out	the buffer, sets "is_finished()" to
       "true" and tries	to clear the "scroll_id" on Elasticsearch.  This API
       is only supported since v0.90.5,	but the	call to	"clear_scroll" is
       wrapped in an "eval" so the "finish()" method can be safely called with
       any version of Elasticsearch.

       When the	$scroll	instance goes out of scope, "finish()" is called
       automatically if	required.

   "is_finished()"
	   $bool = $scroll->is_finished;

       A flag which returns "true" if all results have been processed or
       "finish()" has been called.

INFO ACCESSORS
       The information from the	original search	is returned via	the following
       accessors:

   "total"
       The total number	of documents that matched your query.

   "max_score"
       The maximum score of any	documents in your query.

   "aggregations"
       Any aggregations	that were specified, or	"undef"

   "facets"
       Any facets that were specified, or "undef"

   "suggest"
       Any suggestions that were specified, or "undef"

   "took"
       How long	the original search took, in milliseconds

   "took_total"
       How long	the original search plus all subsequent	batches	took, in
       milliseconds.

SEE ALSO
       o   "search()" in Search::Elasticsearch::Client::5_0::Direct

       o   "scroll()" in Search::Elasticsearch::Client::5_0::Direct

       o   "reindex()" in Search::Elasticsearch::Client::5_0::Direct

AUTHOR
       Clinton Gormley <drtech@cpan.org>

COPYRIGHT AND LICENSE
       This software is	Copyright (c) 2017 by Elasticsearch BV.

       This is free software, licensed under:

	 The Apache License, Version 2.0, January 2004

perl v5.24.1			 Search::Elasticsearch::Client::5_0::Scroll(3)

NAME | VERSION | SYNOPSIS | DESCRIPTION | USE CASES | METHODS | INFO ACCESSORS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Search::Elasticsearch::Client::5_0::Scroll&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help