Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
Search::Elasticsearch:UsereContributedhPerlaDticsearch::Client::1_0::Scroll(3)

NAME
       Search::Elasticsearch::Client::1_0::Scroll - A helper module for
       scrolled	searches

VERSION
       version 5.02

SYNOPSIS
	   use Search::Elasticsearch;

	   my $es     =	Search::Elasticsearch->new;

	   my $scroll =	$es->scroll_helper(
	       index	   => 'my_index',
	       search_type => 'scan',
	       size	   => 500
	   );

	   say "Total hits: ". $scroll->total;

	   while (my $doc = $scroll->next) {
	       # do something
	   }

DESCRIPTION
       A scrolled search is a search that allows you to	keep pulling results
       until there are no more matching	results, much like a cursor in an SQL
       database.

       Unlike paginating through results (with the "from" parameter in
       search()), scrolled searches take a snapshot of the current state of
       the index. Even if you keep adding new documents	to the index or
       updating	existing documents, a scrolled search will only	see the	index
       as it was when the search began.

       This module is a	helper utility that wraps the functionality of the
       search()	and scroll() methods to	make them easier to use.

       IMPORTANT: Deep scrolling can be	expensive.  See	"DEEP SCROLLING" for
       more.

       This class does Search::Elasticsearch::Client::1_0::Role::Scroll	and
       Search::Elasticsearch::Role::Is_Sync.

USE CASES
       There are two primary use cases:

   Pulling enough results
       Perhaps you want	to group your results by some field, and you don't
       know exactly how	many results you will need in order to return 10
       grouped results.	 With a	scrolled search	you can	keep pulling more
       results until you have enough.  For instance, you can search emails in
       a mailing list, and return results grouped by "thread_id":

	   my (%groups,@results);

	   my $scroll =	$es->scroll_helper(
	       index =>	'my_emails',
	       type  =>	'email',
	       body  =>	{ query	=> {...	some query ... }}
	   );

	   my $doc;
	   while (@results < 10	and $doc = $scroll->next) {

	       my $thread = $doc->{_source}{thread_id};

	       unless ($groups{$thread}) {
		   $groups{$thread} = [];
		   push	@results, $groups{$thread};
	       }
	       push @{$groups{$thread}},$doc;

	   }

   Extracting all documents
       Often you will want to extract all (or a	subset of) documents in	an
       index.  If you want to change your type mappings, you will need to
       reindex all of your data. Or perhaps you	want to	move a subset of the
       data in one index into a	new dedicated index. In	these cases, you don't
       care about sort order, you just want to retrieve	all documents which
       match a query, and do something with them. For instance,	to retrieve
       all the docs for	a particular "client_id":

	   my $scroll =	$es->scroll_helper(
	       index	   => 'my_index',
	       search_type => 'scan',	       # important!
	       size	   => 500,
	       body	   => {
		   query => {
		       match =>	{
			   client_id =>	123
		       }
		   }
	       }
	   );

	   while (my $doc = $scroll->next) {
	       # do something
	   }

       Very often the something	that you will want to do with these results
       involves	bulk-indexing them into	a new index. The easiest way to	marry
       a scrolled search with bulk indexing is to use the "reindex()" in
       Search::Elasticsearch::Client::1_0::Bulk	method.

DEEP SCROLLING
       Deep scrolling (and deep	pagination) are	very expensive in a
       distributed environment,	and the	reason they are	expensive is that
       results need to be sorted in a global order.

       For example, if we have an index	with 5 shards, and we request the
       first 10	results, each shard has	to return its top 10, and then the
       requesting node (the node that is handling the search request) has to
       resort these 50 results to return a global top 10. Now, if we request
       page 1,000 (ie results 10,001 ..	10,010), then each shard has to	return
       10,010 results, and the requesting node has to sort through 50,050
       results just to return 10 of them!

       You can see how this can	get very heavy very quickly. This is the
       reason that web search engines never return more	than 1,000 results.

   Disable sorting for efficient scrolling
       The problem with	deep scrolling is the sorting phase.  If we disable
       sorting,	then we	can happily scroll through millions of documents
       efficiently.  The way to	do this	is to set "search_type"	to "scan":

	   my $scroll =	$es->scroll_helper(
	       search_type => 'scan',
	       size	   => 500,
	   );

       Scanning	disables sorting and will just return "size" results from each
       shard until there are no	more results to	return.	Note: this means that,
       when querying an	index with 5 shards, the scrolled search will pull
       "size * 5" results at a time. If	you have large documents or are	memory
       constrained, you	will need to take this into account.

METHODS
   "new()"
	   use Search::Elasticsearch;

	   my $es = Search::Elasticsearch->new(...);
	   my $scroll =	$es->scroll_helper(
	       scroll	      => '1m',		  # optional
	       scroll_in_qs   => 0|1,		  # optional
	       %search_params
	   );

       The "scroll_helper()" in	Search::Elasticsearch::Client::1_0::Direct
       method loads Search::Elasticsearch::Client::1_0::Scroll class and calls
       "new()",	passing	in any arguments.

       You can specify a "scroll" duration (which defaults to "1m") and
       "scroll_in_qs" (which defaults to "false"). Any other parameters	are
       passed directly to "search()" in
       Search::Elasticsearch::Client::1_0::Direct.

       The "scroll" duration tells Elasticearch	how long it should keep	the
       scroll alive.  Note: this duration doesn't need to be long enough to
       process all results, just long enough to	process	a single batch of
       results.	 The expiry gets renewed for another "scroll" period every
       time new	a new batch of results is retrieved from the cluster.

       By default, the "scroll_id" is passed as	the "body" to the scroll
       request.	 To send it in the query string	instead, set "scroll_in_qs" to
       a true value, but be aware: when	querying very many indices, the	scroll
       ID can become too long for intervening proxies.

       The "scroll" request uses "GET" by default.  To use "POST" instead, set
       send_get_body_as	to "POST".

   "next()"
	   $doc	 = $scroll->next;
	   @docs = $scroll->next($num);

       The "next()" method returns the next result, or the next	$num results
       (pulling	more results if	required).  If all results have	been
       exhausted, it returns an	empty list.

   "drain_buffer()"
	   @docs = $scroll->drain_buffer;

       The "drain_buffer()" method returns all of the documents	currently in
       the buffer, without fetching any	more from the cluster.

   "refill_buffer()"
	   $total = $scroll->refill_buffer;

       The "refill_buffer()" method fetches the	next batch of results from the
       cluster,	stores them in the buffer, and returns the total number	of
       docs currently in the buffer.

   "buffer_size()"
	   $total = $scroll->buffer_size;

       The "buffer_size()" method returns the total number of docs currently
       in the buffer.

   "finish()"
	   $scroll->finish;

       The "finish()" method clears out	the buffer, sets "is_finished()" to
       "true" and tries	to clear the "scroll_id" on Elasticsearch.  This API
       is only supported since v0.90.5,	but the	call to	"clear_scroll" is
       wrapped in an "eval" so the "finish()" method can be safely called with
       any version of Elasticsearch.

       When the	$scroll	instance goes out of scope, "finish()" is called
       automatically if	required.

   "is_finished()"
	   $bool = $scroll->is_finished;

       A flag which returns "true" if all results have been processed or
       "finish()" has been called.

INFO ACCESSORS
       The information from the	original search	is returned via	the following
       accessors:

   "total"
       The total number	of documents that matched your query.

   "max_score"
       The maximum score of any	documents in your query.

   "aggregations"
       Any aggregations	that were specified, or	"undef"

   "facets"
       Any facets that were specified, or "undef"

   "suggest"
       Any suggestions that were specified, or "undef"

   "took"
       How long	the original search took, in milliseconds

   "took_total"
       How long	the original search plus all subsequent	batches	took, in
       milliseconds.

SEE ALSO
       o   "reindex()" in Search::Elasticsearch::Client::1_0::Bulk

       o   "search()" in Search::Elasticsearch::Client::1_0::Direct

       o   "scroll()" in Search::Elasticsearch::Client::1_0::Direct

AUTHOR
       Clinton Gormley <drtech@cpan.org>

COPYRIGHT AND LICENSE
       This software is	Copyright (c) 2017 by Elasticsearch BV.

       This is free software, licensed under:

	 The Apache License, Version 2.0, January 2004

perl v5.24.1			 Search::Elasticsearch::Client::1_0::Scroll(3)

NAME | VERSION | SYNOPSIS | DESCRIPTION | USE CASES | DEEP SCROLLING | METHODS | INFO ACCESSORS | SEE ALSO | AUTHOR | COPYRIGHT AND LICENSE

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Search::Elasticsearch::Client::1_0::Scroll&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help