Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RRD_PDPCALC(1)			    rrdtool			RRD_PDPCALC(1)

NAME
       PDP calculation explanation - PDP inner calculation logics with an
       example by Tianpeng Xia

DESCRIPTION
       This article explains how PDP are calculated in a detailed yet easy-to-
       understand way, with an example.

Refreshing some	basics about PDP
   Fundamental knowledge
       If you have not read the	tutorials or man pages either on the official
       site or those by	others,	then I strongly	encourage you to do so.	 As
       said in the description,	this article will only explain how a PDP is
       calculated, but not the definition of it.  So please read the following
       materials to get	a basic	understanding of PDP:

       <http://rrdtool.vandenbogaerdt.nl/process.php> -	By Alex	van den
       Bogaerdt. This article explained	PDP in a very detailed and clear way,
       however,	it does	not explain the	"normalization process"	in its
       "Normalize interval" section in the right way( as opposed to the
       official	version	I confirmed with @oetiker himself). The	flaw can be
       easily seen in the bar charts, discussed	in the "Calculation logics"
       section.

       <https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html> -	This one is on
       the official site. Actually it's	the manual page	for "rrdcreate", and
       it reveals what's under the hood	with regard to PDP calculation in its
       "The HEARTBEAT and the STEP" section.

       The text	graph by Don Baarda provides a vivid explanation on how	UNKOWN
       data are	produced and how heartbeat value can influence in the
       sampling. Unfortunately,	it fails to give a clear method	by which PDPs
       are calculated.

       <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html>	- Another
       detailed	official tutorial by Alex van den Bogaerdt. Similarly, it only
       provides	examples with data evenly and exactly distributed according to
       the step	set.

       If you don't like doing experiments or care about the inner mechanics
       that much, you can just stop here and give more attention to more
       practical topics	like graph exports or command manual. But if you are
       the sort	of people like me who just care	as much	about the calculation
       logics, please read on.

Calculation logics
       Here begins the core part of this article. In the following content of
       this section, I would like to give two versions of calculation methods,
       one by Alex van den Bogaerdt and	the other by @eotiker.

       To provide an ASCII-friendly explanation, I will	explain	both versions
       with the	char below instead of a	real image.

	 |
	 |    (v1)
	 | _______			  (v4)	(v5)
	 | |	 |	     (v3)	 ____________
	 | |	 |	  ______________|     ||   |
	 | |	 |	  |	       ||     ||   |
	 | |	 |	  |	       ||     ||   |
	 | |	 |   (v2) |	       ||     ||   |
	 | |	 |________|	       ||     ||   |
	--------------------------------------------->
	 0 1	 3	  7	       17     20   21

       The X axis means	time slots( each second	denotes	one slot) and the Y
       axis means the value.

       Let's make everything a little clearer:

       - The step is 5

       - each PDP gets updated only if a value arrives at or after the last
       slot of the PDP,	for instance, the last slot of the PDP from 16 to 20
       is 20

       - The heartbeat is 20, so the samples during the	entire 7-17 period is
       not discarded

       - At second 3, the first	value comes in as v1, and so on

       - Second	0 is the origin, and it	does not count as a sample

   Bogaerdt version
       As can be seen on this page:
       <http://rrdtool.vandenbogaerdt.nl/process.php>, after all the primary
       data are	transformed to rates( except for GAUGE,	of course), they have
       to go through a normalization process if	they are not distributed
       exactly according to the	step or	on well-defined	boundaries in time, in
       the words of the	author.

       What does that mean? Basically, if all the known	(as opposed to an
       unknown value) data make	up at least 50%	of all slots during a period,
       then a PDP is calculated	from them.

       This version seems to go	well until we reach the	bar chart part.

       According to the	ASCII bar chart, we have the following results:

       From second 1 on, the PDP of each period( 1-5,6-10, ...)	is computed by
       averaging all the values	within it.

       So: - the PDP from 1 to 5 is (v1*3+v2*2)/5

       - the PDP from 6	to 10 is (v2*2+v3*3)/5

       - the PDP from 11 to 15 is (v3*5)/5, since all the values in slots 11,
       12, 13, 14 and 15 are the same, which is	v3

       - ...

   The official	version( also @oetiker version):
       Using the same chart, this version suggests the following:

       - the PDP from 1	to 5 is	(v1*3+v2*2)/5

       - the PDPs from 6 to 10 and 11 to 15 are	the SAME, which	is (v2*2+v3*8)

       - ...

   A Comparison	and some explanation
       So we have seen the above two versions and their	PDPs from 6 to 10 and
       11 to 15	do not comply with each	other.

       Why is that?

       Because the difference between the official version and Bogaerdt
       version stems from the way they do the calculation for PDP(6-10)	and
       PDP(11-15).

       Let's discuss this in more detail using the above bar chart.

       Bogaerdt's version,

       PDPs are	always computed	individually no	matter how values arrive.

       For example, the	value at slot 17 comes after the last slot of
       PDP(11-15). Also, the immediate previous	value before slot 17 is	at 7.
       All the slots from 7 to 17 are assigned v3. Since each PDP is computed
       individually, PDP(6-10) is (v2*2+v3*3)/5	while the PDP(11-15) is
       (v3*5)/5.

       The official version

       PDPs are	always computed	in terms of the	steps which the	next update
       spans, be it 1 step, 2 steps or n steps;	in other words,	PDPs may be
       computed	together.

       For example, the	update at slot 17 spans	PDP(6-10) and PDP(11-15)
       because the immediate previous value is at 7 and	7 is within 6 and 10 ,
       and 17 is after 15. PDP(1-5) and	PDP(16-20) are not included since the
       update at slot 7	has already triggered the calculation for PDP(1-5) and
       the update at slot 17 comes before the last slot	of PDP(16-20) which is
       20.

       That's the reason why PDP(6-10) and PDP(11-15) have the same value,
       (v2*2+v3*8).

An example
       If you are still	confused, don't	worry, an example is here to help you.

       Let's get our hands dirty with some commands

	rrdtool	create target.rrd --start 1000000000  --step 5 DS:mem:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:10
	rrdtool	update target.rrd 1000000003:8 1000000006:1 1000000017:6 \
	1000000020:7 1000000021:7 1000000022:4 \
	1000000023:3 1000000036:1 1000000037:2 \
	1000000038:3 1000000039:3 1000000042:5
	rrdtool	fetch target.rrd AVERAGE --start 1000000000 --end 1000000045

       Basically, the above codes contain 3 commands: create, update and
       fetch. First create a new rrd file, and then we feed in some data and
       last we fetch all the PDPs from the rrd.

   Focus on single steps
       In order	to provide a detailed explanation, each	the calculation
       process of each PDP is provided.

       Below is	the output of the commands above:

	1000000005: 5.2000000000e+00
	1000000010: 5.5000000000e+00
	1000000015: 5.5000000000e+00
	1000000020: 6.6000000000e+00
	1000000025: 1.7333333333e+00
	1000000030: 1.7333333333e+00
	1000000035: 1.7333333333e+00
	1000000040: 2.8000000000e+00
	1000000045: nan
	1000000050: nan

       NOTE: 1000000005	means the PDP from 1000000001 to 1000000005, and so
       on. For concision and readability, we use only the last two digits, so
       05 denotes 1000000005. We choose	the type of the	data source as gauge
       because original	values will be treated as rates, no additional
       transformation is needed, see this article
       <http://rrdtool.vandenbogaerdt.nl/process.php> for detail.

       05: 5.2 = (8*3+1*2)/5

       10: 5.5 = (1*1+6*9)/10

       15: the same as the previous one

       20: 6.6 = (6*2+7*3)/5

       25: 1.73333 = (7+4+3+1*12)/15

       ...

       45: nan,	as the last value is at	42,which does not trigger the
       calculation for PDP(41-45)

       50: nan,	why this unknown PDP is	shown is explained in this article
       <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html>

SUMMARY
       All that	said, I	hope you get a clear understanding of the inner
       calculation "magic" for PDPs.

   Other References
       o   A great PowerShell shell script for generating ASCII	bar charts:
	   <https://gallery.technet.microsoft.com/scriptcenter/Sample-Script-to-Generate-59c80d4c>

       o   <https://stackoverflow.com/questions/18924450/rrd-wrong-values>

1.7.1				  2019-02-04			RRD_PDPCALC(1)

NAME | DESCRIPTION | Refreshing some basics about PDP | Calculation logics | An example | SUMMARY

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=rrd_pdpcalc&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help