# FreeBSD Manual Pages

```OLS(3)		      User Contributed Perl Documentation		OLS(3)

NAME
Statistics::OLS - perform ordinary least	squares	and associated
statistics, v 0.07.

SYNOPSIS
use Statistics::OLS;

my \$ls = Statistics::OLS->new();

\$ls->setData	(\@xydataset) or die( \$ls->error() );
\$ls->setData	(\@xdataset, \@ydataset);

\$ls->regress();

my (\$intercept, \$slope) = \$ls->coefficients();
my \$R_squared = \$ls->rsq();
my (\$tstat_intercept, \$tstat_slope) = \$ls->tstats();
my \$sigma = \$ls->sigma();
my \$durbin_watson = \$ls->dw();

my \$sample_size = \$ls->size();
my (\$avX, \$avY) = \$ls->av();
my (\$varX, \$varY, \$covXY) = \$ls->var();
my (\$xmin, \$xmax, \$ymin, \$ymax) = \$ls->minMax();

# returned arrays are x-y or	y-only data
# depending on initial call to setData()
my @predictedYs = \$ls->predicted();
my @residuals = \$ls->residuals();

DESCRIPTION
I wrote Statistics::OLS to perform Ordinary Least Squares (linear curve
fitting)	on two dimensional data: y = a + bx. The other simple
statistical module I found on CPAN (Statistics::Descriptive) is
designed	for univariate analysis. It accomodates	OLS, but somewhat
inflexibly and without rich bivariate statistics. Nevertheless, it
might make sense	to fold	OLS into that module or	a supermodule someday.

Statistics::OLS computes	the estimated slope and	intercept of the
regression line,	their T-statistics, R squared, standard	error of the
regression and the Durbin-Watson	statistic. It can also return the
residuals.

It is pretty simple to do two dimensional least squares,	but much
harder to do multiple regression, so OLS	is unlikely ever to work with
multiple	independent variables.

This is a beta code and has not been extensively	tested.	It has worked
on a few	published datasets. Feedback is	welcome, particularly if you
notice an error or try it with known results that are not reproduced
correctly.

USAGE
Create a regression object: new()
use Statistics::OLS;
my \$ls = Statistics::OLS->new;

Register a dataset: setData()
\$ls->setData	(\@xydata);
\$ls->setData	(\@xdata, \@ydata);

The setData() method registers a	two-dimensional	dataset	with the
regression object.  You can pass	the dataset either as a	reference to
one flat	array containing the paired x,y	data or	as references to two
arrays, one each	for the	x and y	data. [In either case, the data	arrays
in your script are not cached (copied into the object). If you alter
your data, you may optionally call setData() again (if you want error
checking--see below) but	you should at least call the regress() method
(see below) to recompute	statistics for the new data. Or	more simply,

As a single array, in your script, construct a flat array of the	form
(x0, y0,	..., xn, yn) containing	n+1 x,y	data points.  Then pass	a
reference to the	data array to the setData() method. (If	you do not
know what a reference is, just put a backslash (\) in front of the name
of your data array when you pass	it as an argument to setData().)  Like
this:

my @xydata =	qw( -3 9   -2 4	  -1 1	 0 0   1 1  2 4	 3 9);
\$ls->setData	(\@xydata);

Or, you may find	it more	convenient to construct	two equal length
arrays, one for the horizontal and one for the corresponding vertical
data. Then pass references to both arrays (horizontal first) to
setData():

my @xdata = qw( -3  -2  -1  0  1  2	3 );
my @ydata = qw(  9	4   1  0  1  4	9 );
\$ls->setData	(\@xdata, \@ydata);

Error checking: The setData() method returns a postive integer on
success and 0 on	failure. If setData() fails, you can recover an	error
message about the failure with the error() method. The error string
returned	will either be one of

The data set	does not contain an equal number of x and y values.
The data element ...	is non-numeric.
The data set	must contain at	least three points.

In your script, you could test for errors like:

\$ls->setData	(\@data) or die( \$ls->error() );

In the current version, only numerals, decimal points (apologies	to
Europeans), scientific notation (1.7E10)	and minus signs	are permitted.
Currencies (\$), time (11:23am) or dates (23/05/98) are not yet
supported and will generate errors. I may figure	these out someday.

Perform the regression: regress()
\$ls->regress() or die ( \$ls->error()	);

This performs most of the calculations. Call this method	after setting
the data, but before asking for any regressions results.	If you change
your data, previous calculations	will generallly	be inaccurate, so you
should call this	method again. The regress() method returns 1 on
success,	The only error message is

No datset has been registered.

although	a number of undef results (due to divide by zero errors) may
be returned in specific statistics below.

Obtain regression results: coefficients(), rsq(), tstats(), etc.
my (\$intercept, \$slope) = \$ls->coefficients();
my \$R_squared = \$ls->rsq();
my (\$tstat_intercept, \$tstat_slope) = \$ls->tstats();
my \$sigma = \$ls->sigma();
my \$durbin_watson = \$ls->dw();

my \$sample_size = \$ls->size();
my (\$avX, \$avY) = \$ls->av();
my (\$varX, \$varY, \$covXY) = \$ls->var();
my (\$xmin, \$xmax, \$ymin, \$ymax) = \$ls->minMax();

Call these methods only after you have called regress().	 Most of these
should be familiar from any econometrics	text. If the slope is infinite
(variance of X is zero) it is set to undef. R-squared is	1.0 if the
sample variances	of either X or Y are zero (or the data are colinear).
If the variance of X is zero, both T statistics are set to undef. sigma
is an estimate of the homoscedastic standard deviation of the error
term, also known	as the standard	error of the estimate. The variances
use n-1.	 Durbin-Watson returns undef if	the data are colinear.

Obtain predicted or residual	data: predicted() and residuals()
my @predictedYs = \$ls->predicted();
my @residuals = \$ls->residuals();

Call these methods only after you have called regress().	 Both methods
return data arrays, in the same format you used in setData(). If	the
data was	passed to setData() as a reference to an @xydata array of the
form (x0, y0, ..., xn, yn), then	the results of these methods will be
of this same form, except that the y values will	either be the
predicted y based on the	coefficient estimates, or the residual error
of that predicted y from	the observed value of y.

If the data was passed as references to two arrays, @xdata = (x0	...
xn) and @ydata =	(y0 ...	yn), then the results of these two methods
will be a single	array of y type	data, either the predicted y or
residual	error. The original x data array will still correspond to
these result arrays.

BUGS AND TO DO
This module is beta code, so it is not guaranteed to work right.	 I
have not	exhaustively tested it.

Possible	future work includes support for other data formats, such as
date, time and currency.

Generalization to multiple regression is	probably not in	the cards,
since it	is more	than an	order of magnitude more	difficult. Better to
use something Fortran based or maybe the	Perl Data Language.

It would	make sense to fold this	into Statistics::Descriptive as	a more
comprehensive library, perhaps called "libstats". But that might	not
happen soon, since it sounds like a big project.

Comments	and bug	reports	are welcome.

AUTHOR
Copyright (c) 1998 by Sanford Morton, smorton@pobox.com.	 All rights
reserved.  This program is free software; you can redistribute it
and/or modify it	under the same terms as	Perl itself.

This work is dedicated to the memory of Dr. Andrew Morton, who
requested it.  Requiescat in pace, my friend.

The Statistics::Descriptive(1) module performs useful univariate
statistics and is well tested. The Perl Data Language (see CPAN)	may
also prove useful for statistics.

Simple linear regression	is discussed in	all econometrics and most
probablility and	statistics texts. I used E<Basic Econometrics> 2nd
ed., by Gujaratii, New York: McGraw-Hill,1988, for most of the formulas
and the test example (appendix 3A.6, page 87).

POD ERRORS
Hey! The	above document had some	coding errors, which are explained
below:

Around line 609:
Unknown E content in	E<Basic	Econometrics>

perl v5.24.1			  2000-10-13				OLS(3)
```

NAME | SYNOPSIS | DESCRIPTION | USAGE | BUGS AND TO DO | AUTHOR | SEE ALSO | POD ERRORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=Statistics::OLS&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>