Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
OLS(3)		      User Contributed Perl Documentation		OLS(3)

       Statistics::OLS - perform ordinary least	squares	and associated
       statistics, v 0.07.

	   use Statistics::OLS;

	   my $ls = Statistics::OLS->new();

	   $ls->setData	(\@xydataset) or die( $ls->error() );
	   $ls->setData	(\@xdataset, \@ydataset);


	   my ($intercept, $slope) = $ls->coefficients();
	   my $R_squared = $ls->rsq();
	   my ($tstat_intercept, $tstat_slope) = $ls->tstats();
	   my $sigma = $ls->sigma();
	   my $durbin_watson = $ls->dw();

	   my $sample_size = $ls->size();
	   my ($avX, $avY) = $ls->av();
	   my ($varX, $varY, $covXY) = $ls->var();
	   my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax();

	   # returned arrays are x-y or	y-only data
	   # depending on initial call to setData()
	   my @predictedYs = $ls->predicted();
	   my @residuals = $ls->residuals();

       I wrote Statistics::OLS to perform Ordinary Least Squares (linear curve
       fitting)	on two dimensional data: y = a + bx. The other simple
       statistical module I found on CPAN (Statistics::Descriptive) is
       designed	for univariate analysis. It accomodates	OLS, but somewhat
       inflexibly and without rich bivariate statistics. Nevertheless, it
       might make sense	to fold	OLS into that module or	a supermodule someday.

       Statistics::OLS computes	the estimated slope and	intercept of the
       regression line,	their T-statistics, R squared, standard	error of the
       regression and the Durbin-Watson	statistic. It can also return the

       It is pretty simple to do two dimensional least squares,	but much
       harder to do multiple regression, so OLS	is unlikely ever to work with
       multiple	independent variables.

       This is a beta code and has not been extensively	tested.	It has worked
       on a few	published datasets. Feedback is	welcome, particularly if you
       notice an error or try it with known results that are not reproduced

   Create a regression object: new()
	   use Statistics::OLS;
	   my $ls = Statistics::OLS->new;

   Register a dataset: setData()
	   $ls->setData	(\@xydata);
	   $ls->setData	(\@xdata, \@ydata);

       The setData() method registers a	two-dimensional	dataset	with the
       regression object.  You can pass	the dataset either as a	reference to
       one flat	array containing the paired x,y	data or	as references to two
       arrays, one each	for the	x and y	data. [In either case, the data	arrays
       in your script are not cached (copied into the object). If you alter
       your data, you may optionally call setData() again (if you want error
       checking--see below) but	you should at least call the regress() method
       (see below) to recompute	statistics for the new data. Or	more simply,
       do not alter your data.]

       As a single array, in your script, construct a flat array of the	form
       (x0, y0,	..., xn, yn) containing	n+1 x,y	data points.  Then pass	a
       reference to the	data array to the setData() method. (If	you do not
       know what a reference is, just put a backslash (\) in front of the name
       of your data array when you pass	it as an argument to setData().)  Like

	   my @xydata =	qw( -3 9   -2 4	  -1 1	 0 0   1 1  2 4	 3 9);
	   $ls->setData	(\@xydata);

       Or, you may find	it more	convenient to construct	two equal length
       arrays, one for the horizontal and one for the corresponding vertical
       data. Then pass references to both arrays (horizontal first) to

	   my @xdata = qw( -3  -2  -1  0  1  2	3 );
	   my @ydata = qw(  9	4   1  0  1  4	9 );
	   $ls->setData	(\@xdata, \@ydata);

       Error checking: The setData() method returns a postive integer on
       success and 0 on	failure. If setData() fails, you can recover an	error
       message about the failure with the error() method. The error string
       returned	will either be one of

	   The data set	does not contain an equal number of x and y values.
	   The data element ...	is non-numeric.
	   The data set	must contain at	least three points.

       In your script, you could test for errors like:

	   $ls->setData	(\@data) or die( $ls->error() );

       In the current version, only numerals, decimal points (apologies	to
       Europeans), scientific notation (1.7E10)	and minus signs	are permitted.
       Currencies ($), time (11:23am) or dates (23/05/98) are not yet
       supported and will generate errors. I may figure	these out someday.

   Perform the regression: regress()
	   $ls->regress() or die ( $ls->error()	);

       This performs most of the calculations. Call this method	after setting
       the data, but before asking for any regressions results.	If you change
       your data, previous calculations	will generallly	be inaccurate, so you
       should call this	method again. The regress() method returns 1 on
       success,	The only error message is

	   No datset has been registered.

       although	a number of undef results (due to divide by zero errors) may
       be returned in specific statistics below.

   Obtain regression results: coefficients(), rsq(), tstats(), etc.
	   my ($intercept, $slope) = $ls->coefficients();
	   my $R_squared = $ls->rsq();
	   my ($tstat_intercept, $tstat_slope) = $ls->tstats();
	   my $sigma = $ls->sigma();
	   my $durbin_watson = $ls->dw();

	   my $sample_size = $ls->size();
	   my ($avX, $avY) = $ls->av();
	   my ($varX, $varY, $covXY) = $ls->var();
	   my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax();

       Call these methods only after you have called regress().	 Most of these
       should be familiar from any econometrics	text. If the slope is infinite
       (variance of X is zero) it is set to undef. R-squared is	1.0 if the
       sample variances	of either X or Y are zero (or the data are colinear).
       If the variance of X is zero, both T statistics are set to undef. sigma
       is an estimate of the homoscedastic standard deviation of the error
       term, also known	as the standard	error of the estimate. The variances
       use n-1.	 Durbin-Watson returns undef if	the data are colinear.

   Obtain predicted or residual	data: predicted() and residuals()
	   my @predictedYs = $ls->predicted();
	   my @residuals = $ls->residuals();

       Call these methods only after you have called regress().	 Both methods
       return data arrays, in the same format you used in setData(). If	the
       data was	passed to setData() as a reference to an @xydata array of the
       form (x0, y0, ..., xn, yn), then	the results of these methods will be
       of this same form, except that the y values will	either be the
       predicted y based on the	coefficient estimates, or the residual error
       of that predicted y from	the observed value of y.

       If the data was passed as references to two arrays, @xdata = (x0	...
       xn) and @ydata =	(y0 ...	yn), then the results of these two methods
       will be a single	array of y type	data, either the predicted y or
       residual	error. The original x data array will still correspond to
       these result arrays.

       This module is beta code, so it is not guaranteed to work right.	 I
       have not	exhaustively tested it.

       Possible	future work includes support for other data formats, such as
       date, time and currency.

       Generalization to multiple regression is	probably not in	the cards,
       since it	is more	than an	order of magnitude more	difficult. Better to
       use something Fortran based or maybe the	Perl Data Language.

       It would	make sense to fold this	into Statistics::Descriptive as	a more
       comprehensive library, perhaps called "libstats". But that might	not
       happen soon, since it sounds like a big project.

       Comments	and bug	reports	are welcome.

       Copyright (c) 1998 by Sanford Morton,	 All rights
       reserved.  This program is free software; you can redistribute it
       and/or modify it	under the same terms as	Perl itself.

       This work is dedicated to the memory of Dr. Andrew Morton, who
       requested it.  Requiescat in pace, my friend.

       The Statistics::Descriptive(1) module performs useful univariate
       statistics and is well tested. The Perl Data Language (see CPAN)	may
       also prove useful for statistics.

       Simple linear regression	is discussed in	all econometrics and most
       probablility and	statistics texts. I used E<Basic Econometrics> 2nd
       ed., by Gujaratii, New York: McGraw-Hill,1988, for most of the formulas
       and the test example (appendix 3A.6, page 87).

       Hey! The	above document had some	coding errors, which are explained

       Around line 609:
	   Unknown E content in	E<Basic	Econometrics>

perl v5.24.1			  2000-10-13				OLS(3)


Want to link to this manual page? Use this URL:

home | help