# FreeBSD Manual Pages

OLS(3) User Contributed Perl Documentation OLS(3)NAMEStatistics::OLS - perform ordinary least squares and associated statistics, v 0.07.SYNOPSISuse Statistics::OLS; my $ls = Statistics::OLS->new(); $ls->setData (\@xydataset) or die( $ls->error() ); $ls->setData (\@xdataset, \@ydataset); $ls->regress(); my ($intercept, $slope) = $ls->coefficients(); my $R_squared = $ls->rsq(); my ($tstat_intercept, $tstat_slope) = $ls->tstats(); my $sigma = $ls->sigma(); my $durbin_watson = $ls->dw(); my $sample_size = $ls->size(); my ($avX, $avY) = $ls->av(); my ($varX, $varY, $covXY) = $ls->var(); my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax(); # returned arrays are x-y or y-only data # depending on initial call to setData() my @predictedYs = $ls->predicted(); my @residuals = $ls->residuals();DESCRIPTIONI wroteStatistics::OLSto perform Ordinary Least Squares (linear curve fitting) on two dimensional data: y = a + bx. The other simple statistical module I found on CPAN (Statistics::Descriptive) is designed for univariate analysis. It accomodates OLS, but somewhat inflexibly and without rich bivariate statistics. Nevertheless, it might make sense to fold OLS into that module or a supermodule someday.Statistics::OLScomputes the estimated slope and intercept of the regression line, their T-statistics, R squared, standard error of the regression and the Durbin-Watson statistic. It can also return the residuals. It is pretty simple to do two dimensional least squares, but much harder to do multiple regression, so OLS is unlikely ever to work with multiple independent variables. This is a beta code and has not been extensively tested. It has worked on a few published datasets. Feedback is welcome, particularly if you notice an error or try it with known results that are not reproduced correctly.USAGECreatearegressionobject:new()use Statistics::OLS; my $ls = Statistics::OLS->new;Registeradataset:setData()$ls->setData (\@xydata); $ls->setData (\@xdata, \@ydata); ThesetData()method registers a two-dimensional dataset with the regression object. You can pass the dataset either as a reference to one flat array containing the paired x,y data or as references to two arrays, one each for the x and y data. [In either case, the data arrays in your script are not cached (copied into the object). If you alter your data, you may optionally callsetData()again (if you want error checking--see below) but you should at least call theregress()method (see below) to recompute statistics for the new data. Or more simply, do not alter your data.] As a single array, in your script, construct a flat array of the form (x0, y0, ..., xn, yn) containing n+1 x,y data points. Then pass a reference to the data array to thesetData()method. (If you do not know what a reference is, just put a backslash (\) in front of the name of your data array when you pass it as an argument tosetData().) Like this: my @xydata = qw( -3 9 -2 4 -1 1 0 0 1 1 2 4 3 9); $ls->setData (\@xydata); Or, you may find it more convenient to construct two equal length arrays, one for the horizontal and one for the corresponding vertical data. Then pass references to both arrays (horizontal first) tosetData(): my @xdata = qw( -3 -2 -1 0 1 2 3 ); my @ydata = qw( 9 4 1 0 1 4 9 ); $ls->setData (\@xdata, \@ydata);Errorchecking:ThesetData()method returns a postive integer on success and 0 on failure. IfsetData()fails, you can recover an error message about the failure with theerror()method. The error string returned will either be one of The data set does not contain an equal number of x and y values. The data element ... is non-numeric. The data set must contain at least three points. In your script, you could test for errors like: $ls->setData (\@data) or die( $ls->error() ); In the current version, only numerals, decimal points (apologies to Europeans), scientific notation (1.7E10) and minus signs are permitted. Currencies ($), time (11:23am) or dates (23/05/98) are not yet supported and will generate errors. I may figure these out someday.Performtheregression:regress()$ls->regress() or die ( $ls->error() ); This performs most of the calculations. Call this method after setting the data, but before asking for any regressions results. If you change your data, previous calculations will generallly be inaccurate, so you should call this method again. Theregress()method returns 1 on success, The only error message is No datset has been registered. although a number of undef results (due to divide by zero errors) may be returned in specific statistics below.Obtainregressionresults:coefficients(),rsq(),tstats(),etc.my ($intercept, $slope) = $ls->coefficients(); my $R_squared = $ls->rsq(); my ($tstat_intercept, $tstat_slope) = $ls->tstats(); my $sigma = $ls->sigma(); my $durbin_watson = $ls->dw(); my $sample_size = $ls->size(); my ($avX, $avY) = $ls->av(); my ($varX, $varY, $covXY) = $ls->var(); my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax(); Call these methods only after you have calledregress(). Most of these should be familiar from any econometrics text. If the slope is infinite (variance of X is zero) it is set to undef. R-squared is 1.0 if the sample variances of either X or Y are zero (or the data are colinear). If the variance of X is zero, both T statistics are set to undef. sigma is an estimate of the homoscedastic standard deviation of the error term, also known as the standard error of the estimate. The variances use n-1. Durbin-Watson returns undef if the data are colinear.Obtainpredictedorresidualdata:predicted()andresiduals()my @predictedYs = $ls->predicted(); my @residuals = $ls->residuals(); Call these methods only after you have calledregress(). Both methods return data arrays, in the same format you used insetData(). If the data was passed tosetData()as a reference to an @xydata array of the form (x0, y0, ..., xn, yn), then the results of these methods will be of this same form, except that the y values will either be the predicted y based on the coefficient estimates, or the residual error of that predicted y from the observed value of y. If the data was passed as references to two arrays, @xdata = (x0 ... xn) and @ydata = (y0 ... yn), then the results of these two methods will be a single array of y type data, either the predicted y or residual error. The original x data array will still correspond to these result arrays.BUGS AND TO DOThis module is beta code, so it is not guaranteed to work right. I have not exhaustively tested it. Possible future work includes support for other data formats, such as date, time and currency. Generalization to multiple regression is probably not in the cards, since it is more than an order of magnitude more difficult. Better to use something Fortran based or maybe the Perl Data Language. It would make sense to fold this into Statistics::Descriptive as a more comprehensive library, perhaps called "libstats". But that might not happen soon, since it sounds like a big project. Comments and bug reports are welcome.AUTHORCopyright (c) 1998 by Sanford Morton, smorton@pobox.com. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This work is dedicated to the memory of Dr. Andrew Morton, who requested it.Requiescatinpace, my friend.SEE ALSOTheStatistics::Descriptive(1) module performs useful univariate statistics and is well tested. The Perl Data Language (see CPAN) may also prove useful for statistics. Simple linear regression is discussed in all econometrics and most probablility and statistics texts. I used E<Basic Econometrics> 2nd ed., by Gujaratii, New York: McGraw-Hill,1988, for most of the formulas and the test example (appendix 3A.6, page 87).POD ERRORSHey!Theabovedocumenthadsomecodingerrors,whichareexplainedbelow:Around line 609: Unknown E content in E<Basic Econometrics> perl v5.24.1 2000-10-13 OLS(3)

NAME | SYNOPSIS | DESCRIPTION | USAGE | BUGS AND TO DO | AUTHOR | SEE ALSO | POD ERRORS

Want to link to this manual page? Use this URL:

<https://www.freebsd.org/cgi/man.cgi?query=Statistics::OLS&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>