Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Statistics::ChiSquare(User Contributed Perl DocumentatStatistics::ChiSquare(3)

       "Statistics::ChiSquare" - How well-distributed is your data?

	   use Statistics::ChiSquare;

	   print chisquare(@array_of_numbers);

       Statistics::ChiSquare is	available at a CPAN site near you.

       Suppose you flip	a coin 100 times, and it turns up heads	70 times.  Is
       the coin	fair?

       Suppose you roll	a die 100 times, and it	shows 30 sixes.	 Is the	die

       In statistics, the chi-square test calculates how well a	series of
       numbers fits a distribution.  In	this module, we	only test for whether
       results fit an even distribution.  It doesn't simply say	"yes" or "no".
       Instead,	it gives you a confidence interval, which sets upper and lower
       bounds on the likelihood	that the variation in your data	is due to
       chance.	See the	examples below.

       If you've ever studied elementary genetics, you've probably heard about
       Gregor Mendel.  He was a	wacky Austrian botanist	who discovered (in
       1865) that traits could be inherited in a predictable fashion.  He did
       lots of experiments with	cross breeding peas: green peas, yellow	peas,
       smooth peas, wrinkled peas.  A veritable	Brave New World	of legumes.

       But Mendel faked	his data.  A statistician by the name of R. A. Fisher
       used the	chi-square test	to prove it.

       There's just one	function in this module: chisquare().  Instead of
       returning the bounds on the confidence interval in a tidy little	two-
       element array, it returns an English string.  This was a	deliberate
       design choice---many people misinterpret	chi-square results, and	the
       string helps clarify the	meaning.

       The string returned by chisquare() will always match one	of these

	 "There's a >\d+% chance, and a	<\d+% chance, that this	data is	random."


	 "There's a <\d+% chance that this data	is random."


	 "I can't handle \d+ choices without a better table."

       That last one deserves a	bit more explanation.  The "modern" chi-square
       test uses a table of values (based on Pearson's approximation) to avoid
       expensive calculations.	Thanks to the table, the chisquare()
       calculation is very fast, but there are some collections	of data	it
       can't handle, including any collection with more	than 31	slots.	So you
       can't calculate the randomness of a 50-sided die.

       You will	also notice that the percentage	points that have been
       tabulated for different numbers of data points -	that is, for different
       degrees of freedom - differ.  The table in Jon Orwant's original
       version has data	tabulated for 100%, 99%, 95%, 90%, 70%,	50%, 30%, 10%,
       5%, and 1% likelihoods.	Data added later by David Cantrell is
       tabulated for 100%, 99%,	95%, 90%, 75%, 50%, 25%, 10%, 5%, and 1%

       Imagine a coin flipped 1000 times.  The expected	outcome	is 500 heads
       and 500 tails:

	 @coin = (500, 500);
	 print chisquare(@coin);

       prints "There's a >90% chance, and a <100% chance, that this data is

       Imagine a die rolled 60 times that shows	sixes just a wee bit too

	 @die1	= (8, 7, 9, 8, 8, 20);
	 print chisquare(@die1);

       prints "There's a >1% chance, and a <5% chance, that this data is

       Imagine a die rolled 600	times that shows sixes way too often.

	 @die2	= (80, 70, 90, 80, 80, 200);
	 print chisquare(@die2);

       prints "There's a <1% chance that this data is random."

       How random is rand()?

	 srand(time ^ $$);
	 @rands	= ();
	 for ($i = 0; $i < 60000; $i++)	{
	     $slot = int(rand(6));
	 print "@rands\n";
	 print chisquare(@rands);

       prints (on my machine)

	 10156 10041 9991 9868 10034 9910
	 There's a >10%	chance,	and a <50% chance, that	this data is random.

       So much for pseudorandom	number generation.

       Jon Orwant, Readable Publications, Inc;

       Maintained and updated since October 2003 by David Cantrell,

       This software is	free-as-in-speech software, and	may be used,
       distributed, and	modified under the terms of either the GNU General
       Public Licence version 2	or the Artistic	Licence. It's up to you	which
       one you use. The	full text of the licences can be found in the files
       GPL2.txt	and ARTISTIC.txt, respectively.

perl v5.24.1			  2013-09-19	      Statistics::ChiSquare(3)


Want to link to this manual page? Use this URL:

home | help