SEEING STATISTICS
Gary McClelland
Department of Psychology
University of Colorado @ Boulder

Copyright 1996, Gary H. McClelland

IMPORTANT: Read copyright, fair use, caution, and disclaimer notice.


The Mean

There are five data values represented in the graph below. Our goal is to find one number to represent or model all the data values. Obviously, a number in the middle of the data values would be desirable. Hence, the number we select to represent all the data is called a measure of central tendency. But there are many possible choices; which one would be best? We need a criterion. One criterion, used in many statistical procedures, is to minimize the sum of squared errors. The procedure of minimizing the sum of squared errors is illustrated in the interactive graph below.

The Graph Exlained

The Goal

Find the estimate that minimizes the sum of squares.
Instructions

  • Use the mouse to either drag the blue bar or to click where you want to place the blue bar. The numerical value of the current model estimate, the location of the blue bar, is displayed above the graph.
  • Use the red error meter on the left to evaluate your new estimate. If the red bar is going down, your model estimate is getting better; if it is going up, your model estimate is getting worse. The numerical value of the sum of squared errors is displayed below the error meter.
  • When you think you have found the best estimate, click on the Display Mean? button to display the mean. The mean, or arithmetic average, of the data values is the model estimate that always gives the smallest sum of squared errors.
  • Click on the New Data button to repeat the procedure with other sets of data values. Try clicking directly where you think the mean would be.
  • Try more datasets until you are familiar with finding the model estimate which gives the smallest sum of squared errors.

  • Notes

  • The horizontal spacing of the data values is arbitrary and was chosen simply to enhance viewing the data values and their squared errors.
  • As we often do in practice, for each new data the model estimate starts at zero. The maximum error (when the error meter is all red) is sum of squared errors when the model estimate equals 0. If we can't do better than an estimate of zero, then we would be better off with an estimate of zero.
  • Note that when you move the blue bar in the area near the mean, the sum of squared errors doesn't change much. This implies that estimates in the neighborhood of the mean would do just about as well as the mean itself; hence, we shouldn't put much faith in a very precise estimate of the mean.
  • Squaring the errors has the effect of making the model estimate pay a bigger penalty for making a poor estimate (i.e., making a large error). For example, one error of 4 has a square of 16. This counts more than four errors of size 1, which would total only 4. If we were using line lengths as our measure of error error, one error of 4 would count the same as four errors of size 1.


    Seeing Statistics home page.

    Comments to: Gary.McClelland@Colorado.edu
    Go to: Gary McClelland's home page.

    Revised: 16 April 1996