Lesson 2
Describing Central Tendency, Variability and Skew

    Features in this lesson:
  1. Assignment 2
  2. Mean, median and mode defined
  3. Java demonstration of least-squares property of the mean
  4. Mean of combined groups
  5. Range, variance & standard deviation defined
  6. Skewness
  7. How skew affects mean and median
  8. Using Excel to calculate summary statistics

We will be using the Teacher Salary datafile again for this lesson. If you don't have a copy or have overwritten your old copy, you can download another here:

Reading Assignment

Please study carefully, the following Sections in the textbook: 4.1 - 4.15, 4.17; 5.1 - 5.2, 5.4 - 5.12.

Describing the Central Tendency of a Set of Scores

Two properties of distributions of measures are important to describe: their center and their spread.
Set A:

12,34,36,42,52
54,68,72,81,93
Set B:

152,154,155,155,156
158,159,161,163,163
  • Set A has greater spread (over 80 points from 12 to 93)
  • Set B has a higher center: the center of A is around 50, whereas the center of B is around 155

Measures of Center (Central Tendency)

        Properties of Mean and Median
The most important property of the mean and median is embodied in a simple example. Observe what happens to a set of five scores when the largest one is increased by several points:
Set A: 12, 13, 23, 32, 43
Mean = 24.6
Median = 23

Set A Altered: 12, 13, 23, 32, 143
Mean= 44.6
Median = 23

The Mean changes but the Median does not.

        The Mean of Combined Groups

The following situation arises not infrequently. One knows the mean of a group of some number of scores, call it Group A with n scores in it, and the mean of another group of scores, Group B, but the original scores are not in hand and one wishes to know the mean of both groups combined. For example, Crestwood school district issues a report in which it is stated that the average salary of its 36 "probationary" teachers is $24,560, and the average salary of its 215 "tenured" teachers is $38,630. You want to know the average teacher salary in the whole district (i.e., the average of the group of probationary and tenured teaches combined).
Notice right off that the average of the combined group is NOT ($24,560 + $38,630)/2, that is, it is NOT the average of the two averages. That would only be true if the two groups being combined had equal numbers of cases in them. As a general principle, the mean of the combined group will be closer to the mean of the larger (in terms of number of cases) group. So we know without even making any exact calculations that the mean teacher salary in Crestwood district will be closer to $38,630 than it will be to $24,560.
But here is how the exact calculations are made even when the original scores, all 36 + 215 = 251 of them, are not available for analysis:

The mean of Groups A & B combined will be the sum of the scores in Group A plus the sum of the scores in Group B divided by the number of scores in the combined group; symbolically, it looks like this:
Mean(A&B) = [(Sum of A) + (Sum of B)] / (na + nb)

Since Mean(A) = Sum (A) / na, then naMean(A) = Sum(A).

Consequently, Mean(A&B) = [naSum(A) + nbSum(B)] / (na + nb)

That's all there is to it. Multiply the number of cases times each group mean, add those two figures together and divide by the combined number of cases and you have the combined group mean.

Describing the Variability of a Set of Scores

There are a few common measures of variability of a distribution:

Modality and Skewness

Persons talking informally about distributions of scores commonly refer to two properties: modality and skewness. Modality refers to the number of modes a distribution has. If the histogram of the set of scores has "one hump," it is said to be unimodal; two humps, and it's bimodal. Truly bimodal distributions are seldom encountered.

A Bimodal Distribution

Hypothetical Data

Skewness refers to the asymmetry of a histogram. If the histogram is perfectly symmetrical around its middle, the it has "no skewness." If the histogram has a hump toward the left and the right-hand tail stretches out longer than the left-hand tail, then the distribution is said to be positively skewed. Like this one:

A Positively Skewed Distribution

The histogram above describes the 97 elementary school districts in Arizona in terms of the proportion of poor people in the school district boundaries.

Negative skewness is observed when the hump is to the right and the left-tail (toward the negative numbers) is elongated.

Locus of Control measures for a sample (n=600) of the High School and Beyond Survey.

A Negatively Skewed Distribution

There exists a summary statistic that measures the degree of skewness; it is not often reported, but merely inspected as to its algebraic sign to confirm an impression of either negative or positive skew from a histogram. It is roughly equivalent to an average of (standardized) third powers (cubes) of deviations of scores from the mean. Forget about it.

The Mean, the Median and Skewness

There is a relationship among the mean, the median and skewness that is important in descriptive statistics. To put it in common language, the mean is drawn in the direction of the skew more than is the median. That is, in a very positively skewed distribution, the mean will be higher than the median. In a very negatively skewed distribution, the mean will be lower than the median. The median is less affected by extreme scores in a distribution than is the mean. Recall the earlier example: when the largest of 5 scores is increased by several points the mean is drawn toward the elongated tail of the distribution:
Set A: 12, 13, 23, 32, 43
Mean = 24.6
Median = 23

Set A Altered: 12, 13, 23, 32, 143
Mean= 44.6
Median = 23

The Mean changes but the Median does not.

Because of this sensitivity of the mean to extreme scores, it is sometimes not favored for describing central tendency of very skewed dstributions. Often, distributions of financial statistics (income, poverty rates, expenditures and the like) are very skewed. One will find the median preferred for describing the centers of skewed distributions.

Exercises

You can use the online stats calculator to make some very quick calculations of means, medians, variances, standard deviations, and skewness so that you get a feel for what these summary statistics mean.

Online Stats Calculator
    Exercises
  1. Enter the following numbers into the online calculator and observe the mean, median, variance and standard deviation:

    12.3 21.4 34.5 32.8 42.3 18.6 25.2 28.3 27.1 24.3 31.7

  2. Here's the same set of scores as in #1 above except, the largest score, 42.3, has been increased to 68.2. What will happen to the mean and median of thisgroup of scores compared to the group of scores in #1?

    12.3 21.4 34.5 32.8 68.2 18.6 25.2 28.3 27.1 24.3 31.7

Using Excel to Calculate Summary Statistics

Fortunately, it is a whole lot easier to calculate things like means, medians, variances and standard deviations in Excel than it was to construct a frequency distribution.
    Calculating the Mean in Excel
  1. Suppose that the numbers whose mean you want are in Rows a3 through a156 of the spreadsheet.
  2. First, find an empty spot in the spreadsheet where you want the answer to appear and click on it, e.g., cell e5.
  3. Click on the Function icon (it looks like this, remember: fx).
  4. In the left box that appears, click on "Statistical." In the right hand box, click on "AVERAGE." Then click on Next at the bottom of the dialogue box.
  5. Then in the dialogue box that appears next, type this in the first window labeled number 1 fx: a3:a156 . (Obviously, if your data are in some other rows, enter the proper symbols, e.g., b1:b100). Finally, click on "Finish" at the bottom of the dialogue box.
  6. That's all; by now you should be seeing the mean of the numbers in Rows a3 through a156 in the cell at e5.

    Calculating the Standard Deviation in Excel
  • Do exactly as you did to calculate the Mean, only pick "STDEV" in the "Statistical" dialogue box instead of "AVERAGE."

And now, experiment with these other Statistical Functions that you'll find in your Excel spreadsheet:
  • COUNT
  • MAX
  • MEDIAN
  • MIN
  • MODE
  • PERCENTILE
  • SKEW
  • VAR
If the Excel program you are working on has the Tools ---> Data Analysis package installed on it, then you can get all these statistics in one fel swoop. Just select Tools ---> Data Analysis ---> Descriptive Statistics and when you reach the Descriptive Statistics dialgoue box, fill in the "Input Range" with the location of your scores (e.g., a3:a391) and be sure to check Summary Statistics at the very bottom of the box. That's all there is to it.


Assignment Two

Use this form to complete Assignment #2 and submit your work.

home     |     online calc.     |     lesson:   one     |     two     |     three     |     four     |     five     |     six