Lesson 5
Proportions & Contingencies

Downloading a New Datafile

Before we get started in learning somethings about the statistical methods for dealing with counts of "nominal data, we'll need a new set of data to practice on and use as an illustration. I have prepared an Excel worksheet that contains information on a nationally representative sample of 600 high-school seniors before and years after they graduated high school. This government funded research project was known as the "High School and Beyond" study. It was started in the 1960s and may still be going on today, for all I know. By clicking on the link immediately below, you should--assuming you are working on a machine properly configured--start a chain of events that will bring this new Excel data file from my server to your computer. If asked for a logon id, use "anonymous."
You need to SAVE a copy of the data file (its name when it arrives is something like hsb.xls for Windows and hsb.csv for Macintosh) on your diskette. The datafile has 600 persons (in rows) described by about 14 variables. The names of the variables appear in the spreadsheet. You may also see a description of the variables here.

A Note (Warning) to America Online Users

Please note: There is no textbook
reading assignment for this lesson.

Proportions

Statisticians use propotions to convey information about the occurrence of "nominal" characteristics.
    The Kinds of Questions Answered by Proportions:
  • What proportion of Arizona teachers belong to the AEA?
  • What is the proportion of high school graduates who immediately enroll in a community college?
(If you have never encountered the term "nominal measurement" before, take this mini-lesson on Scales of Measurement then come back.)
Proportions are really just like percents except that they range from 0 to 1.00 instead of 0 to 100. If the proportion of high school athletes who report using anabolic steroids is .053, the 5.3% of high school athletes say they use steroids. (Multiply a proportion by 100 to get a percent.)
The statistical definition of a proportion is simple: count the number of cases in the group of n who have the characteristic you are interested in--let's say that number turns out to be f (for "frequency")--and then divide f by n and you have the proportion sybolized by p:

p = f/n

Question: A sample of 200 eigth-grade students revealed 140 students who answered "Yes" to the questionnaire item "Is too much homework assigned to you?" and 60 who answered "No" to this question. What is the proportion of eigth-graders who responded that too much homework was being assigned?

n = 200 and f = 140 , so p = f/n = 140/200 = .70

Contingencies

Contingency table analysis is, perhaps, the most often used statistical technique in the social sciences and in researching the professions. When we studied the association between two variables that were measured on a numeric scale with ordinal or better properties, we spoke of the "correlation" between X and Y. But when we examine the association between nominally measured characteristics (e.g., gender and political affiliation), we speak of the "contingency' between the two characteristics. Contingencies are displayed in tables like the following, called, naturally enough "contingency tables":

  Political Affiliation
  Republican Democrat
Gender Male
55
41
Female
52
70

In this table, we see that in the sample studied, 55 persons are male and Republican, while 70 are Female and Democrat. It is generally more revealing to transform these frequencies into proportions (or percents), as in the following table:

Adults Classified by Gender & Political Affiliation
with Proportions by Rows

  Political Affiliation
  Republican Democrat
Gender Male
.57
.43
Female
.43
.57

Now we see that .57, or 57%, of Males are Republican, and .43, or 42%, of Females are Republican. (The .57 comes from the division of 55 by 96; the number of Male Republicans divided by the total number of Males.) Thus, a woman is more likely to be a Democrat than a man is likely to be a Democrat. Or, to put it slightly differently, "Males prefer Republican as a political affiliation more often than females do." Or, in the jargon of statistics, "There is a slight association between Gender and Political Affiliation with men more likely to be Republican than are women." (Notice how the proportions add to 1.0 across the rows in the above table.)
Suppose we had chosen to calculate the proportions in the above example by columns instead of by rows. The following table would have resulted.

Adults Classified by Gender & Political Affiliation
with Proportions by Columns

  Political Affiliation
  Republican Democrat
Gender Male
.51
.37
Female
.49
.63

Even though the two above tables contain different numbers, they are both true in what they report; they are simply reporting on different things. The table containing proportions by rows shows how males distribute themselves between the two major political parties, and then, in the second row, how females distribute themselves. The second table, showing proportions by columns, reports how Republicans are divided between males and females, then, in the second column, how Democrats are so divided. It is generally informative to report both sets of proportions (or percents).

Calculating Contingency Tables in Excel

Excel has a weird name for contingency tables; it calls them "pivot tables." To begin the construction of a pivot table, first click in an empty cell somewhere to the right of the data in your worksheet. Next, you'll find the Pivot Table option about half way down the list of menu items under the Data option on the top menu bar in Excel. When you take the Pivot Table option, you'll see Step 1 of $ of the Pivot Table Wizard; just click on Next at the bottom. You next see Step 2 that looks like this:

The above is the dialogue box for entering the location of the data that you will want to tabulate in a contingency table. The easiest thing to do is to enter a description of the location of all the data in your worksheet, e.g., for the High School & Beyond worksheet that you downloaded, the data occupy a rectangular area from Row 4, Column A to Row 603, Column O. Please Note: Excel uses your variable names as well in creating a contingency table; notice that the HSB variables names are in Row 3. So you must enter the following code into the dialogue box: a3:o603. Now click on the Next button.

Now you should see the dialog box below, called Step 3 of 4. This is the box where most of the action takes place.

The Step 3 dialogue box is where you choose which variable will constitute the rows of your contingency table and which variable will be the columns. Observe the labels for the variables off to the right side.

  • Click and drag a variable for rows--e.g., Sex. Drag the label to the left and drop it in the tall rectangle named ROW.
  • Now, click, drag and drop the label for School Ty(pe) into the horizontal rectangle named COLUMN.
  • Finally, click and drag either the Sex or the School Type label from the list of variables on the right and drop it into the center box named DATA.
  • If the label you have just dropped into the DATA box suddenly changes its name to "Count of School Type" or "Count of Sex," you are in good shape. If, however, its name becomes something like "Sum of School Type" or "Sum of Sex," you will need to double click that label and pick the Count option.
When the Step 3 dialogue box looks like the example below, you are ready to click on the Finish button:

Clicking on the Finish button in the Step 3 dialogue box will produce the following pivot table (contingency table) results:

Observe what we learn from this: 53 out of 327 females in the High School & Beyond sample attended private schools, whereas 41 out of 273 males went to private schools. This is very informative and would have taken a long time to tally by hand, but there is much more we can learn about this issue when we look at proportions or percents.
Let's click on another empty cell in the worksheet and select the Pivot Table option again from the Data menu. Now, after arranging the Step 3 dialogue box as above and before clicking on the Finish button, let's double click the small variabe label that reads "Count of School Type" (or "Count of Sex"). A dialogue like the one below will appear.

From among the five buttons on the right of the box below, click on the Options >> button. When you then click on the tiny arrow beside the box that reads "Show Data as:" you will see a list of choices. Pick the item that reads "% of column." Then following contingency table will result:

This contingency table is even more easily interpreted. Among all Private school students who finished high school, 56% are female; among all Public school students who finished high school, 54% are female. Thus, the percents of Private and Public school graduates who are female are nearly the same. In statistical jargon, we would say, "There is no association between Sex and School Type." (If the results had proven to be something like 70% of Private school grads are female and only 40% of Public school grads are female, we would have concluded that Sex and School Type are associated.)

One final word about contingency tables: we don't construct contingency tables from data that are measured like height and weight or age or the test score data in the High School & Beyond worksheet. Contingency tables describe the relationships between nominally measured variables.

Assignment 5

Use this form to complete Assignment #5 and submit your work.

home     |     online calc.     |     lesson:   one     |     two     |     three     |     four     |     five     |     six