Creating Frequency Distributions and Histograms in Excel

I mentioned that there are at least three ways of constructing a frequency distribution in Excel: the "brute force" method; the Analysis Pak Addin method; and the worst method, using the Excel Statistical Functions approach. Each method is discussed below.

The "Brute Force" Method

The first thing one must do with any method of constructing a frequency distribution is to decide on the classes into which the scores will be grouped. Suppose we want to build a frequency distribution and histogram for the teacher salaries at Desertview.
  1. First, get an idea of the range of the salaries by sorting the column and inspecting the bottom and the top salaries.
  2. Pick a salary just below the lowest salary at some nice round number like 15,000; then pick another nice round salary that just exceeds the highest salary, like maybe 65,000.
  3. Now write down classes from the lowest number to the highest in steps of a nice round size like 5,000 that will give you somewhere between 10 and 20 classes from the lowest to the highest salary (this is done in the textbook for a different example on pages 10 and 11): 15000-20000, 20001-25000, 25001-30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001-55000, 55001-60000, 60001-65000. These classes, or "bins," as they are sometimes called, will be the groups into which the data are sorted and the numbers of cases in each class counted.
    Now the easiest way to find the frequencies in each class is simply to sort the column of data from largest to smallest or vice versa and count the cases in each class by scrolling down the column and noting the row numbers as the scores change from one class to the next. For example, if the salary 20,100 resides in Row 50 of the spreadsheet and scrolling down the column you see 24,950 in Row 75 and 25,100 in Row 76, then you know that there are 26 salaries in the bin 20,001 - 25,000. Why 26? Because 75 - 50 is 25 but the 50th row is occupied by a score in the bin so add 1: 25 + 1 = 26. (Notice that 0, 1, 2, 3, 4, 5 are six things, not 5.)
    Continue counting cases in this manner until you reach the end of the collumn. Record the frequency of cases beside the bin description in some empty columns in the spreadsheet, and you will be ready to draw the histogram.

Creating Frequency Distributions & Histograms with the
Data Analysis Tool Pak Add-in

Excel comes with added features called "Add-ins." If you were fortunate and had these Add-ins copied onto your computer when Excel was installed, statistical analysis can be much easier. Even if the Add-ins were copied onto the computer's hard rive that you use, the Data Analysis Pak may still have to be "installed" before you can use it.

Installing the Data Analysis Add-ins for Excel

Statistical analyses are a lot simpler in Excel if the Data Analysis Tool is "added in" the spreadsheet. This only has to be done once and may already exist with the version of Excel that you are using. Or, you may have to do it yourself.
Excel has a menu called "Tools". To check whether the Data Analysis toolbox has already been loaded, select this menu and look for the menu item called Data Analysis. If it is there, continue to the next section.
If the Data Analysis Tool is not in the "Tools" menu, select Tools --> Add in... , which is about half way down the list. A window will open containing a list of toolboxes that you may add. Near the top of this list will be a two boxes called Analysis ToolPak and Analysis ToolPak-VBA. (On Macs, the menu has only one item called Analysis ToolPak, I believe.) Select both with the mouse and then click OK. After a few seconds, the computer will return you to the worksheet window and a new menu item called "Data Analysis" should be available at the bottom of the "Tools" menu.

With the Data Analysis Tool Pak installed, here's how to construct the frequency distribution and histogram.
  1. Put your data in a column of the spreadsheet. Write down the first and last cell containing data in this column (e.g., a3 to a389).
  2. In a new column, enter numbers that represent the upper boundaries of classes ("bins") into which the data are to be sorted and counted for the histogram. Write down the first and last cells in this column (e.g., e1 to e13).
  3. Choose "Tools --> Data Analysis". A window will appear with a list of many statistical analysis functions.
  4. Choose Histogram from this list. In the dialogue box that appears, enter the cells for the "Input Range" (the data) and for the "Bin Range" (the classes). Use a colon between the first and last cells (e.g., a3:a389 for the input range, and e1:e13 for the bin range).
  5. Choose "Output Range" and enter the cell in which you want the output to start, e.g., g1. The output of this command includes the list of frequencies with which data appear in the various classes and the histogram.
  6. Choose the "Chart Output" option at the bottom of the window and click on the "OK" button. The histogram should appear in the spreadsheet.
Here's a picture of the Excel screen for setting up Histograms in the Data Analysis Tool:

 

Constructing Frequency Distributions & Histograms with the Excel Statistical Analysis Functions

It's not really fair that one of the first things we try to do in Excel proves to be one of the most complicated; but life is not always fair, right? Well, constructing a grouped frequency distribution and histogram with Excel is pretty tough unless your version of Excel has the "Data Analysis Tool Pak" added in (see above). The appraoch that uses the built in Statistical Analysis Function called FREQUENCY is complex and difficult. After many attempts, I still have difficulty making it work. But, here goes.
  • Let's just assume that the teacher salaries for Desertview are in Column A in the spreadsheet from row 2 through row 389.
  • Now over in Column E, you should enter from Row 1 through Row 13 the UPPER LIMIT of the classes into which we are going to group the teacher salaries in Column A; that would be 10000, 15000, 20000, 25000, 30000, etc through Row 13 which would be 70000.
  • Now go to Column G, Row 1 and click in the cell and drag down so that you select 14 (not 13) rows. Release and the rectangle from G1 to G14 should be highlighted.
  • Next, click in the box for entering data just above the spreadsheet and you should see a symbol like this appear: fx. [Please note: if this symbol does not appear in the data entry box, you can always find it on the top menu bar.] This is the symbol for a spreadsheet "function." Click on it. In the dialogue box that appears, click on "Statistical" in the Function Category (left box) and "FREQUENCY" under Function Name (right box). Then click on the OK button at the bottom of the dialogue box.

  • The dialogue box that now appears has a place to enter the location of the "data array" (the scores we are going to count) and a place below it to enter the "bins array" (the limits of our classes). In the data_array box enter this: "a2:a389" (without any quotation marks; it stands for Col. A, Row 2 thru Col. A Row 389 where the data are.
  • In the box for the bin_array, enter the location of the upper limits of the score classes, e.g., e1:e13
  • Now comes the tricky part. Do NOT click on OK in the dialogue box. Before doing anything else, press simultaneously these three keys: Control, Shift and Enter. (As odd as this seems, it is necessary to calculate what is called a "function array" which is language peculiar to Excel and best left unspoken of.)
  • The counts of frequencies in the classes will appear in Column G. The first entry is the number of scores less than the upper limit of the first class; the second entry is the number of scores between the upper limit of the first class and the upper limit of the second class, and so on. If you managed to make this work on any one of your first half dozen tries, congratulations!

Drawing the Histogram

To construct the histogram, select (by clicking and dragging) the numbers in G1 through G13 (the frequencies in the classes or "bins") and then click on the symbol for the "Chart Wizard" at the top of the spreadsheet (it's to the right of the globe and looks like a tiny histogram). The charting function is not too difficult to figure out; play with it and see what you can produce.

Last But Not Least

Conrad Carlberg has prepared a very convenient tool for graphing frequency distributions. His Histogram Excel Worksheet will work with any data, and it automatically draws a graph of an ungrouped frequency distribution for the data that you copy into the first column. You can download Carlberg's Histogram Worksheet from this point. (Note: If your broswer warns you about downloading files with "embedded macros," tell it that you are not worried, the file comes from a trustworthy individual, and you'll be happy to have it in your machine.)

home     |     online calc.     |     lesson:   one     |     two     |     three     |     four     |     five     |     six