Sampling & Data
4 Frequency Distributions
Frequency
Twenty students were asked how many hours they worked per day. Their responses, in hours, are as follows: 5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3.
Table 1.9 lists the different data values in ascending order and their frequencies.
DATA VALUE | FREQUENCY |
---|---|
2 | 3 |
3 | 5 |
4 | 3 |
5 | 6 |
6 | 2 |
7 | 1 |
A frequency is the number of times a value of the data occurs. According to Table 1.9, there are three students who work two hours, five students who work three hours, and so on. The sum of the values in the frequency column, 20, represents the total number of students included in the sample.
A frequency distribution (also known as a grouped frequency distribution table or GFDT for short) is the table that lists the class (this is the “Data Value” column in Table 1.9) and the frequency. Each class can be a single value like in Table 1.9, or a range of values like in Table 1.12 below.
A relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. To find the relative frequencies, divide each frequency by the total number of students in the sample–in this case, 20. Relative frequencies can be written as fractions, percents, or decimals.
DATA VALUE | FREQUENCY | RELATIVE FREQUENCY |
---|---|---|
2 | 3 | $\frac{3}{20}$ or 0.15 |
3 | 5 | $\frac{5}{20}$ or 0.25 |
4 | 3 | $\frac{3}{20}$ or 0.15 |
5 | 6 | $\frac{6}{20}$ or 0.30 |
6 | 2 | $\frac{2}{20}$ or 0.10 |
7 | 1 | $\frac{1}{20}$ or 0.05 |
The sum of the values in the relative frequency column of Table 1.10 is $\frac{20}{20}$ , or 1.
Cumulative frequency is the accumulation of the previous frequencies. To find the cumulative frequencies, add all the previous frequencies to the frequency for the current row, as shown in Table 1.11a.
DATA VALUE | FREQUENCY | CUMULATIVE RELATIVE FREQUENCY |
---|---|---|
2 | 3 | 3 |
3 | 5 | 8 |
4 | 3 | 11 |
5 | 6 | 17 |
6 | 2 | 19 |
7 | 1 | 20 |
The last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated.
Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row, as shown in Table 1.11b.
DATA VALUE | FREQUENCY | RELATIVE FREQUENCY |
CUMULATIVE RELATIVE FREQUENCY |
---|---|---|---|
2 | 3 | $\frac{3}{20}$ or 0.15 | 0.15 |
3 | 5 | $\frac{5}{20}$ or 0.25 | 0.15 + 0.25 = 0.40 |
4 | 3 | $\frac{3}{20}$ or 0.15 | 0.40 + 0.15 = 0.55 |
5 | 6 | $\frac{6}{20}$ or 0.30 | 0.55 + 0.30 = 0.85 |
6 | 2 | $\frac{2}{20}$ or 0.10 | 0.85 + 0.10 = 0.95 |
7 | 1 | $\frac{1}{20}$ or 0.05 | 0.95 + 0.05 = 1.00 |
The last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated.
NOTE
Because of rounding, the relative frequency column may not always sum to one, and the last entry in the cumulative relative frequency column may not be one. However, they each should be close to one.
Example – Professional Soccer Players
Table 1.12 represents the heights, in inches, of a sample of 100 male semiprofessional soccer players.
HEIGHTS (INCHES) |
FREQUENCY | RELATIVE FREQUENCY |
CUMULATIVE FREQUENCY | CUMULATIVE RELATIVE FREQUENCY |
---|---|---|---|---|
60–61 | 5 | $\frac{5}{100}$ = 0.05 | 5 | 0.05 |
62–63 | 3 | $\frac{3}{100}$ = 0.03 | 8 | 0.05 + 0.03 = 0.08 |
64–65 | 15 | $\frac{15}{100}$ = 0.15 | 23 | 0.08 + 0.15 = 0.23 |
66–67 | 40 | $\frac{40}{100}$ = 0.40 | 63 | 0.23 + 0.40 = 0.63 |
68–69 | 17 | $\frac{17}{100}$ = 0.17 | 80 | 0.63 + 0.17 = 0.80 |
70–71 | 12 | $\frac{12}{100}$ = 0.12 | 92 | 0.80 + 0.12 = 0.92 |
72–73 | 7 | $\frac{7}{100}$ = 0.07 | 99 | 0.92 + 0.07 = 0.99 |
74–75 | 1 | $\frac{1}{100}$ = 0.01 | 100 | 0.99 + 0.01 = 1.00 |
Total = 100 | Total = 1.00 |
The data in this table have been grouped into the following intervals, which are the classes:
- 60 to 61 inches
- 62 to 63 inches
- 64 to 65 inches
- 66 to 67 inches
- 68 to 69 inches
- 70 to 71 inches
- 72 to 73 inches
- 74 to 75 inches
The lower class limit for each class is the starting point for each class, and the upper class limit for each class is the upper end of the range. So for example, from Table 1.12, the set of lower class limits are 60, 62, 64, 66, 68, 70, 72, & 74 and the set of upper class limits are 61, 63, 65, 67, 69, 71, 73, & 75.
The class boundaries are the values in the middle between the upper class limit of one class, and the lower class limit of the next class. For example, the class boundary between the first class and the second class in Table 1.12 is 61.5; we can get the class boundary mathematically by adding the upper class limit and the next lower class limit, and dividing the result by two (find the average or mean of the two numbers) $\frac{61+62}{2}=61.5$. Doing this for each class, we see the set of class boundaries is 61.5, 63.5, 65.5, 67.5, 69.5, 71.5, & 73.5
But wait, the class boundaries also include the boundaries before the first class, and after the last class. So if we continue the pattern from the class boundaries above, we get the actual list of class boundaries 59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5, 73.5, & 75.5 (the two added numbers are highlighted for emphasis).
The class width is the difference between one lower class limit and the next class’ lower class limit. In table 1.12, we can simply take the difference of the first two class’ lower class limits and find the class width is 62 – 60 = 2. The class width is the same for all the classes.
The class midpoint is the value in the middle of the class, which can be found by “averaging” a class’ lower class limit and it’s upper class limit. For example, for the frequency distribution that we’ve been working with here, we would find the first class midpoint as $\frac{60+61}{2}=60.5$. Similarly, the second class midpoint would be 62.5, and so on.
Building a Frequency Distribution
To build a Frequency Distribution, you are generally given two things: 1) a set of data which you are going to group into the frequency distribution, and 2) the number of classes two use. We will illustrate the process with an example.
Suppose you are working for an advertising agency and a client wants you to summarize how long potential customers on the internet watch their commercial before hitting the skip button. You collect the below data from 30 viewers which shows the number of seconds they watched the advertisement, and you figure with 30 data points, you should have 7 classes. Note: the number of classes will be given to you in any problems you work on in this class.
4 | 5 | 1 | 2 | 15 | 1 |
9 | 10 | 12 | 4 | 25 | 5 |
6 | 6 | 7 | 8 | 2 | 8 |
22 | 31 | 3 | 1 | 20 | 3 |
2 | 12 | 1 | 2 | 4 | 6 |
Table 1.13 Data from 30 viewers which shows the number of seconds of watching an advertisement
Calculate the Class Width
The first thing to do is to calculate the class width. To do this, we simply subtract the smallest number in our data set from the largest number in our dataset, and divide that by the number of classes.
$$\frac{\text{max – min}}{\text{num. classes}} = \text{class width (rounded up)}$$
Our class width is calculated as $\frac{31-1}{7} = 4.2857$. We always round this number up, so in this example, the class width is $5$.
Set the Lower Class Limits
Now we can start building our frequency distribution. The lower class limit of the first class is always the smallest number in the original dataset, which is 1 second in this example. From there, we add our class width to get each of the next lower class limits.
Class | Frequency |
1 – | |
6 – | |
11 – | |
16 – | |
21 – | |
26 – | |
31 – |
Table 1.13a A frequency distribution with just the lower class limits
Set the Upper Class Limits
Now we can determine the upper class limits. Since the original data in Table 1.13 has no decimal points, our upper class limits are simply 1 less than the next class’ lower class limit. So the upper class limit of the first class is $6 – 1 = 5$. From there we can keep adding the class width of 5 to get each of the subsequent upper class limits.
Class | Frequency |
1 – 5 | |
6 – 10 | |
11 – 15 | |
16 – 20 | |
21 – 25 | |
26 – 30 | |
31 – 35 |
Table 1.13b A frequency distribution with just the upper and lower class limits
Count the Frequencies
Now we simply count up how many data points from the original dataset fit into each class. So I see that there are 15 values between 1 & 5, so 15 is my frequency for the first class. We can fill out the rest of the table.
Class | Frequency |
1 – 5 | 15 |
6 – 10 | 8 |
11 – 15 | 3 |
16 – 20 | 1 |
21 – 25 | 2 |
26 – 30 | 0 |
31 – 35 | 1 |
Table 1.13c A frequency distribution, complete with the upper and lower class limits and frequencies.