Testing Multiple Proportions
[latexpage]
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to draw a conclusion about whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence.
The expected value for each cell needs to be at least five in order for you to use this test.
Hypotheses
H0: The distributions of the two populations are the same.
H1: The distributions of the two populations are not the same.
Test Statistic
Use a $\chi^2$ test statistic. It is computed in the same way as the test for independence.
Degrees of Freedom (df)
df = number of columns – 1
Requirements
All values in the table must be greater than or equal to five.
Common Uses
Comparing two populations. For example: men vs. women, before vs. after, east vs. west. The variable is categorical with more than two possible response values.
Do male and female college students have the same distribution of living arrangements? Use a level of significance of 0.05. Suppose that 250 randomly selected male college students and 300 randomly selected female college students were asked about their living arrangements: dormitory, apartment, with parents, other. The results are shown in Table 10.19. Do male and female college students have the same distribution of living arrangements?
|
Dormitory |
Apartment |
With Parents |
Other |
Males |
72 |
84 |
49 |
45 |
Females |
91 |
86 |
88 |
35 |
Table 10.19 Distribution of Living Arragements for College Males and College Females
Solution 10.8
H0: The distribution of living arrangements for male college students is the same as the distribution of living arrangements for female college students.
H1: The distribution of living arrangements for male college students is not the same as the distribution of living arrangements for female college students.
Degrees of Freedom (df):
df = number of columns – 1 = 4 – 1 = 3
Distribution for the test:
$\chi_3^2$
Create the supporting tables
- Expand the given observed table to include the row totals and column totals.
|
Dormitory |
Apartment |
With Parents |
Other |
Row Totals
|
Males |
72 |
84 |
49 |
45 |
250 |
Females |
91 |
86 |
88 |
35 |
300 |
Col Totals
|
163 |
170 |
137 |
80 |
550 |
- Create the table of expected values using the same formula for each value as we did with the test for independence in the previous section.
$$E = \frac{(\text{row total})(\text{col total})}{\text{total num surveyed}}$$
74.091 |
77.273 |
62.273 |
36.364 |
88.909 |
92.727 |
74.727 |
43.636 |
- Create the $O-E$ table, subtracting the values in the second table from the values in the first table.
-2.091 |
6.727 |
-13.273 |
8.636 |
2.091 |
-6.727 |
13.273 |
-8.636 |
- Create the residuals $\frac{(O-E)^2}{E}$ table by squaring each value in the previous $O-E$ table (step 3) and dividing each value by the values in the expected value $E$ table (step 2).
0.059 |
0.586 |
2.829 |
2.051 |
0.049 |
0.488 |
2.358 |
1.709 |
Calculate the test statistic:
Add the values together in the $\frac{(O-E)^2}{E}$ table to get the test statistic.
χ2 = 10.129
Probability statement:
In the same way we did for a Test for Independence, you can use the
Chi Square Distribution table to get a range for the
p-value, or use the Google Sheets function CHISQ.DIST.RT, along with the test statistic
χ2 = 10.129 and the degrees of freedom
df = 3, to find the exact value.
=CHISQ.DIST.RT(10.129,3)
p-value = P(χ2 >10.129) = 0.0175
Compare α and the p-value: Since no α is given, assume α = 0.05. p-value = 0.0175. α > p-value.
Make a decision: Since α > p-value, reject H0. This means that the distributions are not the same.
Conclusion: At a 5% level of significance, from the data, there is sufficient evidence to conclude that the distributions of living arrangements for male and female college students are not the same.
Notice that the conclusion is only that the distributions are not the same. We cannot use the test for homogeneity to draw any conclusions about how they differ.
Just like with a Test for Independence, you can find a critical value instead of a p-value. Then you compare the critical value to the test statistic to decide whether or not to reject H0.
Do families and singles have the same distribution of cars? Use a level of significance of 0.05. Suppose that 100 randomly selected families and 200 randomly selected singles were asked what type of car they drove: sport, sedan, hatchback, truck, van/SUV. The results are shown in Table 10.20. Do families and singles have the same distribution of cars? Test at a level of significance of 0.05.
|
Sport |
Sedan |
Hatchback |
Truck |
Van/SUV |
Family |
5 |
15 |
35 |
17 |
28 |
Single |
45 |
65 |
37 |
46 |
7 |
Table 10.20
Both before and after a recent earthquake, surveys were conducted asking voters which of the three candidates they planned on voting for in the upcoming city council election. Has there been a change since the earthquake? Use a level of significance of 0.05. Table 10.21 shows the results of the survey. Has there been a change in the distribution of voter preferences since the earthquake?
|
Perez |
Chung |
Stevens |
Before |
167 |
128 |
135 |
After |
214 |
197 |
225 |
Table 10.21
Solution 10.9
H0: The distribution of voter preferences was the same before and after the earthquake.
H1: The distribution of voter preferences was not the same before and after the earthquake.
Degrees of Freedom (df):
df = number of columns – 1 = 3 – 1 = 2
Distribution for the test:
$\chi_2^2$
Create the supporting tables
- Expand the given observed table to include the row totals and column totals.
|
Perez |
Chung |
Stevens |
Row Totals
|
Before |
167 |
128 |
135 |
430 |
After |
214 |
197 |
225 |
636 |
Col Totals
|
381 |
325 |
360 |
1066 |
- Create the table of expected values using the same formula for each value as we did with the test for independence in the previous section.
$$E = \frac{(\text{row total})(\text{col total})}{\text{total num surveyed}}$$
153.687 |
131.098 |
145.216 |
227.313 |
193.902 |
214.784 |
- Create the $O-E$ table, subtracting the values in the second table from the values in the first table.
13.313 |
-3.098 |
-10.216 |
-13.313 |
3.098 |
10.216 |
- Create the residuals $\frac{(O-E)^2}{E}$ table by squaring each value in the previous $O-E$ table (step 3) and dividing each value by the values in the expected value $E$ table (step 2).
1.153 |
0.073 |
0.719 |
0.78 |
0.049 |
0.486 |
Calculate the test statistic:
χ2 = 3.26
Critical Value:
Look in the α = 0.05 column and the df = 2 row.
The critical value is 5.991
Compare the test statistic and the critical value:
3.26 < 5.991
Make a decision: Since the test statistic is less than the critical value, do not reject Ho.
Conclusion: At a 5% level of significance, from the data, there is insufficient evidence to conclude that the distribution of voter preferences was not the same before and after the earthquake.
Ivy League schools receive many applications, but only some can be accepted. At the schools listed in Table 10.22, two types of applications are accepted: regular and early decision.
Application Type Accepted |
Brown |
Columbia |
Cornell |
Dartmouth |
Penn |
Yale |
Regular |
2,115 |
1,792 |
5,306 |
1,734 |
2,685 |
1,245 |
Early Decision |
577 |
627 |
1,228 |
444 |
1,195 |
761 |
Table 10.22
We want to know if the number of regular applications accepted follows the same distribution as the number of early applications accepted. State the null and alternative hypotheses, the degrees of freedom and the test statistic, sketch the graph of the p-value, and draw a conclusion about the test of homogeneity.