Confidence Intervals with Two Samples

39 Confidence Intervals for the Difference Between Two Means: Independent Samples


How can we tell if exercise is actually good for you? One indicator of good health is a low (but not too low) resting heart rate, so if people who regularly exercise have an average heart rate that is lower than those who do not regularly exercise, it may be taken as evidence that exercise improves a persons health.

Construct Confidence Intervals for the Difference Between Two Population Means

As in the situation above, suppose we want to show evidence that running as exercise leads to improved health. We first find a large sample of individuals to take part in this study, and then assign them to two groups. The first group is made up of participants who say they run for at least 30 minutes, at least 3 days per week. The second group is made up of everyone else, those who say they do not exercise for 30 minutes 3 days per week.

We take the symbol $\mu_1$ to be the mean heart rate of all people who run regularly (at least 30 minutes, at least 3 days per week), and we take $\mu_2$ to be the mean resting heart rate of all others who do not run regularly; these are the population means for the two populations. We are interested in creating a confidence interval to estimate $\mu_1 – \mu_2$ to estimate the difference in the mean heart rates of the two populations. To do this, as was the case with Confidence Intervals for single populations, we need a point estimate, a standard deviation, and a critical value.

The best point estimate for the difference between the two population means $\mu_1 – \mu_2$  is the difference between our sample means $x_1 – x_2$ from our observation.

Because the sample means $x_1$ and $x_2$ have variances $\sigma_1^2/n_1$ and $\sigma_2^2/n_2$ respectively where $n_1$ and $n_2$ are the sample sizes, the variance of the difference between $x_1 – x_2$ is the sum of the variances $\dfrac{\sigma_1^2}{n_1}+\dfrac{\sigma_2^2}{n_2}$. In general, if we don’t know the population mean (this is in essence what we are trying to estimate), we also don’t know the population variances $\sigma_1$ and $\sigma_2$, so we use the sample variances $s_1$ and $s_2$ as our best estimate. Thus the standard deviation is the square root of the variance, or

$\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}$

For our critical value, we use the same approach as we did for confidence intervals estimating a single population mean. We take $1-\alpha$ to be the confidence level, and then find the critical value $t_{\alpha/2}$ using the Student’s $t$ distribution. In order to find the critical value, we need the degrees of freedom so we use the method of taking 1 less than the smaller sample size.

Once have the each value above (the point estimate, the standard deviation, and the critical value), we calculate the Margin of Error using the formula $$E = t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$

Finally, we can construct the confidence interval: $$(\bar{x}_1-\bar{x}_2) – E<\mu_1-\mu_2<(\bar{x}_1-\bar{x}_2) + E$$

Procedure for Constructing a Confidence Interval for $\mu_1-\mu_2$ with Independent Samples

  1. Compute the sample means $\bar x_1$ and $\bar x_2$ (unless they are given). Then compute the point estimate $\bar x_1 – \bar x_2$.
  2. Calculate the degrees of freedom, which is the smaller of $n_1-1$ and $n_2 -1$ and then get the critical value $t_{\alpha/2}$
  3. Calculate the standard deviation $\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$
  4. Use the above values to find the margin of error $E = t_{\alpha/2}\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$
  5. Calculate the lower bound of the confidence interval $(\bar{x}_1-\bar{x}_2)  –  E$
  6. Calculate the upper bound of the confidence interval $(\bar{x}_1-\bar{x}_2) + E$

 

License