{"id":111,"date":"2021-01-12T22:19:54","date_gmt":"2021-01-12T22:19:54","guid":{"rendered":"https:\/\/textbooks.jaykesler.net\/introstats\/chapter\/testing-the-significance-of-the-correlation-coefficient\/"},"modified":"2023-08-04T18:18:40","modified_gmt":"2023-08-04T18:18:40","slug":"testing-the-significance-of-the-correlation-coefficient","status":"publish","type":"chapter","link":"https:\/\/textbooks.jaykesler.net\/introstats\/chapter\/testing-the-significance-of-the-correlation-coefficient\/","title":{"rendered":"Testing the Significance of the Correlation Coefficient"},"content":{"raw":"<span style=\"display: none;\">\r\n[latexpage]\r\n<\/span>\r\n<div id=\"0d551ce0-f934-4743-b7c1-b60fa4d19cbc\" class=\"chapter-content-module\" data-type=\"page\" data-cnxml-to-html-ver=\"2.1.0\">\r\n<p id=\"eip-298\">The correlation coefficient, <em data-effect=\"italics\">r<\/em>, tells us about the strength and direction of the linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient <em data-effect=\"italics\">r<\/em> and the sample size <em data-effect=\"italics\">n<\/em>, together.<\/p>\r\n<p id=\"element-3884\">We perform a hypothesis test of the <strong> \"significance of the correlation coefficient\" <\/strong> to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.<\/p>\r\n<p id=\"eip-413\">The sample data are used to compute <em data-effect=\"italics\">r<\/em>, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, <em data-effect=\"italics\">r<\/em>, is our estimate of the unknown population correlation coefficient.<\/p>\r\n\r\n<ul id=\"eip-560\" data-labeled-item=\"true\">\r\n \t<li>The symbol for the population correlation coefficient is <em data-effect=\"italics\">\u03c1<\/em>, the Greek letter \"rho.\"<\/li>\r\n \t<li><em data-effect=\"italics\">\u03c1<\/em> = population correlation coefficient (unknown)<\/li>\r\n \t<li><em data-effect=\"italics\">r<\/em> = sample correlation coefficient (known; calculated from sample data)<\/li>\r\n<\/ul>\r\n<p id=\"eip-137\">The hypothesis test lets us decide whether the value of the population correlation coefficient <em data-effect=\"italics\">\u03c1<\/em> is \"close to zero\" or \"significantly different from zero\". We decide this based on the sample correlation coefficient <em data-effect=\"italics\">r<\/em> and the sample size <em data-effect=\"italics\">n<\/em>.<\/p>\r\n<p id=\"fs-idm38597088\"><span data-type=\"title\">If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is \"significant.\"<\/span><\/p>\r\n\r\n<ul id=\"eip-9\">\r\n \t<li>Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> because the correlation coefficient is significantly different from zero.<\/li>\r\n \t<li>What the conclusion means: There is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. We can use the regression line to model the linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\r\n<\/ul>\r\n<p id=\"fs-idp40539792\"><span data-type=\"title\">If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is \"not significant\".<\/span><\/p>\r\n\r\n<ul id=\"eip-663\">\r\n \t<li>Conclusion: \"There is insufficient evidence to conclude that there is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> because the correlation coefficient is not significantly different from zero.\"<\/li>\r\n \t<li>What the conclusion means: There is not a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. Therefore, we CANNOT use the regression line to model a linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\r\n<\/ul>\r\n<div id=\"eip-939\" class=\"ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\" data-type=\"title\"><span id=\"1\" class=\"os-title-label\" data-type=\"\">Note<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"os-note-body\">\r\n<ul id=\"eip-id1164926616686\">\r\n \t<li>If <em data-effect=\"italics\">r<\/em> is significant and the scatter plot shows a linear trend, the line can be used to predict the value of\r\n<em data-effect=\"italics\">y<\/em> for values of <em data-effect=\"italics\">x<\/em> that are within the domain of observed <em data-effect=\"italics\">x<\/em> values.<\/li>\r\n \t<li>If <em data-effect=\"italics\">r<\/em> is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.<\/li>\r\n \t<li>If <em data-effect=\"italics\">r<\/em> is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed <em data-effect=\"italics\">x<\/em> values in the data.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/section><\/div>\r\n<section id=\"fs-idm115676240\" data-depth=\"1\">\r\n<h3 data-type=\"title\">PERFORMING THE HYPOTHESIS TEST<\/h3>\r\n<ul id=\"eip-375\">\r\n \t<li><strong> Null Hypothesis: <em data-effect=\"italics\">H<sub>0<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> = 0 <\/strong><\/li>\r\n \t<li><strong>Alternate Hypothesis: <em data-effect=\"italics\">H<sub>1<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> \u2260 0<\/strong><\/li>\r\n<\/ul>\r\n<p id=\"eip-422\"><span data-type=\"title\">WHAT THE HYPOTHESES MEAN IN WORDS:<\/span><\/p>\r\n\r\n<ul id=\"eip-761\">\r\n \t<li><strong>Null Hypothesis <em data-effect=\"italics\">H<sub>0<\/sub><\/em>:<\/strong> The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\r\n \t<li><strong>Alternate Hypothesis <em data-effect=\"italics\">H<sub>1<\/sub><\/em>:<\/strong> The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\r\n<\/ul>\r\n<p id=\"fs-idm162949504\"><span data-type=\"title\">DRAWING A CONCLUSION:<\/span><\/p>\r\nWe will use a <a href=\"\/introstats\/wp-content\/uploads\/sites\/2\/2021\/09\/Pearson-Correlation-Critical-Values.pdf\">table of critical values<\/a> to draw a conclusion about the test.\r\n<p id=\"eip-870\">In this chapter of this textbook, we will always use a significance level of 5%, <em data-effect=\"italics\">\u03b1<\/em> = 0.05, but in homework problems, it may be different.<\/p>\r\n\r\n<section id=\"fs-idp18638240\" data-depth=\"2\"><\/section><section id=\"fs-idp201258800\" data-depth=\"2\">\r\n<h4 data-type=\"title\">Using a table of Critical Values to make a decision<\/h4>\r\nCompare <em data-effect=\"italics\">r<\/em> to the appropriate critical value in the table. If <em data-effect=\"italics\">r<\/em> is not between the positive and negative critical values, then the correlation coefficient is significant. If <em data-effect=\"italics\">r<\/em> is significant, then you may want to use the line for prediction.\r\n<div id=\"element-684\" class=\"ui-has-child-title\" data-type=\"example\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.7<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"body\">\r\n<p id=\"element-798\">Suppose you computed <em data-effect=\"italics\">r<\/em> = 0.801 using <em data-effect=\"italics\">n<\/em> = 10 data points. <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> - 2 = 10 - 2 = 8. The critical values associated with <em data-effect=\"italics\">df<\/em> = 8 are -0.632 and + 0.632. If <em data-effect=\"italics\">r<\/em> &lt; negative critical value or <em data-effect=\"italics\">r<\/em> &gt; positive critical value, then <em data-effect=\"italics\">r<\/em> is significant. Since <em data-effect=\"italics\">r<\/em> = 0.801 and 0.801 &gt; 0.632, <em data-effect=\"italics\">r<\/em> is significant and the line may be used for prediction. If you view this example on a number line, it will help you.<\/p>\r\n\r\n<div id=\"id1165447908809\" class=\"os-figure\">\r\n<figure data-id=\"id1165447908809\"><span id=\"id1165447908813\" data-type=\"media\" data-alt=\"Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.\"><img class=\"alignnone size-full wp-image-597\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d.jpg\" alt=\"Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.\" width=\"731\" height=\"104\" \/><\/span><\/figure>\r\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.14<\/span> <span class=\"os-caption\"><em data-effect=\"italics\">r<\/em> is not significant between -0.632 and +0.632. <em data-effect=\"italics\">r<\/em> = 0.801 &gt; +0.632. Therefore, <em data-effect=\"italics\">r<\/em> is significant.<\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<div id=\"fs-idp24557360\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.7<\/span><\/h3>\r\n<\/header><section>\r\n<div id=\"eip-97\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"eip-646\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"eip-154\">For a given line of best fit, you computed that <em data-effect=\"italics\">r<\/em> = 0.6501 using <em data-effect=\"italics\">n<\/em> = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<div id=\"element-358\" class=\"ui-has-child-title\" data-type=\"example\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.8<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"body\">\r\n<p id=\"element-206\">Suppose you computed <em data-effect=\"italics\">r<\/em> = \u20130.624 with 14 data points. <em data-effect=\"italics\">df<\/em> = 14 \u2013 2 = 12. The critical values are \u20130.532 and 0.532. Since \u20130.624 &lt; \u20130.532, <em data-effect=\"italics\">r<\/em> is significant and the line can be used for prediction<\/p>\r\n\r\n<div id=\"id1165447908929\" class=\"os-figure\">\r\n<figure data-id=\"id1165447908929\"><span id=\"id1165447908933\" data-type=\"media\" data-alt=\"Horizontal number line with values of -0.624, -0.532, and 0.532.\"><img class=\"alignnone size-full wp-image-599\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334.jpg\" alt=\"Horizontal number line with values of -0.624, -0.532, and 0.532.\" width=\"731\" height=\"58\" \/><\/span><\/figure>\r\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.15<\/span> <span class=\"os-caption\">r = \u20130.624 &lt; -0.532. Therefore, <em data-effect=\"italics\">r<\/em> is significant.<\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<div id=\"fs-idp165498288\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.8<\/span><\/h3>\r\n<\/header><section>\r\n<div id=\"eip-847\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"eip-834\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"eip-262\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = 0.5204 using <em data-effect=\"italics\">n<\/em> = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<div id=\"element-719\" class=\"ui-has-child-title\" data-type=\"example\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.9<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"body\">\r\n<p id=\"element-446\">Suppose you computed <em data-effect=\"italics\">r<\/em> = 0.776 and <em data-effect=\"italics\">n<\/em> = 6. <em data-effect=\"italics\">df<\/em> = 6 \u2013 2 = 4. The critical values are \u20130.811 and 0.811. Since \u20130.811 &lt; 0.776 &lt; 0.811, <em data-effect=\"italics\">r<\/em> is not significant, and the line should not be used for prediction.<\/p>\r\n\r\n<div id=\"linrgs_facts5\" class=\"os-figure\">\r\n<figure data-id=\"linrgs_facts5\"><span id=\"id1165447912701\" data-type=\"media\" data-alt=\"Horizontal number line with values -0.924, -0.532, and 0.532.\"><img class=\"alignnone size-full wp-image-600\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde.jpg\" alt=\"Horizontal number line with values -0.924, -0.532, and 0.532.\" width=\"731\" height=\"52\" \/><\/span><\/figure>\r\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.16<\/span> <span class=\"os-caption\">-0.811 &lt; <em data-effect=\"italics\">r<\/em> = 0.776 &lt; 0.811. Therefore, <em data-effect=\"italics\">r<\/em> is not significant.<\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<div id=\"fs-idp121877392\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.9<\/span><\/h3>\r\n<\/header><section>\r\n<div id=\"eip-610\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"eip-276\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"eip-396\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = \u20130.7204 using <em data-effect=\"italics\">n<\/em> = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<\/section><\/section><section id=\"fs-idm89842304\" data-depth=\"1\">\r\n<h3 data-type=\"title\">THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method<\/h3>\r\n<p id=\"eip-666\">Consider the <a href=\"12-3-the-regression-equation#element-22\">third exam\/final exam example<\/a>.\r\nThe line of best fit is: <em data-effect=\"italics\">\u0177<\/em> = \u2013173.51+4.83<em data-effect=\"italics\">x<\/em> with <em data-effect=\"italics\">r<\/em> = 0.6631 and there are <em data-effect=\"italics\">n<\/em> = 11 data points. Can the regression line be used for prediction? <strong>Given a third-exam score (<em data-effect=\"italics\">x<\/em> value), can we use the line to predict the final exam score (predicted <em data-effect=\"italics\">y<\/em> value)?<\/strong><\/p>\r\n\r\n<ul id=\"eip-830\" data-labeled-item=\"true\">\r\n \t<li><em data-effect=\"italics\">H<sub>0<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> = 0<\/li>\r\n \t<li><em data-effect=\"italics\">H<sub>1<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> \u2260 0<\/li>\r\n \t<li><em data-effect=\"italics\">\u03b1<\/em> = 0.05<\/li>\r\n<\/ul>\r\n<ul id=\"eip-557\" data-bullet-style=\"bullet\">\r\n \t<li>Use the \"95% Critical Value\" table for <em data-effect=\"italics\">r<\/em> with <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 11 \u2013 2 = 9.<\/li>\r\n \t<li>The critical values are \u20130.602 and +0.602<\/li>\r\n \t<li>Since 0.6631 &gt; 0.602, <em data-effect=\"italics\">r<\/em> is significant.<\/li>\r\n \t<li>Decision: Reject the null hypothesis.<\/li>\r\n \t<li>Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (<em data-effect=\"italics\">x<\/em>) and the final exam score (<em data-effect=\"italics\">y<\/em>) because the correlation coefficient is significantly different from zero.<\/li>\r\n<\/ul>\r\n<p id=\"eip-138\"><strong>Because <em data-effect=\"italics\">r<\/em> is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.<\/strong><\/p>\r\n\r\n<div id=\"element-433\" class=\"ui-has-child-title\" data-type=\"example\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.10<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"body\">\r\n<p id=\"element-294\">Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if <em data-effect=\"italics\">r<\/em> is significant and the line of best fit associated with each <em data-effect=\"italics\">r<\/em> can be used to predict a <em data-effect=\"italics\">y<\/em> value. If it helps, draw a number line.<\/p>\r\n\r\n<ol id=\"element-467\" type=\"a\">\r\n \t<li><em data-effect=\"italics\">r<\/em> = \u20130.567 and the sample size, <em data-effect=\"italics\">n<\/em>, is 19. The <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 17. The critical value is \u20130.456. \u20130.567 &lt; \u20130.456 so <em data-effect=\"italics\">r<\/em> is significant.<\/li>\r\n \t<li><em data-effect=\"italics\">r<\/em> = 0.708 and the sample size, <em data-effect=\"italics\">n<\/em>, is nine. The <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 7. The critical value is 0.666. 0.708 &gt; 0.666 so <em data-effect=\"italics\">r<\/em> is significant.<\/li>\r\n \t<li><em data-effect=\"italics\">r<\/em> = 0.134 and the sample size, <em data-effect=\"italics\">n<\/em>, is 14. The <em data-effect=\"italics\">df<\/em> = 14 \u2013 2 = 12. The critical value is 0.532. 0.134 is between \u20130.532 and 0.532 so <em data-effect=\"italics\">r<\/em> is not significant.<\/li>\r\n \t<li><em data-effect=\"italics\">r<\/em> = 0 and the sample size, <em data-effect=\"italics\">n<\/em>, is five. No matter what the dfs are, <em data-effect=\"italics\">r<\/em> = 0 is between the two critical values so <em data-effect=\"italics\">r<\/em> is not significant.<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/section><\/div>\r\n<div id=\"fs-idp56886288\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.10<\/span><\/h3>\r\n<\/header><section>\r\n<div id=\"eip-316\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"eip-112\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"eip-483\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = 0 using <em data-effect=\"italics\">n<\/em> = 100 data points. Can the line be used for prediction? Why or why not?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<\/section><section id=\"eip-570\" data-depth=\"1\">\r\n<h3 data-type=\"title\">Assumptions in Testing the Significance of the Correlation Coefficient<\/h3>\r\n<p id=\"eip-485\">Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/p>\r\n<p id=\"eip-953\">The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.<\/p>\r\n\r\n<div id=\"eip-873\" data-type=\"list\">\r\n<div id=\"14\" data-type=\"title\">The assumptions underlying the test of significance are:<\/div>\r\n<ul>\r\n \t<li>There is a linear relationship in the population that models the average value of <em data-effect=\"italics\">y<\/em> for varying values of <em data-effect=\"italics\">x<\/em>. In other words, the expected value of <em data-effect=\"italics\">y<\/em> for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)<\/li>\r\n \t<li>The <em data-effect=\"italics\">y<\/em> values for any particular <em data-effect=\"italics\">x<\/em> value are normally distributed about the line. This implies that there are more <em data-effect=\"italics\">y<\/em> values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of <em data-effect=\"italics\">y<\/em> values lie on the line.<\/li>\r\n \t<li>The standard deviations of the population <em data-effect=\"italics\">y<\/em> values about the line are equal for each value of <em data-effect=\"italics\">x<\/em>. In other words, each of these normal distributions of <em data-effect=\"italics\">y<\/em> values has the same shape and spread about the line.<\/li>\r\n \t<li>The residual errors are mutually independent (no pattern).<\/li>\r\n \t<li>The data are produced from a well-designed, random sample or randomized experiment.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<div id=\"linrgs_facts_normal\" class=\"os-figure\">\r\n<figure data-id=\"linrgs_facts_normal\"><span id=\"id9999999999999\" data-type=\"media\" data-alt=\"The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line \u2014 they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.\">\r\n<img class=\"alignnone size-full wp-image-601\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab.jpg\" alt=\"The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line \u2014 they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.\" width=\"731\" height=\"362\" \/>\r\n<\/span><\/figure>\r\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.17<\/span> <span class=\"os-caption\">The <em data-effect=\"italics\">y<\/em> values for each <em data-effect=\"italics\">x<\/em> value are normally distributed about the line with the same standard deviation. For each <em data-effect=\"italics\">x<\/em> value, the mean of the <em data-effect=\"italics\">y<\/em> values lies on the regression line. More <em data-effect=\"italics\">y<\/em> values lie near the line than are scattered further away from the line.\r\n<\/span><\/div>\r\n<\/div>\r\n<\/section><\/div>","rendered":"<p><span style=\"display: none;\"><br \/>\n[latexpage]<br \/>\n<\/span><\/p>\n<div id=\"0d551ce0-f934-4743-b7c1-b60fa4d19cbc\" class=\"chapter-content-module\" data-type=\"page\" data-cnxml-to-html-ver=\"2.1.0\">\n<p id=\"eip-298\">The correlation coefficient, <em data-effect=\"italics\">r<\/em>, tells us about the strength and direction of the linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. However, the reliability of the linear model also depends on how many observed data points are in the sample. We need to look at both the value of the correlation coefficient <em data-effect=\"italics\">r<\/em> and the sample size <em data-effect=\"italics\">n<\/em>, together.<\/p>\n<p id=\"element-3884\">We perform a hypothesis test of the <strong> &#8220;significance of the correlation coefficient&#8221; <\/strong> to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.<\/p>\n<p id=\"eip-413\">The sample data are used to compute <em data-effect=\"italics\">r<\/em>, the correlation coefficient for the sample. If we had data for the entire population, we could find the population correlation coefficient. But because we have only sample data, we cannot calculate the population correlation coefficient. The sample correlation coefficient, <em data-effect=\"italics\">r<\/em>, is our estimate of the unknown population correlation coefficient.<\/p>\n<ul id=\"eip-560\" data-labeled-item=\"true\">\n<li>The symbol for the population correlation coefficient is <em data-effect=\"italics\">\u03c1<\/em>, the Greek letter &#8220;rho.&#8221;<\/li>\n<li><em data-effect=\"italics\">\u03c1<\/em> = population correlation coefficient (unknown)<\/li>\n<li><em data-effect=\"italics\">r<\/em> = sample correlation coefficient (known; calculated from sample data)<\/li>\n<\/ul>\n<p id=\"eip-137\">The hypothesis test lets us decide whether the value of the population correlation coefficient <em data-effect=\"italics\">\u03c1<\/em> is &#8220;close to zero&#8221; or &#8220;significantly different from zero&#8221;. We decide this based on the sample correlation coefficient <em data-effect=\"italics\">r<\/em> and the sample size <em data-effect=\"italics\">n<\/em>.<\/p>\n<p id=\"fs-idm38597088\"><span data-type=\"title\">If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is &#8220;significant.&#8221;<\/span><\/p>\n<ul id=\"eip-9\">\n<li>Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> because the correlation coefficient is significantly different from zero.<\/li>\n<li>What the conclusion means: There is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. We can use the regression line to model the linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\n<\/ul>\n<p id=\"fs-idp40539792\"><span data-type=\"title\">If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is &#8220;not significant&#8221;.<\/span><\/p>\n<ul id=\"eip-663\">\n<li>Conclusion: &#8220;There is insufficient evidence to conclude that there is a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> because the correlation coefficient is not significantly different from zero.&#8221;<\/li>\n<li>What the conclusion means: There is not a significant linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em>. Therefore, we CANNOT use the regression line to model a linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\n<\/ul>\n<div id=\"eip-939\" class=\"ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\" data-type=\"title\"><span id=\"1\" class=\"os-title-label\" data-type=\"\">Note<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"os-note-body\">\n<ul id=\"eip-id1164926616686\">\n<li>If <em data-effect=\"italics\">r<\/em> is significant and the scatter plot shows a linear trend, the line can be used to predict the value of<br \/>\n<em data-effect=\"italics\">y<\/em> for values of <em data-effect=\"italics\">x<\/em> that are within the domain of observed <em data-effect=\"italics\">x<\/em> values.<\/li>\n<li>If <em data-effect=\"italics\">r<\/em> is not significant OR if the scatter plot does not show a linear trend, the line should not be used for prediction.<\/li>\n<li>If <em data-effect=\"italics\">r<\/em> is significant and if the scatter plot shows a linear trend, the line may NOT be appropriate or reliable for prediction OUTSIDE the domain of observed <em data-effect=\"italics\">x<\/em> values in the data.<\/li>\n<\/ul>\n<\/div>\n<\/section>\n<\/div>\n<section id=\"fs-idm115676240\" data-depth=\"1\">\n<h3 data-type=\"title\">PERFORMING THE HYPOTHESIS TEST<\/h3>\n<ul id=\"eip-375\">\n<li><strong> Null Hypothesis: <em data-effect=\"italics\">H<sub>0<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> = 0 <\/strong><\/li>\n<li><strong>Alternate Hypothesis: <em data-effect=\"italics\">H<sub>1<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> \u2260 0<\/strong><\/li>\n<\/ul>\n<p id=\"eip-422\"><span data-type=\"title\">WHAT THE HYPOTHESES MEAN IN WORDS:<\/span><\/p>\n<ul id=\"eip-761\">\n<li><strong>Null Hypothesis <em data-effect=\"italics\">H<sub>0<\/sub><\/em>:<\/strong> The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\n<li><strong>Alternate Hypothesis <em data-effect=\"italics\">H<sub>1<\/sub><\/em>:<\/strong> The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/li>\n<\/ul>\n<p id=\"fs-idm162949504\"><span data-type=\"title\">DRAWING A CONCLUSION:<\/span><\/p>\n<p>We will use a <a href=\"\/introstats\/wp-content\/uploads\/sites\/2\/2021\/09\/Pearson-Correlation-Critical-Values.pdf\">table of critical values<\/a> to draw a conclusion about the test.<\/p>\n<p id=\"eip-870\">In this chapter of this textbook, we will always use a significance level of 5%, <em data-effect=\"italics\">\u03b1<\/em> = 0.05, but in homework problems, it may be different.<\/p>\n<section id=\"fs-idp18638240\" data-depth=\"2\"><\/section>\n<section id=\"fs-idp201258800\" data-depth=\"2\">\n<h4 data-type=\"title\">Using a table of Critical Values to make a decision<\/h4>\n<p>Compare <em data-effect=\"italics\">r<\/em> to the appropriate critical value in the table. If <em data-effect=\"italics\">r<\/em> is not between the positive and negative critical values, then the correlation coefficient is significant. If <em data-effect=\"italics\">r<\/em> is significant, then you may want to use the line for prediction.<\/p>\n<div id=\"element-684\" class=\"ui-has-child-title\" data-type=\"example\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.7<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"body\">\n<p id=\"element-798\">Suppose you computed <em data-effect=\"italics\">r<\/em> = 0.801 using <em data-effect=\"italics\">n<\/em> = 10 data points. <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> &#8211; 2 = 10 &#8211; 2 = 8. The critical values associated with <em data-effect=\"italics\">df<\/em> = 8 are -0.632 and + 0.632. If <em data-effect=\"italics\">r<\/em> &lt; negative critical value or <em data-effect=\"italics\">r<\/em> &gt; positive critical value, then <em data-effect=\"italics\">r<\/em> is significant. Since <em data-effect=\"italics\">r<\/em> = 0.801 and 0.801 &gt; 0.632, <em data-effect=\"italics\">r<\/em> is significant and the line may be used for prediction. If you view this example on a number line, it will help you.<\/p>\n<div id=\"id1165447908809\" class=\"os-figure\">\n<figure data-id=\"id1165447908809\"><span id=\"id1165447908813\" data-type=\"media\" data-alt=\"Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-597\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d.jpg\" alt=\"Horizontal number line with values of -1, -0.632, 0, 0.632, 0.801, and 1. A dashed line above values -0.632, 0, and 0.632 indicates not significant values.\" width=\"731\" height=\"104\" srcset=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d.jpg 731w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d-300x43.jpg 300w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d-65x9.jpg 65w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d-225x32.jpg 225w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/b13b338698cdb76c5de67a1a677fc7fd93ec542d-350x50.jpg 350w\" sizes=\"auto, (max-width: 731px) 100vw, 731px\" \/><\/span><\/figure>\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.14<\/span> <span class=\"os-caption\"><em data-effect=\"italics\">r<\/em> is not significant between -0.632 and +0.632. <em data-effect=\"italics\">r<\/em> = 0.801 &gt; +0.632. Therefore, <em data-effect=\"italics\">r<\/em> is significant.<\/span><\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"fs-idp24557360\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.7<\/span><\/h3>\n<\/header>\n<section>\n<div id=\"eip-97\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"eip-646\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"eip-154\">For a given line of best fit, you computed that <em data-effect=\"italics\">r<\/em> = 0.6501 using <em data-effect=\"italics\">n<\/em> = 12 data points and the critical value is 0.576. Can the line be used for prediction? Why or why not?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"element-358\" class=\"ui-has-child-title\" data-type=\"example\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.8<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"body\">\n<p id=\"element-206\">Suppose you computed <em data-effect=\"italics\">r<\/em> = \u20130.624 with 14 data points. <em data-effect=\"italics\">df<\/em> = 14 \u2013 2 = 12. The critical values are \u20130.532 and 0.532. Since \u20130.624 &lt; \u20130.532, <em data-effect=\"italics\">r<\/em> is significant and the line can be used for prediction<\/p>\n<div id=\"id1165447908929\" class=\"os-figure\">\n<figure data-id=\"id1165447908929\"><span id=\"id1165447908933\" data-type=\"media\" data-alt=\"Horizontal number line with values of -0.624, -0.532, and 0.532.\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-599\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334.jpg\" alt=\"Horizontal number line with values of -0.624, -0.532, and 0.532.\" width=\"731\" height=\"58\" srcset=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334.jpg 731w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334-300x24.jpg 300w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334-65x5.jpg 65w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334-225x18.jpg 225w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/009f8216453749c0e9378d40bdedd6430ad56334-350x28.jpg 350w\" sizes=\"auto, (max-width: 731px) 100vw, 731px\" \/><\/span><\/figure>\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.15<\/span> <span class=\"os-caption\">r = \u20130.624 &lt; -0.532. Therefore, <em data-effect=\"italics\">r<\/em> is significant.<\/span><\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"fs-idp165498288\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.8<\/span><\/h3>\n<\/header>\n<section>\n<div id=\"eip-847\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"eip-834\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"eip-262\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = 0.5204 using <em data-effect=\"italics\">n<\/em> = 9 data points, and the critical value is 0.666. Can the line be used for prediction? Why or why not?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"element-719\" class=\"ui-has-child-title\" data-type=\"example\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.9<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"body\">\n<p id=\"element-446\">Suppose you computed <em data-effect=\"italics\">r<\/em> = 0.776 and <em data-effect=\"italics\">n<\/em> = 6. <em data-effect=\"italics\">df<\/em> = 6 \u2013 2 = 4. The critical values are \u20130.811 and 0.811. Since \u20130.811 &lt; 0.776 &lt; 0.811, <em data-effect=\"italics\">r<\/em> is not significant, and the line should not be used for prediction.<\/p>\n<div id=\"linrgs_facts5\" class=\"os-figure\">\n<figure data-id=\"linrgs_facts5\"><span id=\"id1165447912701\" data-type=\"media\" data-alt=\"Horizontal number line with values -0.924, -0.532, and 0.532.\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-600\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde.jpg\" alt=\"Horizontal number line with values -0.924, -0.532, and 0.532.\" width=\"731\" height=\"52\" srcset=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde.jpg 731w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde-300x21.jpg 300w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde-65x5.jpg 65w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde-225x16.jpg 225w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/1bab12a894f1088eca8b9ef9185221bf46a06dde-350x25.jpg 350w\" sizes=\"auto, (max-width: 731px) 100vw, 731px\" \/><\/span><\/figure>\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.16<\/span> <span class=\"os-caption\">-0.811 &lt; <em data-effect=\"italics\">r<\/em> = 0.776 &lt; 0.811. Therefore, <em data-effect=\"italics\">r<\/em> is not significant.<\/span><\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"fs-idp121877392\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.9<\/span><\/h3>\n<\/header>\n<section>\n<div id=\"eip-610\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"eip-276\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"eip-396\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = \u20130.7204 using <em data-effect=\"italics\">n<\/em> = 8 data points, and the critical value is = 0.707. Can the line be used for prediction? Why or why not?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/section>\n<section id=\"fs-idm89842304\" data-depth=\"1\">\n<h3 data-type=\"title\">THIRD-EXAM vs FINAL-EXAM EXAMPLE: critical value method<\/h3>\n<p id=\"eip-666\">Consider the <a href=\"12-3-the-regression-equation#element-22\">third exam\/final exam example<\/a>.<br \/>\nThe line of best fit is: <em data-effect=\"italics\">\u0177<\/em> = \u2013173.51+4.83<em data-effect=\"italics\">x<\/em> with <em data-effect=\"italics\">r<\/em> = 0.6631 and there are <em data-effect=\"italics\">n<\/em> = 11 data points. Can the regression line be used for prediction? <strong>Given a third-exam score (<em data-effect=\"italics\">x<\/em> value), can we use the line to predict the final exam score (predicted <em data-effect=\"italics\">y<\/em> value)?<\/strong><\/p>\n<ul id=\"eip-830\" data-labeled-item=\"true\">\n<li><em data-effect=\"italics\">H<sub>0<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> = 0<\/li>\n<li><em data-effect=\"italics\">H<sub>1<\/sub><\/em>: <em data-effect=\"italics\">\u03c1<\/em> \u2260 0<\/li>\n<li><em data-effect=\"italics\">\u03b1<\/em> = 0.05<\/li>\n<\/ul>\n<ul id=\"eip-557\" data-bullet-style=\"bullet\">\n<li>Use the &#8220;95% Critical Value&#8221; table for <em data-effect=\"italics\">r<\/em> with <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 11 \u2013 2 = 9.<\/li>\n<li>The critical values are \u20130.602 and +0.602<\/li>\n<li>Since 0.6631 &gt; 0.602, <em data-effect=\"italics\">r<\/em> is significant.<\/li>\n<li>Decision: Reject the null hypothesis.<\/li>\n<li>Conclusion:There is sufficient evidence to conclude that there is a significant linear relationship between the third exam score (<em data-effect=\"italics\">x<\/em>) and the final exam score (<em data-effect=\"italics\">y<\/em>) because the correlation coefficient is significantly different from zero.<\/li>\n<\/ul>\n<p id=\"eip-138\"><strong>Because <em data-effect=\"italics\">r<\/em> is significant and the scatter plot shows a linear trend, the regression line can be used to predict final exam scores.<\/strong><\/p>\n<div id=\"element-433\" class=\"ui-has-child-title\" data-type=\"example\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.10<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"body\">\n<p id=\"element-294\">Suppose you computed the following correlation coefficients. Using the table at the end of the chapter, determine if <em data-effect=\"italics\">r<\/em> is significant and the line of best fit associated with each <em data-effect=\"italics\">r<\/em> can be used to predict a <em data-effect=\"italics\">y<\/em> value. If it helps, draw a number line.<\/p>\n<ol id=\"element-467\" type=\"a\">\n<li><em data-effect=\"italics\">r<\/em> = \u20130.567 and the sample size, <em data-effect=\"italics\">n<\/em>, is 19. The <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 17. The critical value is \u20130.456. \u20130.567 &lt; \u20130.456 so <em data-effect=\"italics\">r<\/em> is significant.<\/li>\n<li><em data-effect=\"italics\">r<\/em> = 0.708 and the sample size, <em data-effect=\"italics\">n<\/em>, is nine. The <em data-effect=\"italics\">df<\/em> = <em data-effect=\"italics\">n<\/em> \u2013 2 = 7. The critical value is 0.666. 0.708 &gt; 0.666 so <em data-effect=\"italics\">r<\/em> is significant.<\/li>\n<li><em data-effect=\"italics\">r<\/em> = 0.134 and the sample size, <em data-effect=\"italics\">n<\/em>, is 14. The <em data-effect=\"italics\">df<\/em> = 14 \u2013 2 = 12. The critical value is 0.532. 0.134 is between \u20130.532 and 0.532 so <em data-effect=\"italics\">r<\/em> is not significant.<\/li>\n<li><em data-effect=\"italics\">r<\/em> = 0 and the sample size, <em data-effect=\"italics\">n<\/em>, is five. No matter what the dfs are, <em data-effect=\"italics\">r<\/em> = 0 is between the two critical values so <em data-effect=\"italics\">r<\/em> is not significant.<\/li>\n<\/ol>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"fs-idp56886288\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.10<\/span><\/h3>\n<\/header>\n<section>\n<div id=\"eip-316\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"eip-112\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"eip-483\">For a given line of best fit, you compute that <em data-effect=\"italics\">r<\/em> = 0 using <em data-effect=\"italics\">n<\/em> = 100 data points. Can the line be used for prediction? Why or why not?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<section id=\"eip-570\" data-depth=\"1\">\n<h3 data-type=\"title\">Assumptions in Testing the Significance of the Correlation Coefficient<\/h3>\n<p id=\"eip-485\">Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. The premise of this test is that the data are a sample of observed points taken from a larger population. We have not examined the entire population because it is not possible or feasible to do so. We are examining the sample to draw a conclusion about whether the linear relationship that we see between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between <em data-effect=\"italics\">x<\/em> and <em data-effect=\"italics\">y<\/em> in the population.<\/p>\n<p id=\"eip-953\">The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this.<\/p>\n<div id=\"eip-873\" data-type=\"list\">\n<div id=\"14\" data-type=\"title\">The assumptions underlying the test of significance are:<\/div>\n<ul>\n<li>There is a linear relationship in the population that models the average value of <em data-effect=\"italics\">y<\/em> for varying values of <em data-effect=\"italics\">x<\/em>. In other words, the expected value of <em data-effect=\"italics\">y<\/em> for each particular value lies on a straight line in the population. (We do not know the equation for the line for the population. Our regression line from the sample is our best estimate of this line in the population.)<\/li>\n<li>The <em data-effect=\"italics\">y<\/em> values for any particular <em data-effect=\"italics\">x<\/em> value are normally distributed about the line. This implies that there are more <em data-effect=\"italics\">y<\/em> values scattered closer to the line than are scattered farther away. Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of <em data-effect=\"italics\">y<\/em> values lie on the line.<\/li>\n<li>The standard deviations of the population <em data-effect=\"italics\">y<\/em> values about the line are equal for each value of <em data-effect=\"italics\">x<\/em>. In other words, each of these normal distributions of <em data-effect=\"italics\">y<\/em> values has the same shape and spread about the line.<\/li>\n<li>The residual errors are mutually independent (no pattern).<\/li>\n<li>The data are produced from a well-designed, random sample or randomized experiment.<\/li>\n<\/ul>\n<\/div>\n<div id=\"linrgs_facts_normal\" class=\"os-figure\">\n<figure data-id=\"linrgs_facts_normal\"><span id=\"id9999999999999\" data-type=\"media\" data-alt=\"The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line \u2014 they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.\"><br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-601\" src=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab.jpg\" alt=\"The left graph shows three sets of points. Each set falls in a vertical line. The points in each set are normally distributed along the line \u2014 they are densely packed in the middle and more spread out at the top and bottom. A downward sloping regression line passes through the mean of each set. The right graph shows the same regression line plotted. A vertical normal curve is shown for each line.\" width=\"731\" height=\"362\" srcset=\"https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab.jpg 731w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab-300x149.jpg 300w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab-65x32.jpg 65w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab-225x111.jpg 225w, https:\/\/textbooks.jaykesler.net\/introstats\/wp-content\/uploads\/sites\/2\/2021\/01\/351ef6af755502a26ca385159609cf2cad1a01ab-350x173.jpg 350w\" sizes=\"auto, (max-width: 731px) 100vw, 731px\" \/><br \/>\n<\/span><\/figure>\n<div class=\"os-caption-container\"><span class=\"os-title-label\">Figure <\/span><span class=\"os-number\">12.17<\/span> <span class=\"os-caption\">The <em data-effect=\"italics\">y<\/em> values for each <em data-effect=\"italics\">x<\/em> value are normally distributed about the line with the same standard deviation. For each <em data-effect=\"italics\">x<\/em> value, the mean of the <em data-effect=\"italics\">y<\/em> values lies on the regression line. More <em data-effect=\"italics\">y<\/em> values lie near the line than are scattered further away from the line.<br \/>\n<\/span><\/div>\n<\/div>\n<\/section>\n<\/div>\n","protected":false},"author":1,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-111","chapter","type-chapter","status-publish","hentry"],"part":103,"_links":{"self":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/111","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/users\/1"}],"version-history":[{"count":5,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/111\/revisions"}],"predecessor-version":[{"id":696,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/111\/revisions\/696"}],"part":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/parts\/103"}],"metadata":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/111\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/media?parent=111"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=111"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/contributor?post=111"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/license?post=111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}