{"id":112,"date":"2021-01-12T22:19:54","date_gmt":"2021-01-12T22:19:54","guid":{"rendered":"https:\/\/textbooks.jaykesler.net\/introstats\/chapter\/prediction\/"},"modified":"2021-05-11T20:14:02","modified_gmt":"2021-05-11T20:14:02","slug":"prediction","status":"publish","type":"chapter","link":"https:\/\/textbooks.jaykesler.net\/introstats\/chapter\/prediction\/","title":{"rendered":"Prediction"},"content":{"raw":"<span style=\"display: none;\">\r\n[latexpage]\r\n<\/span>\r\n<div id=\"b5743be1-228f-419c-a947-e5737622c8aa\" class=\"chapter-content-module\" data-type=\"page\" data-cnxml-to-html-ver=\"2.1.0\">\r\n<p id=\"eip-775\">Recall the <a href=\"\/introstats\/chapter\/the-regression-equation\/#element-22\">third exam\/final exam example<\/a>.<\/p>\r\n<p id=\"eip-804\">We examined the scatterplot and showed that the correlation coefficient is significant. We found the equation of the best-fit line for the final exam grade as a function of the grade on the third-exam. We can now use the least-squares regression line for prediction.<\/p>\r\n<p id=\"element-12498\">Suppose you want to estimate, or predict, the mean final exam score of statistics students who received 73 on the third exam. The exam scores <strong>(<em data-effect=\"italics\">x<\/em>-values)<\/strong> range from 65 to 75. <strong> Since 73 is between the <em data-effect=\"italics\">x<\/em>-values 65 and 75<\/strong>, substitute <em data-effect=\"italics\">x<\/em> = 73 into the equation. Then:<\/p>\r\n$$\\hat y=\u2212173.51+4.83(73)=179.08$$\r\n<p id=\"fs-idm74881792\">We predict that statistics students who earn a grade of 73 on the third exam will earn a grade of 179.08 on the final exam, on average.<\/p>\r\n\r\n<div id=\"element-438\" class=\"ui-has-child-title\" data-type=\"example\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.11<\/span><\/h3>\r\n<\/header><section>\r\n<div class=\"body\">\r\n<p id=\"element-546\">Recall the <a href=\"\/introstats\/chapter\/the-regression-equation\/#element-22\">third exam\/final exam example<\/a>.<\/p>\r\n&nbsp;\r\n<div id=\"element-770\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"id1170592391444\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"element-464\">a. What would you predict the final exam score to be for a student who scored a 66 on the third exam?<\/p>\r\n\r\n<div id=\"element-770\" class=\" unnumbered\" data-type=\"exercise\"><section>\r\n<div id=\"id1170583280560\" data-type=\"solution\" aria-label=\"show solution\" aria-expanded=\"false\"><section class=\"ui-body\" role=\"alert\">\r\n<div class=\"os-solution-container\">\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<div id=\"element-844\" class=\" unnumbered\" data-type=\"exercise\"><header><\/header><section>\r\n<div id=\"id1170601334360\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"element-306\">b. What would you predict the final exam score to be for a student who scored a 90 on the third exam?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"id1170583280560\" data-type=\"solution\" aria-label=\"show solution\" aria-expanded=\"false\">\r\n<div class=\"ui-toggle-wrapper\"><\/div>\r\n<section class=\"ui-body\" style=\"display: block; overflow: hidden;\" role=\"alert\">\r\n<h4 data-type=\"solution-title\"><span class=\"os-title-label\">Solution <\/span><span class=\"os-number\">12.11<\/span><\/h4>\r\n<div class=\"os-solution-container\">\r\n<p id=\"element-444\">a. 145.27<\/p>\r\n\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<div id=\"element-844\" class=\" unnumbered\" data-type=\"exercise\"><section>\r\n<div id=\"id3981938\" data-type=\"solution\" data-print-placement=\"end\" aria-label=\"show solution\" aria-expanded=\"false\"><section class=\"ui-body\" style=\"display: block; overflow: hidden;\" role=\"alert\">\r\n<div class=\"os-solution-container\">\r\n<p id=\"element-380\">b. The <em data-effect=\"italics\">x<\/em> values in the data are between 65 and 75. Ninety is outside of the domain of the observed <em data-effect=\"italics\">x<\/em> values in the data (independent variable), so you cannot reliably predict the final exam score for this student. (Even though it is possible to enter 90 into the equation for <em data-effect=\"italics\">x<\/em> and calculate a corresponding <em data-effect=\"italics\">y<\/em> value, the <em data-effect=\"italics\">y<\/em> value that you get will not be reliable.)\r\n<span data-type=\"newline\">\r\n<\/span><span data-type=\"newline\">\r\n<\/span><\/p>\r\nTo understand really how unreliable the prediction can be outside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data, make the substitution <em data-effect=\"italics\">x<\/em> = 90 into the equation.\r\n\r\n$$\\hat y=\u2013173.51+4.83(90)=261.19$$\r\n\r\nThe final-exam score is predicted to be 261.19. The largest the final-exam score can be is 200.\r\n<span data-type=\"newline\">\r\n<\/span>\r\n<div id=\"eip-id1168998191288\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header><\/header><section>\r\n<div class=\"os-note-body\">\r\n\r\n<span data-type=\"title\">Note<\/span>\r\n<p id=\"eip-idp117960960\">The process of predicting inside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data is called <span id=\"term224\" data-type=\"term\">interpolation<\/span>. The process of predicting outside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data is called <span id=\"term225\" data-type=\"term\">extrapolation<\/span>.<\/p>\r\n\r\n<\/div>\r\n<\/section><\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<div id=\"fs-idp125531648\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\"><header>\r\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.11<\/span><\/h3>\r\n<\/header><section>\r\n<div id=\"eip-375\" class=\" unnumbered\" data-type=\"exercise\"><section>\r\n<div id=\"eip-637\" data-type=\"problem\">\r\n<div class=\"os-problem-container \">\r\n<p id=\"eip-244\">Data are collected on the relationship between the number of hours per week practicing a musical instrument and scores on a math test. The line of best fit is as follows:<\/p>\r\n<p id=\"eip-idp180607312\"><em data-effect=\"italics\">\u0177<\/em> = 72.5 + 2.8<em data-effect=\"italics\">x<\/em>\r\n<span data-type=\"newline\">\r\n<\/span>Assuming you have confirmed that the correlation coefficient is significant, what would you predict the score on a math test would be for a student who practices a musical instrument for five hours a week?<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/section><\/div>\r\n<\/section><\/div>\r\n<div class=\"textbox\">If you are trying to predict an outcome but the correlation coefficient is not significant, we can simply use the mean of the $y$-data as our predicted value, since the $x$-data and the $y$-data are not well correlated in this case.<\/div>\r\n&nbsp;\r\n\r\n<\/div>","rendered":"<p><span style=\"display: none;\"><br \/>\n[latexpage]<br \/>\n<\/span><\/p>\n<div id=\"b5743be1-228f-419c-a947-e5737622c8aa\" class=\"chapter-content-module\" data-type=\"page\" data-cnxml-to-html-ver=\"2.1.0\">\n<p id=\"eip-775\">Recall the <a href=\"\/introstats\/chapter\/the-regression-equation\/#element-22\">third exam\/final exam example<\/a>.<\/p>\n<p id=\"eip-804\">We examined the scatterplot and showed that the correlation coefficient is significant. We found the equation of the best-fit line for the final exam grade as a function of the grade on the third-exam. We can now use the least-squares regression line for prediction.<\/p>\n<p id=\"element-12498\">Suppose you want to estimate, or predict, the mean final exam score of statistics students who received 73 on the third exam. The exam scores <strong>(<em data-effect=\"italics\">x<\/em>-values)<\/strong> range from 65 to 75. <strong> Since 73 is between the <em data-effect=\"italics\">x<\/em>-values 65 and 75<\/strong>, substitute <em data-effect=\"italics\">x<\/em> = 73 into the equation. Then:<\/p>\n<p>$$\\hat y=\u2212173.51+4.83(73)=179.08$$<\/p>\n<p id=\"fs-idm74881792\">We predict that statistics students who earn a grade of 73 on the third exam will earn a grade of 179.08 on the final exam, on average.<\/p>\n<div id=\"element-438\" class=\"ui-has-child-title\" data-type=\"example\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Example <\/span><span class=\"os-number\">12.11<\/span><\/h3>\n<\/header>\n<section>\n<div class=\"body\">\n<p id=\"element-546\">Recall the <a href=\"\/introstats\/chapter\/the-regression-equation\/#element-22\">third exam\/final exam example<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"element-770\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"id1170592391444\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"element-464\">a. What would you predict the final exam score to be for a student who scored a 66 on the third exam?<\/p>\n<div class=\"unnumbered\" data-type=\"exercise\">\n<section>\n<div id=\"id1170583280560\" data-type=\"solution\" aria-label=\"show solution\" aria-expanded=\"false\">\n<section class=\"ui-body\" role=\"alert\">\n<div class=\"os-solution-container\">\n<p>&nbsp;<\/p>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"element-844\" class=\"unnumbered\" data-type=\"exercise\">\n<header><\/header>\n<section>\n<div id=\"id1170601334360\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"element-306\">b. What would you predict the final exam score to be for a student who scored a 90 on the third exam?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/div>\n<\/div>\n<div data-type=\"solution\" aria-label=\"show solution\" aria-expanded=\"false\">\n<div class=\"ui-toggle-wrapper\"><\/div>\n<section class=\"ui-body\" style=\"display: block; overflow: hidden;\" role=\"alert\">\n<h4 data-type=\"solution-title\"><span class=\"os-title-label\">Solution <\/span><span class=\"os-number\">12.11<\/span><\/h4>\n<div class=\"os-solution-container\">\n<p id=\"element-444\">a. 145.27<\/p>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<div class=\"unnumbered\" data-type=\"exercise\">\n<section>\n<div id=\"id3981938\" data-type=\"solution\" data-print-placement=\"end\" aria-label=\"show solution\" aria-expanded=\"false\">\n<section class=\"ui-body\" style=\"display: block; overflow: hidden;\" role=\"alert\">\n<div class=\"os-solution-container\">\n<p id=\"element-380\">b. The <em data-effect=\"italics\">x<\/em> values in the data are between 65 and 75. Ninety is outside of the domain of the observed <em data-effect=\"italics\">x<\/em> values in the data (independent variable), so you cannot reliably predict the final exam score for this student. (Even though it is possible to enter 90 into the equation for <em data-effect=\"italics\">x<\/em> and calculate a corresponding <em data-effect=\"italics\">y<\/em> value, the <em data-effect=\"italics\">y<\/em> value that you get will not be reliable.)<br \/>\n<span data-type=\"newline\"><br \/>\n<\/span><span data-type=\"newline\"><br \/>\n<\/span><\/p>\n<p>To understand really how unreliable the prediction can be outside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data, make the substitution <em data-effect=\"italics\">x<\/em> = 90 into the equation.<\/p>\n<p>$$\\hat y=\u2013173.51+4.83(90)=261.19$$<\/p>\n<p>The final-exam score is predicted to be 261.19. The largest the final-exam score can be is 200.<br \/>\n<span data-type=\"newline\"><br \/>\n<\/span><\/p>\n<div id=\"eip-id1168998191288\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header><\/header>\n<section>\n<div class=\"os-note-body\">\n<p><span data-type=\"title\">Note<\/span><\/p>\n<p id=\"eip-idp117960960\">The process of predicting inside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data is called <span id=\"term224\" data-type=\"term\">interpolation<\/span>. The process of predicting outside of the observed <em data-effect=\"italics\">x<\/em> values observed in the data is called <span id=\"term225\" data-type=\"term\">extrapolation<\/span>.<\/p>\n<\/div>\n<\/section>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<div id=\"fs-idp125531648\" class=\"statistics try ui-has-child-title\" data-type=\"note\" data-has-label=\"true\" data-label=\"\">\n<header>\n<h3 class=\"os-title\"><span class=\"os-title-label\">Try It <\/span><span class=\"os-number\">12.11<\/span><\/h3>\n<\/header>\n<section>\n<div id=\"eip-375\" class=\"unnumbered\" data-type=\"exercise\">\n<section>\n<div id=\"eip-637\" data-type=\"problem\">\n<div class=\"os-problem-container\">\n<p id=\"eip-244\">Data are collected on the relationship between the number of hours per week practicing a musical instrument and scores on a math test. The line of best fit is as follows:<\/p>\n<p id=\"eip-idp180607312\"><em data-effect=\"italics\">\u0177<\/em> = 72.5 + 2.8<em data-effect=\"italics\">x<\/em><br \/>\n<span data-type=\"newline\"><br \/>\n<\/span>Assuming you have confirmed that the correlation coefficient is significant, what would you predict the score on a math test would be for a student who practices a musical instrument for five hours a week?<\/p>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<\/section>\n<\/div>\n<div class=\"textbox\">If you are trying to predict an outcome but the correlation coefficient is not significant, we can simply use the mean of the $y$-data as our predicted value, since the $x$-data and the $y$-data are not well correlated in this case.<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n","protected":false},"author":1,"menu_order":5,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-112","chapter","type-chapter","status-publish","hentry"],"part":103,"_links":{"self":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/112","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/users\/1"}],"version-history":[{"count":4,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/112\/revisions"}],"predecessor-version":[{"id":404,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/112\/revisions\/404"}],"part":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/parts\/103"}],"metadata":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapters\/112\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/media?parent=112"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/pressbooks\/v2\/chapter-type?post=112"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/contributor?post=112"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/textbooks.jaykesler.net\/introstats\/wp-json\/wp\/v2\/license?post=112"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}