Scatter Plots, Correlation, and Regression

Usually around the time that you are beginning “Algebra II” you’ll have another lesson on a little more advanced Statistics than you had earlier (in the Introduction to Statistics and Probability section). These include Scatter Plots, Correlation, and Regression, including how to use the Graphing Calculator.

Scatter Plots

In the real world, there are always sets of data that need to be interpreted. As an example of interpreting sets of data, we may want to see if there is some sort of connection between two sets of data, such as the number of hours studied per week versus grade point average. It seems like the two variables would be related, but suppose you survey some of your friends to see what a graph would look like:

Friend

Number of hours of studying per week

Grade Point Average (out of 5.0)

Allie 14 3.91
Samantha 42 4.98
Hayley 10 3.22
Jessica 32 4.81
Megan 5 2.0
Rachel 10 2.82
Briley 25 3.79

Lauren

18

3.48

To make more sense of the data, let’s first order it by the number of hours of studying:

Friend Number of hours of studying per week Grade Point Average (out of 5.0)
Megan 5 2.0
Hayley 10 3.22
Rachel 10 2.82
Allie 14 3.91
Lauren 18 3.48
Briley 25 3.79
Jessica 32 4.81
Samantha 42 4.98

Here’s what the scatter plot looks like. A scatter plot is just a graph of the $ x$-points (number of hours studying each week) and the $ y$-points (grade point average):

Correlation

Notice from the scatter plot above, generally speaking, the friends who study more per week have higher GPAs. Thus, if we were to try to fit a line through the points, which is a statistical calculation that finds the “closest” line to the points, it would have a positive slope. Since the trend is that when the $ x$-values go up, the $ y$-values also go up, we call this a positive correlation, and the correlation coefficient is positive.

Note that a positive correlation doesn’t necessarily mean that the effect of one variable causes the effect on the other variable (a causal relationship, or causation); there may be a third effect that causes both of the variables to make the same type of changes. For example, there seems to be a strong correlation between shark attacks and ice cream sales; of course shark attacks do not cause people to buy ice cream, but in hot weather, both shark attacks and people buying ice cream are more likely to occur.

Again, correlation can be thought of as the degree in which two things relate to each other, and the correlation coefficients are anywhere from –1 (strong negative correlation) to 1 (strong positive correlation). A correlation coefficient of or near 0 means there’s no connection at all between the two variables.

Here are some examples (“≈” symbol means approximately equal to):

Regression

Going back to our original data, we can try to fit a line through the points that we have; this is called a “trend line”, “linear regression” or “line of best fit”; it’s the line that’s the “closest fit” to the points – the best trend line. The formula for getting this line is a bit complicated (the “least squares method”, if you’ve heard of it) and is learned in more advanced Statistics, but you may learn how to do with a graphing calculator, as shown below.

Here’s what the line looks like through our data:

 Using Graphing Calculator to Get Line of Best Fit

You can put the data in the graphing calculator and have the points graphed, and also get the equation for the best fit trend line. You can then graph this line over the points like we see above. (I’m using the TI-84 Plus CE calculator.)   

To do this, first put the data points in “lists” in the calculator:


Now, let’s use the power of the graphing calculator to find the line of best fit for this set of data. Again, we could do this manually using a complicated formula in Statistics, but the calculator does it so easily! Basically, the math behind finding the best fit is finding a line that has the minimal distances to each of the points.

Basic Stats on Data from Calculator

Before I show you how to get the line of best fit, let’s get some simple data on the two sets of data – like the mean, median, quartiles, and max (that we got by hand for our Box and Whisker Plot in the Introduction to Statistics and Probability section). For our set of data, since we have two sets of data in our lists, we can use either 1-Var Stats or 2-Var Stats to get information about just the first set of data we put in L1, or both sets of data that we put in L1 and L2:

Line of Best Fit

Now let’s go back and do the regression of our data (find the line of best fit).

Note that before you do this, you should turn diagnostics on so you can see the correlation coefficient $ r$ and also $ {{r}^{2}}$  (which is the square of $ r$ and can used to compare linear and non-linear regressions to see which fits best). You can do this by hitting “mode” and scrolling down to STAT DIAGNOSTICS and hitting ENTER if it’s not on. (You can also go to 2nd 0 (CATALOG), then move cursor to DiagnosticOn and hit ENTER  ENTER to turn this on). You can just keep this on and not worry about it.

Note that we show a quadratic regression here in the Introduction to Quadratics section, and an exponential regression here in the Exponential Functions section.


Learn these rules, and practice, practice, practice.


For Practice: Use the Mathway widget below to try a problem. Click on Submit (the blue arrow to the right of the problem) and click on Find the Regression Line to see the answer.

You can also type in your own problem, or click on the three dots in the upper right hand corner and click on “Examples” to drill down by topic.

If you click on Tap to view steps, or Click Here, you can register at Mathway for a free trial, and then upgrade to a paid subscription at any time (to get any type of math problem solved!).

On to Exponents and Radicals in Algebra – you are ready!

Scroll to Top