Note that we’ll cover **Scatter Plots, Correlation, and Regression** here, including how to use the **TI graphing calculator** to obtain statistics information, such as mean and standard deviation.

Somewhere in the pre-algebra stages, you’ll get a short, but fun introduction to **Statistics**, where you’ll cover such topics as finding the average and median of numbers, the median of numbers, organizing data with different types of graphs, and performing analyses on data. Typically, this type of mathematics falls under the realm of **Statistics**, or the study of the collection, organization, and presentation of data. You may also be introduced to **Probability**, which is the study of how likely events are to occur, or happen.

Let’s first talk about some types of measurement you’ll see in Statistics.

## Average, Mean, Median, Mode, Range and Mean Absolute Deviation (MAD)

Let’s say you go around and ask your friends to keep track of how much time your friends spend each week doing their homework. Fortunately, you have a lot of friends online that you can ask quite easily. **20** of your friends reply with the following numbers:

$ \displaystyle \begin{array}{c}10,2,15,28,1,32,12,14,8,17,\\22,6,42,3,14,7,12,23,20,8\\\text{hours studying each week }\!\!~\!\!\text{ }\end{array}$

##### Mean (Average)

Just by glancing at the numbers, can you guess the average? Probably not. But it’s easy to get the **average**: you just add all the numbers up and divide by the total number of responses. Another word for the average is the **mean**. Average hours per week of your **20** friends =

$ \displaystyle \begin{array}{c}\displaystyle \frac{\begin{array}{l}\text{1}0+\text{2}+\text{15}+\text{28}+\text{1}+\text{32+12}\\+\text{14}+\text{8}+\text{17}+\text{22}+\text{6}+\text{42}+\text{3}\\+\text{14}+\text{7}+\text{12}+\text{23}+\text{2}0+8\end{array}}{{20}}\,\\\\\,=\displaystyle \frac{{296}}{{20}}\,\,=\,\,14.8\end{array}$

##### Median

What if someone asked you for a number that is exactly in the middle of the data; in other words, the number that has just as many answers above it as below it. Sort the data (put it in order) to get this number:

$ \displaystyle \begin{array}{l}1\,\,\,\,\,2\,\,\,\,\,3\,\,\,\,\,6\,\,\,\,\,7\,\,\,\,\,8\,\,\,\,\,\,8\,\,\,\,\,\,10\\\,\,\,\,\,12\,\,\,\,12\,\,\,\,\,14\,\,\,\,\,14\,\,\,\,\,15\,\,\,\,\,17\\\,\,\,\,\,20\,\,\,\,22\,\,\,\,\,23\,\,\,\,\,28\,\,\,\,\,32\,\,\,\,\,42\end{array}$

To get the middle number or **median**, cross out numbers from both ends until you arrive at the middle number or numbers. Since we have an even number, we’ll get two “middles”, so to get the median, we’ll have to take the mean or average of those numbers:

We end up with two numbers in the middle: **12** and **14**. If we just had one number, that number would be the median, but since we have two numbers, we take the average of **12** and **14**, which is $ \displaystyle \frac{{12+14}}{2}=13$; the median is **13**. Remember that the word “median” sounds like the word “middle”.

##### Mode

The **mode** is the number or numbers that occur most often; you can have more than one mode. In the case of our data, the modes are **8**, **12**, and **14**, all of which occur twice in the data. Remember that the word “mode” sounds like the word “most” (often).

##### Range

The **range**, which is the difference between the largest number and smallest number, is **41** ($ 42-1$).

##### Mean Absolute Deviation (MAD)

Let’s find one more statistic that may be helpful to know. The **mean absolute deviation** (**MAD**) of a set of numbers tells us on average, how far our numbers are from the middle (**mean**) of the numbers. This measures the variability (dispersion) of the numbers. The mean absolute deviation is also called the **average absolute deviation**. In our situation, this might give us an indication of how similar or different the study habits are with our friends; for example, the higher the mean absolute deviation, the greater the spread or variability of the students’ study habits. Note that such a statistic is probably most useful when comparing different data sets.

To get this, we get the mean of the data (which we already have: **14.8**). Then we take each value and find the distance (absolute value of the difference) from this mean, and then take the average of them. For example, from the ordered values, the first value is $ \left| {1-14.8} \right|=13.8$. Here is our MAD:

$ \displaystyle \begin{array}{c}\displaystyle \frac{\begin{array}{l}13.8+1\text{2}\text{.8}+\text{11}\text{.8}+\text{8}\text{.8}+\text{7}\text{.8}+\text{6}\text{.8}+\text{6}\text{.8}\\\,\,+4.8+\text{2}\text{.8}+\text{2}\text{.8}+\text{.8}+\text{.8}+\text{.2}+\text{2}\text{.2}\\+\text{5}\text{.2}+\text{7}\text{.2}+\text{8}\text{.2}+\text{13}\text{.2}+17.2+27.2\,\end{array}}{{20}}\,\\\,\\=\displaystyle \frac{{162.2}}{{20}}\,\,=\,\,8.11\end{array}$

Thus, the mean absolute deviation is **8.11 hours**. What this means is among the friends, there is a lot of variability (dispersion) among homework time each week.

Later, we’ll see how to use a similar dispersion measure, **the standard deviation** (which is more complicated to measure but more often used), and also how to get a lot of these measurements in the graphing calculator! These can be found in the **Scatter Plots, Correlation, and Regression** section.

## Box and Whisker Plot

There are a couple different ways of viewing data graphically that you’ll learn in your math classes. First, a **box and whisker plot**, or **box plot**, is a visual picture of the data that shows where the middle of the data is (the **median**), and how far away from the middle the other points lie. Here again is how we got the median; it’s $ \displaystyle \frac{{12+14}}{2}=13$:

Now get the **lower **and **upper quartiles**, which are the “medians” of the numbers to the left and right of the median, including the median. (In our case, we include the **12** and **14** since we didn’t have a true middle number, or median). Note that the word “quartile” is related to the word “quarter”; the data is divided into four quarters. To get the **lower quartile**, start crossing out again:

$ \displaystyle \xcancel{{1\,\,\,\,\,2\,\,\,\,\,3\,\,\,\,\,6\,\,\,\,\,}}\,\,\,\,\,\,\,\,\left[\!\left[ {7\,\,\,\,\,\,8} \right]\!\right]\,\,\,\,\,\,\,\xcancel{{8\,\,\,\,\,10\,\,\,\,\,12\,\,\,\,\,12}}$ We are left with **7** and **8** in the middle, and the mean is **7.5**. Thus, the lower quartile is **7.5**.

Do the same to get the **upper quartile**, but use the numbers to the right of the median:

$ \displaystyle \xcancel{{14\,\,\,\,\,14\,\,\,\,\,15\,\,\,\,\,17\,}}\,\,\,\,\,\,\,\,\left[\!\left[ {20\,\,\,\,\,\,22} \right]\!\right]\,\,\,\,\,\,\,\xcancel{{23\,\,\,\,\,28\,\,\,\,\,32\,\,\,\,\,42}}$ We are left with **20** and **22** in the middle, and the mean is **21**. Thus, the upper quartile is **21**.

Then, plot the lowest number, lower quartile, median, upper quartile, and highest number like this; see how it looks like a box with whiskers on each side?

Can you see that $ \displaystyle \frac{1}{4}$ of your **20** friends (**5** friends) study less than **7.5** hours per week, $ \displaystyle \frac{1}{2}$ (**10** friends) less than **13** hours, and $ \displaystyle \frac{3}{4}$ (**15** friends) less than **21** hours a week? You can also see that $ \displaystyle \frac{1}{2}$ of your friends (**10** friends) study between **7.5** hours (the lower quartile) and **21** hours (the upper quartile) per week. You can also see that the highest point (**42** hours) is somewhat of an **outlier**; this means that this point may not fit in with the rest of the data.

(Note that we could have used our graphing calculator to derive some of these values, such as the median and quartiles. See **Basic Stats on Data from Calculator** in the **Scatter Plots, Correlation, and Regression** section to see how to do this).

## Stem and Leaf Plot

We could also draw a stem and leaf graph with the data, which resembles plant stems and leaves:

For this plot, put the first digit (the tens) of all the numbers on the left-hand side, and then put the ones on the right-hand side, in order. Here you can see the same thing – since we know (from earlier) that the median is **7.5**, we can see that there is a larger difference between the median and the largest number (**42**) than the median and the smallest number (**1**).

We can also see how the smaller data is more clumped together; we’ll see this next in Frequency Tables and Graphs.

## Frequency Tables and Graphs

We could also draw what we call a **frequency table** that shows us how many of your friends studied less than or equal to **10** hours per week, **11** to **20** hours per week, **21** to **30** hours per week, **31** to **40** hours per week, and **41** to **50** hours per week. These are called “**buckets**” or “**classes**” and each class has **10** hours in it (**0** to **10** hours, and so on). Notice that our buckets of data are a little different than the stem and leaf table above:

Then we could draw a **histogram** from this data, as shown below. A **histogram**, or a **frequency graph** is a graph showing the distribution of the data – where it lies.

Or even better, we can draw a **relative frequency histogram**, where we divide the number of friends in each “bucket” or “class” by the total number of friends, as shown below. The reason this is a better graph is that if you add up all the amounts for each bucket, we’ll get a grand total of **1** (all the decimals on the left add up to **1**), so we can compare different sets of data together.

This data is **skewed**** to the right**, or** positively skewed** (the **right-hand side** has a longer “**tail**”). When data is **skewed to the left **(or** negatively skewed**), the **left side** has a longer “**tail**”. When data isn’t skewed left or right, we call the data “**symmetric**”.

When the data is skewed right or left, the median is a better measure of the central tendency (average) for the data, since the mean could be misleading. The mean tends to go out in the “tail”. Here are some examples of other data that shows this (means and medians may not be accurate – just giving an idea):

## Pie Chart

One more type of graph that’s fun to draw is a circle graph, or pie chart. We could divide up the “buckets” from our data above (numbers of hours our friends are studying per week), and compute how big to make the pieces of the pie with the use of a little bit of Geometry.

Since we know the percentages of friends who fall into each category from the relative frequency chart above, we can get the angle measurements of each piece of the pie (from the center of the pie) by using proportions and the fact that there are **360** degrees in a circle. By using proportions (for example, $ \displaystyle \frac{{40}}{{100}}=\frac{?}{{360}}$), we find that we can just multiply the relative frequency by **360** degrees to get each angle measurement. Here’s the table again, with the degrees for each bucket:

We can use a protractor to draw our pie chart:

Don’t worry if you don’t totally get all this now! Later on in more advanced Algebra we’ll learn even more ways to display and interpret data (like when we compare two sets of data), including using a graphing calculator to display/interpret data.

## Probability

Before Algebra, you may also have studied a topic called **Probability**, which is related to Statistics. Probability can get complicated in advanced courses, but we’ll just talk about a few “counting” techniques and how to compute some basic probabilities.

Basically, a probability is a number between **0** and **1** that tells us how likely something is about to occur. Have you ever heard the expression “The probability of my passing this course is about **0**“? That’s not a good sign for passing that course. **Note that a lot of times, probability is given in a percent (0 to 100%) instead of a decimal.**

Here is more information on probabilities:

- A probability can be defined as a fraction with the number of times something occurs over the number of possible ways something
**can**occur. The possible ways are called**outcomes**, the set of all possible outcomes is the**sample space**, and this is typically in the denominator in a probability. For example, the probability of getting a head if you flip a coin is $ \displaystyle \frac{1}{2}$, since only one thing happens (either a head or a tail), but**2**things could have happened (the head or the tail). It’s a little confusing, but you’ll get it after a while. This is an**experiment**, since it involves**chance**. - The probability that something happens and it doesn’t happen (the
**complement**) adds up to**1**. Therefore, the probability of something happening is $ 1-\text{probability of the exact opposite happening}$. - Something that has no chance of happening has a probability of
**0**(like the probability of getting a**7**when you roll a die), and something that will always occur has a probability of**1**(like the probability of getting a number in between and including**1**and**6**when you roll a die). - When events are
**independent**(not related), we can actually**multiply**to get the probability of**both**happening! We do have to be careful though, since if the events are dependent on one another (like choosing again, without replacing), the formulas become more complicated. I also address this**below**. **Experimental probabilities**are those you get by actually doing an experiment (like flipping the coin above). You’d have to this for many, many times to get close to the**theoretical probability**, which is the probability we get through mathematics.

### Experimental Probability Example

An example of trying an experimental probability is to **flip ****a coin 40 times** and record whether you get a head or a tail. At each coin toss, add up the number of heads so far, and divide by number of flips so far, to get the experimental probability each time. Notice how it gets closer to the theoretical probability **.5** (more reliably – less variability) as you get closer to **40** coin tosses.

I just did this experiment with flipping a penny and checking the experimental probability that I get **heads **at each coin toss (total number of heads so far, divided by total number of flips so far). Notice how, even though the experimental probability doesn’t end up at exactly **.5**, the trend is that it gets closer to **.5** (with less variance or deviation) the more times I flip the coin:

If we did this experiment say for **2000** times, our experimental probability each time would be reliably very, very close to the theoretical probability of **.5**. (You might try this for a science experiment!)

### Counting Principles

Probabilities usually involve some sort of **counting** to put on the top or bottom of the fraction.

#### Fundamental Counting Principle

Here’s an example of the **Fundamental Counting Principle**, which says that you have a certain number of ways to do something and another number of ways to do something else, you can just **multiply** those numbers to get the numbers of ways to do **both**.

Let’s say we have **3** shirts, **2** skirts, and **2** pairs of shoes that we’ve taken on a vacation. We want to know the probability of picking our sleeveless blue shirt, with our pink skirt, with our platform sandal shoes for that day.

Do you see that the total number of things that you can get, or outcomes, is **3** times **2** times **2**, which would be **12**? Think about it – for the first shirt, you could wear one of two skirts, and one of two pairs of shoes, for the second shirt, the same thing, and so on. You could draw a “tree” diagram like this:

To get that combination (**order doesn’t matter**: sleeveless blue shirt, pink skirt, and sandals), there would be **1** way out of **12** **possible** ways, so the probability would be $ \displaystyle \frac{1}{12}$!

#### Combinations and Permutations

More advanced probability techniques include the concept of **combinations** and **permutations**. Combinations and permutations exist since most of the time probability concerns picking a subset (smaller set) of things from a larger set of things, and how we pick the sets is important. As we saw above, this ratio of the desired subset to the number of all possible subsets is between **0** and **1**, and this is the probability.

When the **order of the subset matters**, we have a **permutation**. An example of this is wanting the number of ways of picking a president, vice-president, and secretary from a group of people.

When **order doesn’t matter**, we have a **combination**. (I remember this since we don’t care about “order” when we have an “**o**” in the word: c**o**mbination.) An example of this is wanting the number of ways that any three people can be chosen from a group of people. We’ll actually use combinations again in the **Binomial Expansion** section.

The math is a little difficult for calculating **permutations** and **combinations**. Remember that $ n!=n\times \left( {n-1} \right)\times \left( {x-2} \right)\times …..\left( 1 \right)$. For example, $ 4!=4\times 3\times 2\times 1=24$.

Here are some examples of permutation and combination problems:

#### Probability Problems

Here are examples of probability problems that contain the counting techniques we’ve looked at. **Important tip for probability problems**: Generally, you **add probabilities** when the events happen to be alternatives, like an “either/or” situation (and they are mutually exclusive). You **multiply probabilities** when you want two or more things to happen, either at the same time, or one after another (and they are independent). See **below** for more the formal equations for these concepts; I like to try to do the problems without the equations, if possible.

Here are more advanced probability problems.

**Important tip for probability problems**: Generally, you **add probabilities** when the events happen to be alternatives, like an “either/or” situation (and they are mutually exclusive). You **multiply probabilities** when you want two or more things to happen, either at the same time, or one after another (and they are independent). See **below** for more the formal equations for these concepts; I like to try to do the problems without the equations, if possible.

### Probability Formulas

As we said earlier, when events are **independent** (not related), we can actually **multiply** to get the probability of **both** happening! We do have to be careful though, since if the events are dependent on one another (like choosing again, without replacing), the formulas become more complicated. We used some of these formulas above, but I like to try to explain the problems without using formulas, if possible. Here are the formulas with examples:

Again, don’t worry if you don’t get all this now; it’s just important to get the main concepts.

**Learn these rules, and practice, practice, practice!**

Click on Submit (the arrow to the right of the problem) to solve this problem. You can also type in more problems, or click on the 3 dots in the upper right hand corner to drill down for example problems.

If you click on “Tap to view steps”, you will go to the **Mathway** site, where you can register for the **full version** (steps included) of the software. You can even get math worksheets. You can also go to the **Mathway** site here, where you can register, or just use the software for free without the detailed solutions. There is even a Mathway App for your mobile device. Enjoy!

On to **Introduction to Algebra** – you’re ready!