Mean and Variance The main point of this lecture is to tie those concepts back into sigma notation, whichis one of the main uses for summation. We'll walk through this very slowly andunpack it for you. If we have a set of numbers X, with N numbers in them (X1 to Xn)they're just real numbers. Mean and variance are two important concepts in statistics. They are used todescribe a distribution of numbers. The mean is the average value of all thenumbers, while the variance is the average squared deviation from the mean. The mean is calculated by adding up all the values and dividing by how many therewere. The variance is calculated by taking each value, subtracting the mean andsquaring it, then adding those squares together and taking their sum. The mean of that set can be expressed in summation notation as.This is called the variance of x. The point of this lecture is to understand what that isand finally by the way, that's just plain old sigma which is the square root of sigmasquared. The standard deviation is used for describing sets of numbers. We can calculate thestandard deviation of a set that has three elements. In this case, the set Z has threenumbers: 1, 5 and 12. We can see that the sum of these three numbers is 19;however, they can also be expressed as a weighted average using sigma notation:1/3 * 1 + 5/3 * 5 + 12/3 * 12 = 11.
If we take the mean of set Z, what that really means is that we add up all of thenumbers (1+5+12) and divide by how many numbers we have (3). So if we do that,it's never a good idea to do arithmetic in public, obviously we've done this inadvance. So it's 18 divided by 3 which is 6, that's the mean. And there's lots ofnotation for it. The most correct general notation might be the Greek symbol mu formean, but most people just use an open circle or dot instead of mu when they'rewriting formulas because we don't need to distinguish between mathematicalnotation and regular English words anymore. Sometimes you'll see it written as μ(Z),but often you'll just see μ by itself because we know what we're talking aboutalready. That's a simple example of numbers. Let's do a slightly harder example with symbolsinstead. Suppose we have set y, consisting of four numbers but I don't tell you whatthey are. y1, y2, y3, and y4. Suppose we have a set x consisting of n numbers, x1, x2 up to xn. Then the mean ofY, mu sub Y, would be 1/4 times x1 + x2 + x3 +x4 and here comes the punchline:let's express this in sigma notation. This is 1/4 times the sum from i = 1 to 4 of yi.And remember, i is a dummy index. So let's not get too radical; let's use i. Wegeneralize a little bit further. In general suppose we have set x consisting of narbitrary numbers. The mean of x, Is pretty easy to guess this now. Mu of x is equal to 1 over n times the summation from i = 1 to n of x sub i. That's the meaning usingsigma notation. By the way, it's worth thinking a little bit about the two differentphilosophical functions of i and n. The variable i is called a dummy variable because it takes the place of anothervariable in an equation. The variable n, which represents the number of observationsor data points you wish to examine, is often used in statistics. For example, if nrepresents 10, then you would stop computing statistics at 10. If n represents 11,then you would stop computing statistics at 11, and so on. In the previous example,we saw that n was 4. Here is an example of mean centering data: here is the friend Zwith three elements in it 1, 5, and 12; previously we computed that the mean of Zequals 6. Over here we see three elements of Z in blue 1, 5 and 12; and there's themean 6 there.
Let's form a new set, let's call it Z prime. We'll subtract the mean from every elementin Z. For instance, 1-6, 5-6, 12-6 = -5, -1, 6. There's Z prime. If we compute themean of Z prime, we get 0. This is -5 + -1 + 6 divided by 3, which works out to be 0.What we`re actually doing is essentially pretending that the red dot at 6 was zero, sowe are moving it back over to zero and will shift everything over with it. In otherwords, if you think about it this way, the red dot at 6 is going into 0 and becomes ared dot. The mean is a value representing the average of a set of numbers. The variance isanother way of measuring how values are spread out from the mean. And when youmean-center data, you produce a new data set which has the same relationships butwhose mean is 0. We'll discuss more reasons for doing this later. In statistics and data science, the mean is an important measure of central tendencyfor a data set. In this case, large is 30, but large could be 3 million. Statisticians anddata scientists often do not like large amounts of numbers or information. They wantto summarize sets by small sets of numbers. Summarizing a set by its mean is aboutthe simplest thing you can do, but it gives some information. What we're going to seehere is an example of where it obviously doesn't give complete information. Here is aset Z, which is 1, 5, 12. We're getting bored with this; it's our friend, we get boredwith our friends. Mu sub Z is 6; fine. Here's another set W: 5, 6 and 7; if you calculatemu sub W it turns out that's also 6 so you can check yourself that that's true too; so
obviously it's not the case that the mean is not a unique classifier of a set. We havetwo sets with the same meaning. Let's look at these sets of numbers on the number line and see that the mean is nottelling the whole story. Here is 0, here's the mean at 6, and let's draw a Z in blue. Sohere's 1, 5, and 12, and let's say we'll draw W in yellow. W actually has a dot righthere at 5; a dot right there at 6; and another dot at 7. Blue and yellow are similar in that they both refer to the same color, but theirmeanings differ. As you might say, blue is more generalized than yellow. Variance isa statistical mathematical data science concept that examines how generalized orspread out numerical values are. If X (x1, x2, x3...) represents these numbers, thenvariance is equal to sigma squared x divided by one over n or ∑ (xi/n). This equationcan be intimidating at first glance, but it simply tells us that variance is equal to1/(n-1) times the sum of all of our xi divided by n minus 1. We ask how far a value xi is from the mean, or average. We square this distance tofind the variance. The term inside the square, xi- mu sub x, is referred to as the deviation. The reasonwe square it is we don't really care if you're to the right of the mean or the left of themean; what we care about is how far away from it you are. So for example, if xi was1 and the mean was 6, that would be a pretty big number. If xi was 5 and the meanwas 6, then it'd not be that particularly big of a number. And then essentially whatwe're doing here is we're taking the average of those numbers. We're taking themean of those numbers by dividing by n which is why we divide by n. That'sessentially what this variance formula does. If we take the sigma of x, which is just the square root of sigma squared, this iscalled the standard deviation of x.
To understand the concept of variance, it is helpful to work through a simpleexample. To do this, let's first pretend that we know what the mean and standarddeviation are for two populations, and then see if we can figure out which one hasmore dispersion. For these two examples, let's assume that Z=1512 and W=1323. W is 5, 6, 7, And the mean of Z is 6 which turns out to be the mean of W becausethat's how we cooked it. Let's start with the easy one. So the sigma squared of w isgoing to be, in this case n is 3 so 1 over 3. Times the sum from i = 1 to 3 of, let's callthis w1, w2, w3. And this over here is z1, z2, z3. The sum of wi minus mean of wsquared is equal to one-third. So now the first one, w1 is 5- 6 squared + 6- 6 squared+ 7- 6 squared. And if work that out, that turns out to be one third times -1 squared +0 squared + 1 squared. The variance of a standard normal distribution is two-thirds, which means that thestandard deviation is the square root of two-thirds. If we do another calculation, wefind that the variance of a standard t distribution (which is the same shape as Z) isone-third times 1- 6 squared + 5- 6 squared + 12- 6 squared. The point is that itjustifies saying much much greater than two-thirds which justifies our intuition that Zand W have the same mean but Z has greater spread out as measured by itsvariance.