Introduction to Statistics – Z-Scores

Z-Scores

Downloads and transcripts

Video

Download video file

Transcripts

Video transcript

Start of transcript. Skip to the end.
So this lecture is all about how to view data with respect to
other data.
So if you’re telling me that you’re an excellent salesman
because you sold $100,000 worth of widgets,
I have no idea what that means.
You can be terrible compared to your peers, but
how would I know that based on what you told me?
If you tell me the z-score or what percentile you were,
then that’s a different story, right, that’s meaningful.
Obviously, variance in correlation are about
the relationships between two random variables.
Let’s start.
It’s helpful to think of values relative to other values within
the same distribution, and
that’s what a z-score tells you, it tells you where a point is
relative to other points in the distribution.
It’s the number of standard deviations above or
below the mean for a particular point.
It’s helpful to think of values relative to other values within
the same distribution.
If I tell you that I just sold 1200 units,
that doesn’t mean much, cuz you have no idea how good of
a salesman I am, because you don’t have enough context.
What if I told you the mean was a 1000 units?
Still, that doesn’t tell you very much.
You have no idea how unusual it is to go above 1200 units.
Is that very unusual?
Or it was just I was slightly above the mean?
What you need also is the standard deviation.
If I tell you it is 100 units, then you know I’m in business
because I’m selling two standard deviations above the mean.
Now, here’s say a histogram here and
I’m selling here at 1200 and the mean is over here.
Now the vast majority of other salespeople sell much
less than 1200, so I’m at the top of the pile if I’m up here.
Of course, this assumes that salesmen have an approximately
normal distribution, which may or may not be true for
a specific company, but we’ll let that go for now.
The z score of a point x is the number of standard deviations
above or below the mean of X.
And an easy way to compute that is to use this formula
right here.
But that looks a little complicated, so
let’s break it down a bit.
Well, let us start with the original PDF of X.
Let’s say that this is X’s PDF, X is random variable.
Then I’m gonna subtract the mean, so now this
thing has mean 0 because I subtracted the mean, what I did.
When I divide by the standard deviation here,
I actually squish the distribution, so
the distribution now has mean 0 and standard deviation 1.
When I think about one standard deviation above the mean of X,
it’s exactly at the point 1 of this new distribution where
z is 1.
What I did when I subtracted the mean and divided by the standard
deviation is that I shifted the distribution to have mean 0 and
I scaled it to have standard deviation 1,
where I standardized the distribution.
In this way, z measures how many standard deviations X is
above or below the mean.
Now if you’re working with data, you don’t actually have
the mean mu, you only have the sample mean X bar.
People often get confused and call these z-scores, in fact,
I do it myself but they’re actually really sample z-scores.
The sample z-score of X is actually the number of sample
standard deviations above or below the sample mean.
Okay, so here is a histogram of my data, and
you can see that the sample mean is 1000.
And my sample z-score is about 2 because
I’m 2 sample standard deviations above the sample mean.
Just to give you some perspective, let me discuss for
you how rare that actually is.
Here is a standard normal with mean 0 and variance 1.
Now it turns out that 68% of the time,
you’re within one standard deviation of the mean.
Now you can’t calculate this analytically, by the way,
you actually need a computer to do this to get that 68%.
As it turns out that 95% of the time,
you’re within two standard deviations of the mean.
And 99.7% of the time,
you’re within three standard deviations of the mean.
Now you can put it into context.
Though I sold 1200 units, the mean is 1000 and
the standard deviation is 100, but the z-score is 2.
So I sold two standard deviations above the mean and
the probability to be that extreme is actually only 2.5%.
So I’m a pretty unusual sales person.
End of transcript. Skip to the start.

Data Science Essentials & Machine Learning

Curriculum

Introduction to Statistics – Z-Scores

Z-Scores

Downloads and transcripts

Video

Transcripts

Video transcript

Modal title