Probability and Random Variables – Introduction to Probability

Introduction to Probability

Video transcript

Start of transcript. Skip to the end.
Hi, so we’re gonna talk about probability now.
Now probability, I believe, is the most essential skill that
you need to make your case as a data scientist.
You don’t need a ton of advance probability theory,
you just need a really solid grasp of the basics.
And that’s what I’m gonna give you here.
So this is gonna allow you to ask really solid,
maybe even pointed questions.
Things like, in this context, probability is not precisely
defined, but it should be, or are you sure this isn’t a case
of Simpson’s paradox, questions like that.
Now, I want to start by discussing the most common
mistakes that I see among people communicating with data.
So first, they discuss the notion of probability without
actually defining what it means in that context.
You have to define what a random variable means,
in order to talk about probabilities.
Now the word probability by itself is sort of meaningless.
Now, correlation is not causation,
they are quite different.
Now, assuming that because a hypothesis test failed,
people often assume that the null hypothesis must be true.
Now hypothesis testing should not be done by people who don’t
know what hypothesis testing means.
And we’re gonna make sure that you’re not one of those people.
And actually if you goof up any of those things,
people are not going to take you seriously.
They’re gonna think that you don’t know
what you’re talking about.
And so this happens all the time,
because data scientists who are often computer scientists
don’t actually have the training in statistics that they need.
So I’m gonna start this lecture by making sure that you
know the basics of probability.
I’m gonna start with random variables.
Okay, so what’s a random variable?
A random variable assigns a numerical value to each possible
outcome of a random experiment.
Again, it’s something whose value depends on chance.
It has to have a numerical value.
The color of a car chosen at random is not a random variable,
but it would be if you assigned a number to each color that
you’re talking about.
So are the following random variables?
Today’s weather.
No, it’s not.
But if you said, the number of millimeters of rainfall tomorrow
in Redmond, that would be a random variable.
Right, if the number of inches of rainfall
over a certain time and place, that is a number.
The color of a car chosen at random.
That is not a random variable, color is not a number.
But if you assigned 1, if the next car we see is blue,
2 if it’s green, 4 if it’s black, then that would be,
cuz that’s numerical.
The result of a coin flip, heads or tails.
No, that’s not a random variable but if you assign one for
heads and two for tails, then it would be.
The price of Microsoft stock.
Not yesterday’s price, cuz that’s not random, but
tomorrow’s price.
That would be a random variable.
The number of laps between yellow flags in an F1 race.
That’s another example of a random variable,
that’s a number.
Now, random variables come in two flavors, discrete and
continuous.
A discrete random variable has a number of
outcomes that you could count.
So like truffle, number of truffles in a box for instance.
I assume that the machines that put the truffles in a box
aren’t, they’re not so consistent that you always get
exactly the same number of truffles in a box.
Truffles, Cheerios, peanuts, you can count those, so that would
be a discrete random variable, the number of truffles in a box.
Now continuous random variables are different in that you
can’t count the number of possible outcomes,
like you can’t count the amount of ice cream here.
Yeah, you can have one cup of ice cream, or
two cups of ice cream.
But you can have anything in between, and you can’t count
the possibilities because that would take on any real value.
Now, in case you were wondering,
this is actually a raisin on top of that ice cream.
I got compote on top of the ice cream.
So the number of raisins there would be discrete,
but the amount of ice cream would be continuous.
So we’re gonna talk first about discrete random variables and
then we’re gonna talk about continuous.
End of transcript. Skip to the start.

Downloads and transcripts

Video

Transcripts

Video transcript

Data Science Essentials & Machine Learning

Curriculum

Probability and Random Variables – Introduction to Probability

Introduction to Probability

Downloads and transcripts

Video

Transcripts

Video transcript

Leave a Reply Cancel reply

Modal title