- Home
- Courses
- Data Science
- Data Science Essentials & Machine Learning
Curriculum
- 8 Sections
- 69 Lessons
- 4 Weeks
Expand all sectionsCollapse all sections
- Before You StartIntroduction4
- Module 1: Introduction to Data Science12
- 3.1Principles of Data Science – Data Analytic Thinking
- 3.2Principles of Data Science – The Data Science Process
- 3.3Further Reading
- 3.4Data Science Technologies – Introduction to Data Science Technologies
- 3.5Data Science Technologies – An Overview of Data Science Technologies
- 3.6Data Science Technologies – Azure Machine Learning Learning Studio
- 3.7Data Science Technologies – Using Code in Azure ML
- 3.8Data Science Technologies – Jupyter Notebooks
- 3.9Data Science Technologies – Creating a Machine Learning Model
- 3.10Data Science Technologies – Further Reading
- 3.11Lab Instructions
- 3.12Lab Verification
- Module 2: Probability & Statistics for Data Science21
- 4.1Probability and Random Variables – Overview of Probability and Random Variables
- 4.2Probability and Random Variables – Introduction to Probability
- 4.3Probability and Random Variables – Discrete Random Variables
- 4.4Probability and Random Variables – Discrete Probability Distributions
- 4.5Probability and Random Variables – Binomial Distribution Examples
- 4.6Probability and Random Variables – Poisson Distributions
- 4.7Probability and Random Variables – Continuous Probability Distributions
- 4.8Probability and Random Variables – Cumulative Distribution Functions
- 4.9Probability and Random Variables – Central Limit Theorem
- 4.10Probability & Random Variables – Further Reading
- 4.11Introduction to Statistics – Overview of Statistics
- 4.12Introduction to Statistics – Descriptive Statistics
- 4.13Introduction to Statistics – Summary Statistics
- 4.14Introduction to Statistics – Demo: Viewing Summary Statistics
- 4.15Introduction to Statistics – Z-Scores
- 4.16Introduction to Statistics – Correlation
- 4.17Introduction to Statistics – Demo: Viewing Correlation
- 4.18Introduction to Statistics – Simpson’s Paradox
- 4.19Introduction to Statistics – Further Reading
- 4.20Introduction to Statistics – Lab Instructions
- 4.21Introduction to Statistics – Lab Verification
- Module 3: Simulation & Hypothesis Testing16
- 5.1Simulation – Introduction to Simulation
- 5.2Simulation – Start
- 5.3Lab
- 5.4Simulation – Demo: Performing a Simulation
- 5.5Simulation – Further Reading
- 5.6Hypothesis Testing – Overview
- 5.7Hypothesis Testing – Introduction
- 5.8Hypothesis Testing – Z-Tests, T-Tests, and Other Tests
- 5.9Hypothesis Testing – Test Examples
- 5.10Hypothesis Testing – Type 1 and Type 2 Errors
- 5.11Hypothesis Testing – Confidence Intervals
- 5.12Hypothesis Testing – Demo with R & Python
- 5.13Hypothesis Testing – Misconceptions
- 5.14Hypothesis Testing – Further Reading
- 5.15Hypothesis Testing – Lab Instructions
- 5.16Hypothesis Testing – Lab Verification
- Module 4: Exploring & Visualizing Data4
- Module 5: Data Cleansing & Manipulation4
- Module 6: Introduction to Machine Learning4
- Final Exam & Survey4
Probability and Random Variables – Introduction to Probability
Introduction to Probability
Downloads and transcripts
Video transcript
- Start of transcript. Skip to the end.
- Hi, so we’re gonna talk about probability now.
- Now probability, I believe, is the most essential skill that
- you need to make your case as a data scientist.
- You don’t need a ton of advance probability theory,
- you just need a really solid grasp of the basics.
- And that’s what I’m gonna give you here.
- So this is gonna allow you to ask really solid,
- maybe even pointed questions.
- Things like, in this context, probability is not precisely
- defined, but it should be, or are you sure this isn’t a case
- of Simpson’s paradox, questions like that.
- Now, I want to start by discussing the most common
- mistakes that I see among people communicating with data.
- So first, they discuss the notion of probability without
- actually defining what it means in that context.
- You have to define what a random variable means,
- in order to talk about probabilities.
- Now the word probability by itself is sort of meaningless.
- Now, correlation is not causation,
- they are quite different.
- Now, assuming that because a hypothesis test failed,
- people often assume that the null hypothesis must be true.
- Now hypothesis testing should not be done by people who don’t
- know what hypothesis testing means.
- And we’re gonna make sure that you’re not one of those people.
- And actually if you goof up any of those things,
- people are not going to take you seriously.
- They’re gonna think that you don’t know
- what you’re talking about.
- And so this happens all the time,
- because data scientists who are often computer scientists
- don’t actually have the training in statistics that they need.
- So I’m gonna start this lecture by making sure that you
- know the basics of probability.
- I’m gonna start with random variables.
- Okay, so what’s a random variable?
- A random variable assigns a numerical value to each possible
- outcome of a random experiment.
- Again, it’s something whose value depends on chance.
- It has to have a numerical value.
- The color of a car chosen at random is not a random variable,
- but it would be if you assigned a number to each color that
- you’re talking about.
- So are the following random variables?
- Today’s weather.
- No, it’s not.
- But if you said, the number of millimeters of rainfall tomorrow
- in Redmond, that would be a random variable.
- Right, if the number of inches of rainfall
- over a certain time and place, that is a number.
- The color of a car chosen at random.
- That is not a random variable, color is not a number.
- But if you assigned 1, if the next car we see is blue,
- 2 if it’s green, 4 if it’s black, then that would be,
- cuz that’s numerical.
- The result of a coin flip, heads or tails.
- No, that’s not a random variable but if you assign one for
- heads and two for tails, then it would be.
- The price of Microsoft stock.
- Not yesterday’s price, cuz that’s not random, but
- tomorrow’s price.
- That would be a random variable.
- The number of laps between yellow flags in an F1 race.
- That’s another example of a random variable,
- that’s a number.
- Now, random variables come in two flavors, discrete and
- continuous.
- A discrete random variable has a number of
- outcomes that you could count.
- So like truffle, number of truffles in a box for instance.
- I assume that the machines that put the truffles in a box
- aren’t, they’re not so consistent that you always get
- exactly the same number of truffles in a box.
- Truffles, Cheerios, peanuts, you can count those, so that would
- be a discrete random variable, the number of truffles in a box.
- Now continuous random variables are different in that you
- can’t count the number of possible outcomes,
- like you can’t count the amount of ice cream here.
- Yeah, you can have one cup of ice cream, or
- two cups of ice cream.
- But you can have anything in between, and you can’t count
- the possibilities because that would take on any real value.
- Now, in case you were wondering,
- this is actually a raisin on top of that ice cream.
- I got compote on top of the ice cream.
- So the number of raisins there would be discrete,
- but the amount of ice cream would be continuous.
- So we’re gonna talk first about discrete random variables and
- then we’re gonna talk about continuous.
- End of transcript. Skip to the start.