Probability for Data Science and Machine Learning

Probability

Probability is a measure of the likelihood that a particular event will occur. It quantifies uncertainty and helps in making predictions about future events based on past data. The value of probability ranges between 0 and 1, where 0 indicates that an event will not occur and 1 indicates that an event will certainly occur.

Important Terms in Probability:

Experiment

An experiment is a process or action that results in one or more outcomes. Each repetition of an experiment is called a trial. For example, flipping a coin or rolling a dice are common experiments in probability theory.

Sample Space

The sample space, often denoted as S, is the set of all possible outcomes of an experiment. For example:

 The sample space of flipping a coin is
 S = {Heads, Tails}

 The sample space of rolling a six-sided dice is
 S = {1, 2, 3, 4, 5, 6}

Event

An event is a specific outcome or a set of outcomes of an experiment. An event can be a subset of the sample space. For example:

 Getting a “Heads” when flipping a coin is an event:
 A = {Heads}

 Rolling an even number on a six-sided dice is an event:
 B = {2, 4, 6}

Probability of an Event

The probability of an event is calculated by dividing the number of favorable outcomes by the total number of possible outcomes in the sample space. If A is an event, the probability of A, denoted as P(A), is given by:

P(A) = Number of favorable outcomes / Total number of possible outcomes

For example, the probability of getting “Heads” when flipping a coin is:

P(Heads) = 1 / 2

The probability of rolling an even number on a six-sided dice is:

P(Even) = 3 / 6 = 1 / 2

These basic terms and concepts form the foundation of probability theory, which is essential for data science and machine learning. Understanding these terms will help in comprehending more advanced topics and algorithms that rely on probabilistic methods.

When to Add and When to Multiply in Probability

Understanding when to add and when to multiply probabilities is essential for solving different types of probability problems. The rules depend on whether the events are mutually exclusive or independent.

Adding Probabilities

You add probabilities when you are dealing with mutually exclusive events. Mutually exclusive events are events that cannot happen at the same time. For example, when flipping a coin, getting “Heads” and getting “Tails” are mutually exclusive events.

The formula for the probability of either event A or event B occurring (denoted as A or B) is:

P(A or B) = P(A) + P(B)

This rule applies only if A and B are mutually exclusive.

For example, if you roll a six-sided dice, the probability of rolling a 2 or a 4 is:

P(2 or 4) = P(2) + P(4) = 1/6 + 1/6 = 2/6 = 1/3

Multiplying Probabilities

You multiply probabilities when you are dealing with independent events. Independent events are events where the occurrence of one event does not affect the occurrence of the other event. For example, flipping a coin and rolling a dice are independent events.

The formula for the probability of both event A and event B occurring (denoted as A and B) is:

P(A and B) = P(A) * P(B)

For example, if you flip a coin and roll a six-sided dice, the probability of getting “Heads” and rolling a 3 is:

P(Heads and 3) = P(Heads) * P(3) = 1/2 * 1/6 = 1/12

So, in summary:

Add probabilities for mutually exclusive events: P(A or B) = P(A) + P(B)

Multiply probabilities for independent events: P(A and B) = P(A) * P(B)

These rules help in calculating the probability of combined events, whether they occur one after another or as alternatives. Understanding when to add and when to multiply probabilities is crucial for accurate probability calculations in data science and machine learning.

Conditional Probability

Conditional probability is the probability of an event occurring given that another event has already occurred. It helps in understanding the likelihood of an event under a specific condition or scenario. Conditional probability is denoted by P(A|B), which reads as “the probability of event A given event B.”

Formula

The formula for conditional probability is:

P(A|B) = P(A ∩ B) / P(B)

Where:

P(A|B) is the conditional probability of event A given event B.

P(A ∩ B) is the probability of both event A and event B occurring.

P(B) is the probability of event B occurring.

Example

Consider an example of drawing two cards from a standard deck of playing cards. Let event A be drawing a red card, and event B be drawing a heart card.

The probability of drawing a heart card (event B) from a standard deck is P(B) = 13/52 = 1/4.
The probability of drawing a red card and a heart card (event A and event B) is P(A ∩ B) = 13/52 = 1/4.

Using the formula for conditional probability:

P(A|B) = P(A ∩ B) / P(B) = (1/4) / (1/4) = 1

So, the conditional probability of drawing a red card given that a heart card is drawn is 1, indicating that if a heart card is drawn, it must be red.

Addition Theorem of Probability

The Addition Theorem of Probability, also known as the Addition Rule, is a fundamental concept in probability theory. It provides a method for calculating the probability of the union of two events.

Statement

The Addition Theorem states that the probability of the union of two events A and B is equal to the sum of their individual probabilities minus the probability of their intersection:

P(A ∪ B) = P(A) + P(B) – P(A ∩ B)

Where:

P(A ∪ B) is the probability of either event A or event B occurring, or both.

P(A) is the probability of event A occurring.

P(B) is the probability of event B occurring.

P(A ∩ B) is the probability of both event A and event B occurring.

Example

Consider a standard deck of playing cards. Let event A be drawing a red card, and event B be drawing a heart card.

The probability of drawing a red card (event A) from a standard deck is P(A) = 26/52 = 1/2.
The probability of drawing a heart card (event B) from a standard deck is P(B) = 13/52 = 1/4.
The probability of drawing a red heart card (event A ∩ B) from a standard deck is P(A ∩ B) = 13/52 = 1/4.

Using the Addition Theorem:

 P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
 = 1/2 + 1/4 – 1/4
 = 1/2

So, the probability of drawing either a red card or a heart card (or both) from a standard deck is 1/2.

Multiplication Theorem of Probability

The Multiplication Theorem of Probability, also known as the Multiplication Rule, is another fundamental concept in probability theory. It provides a method for calculating the probability of the intersection of two events.

Statement

The Multiplication Theorem states that the probability of the intersection of two events A and B is equal to the probability of event A occurring multiplied by the conditional probability of event B occurring given that event A has already occurred:

P(A ∩ B) = P(A) * P(B|A)

Where:

P(A ∩ B) is the probability of both event A and event B occurring.

P(A) is the probability of event A occurring.

P(B|A) is the conditional probability of event B occurring given that event A has already occurred.

Example

Consider a standard deck of playing cards. Let event A be drawing a red card, and event B be drawing a heart card.

The probability of drawing a red card (event A) from a standard deck is P(A) = 26/52 = 1/2.
If a red card is drawn (event A has occurred), the probability of drawing a heart card (event B) from the remaining cards is P(B|A) = 13/51.

Using the Multiplication Theorem:

 P(A ∩ B) = P(A) * P(B|A)
 = (1/2) * (13/51)
 = 13/102

So, the probability of drawing both a red card and a heart card from a standard deck is 13/102.

Random Variable

In probability theory and statistics, a random variable is a variable whose possible values are outcomes of a random phenomenon. It represents a numerical outcome of a random experiment, often denoted by a letter such as X, Y, or Z.

Definition

Formally, a random variable X is a function that takes an event as input and returns a real number as output. It assigns a numerical value to each outcome in the sample space of the experiment. Mathematically, we can express a random variable as follows:

X: Event -> Real Numbers

Where X is the random variable, Event represents the set of all possible outcomes, and Real Numbers represents the set of all real numbers.

Example

Consider the experiment of tossing 3 coins. Let’s define a random variable X that represents the number of heads obtained in the experiment. We can define X as follows:

X: {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} -> {0, 1, 2, 3}

For each outcome in the sample space, the output of the random variable function X will be:

 X(HHH) = 3
 X(HHT) = 2
 X(HTH) = 2
 X(THH) = 2
 X(HTT) = 1
 X(THT) = 1
 X(TTH) = 1
 X(TTT) = 0

In this example, the random variable X takes an event (outcome of tossing 3 coins) as input and returns a number (the number of heads) associated with that event. Each event in the sample space is associated with a specific value of X, demonstrating that X is indeed a function mapping events to real numbers.

Probability Distribution of Random Variable

In probability theory, the probability distribution of a random variable describes the likelihood of each possible outcome of the variable. It provides a mapping between the values of the random variable and their associated probabilities.

Definition

For a random variable X, its probability distribution P(X) assigns probabilities to each possible value that X can take. Mathematically, for a discrete random variable X, the probability distribution is typically represented as a probability mass function (PMF), denoted as P(X = x), where x is a specific value that X can take.

Example

Consider the experiment of tossing 3 coins. Let X be the random variable representing the number of heads obtained in the experiment. For each outcome in the sample space, the probability distribution of X is as follows:

 P(X = 0) = 1/8
 P(X = 1) = 3/8
 P(X = 2) = 3/8
 P(X = 3) = 1/8

In this example, each possible outcome of tossing 3 coins has an associated probability based on the number of heads obtained. The probability distribution of X provides insights into the likelihood of different outcomes and forms the basis for analyzing the behavior of the random variable in various scenarios.

Mean of Random Variable

In probability theory, the mean of a random variable represents the average value of all possible outcomes weighted by their respective probabilities. It is also known as the expected value of the random variable.

Definition

The mean mu of a random variable X is calculated by summing the product of each possible value of X and its corresponding probability. Mathematically, for a discrete random variable X, the mean is given by:

Mean(X) = Σ(x * P(X = x))

Where x represents each possible value that X can take, and P(X = x) is the probability of X taking the value x.

For a continuous random variable X, the mean is given by the integral:

Mean(X) = ∫(x * f(x)) dx

Where f(x) is the probability density function of X.

Example

Consider the experiment of tossing three coins. Let X be the random variable representing the number of heads obtained in the experiment. The sample space for this experiment is {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, and the probability distribution of X is:

 P(X = 0) = 1/8
 P(X = 1) = 3/8
 P(X = 2) = 3/8
 P(X = 3) = 1/8

The mean of X can be calculated as:

 Mean(X) = (0 * 1/8) + (1 * 3/8) + (2 * 3/8) + (3 * 1/8)
 = (0/8) + (3/8) + (6/8) + (3/8)
 = 12/8
 = 1.5

So, the mean of the random variable X representing the number of heads obtained in tossing three coins is 1.5.

Variance of Random Variable

In probability theory, the variance of a random variable measures how much the values of the random variable differ from the mean. It provides a measure of the spread or dispersion of the random variable’s values around the mean.

Definition

The variance of a random variable X, denoted as Var(X), is calculated as the average of the squared differences between each value of X and the mean of X. Mathematically, for a discrete random variable X, the variance is given by:

Var(X) = Σ((x – μ)^2 * P(X = x))

Where x represents each possible value that X can take, μ is the mean of X, and P(X = x) is the probability of X taking the value x.

For a continuous random variable X, the variance is calculated similarly using the integral:

Var(X) = ∫((x – μ)^2 * f(x)) dx

Where f(x) is the probability density function of X.

Example

Consider the same experiment of tossing three coins, with the random variable X representing the number of heads obtained. We’ve already calculated that the mean of X is 1.5.

The variance of X can be calculated as:

 Var(X) = ((0 – 1.5)^2 * 1/8) + ((1 – 1.5)^2 * 3/8) + ((2 – 1.5)^2 * 3/8) + ((3 – 1.5)^2 * 1/8)
 = (2.25 * 1/8) + (0.25 * 3/8) + (0.25 * 3/8) + (2.25 * 1/8)
 = 0.28125 + 0.09375 + 0.09375 + 0.28125
 = 0.75

So, the variance of the random variable X representing the number of heads obtained in tossing three coins is 0.75.

Joint Probability Distribution

In probability theory and statistics, a joint probability distribution represents the probability of two or more random variables taking on specific values simultaneously. It provides a comprehensive way to study the relationship between multiple random variables.

Definition

A joint probability distribution for two discrete random variables X and Y, denoted as P(X = x, Y = y), gives the probability that X takes on the value x and Y takes on the value y at the same time.

For continuous random variables, the joint probability distribution is described by a joint probability density function f(x, y), which gives the probability that X and Y fall within a particular range of values.

Joint Probability Mass Function (Discrete Case)

For discrete random variables X and Y, the joint probability mass function (pmf) is defined as:

P(X = x, Y = y) = Probability that X = x and Y = y

Joint Probability Density Function (Continuous Case)

For continuous random variables X and Y, the joint probability density function (pdf) is defined as:

f(x, y) = Probability density that X = x and Y = y

Example

Consider a simple example with two discrete random variables X and Y, representing the outcome of tossing three coins. Let X be the number of heads in the first two tosses and Y be the number of heads in the last two tosses. The sample space for each coin toss is {H, T}.

The joint probability distribution of X and Y can be represented as a table where each cell represents the probability P(X = x, Y = y):

 Y=0    Y=1    Y=2
 X=0   1/8   2/8   1/8
 X=1   2/8   4/8   2/8
 X=2   1/8   2/8   1/8

Here’s the breakdown of how the probabilities are calculated:

 P(X = 0, Y = 0) = P(TTT) = 1/8
 P(X = 0, Y = 1) = P(TTH) + P(THT) = 1/8 + 1/8 = 2/8
 P(X = 0, Y = 2) = P(THH) = 1/8
 P(X = 1, Y = 0) = P(HTT) + P(THT) = 1/8 + 1/8 = 2/8
 P(X = 1, Y = 1) = P(HTH) + P(THH) + P(HHT) + P(TTH) = 1/8 + 1/8 + 1/8 + 1/8 = 4/8
 P(X = 1, Y = 2) = P(HHH) + P(HHT) = 1/8 + 1/8 = 2/8
 P(X = 2, Y = 0) = P(HHT) = 1/8
 P(X = 2, Y = 1) = P(HHT) + P(HHH) = 1/8 + 1/8 = 2/8
 P(X = 2, Y = 2) = P(HHH) = 1/8

Each entry in the table represents the probability of a specific combination of outcomes for the three coin tosses. Since the coin tosses are fair, each combination has a probability based on the respective outcomes.

This example shows how to set up and understand a joint probability distribution for two discrete random variables using the outcomes of tossing three coins.

Standard Deviation of a Random Variable

In probability theory and statistics, the standard deviation of a random variable is a measure of the amount of variation or dispersion in a set of values. It indicates how much the values of the random variable deviate from the mean (expected value) of the random variable.

Definition

The standard deviation of a random variable X, denoted as sigma (σ), is the square root of the variance of X. Mathematically, for a discrete random variable X, the standard deviation is given by:

σ(X) = sqrt(Var(X))

Where Var(X) is the variance of X.

For a continuous random variable X, the standard deviation is calculated similarly using the integral:

σ(X) = sqrt( ∫((x – μ)^2 * f(x)) dx )

Where μ is the mean of X, and f(x) is the probability density function of X.

Example

Consider the same experiment of tossing three coins, with the random variable X representing the number of heads obtained. We have already calculated that the mean (μ) of X is 1.5, and the variance (Var(X)) is 0.75.

The standard deviation of X can be calculated as:

 σ(X) = sqrt(Var(X))
 = sqrt(0.75)
 = 0.866

So, the standard deviation of the random variable X representing the number of heads obtained in tossing three coins is approximately 0.866.

The standard deviation provides a useful measure of the spread of the values around the mean. In this case, it tells us how much the number of heads obtained in the experiment typically deviates from the mean number of heads (1.5).

Covariance of Random Variables

In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable correspond with the greater values of the other variable, and the same holds for lower values, the covariance is positive. If greater values of one variable correspond with lower values of the other, the covariance is negative. If the variables are independent, the covariance is zero.

Definition

The covariance between two random variables X and Y, denoted as Cov(X, Y), is defined as the expected value of the product of their deviations from their respective means. Mathematically, for discrete random variables X and Y, the covariance is given by:

Cov(X, Y) = Σ((x – μX) * (y – μY) * P(X = x, Y = y))

Where:

x and y are the possible values of X and Y.
μX is the mean of X.
μY is the mean of Y.
P(X = x, Y = y) is the joint probability that X takes the value x and Y takes the value y.

For continuous random variables X and Y, the covariance is calculated using the integral:

Cov(X, Y) = ∫∫((x – μX) * (y – μY) * f(x, y)) dx dy

Where f(x, y) is the joint probability density function of X and Y.

Example

Consider the experiment of tossing three coins twice. Let X be the random variable representing the number of heads obtained in the first three tosses, and Y be the random variable representing the number of heads obtained in the second three tosses.

The sample space for each set of three coin tosses is {0, 1, 2, 3}, and the joint probability distribution of X and Y is as follows:

 P(X = 0, Y = 0) = 1/64
 P(X = 0, Y = 1) = 3/64
 P(X = 0, Y = 2) = 3/64
 P(X = 0, Y = 3) = 1/64
 P(X = 1, Y = 0) = 3/64
 P(X = 1, Y = 1) = 9/64
 P(X = 1, Y = 2) = 9/64
 P(X = 1, Y = 3) = 3/64
 P(X = 2, Y = 0) = 3/64
 P(X = 2, Y = 1) = 9/64
 P(X = 2, Y = 2) = 9/64
 P(X = 2, Y = 3) = 3/64
 P(X = 3, Y = 0) = 1/64
 P(X = 3, Y = 1) = 3/64
 P(X = 3, Y = 2) = 3/64
 P(X = 3, Y = 3) = 1/64

We first need to calculate the means μX and μY:

 μX = μY = (0 * 1/8) + (1 * 3/8) + (2 * 3/8) + (3 * 1/8)
 = 1.5

Then, we calculate the covariance using the joint probability distribution:

 Cov(X, Y) = Σ((x – μX) * (y – μY) * P(X = x, Y = y))
 = (0 – 1.5)(0 – 1.5) * 1/64 + (0 – 1.5)(1 – 1.5) * 3/64 + (0 – 1.5)(2 – 1.5) * 3/64 + (0 – 1.5)(3 – 1.5) * 1/64
 + (1 – 1.5)(0 – 1.5) * 3/64 + (1 – 1.5)(1 – 1.5) * 9/64 + (1 – 1.5)(2 – 1.5) * 9/64 + (1 – 1.5)(3 – 1.5) * 3/64
 + (2 – 1.5)(0 – 1.5) * 3/64 + (2 – 1.5)(1 – 1.5) * 9/64 + (2 – 1.5)(2 – 1.5) * 9/64 + (2 – 1.5)(3 – 1.5) * 3/64
 + (3 – 1.5)(0 – 1.5) * 1/64 + (3 – 1.5)(1 – 1.5) * 3/64 + (3 – 1.5)(2 – 1.5) * 3/64 + (3 – 1.5)(3 – 1.5) * 1/64
 = 0

In this example, the covariance between X and Y is 0, indicating that there is no linear relationship between the number of heads obtained in the first three coin tosses and the number of heads obtained in the second three coin tosses.

Bayes’ Theorem

In probability theory and statistics, Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that might be related to the event. It provides a way to update the probability of a hypothesis as more evidence or information becomes available.

Definition

Bayes’ Theorem for two events A and B is given by:

P(A | B) = (P(B | A) * P(A)) / P(B)

Where:

P(A | B) is the conditional probability of event A given that event B has occurred.
P(B | A) is the conditional probability of event B given that event A has occurred.
P(A) is the prior probability of event A.
P(B) is the prior probability of event B.

Example

Consider an example involving medical testing. Suppose there is a disease that affects 1% of the population. A test for the disease is 99% accurate for those who have the disease (true positive rate) and 95% accurate for those who do not have the disease (true negative rate). We want to find the probability that a person has the disease given that they tested positive.

Let:

A be the event that a person has the disease.
B be the event that a person tests positive for the disease.

Given:

P(A) = 0.01 (prior probability of having the disease).
P(B | A) = 0.99 (probability of testing positive given having the disease).
P(B | not A) = 0.05 (probability of testing positive given not having the disease).

To find P(B), the total probability of testing positive, we use the law of total probability:

 P(B) = P(B | A) * P(A) + P(B | not A) * P(not A)
 = 0.99 * 0.01 + 0.05 * 0.99
 = 0.0099 + 0.0495
 = 0.0594

Now, applying Bayes’ Theorem:

 P(A | B) = (P(B | A) * P(A)) / P(B)
 = (0.99 * 0.01) / 0.0594
 = 0.0099 / 0.0594
 = 0.1667

So, the probability that a person has the disease given that they tested positive is approximately 16.67%. This example illustrates how Bayes’ Theorem can be used to update the probability of a hypothesis (having the disease) based on new evidence (testing positive).

Bernoulli Process

A Bernoulli process is a sequence of independent and identically distributed random variables, each taking on one of two possible outcomes, typically labeled as “success” (with probability p) and “failure” (with probability 1 – p). Each trial in a Bernoulli process follows a Bernoulli distribution, where the probability of success remains constant across all trials. The Bernoulli process forms the foundation for binomial distributions, where the number of successes in a fixed number of Bernoulli trials is counted.

Binomial Distribution

In probability theory and statistics, a binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials, each with the same probability of success.

Definition

A random variable X follows a binomial distribution if it represents the number of successes in n independent Bernoulli trials, each with the same probability of success p. The probability mass function (pmf) of a binomial distribution is given by:

P(X = k) = C(n, k) * p^k * (1 – p)^(n – k)

Where:

C(n, k) = n! / (k! * (n – k)!) is the binomial coefficient, representing the number of ways to choose k successes from n trials.
n is the number of trials.
k is the number of successes.
p is the probability of success on each trial.
(1 – p) is the probability of failure on each trial.

Example

Consider an example where we flip a fair coin 3 times, and we want to find the probability of getting exactly 2 heads.

Let:

n = 3 (number of trials)
k = 2 (number of successes)
p = 0.5 (probability of success, i.e., getting a head on each trial)

Using the binomial distribution formula:

 P(X = 2) = C(3, 2) * (0.5)^2 * (1 – 0.5)^(3 – 2)
 = 3! / (2! * (3 – 2)!) * (0.5)^2 * (0.5)^1
 = 3 / 1 * 0.25 * 0.5
 = 3 * 0.125
 = 0.375

So, the probability of getting exactly 2 heads in 3 coin flips is 0.375.

Properties

Mean: The mean (expected value) of a binomial distribution is given by:

Mean(X) = n * p

Variance: The variance of a binomial distribution is given by:

Var(X) = n * p * (1 – p)

Example for Mean and Variance

Using the same coin flip example:

n = 3 (number of trials)
p = 0.5 (probability of success)

The mean of the binomial distribution is:

 Mean(X) = n * p
 = 3 * 0.5
 = 1.5

The variance of the binomial distribution is:

 Var(X) = n * p * (1 – p)
 = 3 * 0.5 * 0.5
 = 0.75

So, for 3 coin flips with a fair coin, the expected number of heads is 1.5, and the variance is 0.75.

Negative Binomial Distribution

In probability theory and statistics, the negative binomial distribution models the number of trials needed to achieve a specified number of successes in a sequence of independent and identically distributed Bernoulli trials, each with the same probability of success.

Definition

A random variable X follows a negative binomial distribution if it represents the number of trials required to achieve a specified number of successes r, where each trial has a probability of success p. The probability mass function (pmf) of a negative binomial distribution is given by:

P(X = k) = C(k – 1, r – 1) * p^r * (1 – p)^(k – r)

Where:

C(k – 1, r – 1) = (k – 1)! / ((r – 1)! * (k – r)!) is the binomial coefficient, representing the number of ways to distribute k – r failures among k – 1 trials.
k is the total number of trials.
r is the number of successes.
p is the probability of success on each trial.
(1 – p) is the probability of failure on each trial.

Example

Consider an example where we flip a fair coin until we get 3 heads. We want to find the probability that it takes exactly 5 flips to get those 3 heads.

Let:

r = 3 (number of successes)
k = 5 (total number of trials)
p = 0.5 (probability of success, i.e., getting a head on each trial)

Using the negative binomial distribution formula:

 P(X = 5) = C(5 – 1, 3 – 1) * (0.5)^3 * (1 – 0.5)^(5 – 3)
 = C(4, 2) * (0.5)^3 * (0.5)^2
 = 6 * 0.125 * 0.25
 = 6 * 0.03125
 = 0.1875

So, the probability that it takes exactly 5 flips to get 3 heads is 0.1875.

Properties

Mean: The mean (expected value) of a negative binomial distribution is given by:

Mean(X) = r / p

Variance: The variance of a negative binomial distribution is given by:

Var(X) = r * (1 – p) / p^2

Example for Mean and Variance

Using the same coin flip example:

r = 3 (number of successes)
p = 0.5 (probability of success)

The mean of the negative binomial distribution is:

 Mean(X) = r / p
 = 3 / 0.5
 = 6

The variance of the negative binomial distribution is:

 Var(X) = r * (1 – p) / p^2
 = 3 * (1 – 0.5) / 0.5^2
 = 3 * 0.5 / 0.25
 = 6

So, for getting 3 heads in coin flips where each flip has a 0.5 probability of success, the expected number of flips is 6, and the variance is also 6.

Geometric Distribution

In probability theory and statistics, the geometric distribution models the number of trials needed to get the first success in a sequence of independent and identically distributed Bernoulli trials, each with the same probability of success.

Definition

A random variable X follows a geometric distribution if it represents the number of trials required to achieve the first success, where each trial has a probability of success p. The probability mass function (pmf) of a geometric distribution is given by:

P(X = k) = (1 – p)^(k – 1) * p

Where:

k is the number of trials.
p is the probability of success on each trial.
(1 – p) is the probability of failure on each trial.

Example

Consider an example where we flip a fair coin until we get the first head. We want to find the probability that it takes exactly 3 flips to get that first head.

Let:

k = 3 (number of trials)
p = 0.5 (probability of success, i.e., getting a head on each trial)

Using the geometric distribution formula:

 P(X = 3) = (1 – 0.5)^(3 – 1) * 0.5
 = (0.5)^2 * 0.5
 = 0.25 * 0.5
 = 0.125

So, the probability that it takes exactly 3 flips to get the first head is 0.125.

Properties

Mean: The mean (expected value) of a geometric distribution is given by:

Mean(X) = 1 / p

Variance: The variance of a geometric distribution is given by:

Var(X) = (1 – p) / p^2

Example for Mean and Variance

Using the same coin flip example:

p = 0.5 (probability of success)

The mean of the geometric distribution is:

 Mean(X) = 1 / p
 = 1 / 0.5
 = 2

The variance of the geometric distribution is:

 Var(X) = (1 – p) / p^2
 = (1 – 0.5) / 0.5^2
 = 0.5 / 0.25
 = 2

So, for getting the first head in coin flips where each flip has a 0.5 probability of success, the expected number of flips is 2, and the variance is also 2.

Normal Distribution

In probability theory and statistics, the normal distribution, also known as the Gaussian distribution, is one of the most important probability distributions. It describes how the values of a random variable are distributed and is characterized by its bell-shaped curve.

Definition

A random variable (X) is said to be normally distributed if it has the probability density function (pdf) given by:

f(x) = (1 / (σ * sqrt(2 * π))) * exp(-((x – μ)^2) / (2 * σ^2))

Where:

(μ) (mu) is the mean of the distribution.
(σ) (sigma) is the standard deviation of the distribution.
(π) (pi) is a constant approximately equal to 3.14159.
(exp) is the exponential function.

Properties

Mean: The mean (μ) of the normal distribution is the central point around which the values are distributed.

Variance: The variance (σ^2) measures the spread of the distribution. The standard deviation (σ) is the square root of the variance.

Symmetry: The normal distribution is symmetric around its mean (μ).

68-95-99.7 Rule: Approximately 68% of the data falls within one standard deviation ((μ ± σ)), 95% within two standard deviations ((μ ± 2σ)), and 99.7% within three standard deviations ((μ ± 3σ)).

Example

Consider a simple example where the heights of a group of people are normally distributed with a mean height (μ) of 170 cm and a standard deviation (σ) of 10 cm.

The probability density function for this normal distribution is:

f(x) = (1 / (10 * sqrt(2 * π))) * exp(-((x – 170)^2) / (2 * 10^2))

This function gives the probability density for any given height (x).

Calculating Probabilities

To find the probability that a person’s height is between 160 cm and 180 cm, we would calculate the area under the normal distribution curve between these two values. This is typically done using statistical tables or software.

For our example:

P(160 ≤ X ≤ 180) ≈ 0.6826

This result indicates that there is approximately a 68.26% chance that a randomly selected person from this group will have a height between 160 cm and 180 cm, which aligns with the 68-95-99.7 rule.

Standard Normal Distribution

The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1. The standard normal distribution is denoted by (Z) and its probability density function is:

f(z) = (1 / sqrt(2 * π)) * exp(-z^2 / 2)

Where (z) is the standard score or z-score, which represents the number of standard deviations a data point is from the mean.