【Mathematical Statistics】2- Random variables and their probability distribution

【Mathematical Statistics】2- Random Variables and Their Probability Distribution

2024-07-12

1. Definition and classification of random variables
2. Discrete Probability Distribution
Continuous Probability Distribution

1. Definition and classification of random variables

Definition: A random variable is a variable that takes on different values in a random experiment.

Categorical random variable (Nominal)
- Examples: Gender (male, female), occupation (civil servant, corporate employee, student, retired, unemployed), test result (negative, positive)
Ordered categorical random variables
- Examples: attitude (strongly agree, agree, neutral, disagree, strongly disagree), frequency of use (once a week, once every two weeks, once every six months, rarely, never)
Numeric random variables
- Example: Age (13, 14, 15, 16, etc.), Income (any value can be filled in)

2. Discrete Probability Distribution

2.1 Binomial Distribution

definition: Description in $n$ Success in independent trials $k$ The probability of success in each trial is $p$ 。
formula： $X \sim Bin (n, p)$
Probability Mass Function (PMF)： $p^k (1-p)^{n-k}$
Cumulative Distribution Function (CDF)： $sum_{i=0}^{k} binom{n}{i} p^i (1-p)^{n-i}$

2.2 Bernoulli Distribution

definition: describes the probability of success (or failure) in a trial. The probability of success is $p$ 。
formula： $X \sim Bern (p)$
Probability Mass Function (PMF)： $begin{cases} p & text{if } x = 1 \ 1 - p & text{if } x = 0 end{cases}$
Cumulative Distribution Function (CDF)： $begin{cases} 0 & text{if } x < 0 \ 1 - p & text{if } 0 leq x < 1 \ 1 & text{if } x geq 1 end{cases}$

2.3 Geometric Distribution

definition: describes the probability of the number of failures before the first success, and the probability of success in each trial is $p$ 。
formula： $X \sim Geom (p)$
Probability Mass Function (PMF)： $P(X = k) = (1-p)^k p$
Cumulative Distribution Function (CDF)： $F(X = k) = 1 - (1-p)^{k+1}$

2.4 Negative Binomial Distribution

definition: describes the number of failures before reaching the rth success, and the probability of success in each trial is $p$ 。
formula： $X \sim NegBin (r, p)$
Probability Mass Function (PMF)： $p^r (1-p)^k$
Cumulative Distribution Function (CDF)： $sum_{i=0}^{k} binom{i + r - 1}{i} p^r (1-p)^i$

2.5 Hypergeometric Distribution

definition: describes the process of sampling from a finite population without replacement. $n$ times, success $k$ times the probability.
formula： $X \sim Hypergeom (N, K, n)$
Probability Mass Function (PMF)： $P ( X = k ) = ( K k ) ( N − K n − k ) ( N n ) P(X = k) = frac{binom{K}{k} binom{N-K}{n-k}}{binom{N}{n}}$
Cumulative Distribution Function (CDF)： $sum_{i=0}^{k} frac{binom{K}{i} binom{N-K}{n-i}}{binom{N}{n}}$

2.6 Poisson Distribution

definition：Describes what happens in a unit of time $k$ The probability of an event is λ, and the average rate of events is λ.
formula： $X \sim Poisson (λ)$
Probability Mass Function (PMF)： $frac{lambda^k e^{-lambda}}{k!}$
Cumulative Distribution Function (CDF)： $e^{-lambda} sum_{i=0}^{k} frac{lambda^i}{i!}$

2.7 Relationship between them

Bernoulli Distributionis a specialBinomial Distribution,when $n = 1$ hour.
Geometric distributionis the distribution of the number of trials required before the first success, which can be viewed asBinomial Distributionextension of.
Negative binomial distributionCan be seen asGeometric distributionA generalization of , used to describe the number of failures required before r successes.
Hypergeometric distributionSimilar toBinomial Distribution, but is applicable to finite populations and sampling without replacement.
Poisson distributionyesBinomial DistributionIn the limiting case, $n$ Very large and $p$ Very small, and $λ = n p$ keep constant.

Continuous Probability Distribution

3.1 Exponential Distribution

definition: Exponential distribution is a continuous probability distribution, often used to describe the time intervals between independent events.
formula： $X \sim Exponential (λ)$ ,in $λ > 0$ is the rate parameter
Probability density function (PDF)： $e^{-lambda x} quad text{for } x geq 0$
Cumulative Distribution Function (CDF)： $e^{-lambda x} quad text{for } x geq 0$

3.2 Gamma Distribution

definition: Describes the accumulation of waiting time and is a generalization of the exponential distribution and the χ² distribution.
formula： $X \sim Gamma (k, θ)$ ,in $k > 0$ is the shape parameter, $θ > 0$ is the scale parameter
Probability density function (PDF)： $frac{x^{k-1}e^{-x/theta}}{theta^k Gamma(k)} quad text{for } x geq 0$ ,in $Γ (k)$ is the gamma function
Cumulative Distribution Function (CDF)： $F ( x ; k , θ ) = γ ( k , x / θ ) Γ ( k ) F(x; k, theta) = frac{gamma(k, x/theta)}{Gamma(k)}$ ,in $γ (k, x / θ)$ is the incomplete gamma function.

3.3 Normal Distribution

definition: Describes the distribution of the sum of a large number of independent random variables and is widely used in natural sciences and social sciences.
formula： $sigma^2)$ ,in, $μ$ is the mean, $sigma^2$ is the variance
Probability density function (PDF)： $sigma^2) = frac{1}{sqrt{2pisigma^2}} e^{-frac{(x-mu)^2}{2sigma^2}}$
Cumulative Distribution Function (CDF)： $sigma^2) = frac{1}{2}left[1 + operatorname{erf}left(frac{x-mu}{sqrt{2sigma^2}}right)right]$ , where erf is the error function.

3.4 t-Distribution

definition: Used for hypothesis testing and confidence interval estimation in small sample cases.
formula： $X \sim t (ν)$ ,in, $ν$ It is the degree of freedom
Probability density function (PDF)： $frac{t^2}{nu}right)^{-frac{nu+1}{2}}$
Cumulative Distribution Function (CDF)： $int_0^{t} left(1 + frac{u^2}{nu}right)^{-frac{nu+1}{2}} du$

3.5 Chi-Square Distribution

definition: Commonly used in hypothesis testing and analysis of variance.
formula： $chi^2(k)$ ,in $k$ It is the degree of freedom
Probability density function (PDF)： $frac{1}{2^{k/2} Gamma(k/2)} x^{k/2-1} e^{-x/2} quad text{for } x geq 0$
Cumulative Distribution Function (CDF)： $F ( x ; k ) = γ ( k 2 , x 2 ) Γ ( k 2 ) F(x; k) = frac{gammaleft(frac{k}{2}, frac{x}{2}right)}{Gammaleft(frac{k}{2}right)}$

3.6 F-Distribution

definition: Used to compare the variance of two samples.
formula： $F(d_1, d_2)$ ,in $d_1$ and $d_2$ It is the degree of freedom
Probability density function (PDF)： $d_1, d_2) = frac{sqrt{frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1 + d_2}}}}{x Bleft(frac{d_1}{2}, frac{d_2}{2}right)} quad text{for } x geq 0$ ,in $B$ is the beta function
Cumulative Distribution Function (CDF)： $d_1, d_2) = I_{frac{d_1 x}{d_1 x + d_2}}left(frac{d_1}{2}, frac{d_2}{2}right)$ ,in $I$ is an incomplete beta function

3.7 Relationship between them

The chi-square distribution isSum of squares of a normal distribution.For example, $k$ The sum of squares of independent standard normal variables follows the degree of freedom $k$ The chi-square distribution of .
The t distribution isConstructed on the basis of standard normal distribution and chi-square distributionSpecifically, the t-distribution can be obtained by dividing a standard normal variate by the square root of its independent chi-squared distributed variate.
The F distribution isExtension of the ratio of two independent chi-squared distributed variablesThe F distribution is used to compare two variances and is constructed by taking the ratio of two chi-squared distributed variables.

Technology Sharing

【Mathematical Statistics】2- Random Variables and Their Probability Distribution

Table of Contents

1. Definition and classification of random variables

2. Discrete Probability Distribution

2.1 Binomial Distribution

2.2 Bernoulli Distribution

2.3 Geometric Distribution

2.4 Negative Binomial Distribution

2.5 Hypergeometric Distribution

2.6 Poisson Distribution

2.7 Relationship between them

Continuous Probability Distribution

3.1 Exponential Distribution

3.2 Gamma Distribution

3.3 Normal Distribution

3.4 t-Distribution

3.5 Chi-Square Distribution

3.6 F-Distribution

3.7 Relationship between them

Personal profile

my contact information