Technology Sharing

【Mathematical Statistics】2- Random Variables and Their Probability Distribution

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


1. Definition and classification of random variables

Definition: A random variable is a variable that takes on different values ​​in a random experiment.

  1. Categorical random variable (Nominal)

    • Examples: Gender (male, female), occupation (civil servant, corporate employee, student, retired, unemployed), test result (negative, positive)
  2. Ordered categorical random variables

    • Examples: attitude (strongly agree, agree, neutral, disagree, strongly disagree), frequency of use (once a week, once every two weeks, once every six months, rarely, never)
  3. Numeric random variables

    • Example: Age (13, 14, 15, 16, etc.), Income (any value can be filled in)

2. Discrete Probability Distribution

2.1 Binomial Distribution

  • definition: Description in n n n Success in independent trials k k k The probability of success in each trial is p p p
  • formula X ∼ Bin ( n , p ) X sim text{Bin}(n, p) XBin(n,p)
  • Probability Mass Function (PMF) P ( X = k ) = ( n k ) p k ( 1 − p ) n − k P(X = k) = binom{n}{k} p^k (1-p)^{n-k} P(X=k)=(kn)pk(1p)nk
  • Cumulative Distribution Function (CDF) F ( X = k ) = ∑ i = 0 k ( n i ) p i ( 1 − p ) n − i F(X = k) = sum_{i=0}^{k} binom{n}{i} p^i (1-p)^{n-i} F(X=k)=i=0k(in)pi(1p)ni

2.2 Bernoulli Distribution

  • definition: describes the probability of success (or failure) in a trial. The probability of success is p p p
  • formula X ∼ Bern ( p ) X sim text{Bern}(p) XBern(p)
  • Probability Mass Function (PMF) P ( X = x ) = { p if  x = 1 1 − p if  x = 0 P(X = x) = {pif x=11pif x=0 P(X=x)={p1pif x=1if x=0
  • Cumulative Distribution Function (CDF) F ( X = x ) = { 0 if  x < 0 1 − p if  0 ≤ x < 1 1 if  x ≥ 1 F(X = x) = {0if x<01pif 0x<11if x1 F(X=x)= 01p1if x<0if 0x<1if x1

2.3 Geometric Distribution

  • definition: describes the probability of the number of failures before the first success, and the probability of success in each trial is p p p
  • formula X ∼ Geom ( p ) X sim text{Geom}(p) XGeom(p)
  • Probability Mass Function (PMF) P ( X = k ) = ( 1 − p ) k p P(X = k) = (1-p)^k p P(X=k)=(1p)kp
  • Cumulative Distribution Function (CDF) F ( X = k ) = 1 − ( 1 − p ) k + 1 F(X = k) = 1 - (1-p)^{k+1} F(X=k)=1(1p)k+1

2.4 Negative Binomial Distribution

  • definition: describes the number of failures before reaching the rth success, and the probability of success in each trial is p p p
  • formula X ∼ NegBin ( r , p ) X sim text{NegBin}(r, p) XNegBin(r,p)
  • Probability Mass Function (PMF) P ( X = k ) = ( k + r − 1 k ) p r ( 1 − p ) k P(X = k) = binom{k + r - 1}{k} p^r (1-p)^k P(X=k)=(kk+r1)pr(1p)k
  • Cumulative Distribution Function (CDF) F ( X = k ) = ∑ i = 0 k ( i + r − 1 i ) p r ( 1 − p ) i F(X = k) = sum_{i=0}^{k} binom{i + r - 1}{i} p^r (1-p)^i F(X=k)=i=0k(ii+r1)pr(1p)i

2.5 Hypergeometric Distribution

  • definition: describes the process of sampling from a finite population without replacement. n n n times, success k k k times the probability.
  • formula X ∼ Hypergeom ( N , K , n ) X sim text{Hypergeom}(N, K, n) XHypergeom(N,K,n)
  • Probability Mass Function (PMF) P ( X = k ) = ( K k ) ( N − K n − k ) ( N n ) P(X = k) = frac{binom{K}{k} binom{N-K}{n-k}}{binom{N}{n}} P(X=k)=(nN)(kK)(nkNK)
  • Cumulative Distribution Function (CDF) F ( X = k ) = ∑ i = 0 k ( K i ) ( N − K n − i ) ( N n ) F(X = k) = sum_{i=0}^{k} frac{binom{K}{i} binom{N-K}{n-i}}{binom{N}{n}} F(X=k)=i=0k(nN)(iK)(niNK)

2.6 Poisson Distribution

  • definition:Describes what happens in a unit of time k k k The probability of an event is λ, and the average rate of events is λ.
  • formula X ∼ Poisson ( λ ) X sim text{Poisson}(lambda) XPoisson(λ)
  • Probability Mass Function (PMF) P ( X = k ) = λ k e − λ k ! P(X = k) = frac{lambda^k e^{-lambda}}{k!} P(X=k)=k!λkeλ
  • Cumulative Distribution Function (CDF) F ( X = k ) = e − λ ∑ i = 0 k λ i i ! F(X = k) = e^{-lambda} sum_{i=0}^{k} frac{lambda^i}{i!} F(X=k)=eλi=0ki!λi

2.7 Relationship between them

  • Bernoulli Distributionis a specialBinomial Distribution,when n = 1 n = 1 n=1 hour.
  • Geometric distributionis the distribution of the number of trials required before the first success, which can be viewed asBinomial Distributionextension of.
  • Negative binomial distributionCan be seen asGeometric distributionA generalization of , used to describe the number of failures required before r successes.
  • Hypergeometric distributionSimilar toBinomial Distribution, but is applicable to finite populations and sampling without replacement.
  • Poisson distributionyesBinomial DistributionIn the limiting case, n n n Very large and p p p Very small, and λ = n p lambda = np λ=np keep constant.

Continuous Probability Distribution

3.1 Exponential Distribution

  • definition: Exponential distribution is a continuous probability distribution, often used to describe the time intervals between independent events.

  • formula X ∼ Exponential ( λ ) X sim text{Exponential}(lambda) XExponential(λ),in λ > 0 lambda>0 λ>0 is the rate parameter

  • Probability density function (PDF) f ( x ; λ ) = λ e − λ x for  x ≥ 0 f(x; lambda) = lambda e^{-lambda x} quad text{for } x geq 0 f(x;λ)=λeλxfor x0

  • Cumulative Distribution Function (CDF) F ( x ; λ ) = 1 − e − λ x for  x ≥ 0 F(x; lambda) = 1 - e^{-lambda x} quad text{for } x geq 0 F(x;λ)=1eλxfor x0

3.2 Gamma Distribution

  • definition: Describes the accumulation of waiting time and is a generalization of the exponential distribution and the χ² distribution.

  • formula X ∼ Gamma ( k , θ ) X sim text{Gamma}(k, theta) XGamma(k,θ),in k > 0 k>0 k>0 is the shape parameter, θ > 0 theta>0 θ>0 is the scale parameter

  • Probability density function (PDF) f ( x ; k , θ ) = x k − 1 e − x / θ θ k Γ ( k ) for  x ≥ 0 f(x; k, theta) = frac{x^{k-1}e^{-x/theta}}{theta^k Gamma(k)} quad text{for } x geq 0 f(x;k,θ)=θkΓ(k)xk1ex/θfor x0,in Γ ( k ) Gamma(k) Γ(k) is the gamma function

  • Cumulative Distribution Function (CDF) F ( x ; k , θ ) = γ ( k , x / θ ) Γ ( k ) F(x; k, theta) = frac{gamma(k, x/theta)}{Gamma(k)} F(x;k,θ)=Γ(k)γ(k,x/θ),in γ ( k , x / θ ) gamma(k, x/theta) γ(k,x/θ) is the incomplete gamma function.

3.3 Normal Distribution

  • definition: Describes the distribution of the sum of a large number of independent random variables and is widely used in natural sciences and social sciences.

  • formula X ∼ N ( μ , σ 2 ) X sim mathcal{N}(mu, sigma^2) XN(μ,σ2),in, μ mu μ is the mean, σ 2 sigma^2 σ2 is the variance

  • Probability density function (PDF) f ( x ; μ , σ 2 ) = 1 2 π σ 2 e − ( x − μ ) 2 2 σ 2 f(x; mu, sigma^2) = frac{1}{sqrt{2pisigma^2}} e^{-frac{(x-mu)^2}{2sigma^2}} f(x;μ,σ2)=2πσ2 1e2σ2(xμ)2

  • Cumulative Distribution Function (CDF) F ( x ; μ , σ 2 ) = 1 2 [ 1 + erf ⁡ ( x − μ 2 σ 2 ) ] F(x; mu, sigma^2) = frac{1}{2}left[1 + operatorname{erf}left(frac{x-mu}{sqrt{2sigma^2}}right)right] F(x;μ,σ2)=21[1+erf(2σ2 xμ)], where erf is the error function.

3.4 t-Distribution

  • definition: Used for hypothesis testing and confidence interval estimation in small sample cases.

  • formula X ∼ t ( ν ) X sim t(nu) Xt(ν),in, ν nu ν It is the degree of freedom

  • Probability density function (PDF) f ( t ; ν ) = Γ ( ν + 1 2 ) ν π Γ ( ν 2 ) ( 1 + t 2 ν ) − ν + 1 2 f(t; nu) = frac{Gammaleft(frac{nu+1}{2}right)}{sqrt{nupi} Gammaleft(frac{nu}{2}right)} left(1 + frac{t^2}{nu}right)^{-frac{nu+1}{2}} f(t;ν)=νπ Γ(2ν)Γ(2ν+1)(1+νt2)2ν+1

  • Cumulative Distribution Function (CDF) F ( t ; ν ) = 1 2 + t Γ ( ν + 1 2 ) π ν Γ ( ν 2 ) ∫ 0 t ( 1 + u 2 ν ) − ν + 1 2 d u F(t; nu) = frac{1}{2} + tfrac{Gammaleft(frac{nu+1}{2}right)}{sqrt{pi nu} Gammaleft(frac{nu}{2}right)} int_0^{t} left(1 + frac{u^2}{nu}right)^{-frac{nu+1}{2}} du F(t;ν)=21+tπν Γ(2ν)Γ(2ν+1)0t(1+νu2)2ν+1du

3.5 Chi-Square Distribution

  • definition: Commonly used in hypothesis testing and analysis of variance.

  • formula X ∼ χ 2 ( k ) X sim chi^2(k) Xχ2(k),in k k k It is the degree of freedom

  • Probability density function (PDF) f ( x ; k ) = 1 2 k / 2 Γ ( k / 2 ) x k / 2 − 1 e − x / 2 for  x ≥ 0 f(x; k) = frac{1}{2^{k/2} Gamma(k/2)} x^{k/2-1} e^{-x/2} quad text{for } x geq 0 f(x;k)=2k/2Γ(k/2)1xk/21ex/2for x0

  • Cumulative Distribution Function (CDF) F ( x ; k ) = γ ( k 2 , x 2 ) Γ ( k 2 ) F(x; k) = frac{gammaleft(frac{k}{2}, frac{x}{2}right)}{Gammaleft(frac{k}{2}right)} F(x;k)=Γ(2k)γ(2k,2x)

3.6 F-Distribution

  • definition: Used to compare the variance of two samples.

  • formula X ∼ F ( d 1 , d 2 ) X sim F(d_1, d_2) XF(d1,d2),in d 1 d_1 d1 and d 2 d_2 d2 It is the degree of freedom

  • Probability density function (PDF) f ( x ; d 1 , d 2 ) = ( d 1 x ) d 1 d 2 d 2 ( d 1 x + d 2 ) d 1 + d 2 x B ( d 1 2 , d 2 2 ) for  x ≥ 0 f(x; d_1, d_2) = frac{sqrt{frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1 + d_2}}}}{x Bleft(frac{d_1}{2}, frac{d_2}{2}right)} quad text{for } x geq 0 f(x;d1,d2)=xB(2d1,2d2)(d1x+d2)d1+d2(d1x)d1d2d2 for x0,in B B B is the beta function

  • Cumulative Distribution Function (CDF) F ( x ; d 1 , d 2 ) = I d 1 x d 1 x + d 2 ( d 1 2 , d 2 2 ) F(x; d_1, d_2) = I_{frac{d_1 x}{d_1 x + d_2}}left(frac{d_1}{2}, frac{d_2}{2}right) F(x;d1,d2)=Id1x+d2d1x(2d1,2d2),in I I I is an incomplete beta function

3.7 Relationship between them

  • The chi-square distribution isSum of squares of a normal distribution.For example, k k k The sum of squares of independent standard normal variables follows the degree of freedom k k k The chi-square distribution of .
  • The t distribution isConstructed on the basis of standard normal distribution and chi-square distributionSpecifically, the t-distribution can be obtained by dividing a standard normal variate by the square root of its independent chi-squared distributed variate.
  • The F distribution isExtension of the ratio of two independent chi-squared distributed variablesThe F distribution is used to compare two variances and is constructed by taking the ratio of two chi-squared distributed variables.