Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (2024)

Motivation

Model probabilities

The Beta distribution is a probability distribution on probabilities.

The Beta distribution can be understood as representing a distribution of probabilities, that is, it represents all the possible values of a probability when we don’t know what that probability is. For example,

  • the Click-Through Rate of your advertisem*nt

  • the conversion rate of customers actually purchasing in your store

  • how likely the customer will become “inactive”

Because the Beta distribution models a probability, its domain is bounded between 0 and 1.

Generalization of Uniform Distribution

Give me a continuous and bounded random variable except the Uniform Distribution. This is another way to look at beta distribution, continuous and bounded between 0 and 1; also the density is not flat.

$$X \sim Beta(a, b), \text{ where } a>0, \ b>0.$$$$f_X(x) = c \cdot x ^{a-1}(1-x)^{b-1}, \text{ where } x>0.$$

What is $c$ ? Just a normalization constant! We’ll find the value of $c$ later.

Conjugate Prior

The Beta distribution is the conjugate prior for the Bernoulli, binomial, negative binomial and geometric distributions (seems like those are the distributions that involve success & failure) in Bayesian inference.

Computing a posterior using a conjugate prior is very convenient, because you can avoid expensive numerical computation involved in Bayesian Inference.

Conjugate prior = Convenient prior

For example, the beta distribution is a conjugate prior to the binomial. If we choose to use the beta distribution Beta(α, β) as a prior, during the modeling phase, we already know the posterior will also be a beta distribution. Therefore, after carrying out more experiments, you can compute the posterior simply by adding the number of successes (x), and failures (n-x) to the existing parameters α, β respectively, instead of multiplying the likelihood with the prior distribution. The posterior also becomes a Beta distribution with parameters (x+α, n-x+β).

What is the Intuition?

The intuition for the beta distribution comes into play when we look at it from the lens of the binomial distribution.

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (1)

The difference between the binomial and the beta is that the former models the number of successes (x), while the latter models the probability (p) of success.

In other words, the probability is a parameter in binomial; In the Beta, the probability is a random variable.

Interpretation of α, β

You can think of α-1 as the number of successes and β-1 as the number of failures, just like n & n-x terms in binomial.

You can choose the α and β parameters however you think they are supposed to be.

  • If you think the probability of success is very high, let’s say 90%, set 90 for α and 10 for β.
  • If you think otherwise, 90 for β and 10 for α.

As α becomes larger (more successful events), the bulk of the probability distribution will shift towards the right, whereas an increase in β moves the distribution towards the left (more failures).

Also, the distribution will narrow if both α and β increase, for we are more certain.

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (2)

Dr. Bognar at the University of Iowa built the calculator for Beta distribution, which I found useful and beautiful. You can experiment with different values of α and β and visualize how the shape changes.

Derivation

In this section, we’ll derive Beta distribution using the Beta-Gamma Connections.

We can regard Beta distribution as the fraction of waiting time.

Fraction of Waiting Time

Let $X$ be the waiting time at Bank,

$$X \sim Gamma(n_1, \lambda)$$

Let $Y$ be the waiting time at Post Office,

$$Y \sim Gamma(n_2, \lambda)$$

Assume $X$ and $Y$ are independent. What is the distribution of the proportion $\frac{X}{X+Y}$ ?

Solution:

Let $T := X+Y$ be the total waiting time. Clearly, $T \sim Gamma(n_1+n_2, \lambda)$, you can prove it by MGF.

Let $W =: \frac{X}{X+Y}$ be the proportion of waiting time at Bank to the total waiting time. We need to find the PDF of $W$.

The idea is to find the joint PDF $f_{T,W}(t,w)$ at first, and then get the marginal distribution.

$$\begin{aligned}f_{T,W}(t,w) &= f_{X,Y}(x,y) \left | \frac{\partial(x,y)}{\partial(t,w)} \right|\\ &= \frac{1}{\Gamma(n_1)}\lambda^{n_1}x^{n_1 - 1}e^{-\lambda x} \frac{1}{\Gamma(n_2)}\lambda^{n_2}x^{n_2 - 1}e^{-\lambda y} \left|-t\right|\\ &= \lambda^{n_1+n_2}t^{n_1+n_2-1}e^{-\lambda t} \frac{1}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1}\\ &= \frac{\lambda^{n_1+n_2}t^{n_1+n_2-1}e^{-\lambda t}}{\Gamma(n_1+n_2)} \frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1}\\ &= f_T(t) \frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1}\end{aligned}$$

Integrating $t$ out to get the marginal:

$$\begin{aligned}f_W(w) &= \int_0^\infty f_{T,W}(t,w) dt \\&= \frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1} \cdot\int_0^\infty f_T(t)dt \\&= \frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1} \end{aligned}$$

Here we have Beta,

$$W \sim Beta(n_1, n_2)$$REMARK: the above result also proves W and T and independent!

Beta Function as a normalizing constant

Note that, $f_W(w)$ is a PDF needed to be integrated to 1,

$$\int_0^1\frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)}w^{n_1 - 1}(1-w)^{n_2-1} dw \equiv 1$$

So the normalization constant should be,

$$c = \frac{\Gamma(n_1+n_2)}{\Gamma(n_1)\Gamma(n_2)} := \frac{1}{B(n_1, n_2)}$$

Mean of Beta distribution

As a byproduct in the above derivation, we get that fact that W and T are independent. Then, we can use this to derive the mean of Beta distribution,

$$E(WT) = E(W)E(T) \implies E(X) =E\left(\frac{X}{X+Y}\right)E(X+Y),$$

rearrange to get,

$$E\left(\frac{X}{X+Y}\right) = \frac{E(X)}{E(X+Y)}$$

This result is clear Not True in general, but under our setting, we have this interesting result.

We can use this result to find the mean of $W \sim Beta(a, b)$ without the slightest trace of calculus.

$$E(W)=E\left(\frac{X}{X+Y}\right)=\frac{E(X)}{E(X+Y)}=\frac{a / \lambda}{a / \lambda+b / \lambda}=\frac{a}{a+b}$$

Getting Beta parameters in practice

The Beta distribution is the conjugate prior for many common distributions. We use it a lot. But in practice, how to figure out its parameters? It is sometimes useful to estimate quickly the parameters of the Beta distribution using the method of moments:

$$X \sim Beta(\alpha, \beta),$$$$\alpha + \beta = \frac{E(X)(1-E(X))}{Var(X)} - 1,$$$$\alpha = (\alpha + \beta)E(X),$$$$\beta = (\alpha + \beta)(1 - E(X))$$

Here is the R code:

# calculate beta params using method of momentscal_beta_params <- function(meanX, sdX) { varX <- sdX^2 sum_ab <- meanX * (1 - meanX) / varX - 1 a <- sum_ab * meanX b <- sum_ab * (1 - meanX) # return c("shape" = a, "scale" = b)}# examplecal_beta_params(meanX = 0.136, sdX = 0.103)
## shape scale ## 1.370320 8.705559

Summary

To summarize, the bank–post office story tells us that: when we add independent Gamma r.v.s $X$ and $Y$ with the same rate $\lambda$ ,

  • the total $X+Y$ has a Gamma distribution;

  • the fraction $\frac{X}{X+Y}$ has a Beta distribution;

  • the total is independent of the fraction.

Examples

The PDF of Beta distribution can be U-shaped with asymptotic ends, bell-shaped, strictly increasing/decreasing or even straight lines. As you change α or β, the shape of the distribution changes.

I. Bell-Shape

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (3)

The PDF of a beta distribution is approximately normal if α + β is large enough and α & β are approximately equal.

Intuition behind Bell-Shape

Why would Beta(2,2) be bell-shaped?

If you think:

  • α-1 as the number of successes

  • β-1 as the number of failures

  • Beta(2,2) means you got 1 success and 1 failure

So it makes sense that the probability of the success is highest at 0.5.

Also, Beta(1,1) would mean you got zero for the head and zero for the tail. Then, your guess about the probability of success should be the same throughout [0,1]. The horizontal straight line confirms it.

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (4)

II. Straight Lines

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (5)

α = 1 or β = 1, the beta PDF can be a straight line.

III. U-Shape

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (6)When $\alpha < 1, \beta<1$ the PDF of the Beta is U-shaped.

Reference

  1. wiki Beta distribution

  2. Beta Distribution — Intuition, Examples, and Derivation

  3. What is the intuition behind beta distribution?

  4. youtube video Lecture 23: Beta distribution | Statistics 110

  5. another youtube video: Beta distribution - an introduction

Beta Distribution — Intuition, Derivation, and Examples | Chen Xing (2024)

FAQs

What is the intuition of the beta distribution? ›

The intuition for a beta distribution comes into play when we look at it from the lens of the binomial distribution. The difference between the binomial and the beta is that the former models the number of occurrences (x), while the latter models the probability (p) itself.

What is the interpretation of the beta distribution? ›

You can think of the shape of the beta distribution as reflecting the number of successes and failures it models. For example, if you think of α-1 as the number of successes and β-1 as the number of failures, Beta(2,2) means that you had 1 success and 1 failure.

What do you mean by beta distribution? ›

In probability and statistics, the Beta distribution is considered as a continuous probability distribution defined by two positive parameters. It is a type of probability distribution which is used to represent the outcomes or random behaviour of proportions or percentage.

What is beta in probability? ›

The Beta distribution is a continuous probability distribution often used to model the uncertainty about the probability of success of an experiment.

What is the beta distribution of uncertainty? ›

The parameters of the beta distribution have intuitive interpretations that relate to the uncertainty of a probability or a proportion. The alpha parameter can be seen as the number of successes or positive outcomes, while the beta parameter can be seen as the number of failures or negative outcomes.

How do you prove beta distribution? ›

Proof: Mean of the beta distribution
  1. X∼Bet(α,β). (1)
  2. E(X)=αα+β. (2)
  3. B(α,β)=Γ(α)⋅Γ(β)Γ(α+β). (5)
Apr 29, 2021

What is the application of beta distribution in real life? ›

The Beta Distribution can be used for representing the different probabilities as follows. The likelihood of the audience rating the new movie release. The click-through rate of the website, which is the proportion of visitors. The conversion rate for buyers actually purchasing from your website.

What is the formula for the beta distribution method? ›

ν = α + β is referred to as the "sample size" of a beta distribution, but one should remember that it is, strictly speaking, the "sample size" of a binomial likelihood function only when using a Haldane Beta(0,0) prior in Bayes theorem.

What is the difference between the beta distribution and the binomial distribution? ›

Unlike the binomial distribution, which breaks up nicely into discrete values, the beta distribution represents a continuous range of values, which allows us to represent our infinite number of possible hypotheses.

When should I use a beta distribution? ›

The beta distribution is used to model continuous random variables whose range is between 0 and 1. For example, in Bayesian analyses, the beta distribution is often used as a prior distribution of the parameter p (which is bounded between 0 and 1) of the binomial distribution (see, e.g., Novick and Jackson, 1974).

What does beta tell us in statistics? ›

Beta is a measure of a stock's volatility in relation to the overall market. By definition, the market, such as the S&P 500 Index, has a beta of 1.0, and individual stocks are ranked according to how much they deviate from the market. A stock that swings more than the market over time has a beta above 1.0.

Is beta distribution an exponential family? ›

This expression can be normalized if τ1 > −1 and τ2 > −1. The resulting distribution is known as the beta distribution, another example of an exponential family distribution.

What is beta mathematically? ›

The beta function is defined in the domains of real numbers. The notation to represent the beta function is “β”. The beta function is meant by B(p, q), where the parameters p and q should be real numbers. The beta function in Mathematics explains the association between the set of inputs and the outputs.

How many parameters are involved in beta distribution? ›

Unlike other distributions with shape and scale parameters, the beta distribution has two shape parameters, α and β. Both parameters must be positive values. Additionally, statisticians denote the finite interval's upper and lower bounds as a and b, respectively.

What is the maximum value of the beta distribution? ›

If you know that the distribution is Beta(α,β), then the max is 1, as for all beta distributions, and the mean is μ=αα+β.

What does the beta function represent? ›

In Physics and string approach, the beta function is used to compute and represent the scattering amplitude for Regge trajectories. Apart from these, you will find many applications in calculus using its related gamma function also.

What is beta value distribution? ›

By Jim Frost 6 Comments. The beta distribution is a continuous probability distribution that models random variables with values falling inside a finite interval. Use it to model subject areas with both an upper and lower bound for possible values.

What is the theory of beta function? ›

In theoretical physics, specifically quantum field theory, a beta function, β(g), encodes the dependence of a coupling parameter, g, on the energy scale, μ, of a given physical process described by quantum field theory.

Top Articles
Latest Posts
Article information

Author: Dan Stracke

Last Updated:

Views: 5843

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Dan Stracke

Birthday: 1992-08-25

Address: 2253 Brown Springs, East Alla, OH 38634-0309

Phone: +398735162064

Job: Investor Government Associate

Hobby: Shopping, LARPing, Scrapbooking, Surfing, Slacklining, Dance, Glassblowing

Introduction: My name is Dan Stracke, I am a homely, gleaming, glamorous, inquisitive, homely, gorgeous, light person who loves writing and wants to share my knowledge and understanding with you.