Bayes’ Theorem: Probability and Statistics

Bayes’ Theorem is one of the cornerstones of probability theory and statistics, helping us understand how to update the likelihood of a hypothesis based on new evidence. It plays a pivotal role in various real-world applications, from spam filtering in emails to medical diagnostics and even machine learning. This blog post explores Bayes’ theorem in detail, covering its significance, derivation, a Python implementation, and its real-world applications.

What is Bayes’ Theorem?

Bayes’ theorem is a mathematical formula that describes how to update the probability of a hypothesis when new evidence or information becomes available. It is a simple yet powerful tool that provides a framework for reasoning under uncertainty.

The theorem is named after the English mathematician Thomas Bayes, who first suggested this idea in the 18th century. His work was published posthumously in 1763 by Richard Price, a friend of Bayes. The theorem itself was presented as part of Bayes’ essay titled “An Essay towards solving a Problem in the Doctrine of Chances”.

Mathematical Formula

Bayes’ theorem is mathematically expressed as:

P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}

where,

  • P(A \mid B) is the posterior probability: the probability of event A occurring given that B is true.
  • P(B \mid A) is the likelihood: the probability of event B occurring given that A is true.
  • P(A) is the prior probability: the initial probability of event A occurring.
  • P(B) is the marginal likelihood or the total probability of event B .
Note: P(A|B) is the probability of a hypothesis A being true given that we have observed some evidence B. It is called "posterior" because it represents the updated probability after considering new evidence, in contrast to the prior probability P(A), which is the initial belief before the evidence is taken into account.
Significance of Bayes’ Theorem

Bayes’ theorem is significant in mathematics and probability because it allows us to update the probability of a hypothesis as new evidence emerges. This is particularly useful when dealing with uncertainty or incomplete information, making Bayes’ theorem a foundational tool in areas such as:

  • Statistics: It forms the basis for Bayesian statistics, where probabilities are interpreted as degrees of belief or certainty.
  • Decision Making: In domains like finance, medicine, and artificial intelligence, Bayes’ theorem aids in making informed decisions by updating prior beliefs with new data.

Bayes’ theorem is crucial for understanding how prior knowledge or assumptions can be updated with new evidence. It helps quantify how likely something is, given both prior assumptions and new observations. Its simplicity makes it applicable across diverse fields.

Deriving Mathematical formula for Bayes’ Theorem using Conditional Probability

Before we dive into the derivation of Bayes’ theorem, it’s important to understand conditional probability. Conditional probability refers to the probability of an event A occurring given that another event B has already occurred.

It is written as P(A | B), and can be calculated using the formula:

P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Where P(A B) is the probability of both events A and B occurring together.

Note: Conditional probability P(A|B) represents the likelihood of event A occurring given that event B has already happened.

Formula for conditional probability:

P(A|B) = P(A ∩ B)/P(B)

The logic behind the formula for calculating conditional probability is that, when calculating P(A|B), you are restricting the sample space to event B. The focus is now on the portion of B's outcomes that also include A. Therefore, the probability of A within B is a fraction of how often A and B occur together (the intersection P(A ∩ B)) over how often B occurs ie P(B).

Bayes’ theorem can now be derived from the definition of conditional probability.

P(A \mid B) = \frac{P(A \cap B)}{P(B)}

Similarly,

P(B \mid A) = \frac{P(A \cap B)}{P(A)}

Rearranging the above equation, we get

P(A \cap B) = P(B \mid A) \cdot P(A)

hence,

P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}

This is Bayes’ theorem. It shows that the posterior probability P(A | B) is proportional to the likelihood P(B | A), weighted by the prior probability P(A) and normalized by the marginal probability P(B).

Python Code Implementation

Let’s implement Bayes’ theorem in Python. The implementation will involve a function that calculates the posterior probability based on the prior probability, likelihood, and marginal likelihood.

# Function to implement Bayes' Theorem
def bayes_theorem(prior_A, likelihood_B_given_A, marginal_B):
    """
    Calculate the posterior probability using Bayes' theorem.

    :param prior_A: Prior probability of event A (P(A))
    :param likelihood_B_given_A: Likelihood of event B given A (P(B | A))
    :param marginal_B: Marginal probability of event B (P(B))
    :return: Posterior probability (P(A | B))

    """
    posterior_A_given_B = (likelihood_B_given_A * prior_A) / marginal_B
    return posterior_A_given_B

# Example usage:
# Let's say there's a 40% chance it will rain (prior),
# There's an 80% chance that if it rains, the ground will be moist (likelihood),
# And there's a 50% chance that the ground is moist (marginal likelihood).

prior_A = 0.4  # P(A)
likelihood_B_given_A = 0.8  # P(B | A)
marginal_B = 0.5  # P(B)

# Calculate the posterior probability
posterior = bayes_theorem(prior_A, likelihood_B_given_A, marginal_B)
print(f"The probability of rain given that the ground is moist is: {posterior:.2f}")

Output

The probability of rain given that the ground is moist is: 0.64

In this example, the function bayes_theorem takes the prior probability, the likelihood, and the marginal probability as input, and returns the posterior probability. The example problem calculates the probability that it will rain given that the ground is moist.

Some Real-World Examples

Bayes’ theorem finds applications in a variety of real-world scenarios where decision-making under uncertainty is crucial. From medical diagnosis to email filtering, Bayes’ theorem helps refine the probability of an event occurring based on new information. Let’s see some examples:

Medical Diagnosis

Imagine a doctor is testing for a rare disease that affects 1 out of 1,000 people (0.1% prevalence). The test for this disease is not perfect — it has the following characteristics:

  • True positive rate (Sensitivity): 99% (the test correctly identifies the disease in 99% of the cases when the patient has it).
  • False positive rate: 5% (the test incorrectly identifies the disease in 5% of cases when the patient does not have it).

Given a positive test result, how can we use Bayes’ Theorem to calculate the probability that the patient actually has the disease?

Let,

  • P(D) : Prior probability that the patient has the disease (prevalence) = 0.001 (0.1%)
  • P(\neg D) : Prior probability that the patient does not have the disease = 1 – 0.001 = 0.999 (99.9%)
  • P(+ \mid D) : Probability of a positive test result given that the patient has the disease (true positive rate) = 0.99 (99.0%)
  • P(+ \mid \neg D) : Probability of a positive test result given that the patient does not have the disease (false positive rate) = 0.05 (5.0%)

The marginal probability of a positive test result, P(+) , can be calculated using conditional probability and the law of total probability:

P(+) = P(+ \cap D) + P(+ \cap \neg D)

Using conditional probability, we get

P(+) = P(+ \mid D) \cdot P(D) + P(+ \mid \neg D) \cdot P(\neg D)

P(+) = (0.99 \cdot 0.001) + (0.05 \cdot 0.999)

P(+) = 0.00099 + 0.04995 = 0.05094

Now, using Bayes’ Theorem, we can calculate the probability that the patient has the disease given a positive test result

P(D \mid +) = \frac{P(+ \mid D)⋅P(D)​}{P(+)}

Substituting values,

P(D \mid +) = \frac{0.99 \cdot 0.001}{0.05094} = \frac{0.00099}{0.05094} \approx 0.0194

So, even with a positive test result, the probability that the patient actually has the disease is only about 1.94%.

Why is the probability low?

This result might seem counterintuitive because the test has a high sensitivity (99%). However, due to the rarity of the disease (0.1% prevalence) and the false positive rate (5%), most positive test results are still false positives. This highlights the importance of considering base rates (prevalence) when interpreting test results.

Let’s consider one more example,

Spam Email Filtering

Consider a spam filter that uses Bayes’ theorem to classify emails as spam or non-spam based on certain keywords. Suppose the filter has observed the following probabilities over time:

Let’s define the probabilities

  • P(S) : Prior probability that an email is spam = 0.4 (40%)
  • P(\neg S) : Prior probability that an email is not spam = 0.6 (60%)
  • P(free \mid S) : Probability that the word “free” appears in a spam email = 0.7 (70%)
  • P(free \mid \neg S) : Probability that the word “free” appears in a non-spam email = 0.1 (10%)

Given that an email contains the word “free”, what is the probability that it is spam?

Calculate the marginal probability of the word “free” appearing in emails

The marginal probability of the word “free” appearing in any email, P(free), can be calculated using the law of total probability:

P(free) = P(free \mid S) \cdot P(S) + P(free \mid \neg S) \cdot P(\neg S)

P(free) = (0.7 \cdot 0.4) + (0.1 \cdot 0.6)

P(free) = 0.28 + 0.06 = 0.34

Now, using Bayes’ theorem, we can calculate the probability that the email is spam given that it contains the word “free”,

P(S \mid free) = \frac{P(free \mid s) \cdot P(S)​}{P(free)}

Substituting values,

P(S \mid \text{free}) = \frac{0.7 \cdot 0.4}{0.34} = \frac{0.28}{0.34} \approx 0.8235

Spam Filtering Insights

This method is commonly used in Naive Bayes classifiers, which are widely used in spam filters. The filter continuously updates its probability estimates based on the appearance of certain keywords in spam and non-spam emails. As more data is gathered, the filter becomes better at identifying spam emails, even when only a few keywords are available as evidence.

Some more real world applications of Bayes’ theorem:

  • Machine Learning: Bayesian models are frequently used in machine learning, particularly in the classification of data and making predictions. For example, the Naive Bayes classifier is a simple yet effective algorithm that uses Bayes’ theorem for classification tasks.
  • Finance and Economics: Bayes’ theorem is applied in predictive modeling, stock market analysis, and even in updating forecasts based on new economic data.
  • Genetics: In genetics, Bayes’ theorem is used to calculate the probability of certain traits or genetic conditions being passed down from parents to children, given new genetic data.

Bayes’ theorem provides a powerful framework to update our beliefs and probabilities as new evidence becomes available. Its applications span a wide variety of fields. It offers a structured approach to decision-making under uncertainty. By understanding how prior probabilities and new data interact, we can make more informed and accurate predictions using Bayes’ theorem.

Leave a Reply

Your email address will not be published. Required fields are marked *