Probability Distributions

By Issa Bass
 

2 - 1. Introduction

We all deal with the concept of probability on a daily basis without sometimes even realizing it. What are the chances that we will come to work on time? What is the likelihood that the check we have just written will be sent to our bank before the bank receives the direct deposit from our employer? What are the chances that it will rain tonight? ...etc.

So what is Probability?

It is the chance, the likelihood that something will happen.

In statistics, the words chance and likelihood are seldom used to describe the possibilities for an event to take place, instead, the word probability is used along with some other basic concepts which meaning defer from our every day use. Probability is the measure of the possibility for an event to take place. It is a number between 0 and 1. If there is a 100% chance that the event will take place, the probability will be 1 and if it is impossible for it to happen, it will be 0.

An experiment is the process by which one observation is obtained. An example of experiment would be the sorting out of defective parts from a production line.

An event is the outcome of an experiment. Determining the number of employees who come late to work twice a month is an experiment and there are many possible events, the possible outcomes can be anywhere between 0 and the number of employees in the company.

A sample space is the set of all possible outcomes in an experiment.

2 -2. Discrete probability distributions

A probability distribution shows the possible events and the associated probability for each of these events to occur.

A distribution is said to be discrete if it is built on discrete random variables.

The four most used discrete probability distributions in business operations are the Binomial, the Poisson, the Geometric and the Hyper-geometric distributions.

2 -2.1 Binomial distribution

The binomial distribution assumes an experiment with n  identical trials with each trial having only two possible outcomes considered as success or failure and each trial is independent of the previous ones.

For the remainder of this section, p  will be considered as the probability for a success and q  as the probability for a failure.

q = (l - p).

The formula for a binomial distribution is as follow:

where:

P(x) is the probability for the event x  to happen. x  may take any value from 0 to n and

The mean, variance and standard deviation for a binomial distribution are:

2 -2.2 Poisson distribution

The Poisson distribution focuses on the probability for a number of events occurring over some interval or continuum where  the average of such event occurring is known.

The formula for the Poisson distribution is:

Where:

P(x) is the probability for the event x  to happen

 is the arithmetic mean number of occurrences in a particular interval.

e  is the constant 2.718282.

The mean and the variance of the Poisson distribution are the same and the standard deviation is the square root of the mean.

Binomial problems can be approximated by the Poisson distribution when the sample sizes are large [n  > 20] and p  is small [p = 7]. In that case,.

2 -2.3 Geometric distribution

When we studied the Binomial distribution, we were only interested in the probability for a success or a failure to happen. The geometric distribution addresses the number of trials necessary before the first success. If the trials are repeated k  times until the first success, we will have had k  – 1 failures. If p  is the probability for a success and q  the probability for a failure, the probability for the first success to occur at the kth  trial will be

The probability that more than n trials are needed before the first success will be

The mean and standard deviation for the geometric distribution are:

Example 1

The probability for finding an error by an auditor in a production line is 0.01. What is the probability that the first error is found at the 70 th part audited?

Solution

The probability that the first error is found at the 70th part audited will be 0.004998

Example 2

What is the probability that more than 50 parts need to be audited before the first error is found?

Solution

2 -2.4 Hyper-geometric distribution

One of the conditions of a binomial distribution was the independence of the trials, the probability of a success is the same for every trial.

If successive trials are done without replacement and the sample size or population is small, the probability for each observation will vary.

If a sample has 10 stones, the probability of taking a particular stone out of the ten will be 1/10. If that stone is not replaced into the sample, the probability of taking another one will be 1/9. But if the stones are replaced each time, the probability of taking a particular one will remain the same, 1/10.

When the sampling is finite (relatively small and known) and the outcome changes from trial to trial, the Hyper-geometric distribution is used instead of the Binomial distribution.

The formula for the hyper-geometric distribution is as follow:

Where x  is an integer which value is between 0 and n

2 -3 Continuous distributions

Most experiments in business operations have samples spaces that do not contain finite, countable number of simple events. A distribution is said to be continuous when it is built on continuous random variables which are variables that can assume the infinitely many values corresponding to points on a line interval. An example of a random variable would be the time it takes a production line to produce one item.

In contrast to discrete variables which values are counts, the continuous variables' values are measurements.

The main continuous distributions used in business operations are the Normal, the exponential, the Lognormal and the Weibull distributions.

2 -3.1 Exponential distribution

The exponential distribution is close to the Poisson distribution. The Poisson distribution is built on discrete random variables and describes random occurrences over some intervals, while the exponential distribution is continuous and describes the time between random occurrences. Examples of an exponential distribution are the time between machine breakdowns and the waiting time on a line at a supermarket.

The exponential distribution is determined by the following formula:

The mean and the standard deviation are:

The shape of the exponential distribution is determined by only one parameter . Each value of determines a different shape of the curve. Figure 2 -3.1 shows the graph of the exponential distribution.

Exponential curve:

The area under the curve between two points determines the probabilities for the exponential distribution. The formula bellow can be used to calculate that probability.

If the number of events taking place in a unit time has a Poisson distribution with a mean then the interval between these events are exponentially distributed with the mean interval time equal to .

Example 1:

If the number of items arriving at inspection at the end of a production line follows a Poisson distribution with a mean of 10 an hour, then the arrivals follow an exponential distribution with a mean between arrival times of  

Example 2:

Suppose that the time in months between line stoppages on a production line follows an exponential distribution with

a . What is the probability that the time until the line stops again will be more than 15 months?

b . What is the probability that the time until the line stops again will be less than 20 months?

c . What is the probability that the time until the line stops again will be between 10 and 15 months?

d . Find and  Find the probability that the time until the line stops will be between and .

Solution:

a.   

The probability that the time until the line stops again will be more than 15 months is .000553

b.   

The probability that the time until the line stops again will be less than 2 months is .9999

c.   We have already found that  

We need to find the probability that the time until the line stops again will be more than 10 months.

The probability that the time until the line stops again will be between 10 and 15 months is the difference between .13533 and .000553.

d. The mean and the standard deviation are given by , therefore:

So we need to find  which is equal to

 therefore:

The probability that the time until the line stops again will be between

and  is .9817.

2 -3.2 Normal distribution

The Normal distribution is certainly one of the most widely used probability distributions. Most of nature and human characteristics are normally distributed, and so are most production outputs. Six Sigma derives its statistical definition from it.

The normal probability density function is given by:

The equation of the distribution depends on and .

The curve associated with that function is bell-shaped and has an apex at the center. It is symmetrical about the mean, the two tails of the curve extend indefinitely without ever touching the horizontal line and the area between the curve and the horizontal line is estimated to be equal to 1.

The area under the curve between a and b represents the probability that a random variable assumes a value in that interval.

The normal distribution can be converted into the simpler Z distribution.

This process leads to the Standardized Normal Distribution.

Because of the complexity of the formula standardized normal distribution is used instead.

Let's consider the following example.

The weekly profits of a large group of stores are normally distributed with a mean of and a standard deviation of . What is the Z value for a profit for

X = 1300? For X = 1400?

For X = 1300
For X = 1400

Now let's work on the probability for some events to take place.

Example 1

In the previous example, what is the percentage of the stores that make $1500 or more a week?

Solution:

     On the Z score table, 1.5 corresponds to 0.4332.

This represents the area between $1200 and $1500. The area beyond $1500 is found by deducting 0.4332 from 0.5 (0.5 is half of the area under the curve). This area is 0.0668, in other words 6.68% of the stores make more than $1500 week.

Example 2

A manufacturer wants to set a minimum life expectancy on a newly manufactured light bulb. A test has revealed a mean  hours and a standard deviation . The production of light bulbs is normally distributed.  The manufacturer wants to set the minimum life expectancy of the light bulbs so that less than 5 percent of the bulbs will have to be replaced. What minimum life expectancy should be put on the light bulbs labels?

Solution:

The area shaded under the curve between X and the end of the tail represents the 5% (or .0500) of the light bulbs that might need to be replaced. The area between X and  (250) represents the 95% good bulbs.

To find Z, we need to deduct 0.0500 from 0.500 (0.500 represents half of the area under the curve)

0.5 - 0.05 = 0.45

0.4500 corresponds to 1.645 on the Z table. Because the value is to the left of ,

Z = -1.645

X = 225.325

The minimum life expectancy for the light bulb will be 225.325.

Example 3

The average defective parts that come from a production line is 10.5 with a standard deviation of 2.5. What is the probability that the defective parts for a randomly selected sample will be less than 15?

Solution

1.8 corresponds to .4641 on the Z table.

So the probability that the defective parts will be less than 15 is 0.9641 (0.5 +0 .4641).

2 -3.3 The Lognormal Distribution

Along with the Weibull distribution, the lognormal distribution is frequently used in risk assessment, in reliability, in material strength and fatigue analysis. A random variable is said to be lognormally distributed if its logarithm is normally distributed. Since the lognormal distribution is derived from the normal distribution, the two share most of their properties. The density function of the lognormal distribution is given as:

Where represents the log of the mean and , the scale parameter represents the log of the standard deviation.

The lognormal cumulative distribution is

and the reliability function is

Where

 represents the standard cumulative distribution function,

The shape of the lognormal distribution depends on the scale parameter .


About the author
Issa Bass is the managing editor of SixSigmaFirst. He can be reached at issa@sixsigmafirst.com

Tell us what you think about this article. Send a note to the Editor.

www.manorhouseassociates.com

 

Place your Ad here
Six Sigma Statistics
Order "Six Sigma Statistics with Excel and Minitab," the new book by Issa Bass.