A Zest of Non Parametric Testing -The Chi Square Test

By Issa Bass

The Chi Square Goodness-Of-Fit test
In our
hypotheses testing examples, we used means and variances to determine if there were statistically significant differences between samples. What happens if the data we want to compare cannot be reduced to means and variances? What if the data are nominal or ordinal?
Suppose that a molding machine has historically produced metal bars with varying strength (measured in PSI) and the strengths of the bars are categorized in the following table. The ideal strength is 1998 PSI.

Strength

Proportion

2000 PSI

5%

1999 PSI

9%

1998 PSI

65%

1997 PSI

10%

1996 PSI

6%

1995 PSI

5%

After the most important parts of the machine have been changed, a shift supervisor wants to know if the changes made have made a difference to the production. She takes a sample of 300 bars and finds that their strengths in PSI are as follow:

Strength

bars

2000 PSI

22

1999 PSI

45

1998 PSI

198

1997 PSI

30

1996 PSI

9

1995 PSI

1

Based on the sample that she took, can we say that the changes made on the machine have made a difference?
In this case, we cannot use a hypothesis testing base on the mean since we cannot add the percentages and divide them by six and conclude that we have or do not have a mean strength nor can we add number of bars and divide them by six to determine mean.
Since the data that we have is not additive, we will use a non parametric testing called the Chi Square Goodness -Of -Fit Test.

The Chi Square Goodness-Of-Fit test compares the expected frequencies (the first table) to the actual (or observed) frequencies (the second table).

The formula for the test is:

With

= Expected frequency
= Actual frequency

The degree of freedom will be given as

df = k 1

Chi square cannot be negative since it is the square of a number, if it is equal to zero, all the compared categories would be identical, therefore Chi Square is a one tailed distribution.

The null and alternate hypotheses will be:

: The distribution of quality of the products after the parts were changed is the same as before the parts were changed

: The distribution of the quality of the products after were changed is different than it was before they were changed

We will first transform the table with the percentages to obtain the absolute values of the number of products that would have been obtained had we chosen a sample of 300 products before the parts were changed. 

Strength

Proportion

 

2000 PSI

5% * 300

15

1999 PSI

9% * 300

27

1998 PSI

65% * 300

195

1997 PSI

10% * 300

30

1996 PSI

6% * 300

18

1995 PSI

5% * 300

15

Total

 

300


 

Now we can use the formula to determine the value of the calculated Chi Square


With a confidence level of 95%, alpha = 0.05 and a degree of freedom of 5 (
df = 6 1), the criticalis equal to 11.0705.

The next step will be to compare the calculatedwith the Critical found on the table. If the Critical (found on the table) is greater than the calculated, we cannot reject the null hypothesis, otherwise, we reject it.

Since the calculated Chi Square (32.88) is lot higher that the Critical value (11.0705), we have to reject the null hypothesis. The changes made on the machine have indeed resulted in changes in the quality of the output.

Contingency Analysis – Chi Square Test of Independence
In the previous example, we only had one variable which is the quality level of the metal bars measured in terms of strength. If we have two variables with several levels (or categories) to test at the same time, we use Chi Square Test of Independence.  
Suppose a chemist wants to know if the effect of an acidic chemical on a metal alloy. The experimenter wants to know if the use of the chemical called Acidic accelerates the oxidation of the metal. Samples of the metal were taken and were immersed with the chemical and some were not. Of the sample that was immersed, traces of oxide were found on 79 bars and no trace of oxide was found on 1091 bars and for those that were not immersed with the chemical, traces of oxide were found on 48 bars and no oxide was found on 1492 bars. The findings are summarized on the table bellow.

 

Acidic

Non-acidic

Oxide

79

48

No Oxide

1091

1492

In this case, if the acidic chemical has no impact on the oxidation level of the metal, we should expect that there would be no statistically significant difference between the proportions of the metals with oxidation and the ones without oxidation with respect to their groups.

If we call the proportion of the bars with oxide that were immersed in the chemical and the proportion of the bars with oxide that were not immersed in the chemical, the null and alternate hypotheses will be as follow:

 

Let’s rewrite the table adding the totals

 

Acidic

Non-acidic

Total

Oxide

79

48

127

No Oxide

1091

1492

2583

Total

1170

1540

2710

The grand mean proportion for the bars with traces of oxidation is:


The grand mean proportion of the bars without traces of oxide is:

Now we can build the table of the expected frequencies

 

Acidic

Non-acidic

Total

Oxide

0.046864 * 1170 = 54.830295

0.46864 * 1540 = 72.16979

127

No Oxide

0.953137 * 1170 =1115.169705

0.953137 * 1540 =1467.83021

2583

Total

1170

1540

2710

Now that we have both the observed data and the expected data, we can use the formula to make the comparison.

The formula that will be used in the case of a contingency table is slightly different from the one of Chi Square Goodness-Of-Fit.

With a degree of freedom

Df = (r 1)(c 1)

C = number of columns
r = number of rows
The degree of freedom for this instance will be (2 – 1)(2 – 1) = 1. For a significance level of 0.05, the Critical found on
the table would be 3.841.

We can now compute the test statistic

54.830295

79

584.1746

10.65423

72.16979

48

584.1787

8.094505

1115.169705

1091

584.1746

0.523844

1467.83021

1492

584.1787

0.397988

Totals

 

 

19.67057

The calculated is 19.67057 which is a lot higher than the Critical  which is 3.841 therefore we have to reject the null hypothesis. At confidence level of 0.05, there is enough evidence to suggest that the Acidic chemical has an effect on the oxidation of the metal alloy.

Using SigmaXL to test our results. SigmaXL is a very powerful statistics software that is very easy to use at a very competitive price.

After having pasted the data on a SigmaXL worksheet, from the menu bar, click on SigmaXL, the on Statistical Tools and from the submenu, select Chi-square test two way table data as indicated below

When the Chi-Square Table Data box appears, select the area containing the data, then press on the Next>> button

The results appear as shown below.

For those of you who thought that I could not do it, SigmaXL has vindicated me.

The degree of freedom is 1, the calculated Chi- square is 19.671.
The P –Value of zero suggest that there is a statistically significant difference and therefore we have to reject the null hypothesis.

It was fun!!!!!!!!!


About the author
Issa Bass is the managing editor of SixSigmaFirst. He can be reached at issa@sixsigmafirst.com

Tell us what you think about this article. Send a note to the Editor.

 

www.manorhouseassociates.com

 

Place your Ad here
Six Sigma Statistics
Order "Six Sigma Statistics with Excel and Minitab," the new book by Issa Bass.