top of page
Search

Understanding Risks in Testing - Part I Alpha Risk

  • Chris Butterworth
  • Jun 23, 2024
  • 5 min read

In problem solving, we often need to know if a change in the process, recipe or design made an improvement. Did the formulation change make the product stronger? Is this group of parts better than that group? Which supplier has the better product?





We are interested in knowing something about the new population. But we can only test a sample so we make a judgement call about the population based on our sample. The expectation is that the results of the test will inform us. But be aware that the results can also mislead. There are a few additional details about the test and the laws of probability that need to be known. Understanding these will improve your skills at problem solving.


Alpha Risk


There are two types of errors that are possible when we make our decision. We can decide that the process is better when it really isn’t. Or we can decide the process is not better when it is. Those two error types need to be understood so you can design your test in a way that minimizes the chances of making an error.


In this post, I will describe the first error type; concluding that things have improved when they really haven’t. In the language of hypothesis testing, this is rejecting the Null hypothesis when it was true. The Null hypothesis is the hypothesis of no change. This means that the process (or product) has not been improved. The other hypothesis is called the alternative hypothesis and it is the hypothesis that we are looking to prove with the test. In this case, the alternative hypothesis would be that the new process is indeed better.


Rejecting the null hypothesis when it was correct (concluding that the process was improved when it wasn’t) is a common error made in testing. We are prone to this risk because we are hopeful that the alternative hypothesis were true. We want to see test results that support our work.


This risk is known as the alpha risk and it is always a part of drawing conclusions based on test results. It exists because the laws of probability provide for it. So I think it is practical to understand this risk so you can protect yourself from making such mistakes.


Industrial Example


As an example, consider a manufacturing process which is being altered. A change has been proposed that will make the process produce better product. The plot below shows the data in the order of production.








 The histogram displays the shape and spread in the data.






We are really only interested in the new population. If the new population average was higher, that would prove that the change is a success. But we don’t have the new population yet. We can only obtain a sample and we are making the decision to go with the change based on the results of the sample.


The team knows that the average output is 39.70 and are looking to see an increase in the average. They decided to take a sample of 5 units with the change and if the average was higher than 39.70, they had a winner. Do you see a problem with that?


From a sample of five, how often would the average be higher than 39.70, if the process was not changed? The answer is, of course, 50% of the time. That is true for any sample size. Half of the samples will have a mean above the population average and half will be below.


You need to decide on the amount of change you are looking to test against.


The team decides on 39.71. If the average is above 39.71, it demonstrates an improvement and they will implement the change permanently. So here’s where running simulations adds a lot to our knowledge of the sampling and decision-making processes.


If you have data on the current process, run a simulation. Take a random sample of 5 items and calculate the average. This is one example of one outcome that would be expected under the condition of the null scenario, of no improvement.


Simulations are very useful for us because we can take as many as we want. It’s insightful, educational and free. We are only going to run the real test one time and make a decision based on the results of that one sample. Running a simulation on a computer provides a distribution of possible outcomes. So here’s a distribution of the averages from samples of size 5, performed 10,000 times.






We see that, even with no change to the process, we will obtain sample means above 39.71 fairly often. In this plot, that occurs 12.8% of the time. That’s about 1 in 8 and that feels too risky. Visually, it is the area in the histogram above 39.71 (red line).


This test plan will cause us to believe that we have made an improvement 12.8% of the time when we do not have an improvement. This is called the alpha risk. In our specific case here, it is the risk of making a decision in favour of an improvement when there is no improvement. By definition, alpha risk is the risk of rejecting the null hypothesis when the null was true.


In manufacturing, mistakes like that cost money. We don’t want to make those kinds of mistakes. How can we deal with this?


We can set this alpha risk ahead of time. Before we decide on the sample size, compare sample size with alpha risk and decide on the appropriate value. Run a few simulations, plot the results (histograms) and then decide your sample size. That may sound tedious but the learning is valuable.


We all have computers today as well as access to data. So, to run a bunch of random samples and observe the distribution of outcomes is not a complex task.


Alpha risks are often set to 5%. This is because it is set that way in most statistics text books so most of us learned it that way. But in manufacturing, where we may do such tests often, 5% isn’t low enough. And since the risk affects product quality, customer satisfaction and money, I tend to go with 1% alpha risks. No one wants to make a mistake only to find out about it months later.


Since a sample size of five came with an alpha risk of 12.8%, let’s look at sample sizes of 10, 15 and 20.


As a reminder, these histograms show the outcomes from samples taken from the current process that has not yet been improved.





You can see that the distribution of means becomes thinner as sample size increases. Larger sample sizes provide smaller standard errors. This is how we can minimize the alpha risk. The area under the curves beyond the value 39.71 are the risks. These are possible outcomes from the current sampling process. The plot below shows these risks by sample size for our situation. In our case, it takes a sample size of at least 20 to put the risk of a false positive below 1%.









This is a simulation tool that you can, and should, use every time you run a test. These simulations provide a distribution of expected outcomes. This is useful because it provides you with a basis for statistical reasoning. It makes you smarter👩‍🎓.


This post discussed the topic of alpha risk, or risk of a false positive. There is also beta risk, the risk of a false negative. This occurs when your process did improve but your sample misled you into thinking it did not. I’ll address that in the next post.


Thanks for reading.


Feel free to tag your colleagues who may be interested and to re-post this to your network.


Chris Butterworth, MBB


Industrial Problem Solving Course Creator

Belfield Academy

 
 
 

Comments


bottom of page