You remember the scientific method: ask a question, conduct preliminary research, form a hypothesis, conduct trials, analyze the data, and report the conclusions (rejecting or accepting the hypothesis in the process)? It’s taught repeatedly throughout grade school, usually accompanied by a pre-designed, stripped down “experiment” designed to demonstrate a concept or principle where the “answer” is already known. By the time I reached an academic level at which more robust experiments would be integrated in the curriculum (with a few, rare exceptions I mostly executed on my own time), my studies were more focused on engineering. In general, engineers are less concerned with experiments than with testing (yes, there’s a difference); in astronautical engineering, specifically, our hardware is both expensive and bespoke, so the kind of repeated testing usually thought of as aligning with experimental methods really doesn’t map well. Most of my work since then has involved observational studies, not experiments – analyzing data collected as part of operations in order to extract insights, not data collected as part of a deliberately designed and controlled experiment.
As part of a master’s program I’m working through, I had the opportunity to take a class called “design and analysis of experiments.” It was exactly what the name says it is, but the plain name disguises the enormous and fundamental utility of the course’s content. If you’re accustomed to a battery of one-off tests for bespoke spacecraft verifications, or to the kinds of “experiments” you executed in those grade school science classes, you might not even know that the deliberate design of experiments is a whole field of its own, and there is far more to it than holding things constant and collecting a bunch of data. The design and analysis of experiments course I took is one of the most directly and broadly applicable academic courses I’ve taken, and I only wish some sliver of its content would have been covered earlier in my career (both academic and professional).
Consider a simple example, like a bread recipe. Let’s say I’m working on developing a sourdough bread recipe. The first step, before we even get to experimental design, is to identify the responses and factors which will be considered – that is, what properties of the finished loaf are we going to care about and measure, and what can we either control or manipulate about the recipe that might have an effect on those responses. Maybe I want the loaf to have a crust that’s 1.5 mm thick, an oven spring of 100% from its proofed height, an average bubble size of 5 mm, and a bubble permeation of 10 per square centimeter. To achieve these results, I can manipulate the amount of sourdough starter, the time since the sourdough starter’s last feeding, the amount of salt, the amount of unbleached bread flour, the amount of whole wheat flour, the amount of water, the temperature of the ingredients, the autolyze time, the rising time, the rising temperature, the number of folds, the interval between folds, the retard time, the retard temperature, the proofing time, the proofing temperature, the oven temperature, the amount of steam, the preheating time, the shaping technique, the baking time, and the type of pan. Under a traditional experimental approach, we would need to bake 134,217,728 loaves of bread…and that would only cover the extreme conditions of all our factors, not any center points or more complex interactions of factors. Now, we could hold some of these factors constant, rather than making them active factors in the experiment, and that would reduce the number of runs needed, but even if we only manipulate the amount of starter, the amount of salt, the amount of bread flour, the amount of whole wheat flour, and the amount of water, holding everything else constant, we’re looking at 32 loaves of bread. 32 runs, for an experiment that doesn’t even consider a large number of the factors potentially influencing the results.
Design of experiments lets us be far more efficient in our experimental approach, reducing the number of runs we need to do to acquire insights, while offering techniques for gaining more insights than we could from just running a bunch of trials. The entire field is about making deliberate choices throughout the experimental process: in the design phase, the execution phase, and the analysis phase. It both considers standardized experimental designs which may be applicable, and allows for customized designs which can account for disallowed combinations of factors, different types of factors (continuous, categorical, etc.), and factors that are difficult to change or control. There are techniques for iterative experimental approaches, where the first set of runs might have low power, but give enough insight to inform and constrain subsequent batches of runs, thereby increasing the conclusions that can be drawn while reducing the number of runs which must be conducted. DOE can even account for changes or mistakes made during the testing process.
There are vast potential applications of this field, as I hope you are already imagining. Yes, it’s statistics, but statistics can be useful when correctly and carefully applied. In the future, I may write posts in which we explore how to apply DOE, but this post’s purpose is to share with you that it is a field which exists, and some small sliver of insight into how useful it can be. Both personally and professionally, I can imagine manifold situations in which the principles of DOE will be useful, and I assume the same is true for you.
