Step 6. How do we actually get the data?
At the end of Section 5 we have an array of points that statistically should produce data adequate to generating a useful result. However, “Designing an experiment is like gambling with the Devil…” (Sir R. A. Fisher). The ever-present noise of our equipment, laboratory, building, city, planet, and cosmos will conspire to enwrap the excellent information of our system, reducing it to dirty, messy… data.
The strategy to protect our precious results is four-fold: Educate; Replicate; Randomize; and Block.
“When running an experiment, the safest assumption is that unless extraordinary precautions are taken, it will be run incorrectly” (Box, Hunter, and Hunter). Whether your experiment is being done by other scientists, laboratory technicians, plant operators, or even yourself, it is amazingly easy for “simplifications” or “improvements” to leak through.
The overall methodology can easily be found under “Good Laboratory Practice”, covering things like training, safety, materials handling, and equipment. One of the most important tools for a designed experiment should be a Standard Operating Procedure in which the key procedures are written down. Furthermore, a standard plan sheet for each run of the experiment should be prepared, detailing not only the conditions to be changed but also those to be held constant.
Some of the more common nuisances I have encountered as the experiment reaches the lab or plant are:
Re-organizing. A careful experimenter will set the runs in a specific order (see below). Frequently these are inconvenient for the operators… so they will be “fixed”.
Oops… ran out. The batch of raw material didn’t last for the whole set – so a new batch will be brought in. No big deal…
Different data formats. The data should be collected in a form that is the same for all shifts and all sites. Otherwise, it will be a serious nuisance assembling the final worksheet!
Replicates are independent repeats of factor combinations. These repeats are the only way to get a true estimate of the experimental error. This estimate becomes the basic unit of measurement for determining whether observed differences are statistically different. Note that every factor setting should be reset between each run, in order to get true independence.
Replicates require more effort, time, and money. 100% replication is great – but we frequently settle for 50% or less.
To complete Fisher’s quotation: “Designing an experiment is like gambling with the Devil: Only a random strategy can defeat all his betting systems.” By randomizing the experiment, we are trying to “average out” all the effects of nuisance factors that might be present. Statistical methods assume that observations (or errors) are independently distributed random variables. By randomizing, we validate that assumption.
A randomized set of experiments can sometimes be difficult or slow to perform. This will force us to….
A block is a set of “relatively “ homogeneous experimental conditions. It commonly occurs when a factor is hard to change. This could be an oven with a lot of thermal mass, or batches of raw material that cannot be blended into homogeneity. These are sometimes called “nuisance factors”. A designed experiment can be divided amongst the nuisance factors in such a way that the important factors are independent and can be analyzed correctly.
The last three elements are routinely handled by most statistical software – as long as you recognize their presence!
You now have a DOE strategy that
- Your team understands
- Is robust to the Mother Nature’s tricks
In the next section, we’ll look at analyzing some of the excellent data your team will generate.
If you want to jump right to the whole strategy, contact me at +1 413 822 5006 or firstname.lastname@example.org!