Thursday, August 30, 2012

Research 102

First of all, I would like to thank those of you who haven't bothered to cancel your subscription to this blog. Secondly, I'd like to note that if I'm only going to post every 2.5 years, I completely don't blame you for canceling your subscription to this blog.

Welcome back! Feel free to read the last post again, just in case you have forgotten all the minute details of what I wrote back in 2010. (Sigh.)

Now, shall we continue?

I will remain true to my last post, where I said, "In fairness, maybe the study is just fine, and the results are true and accurate. How can you tell that? I’ll get into more details of that next time." I'd like to cover just a little bit of experimental theory, because if you're like me, it's been a long time since that high school class where you had to keep a composition lab notebook that got turned in once a week. In real life, I actually do still have a lab notebook. However, instead of the black and white cover with wide ruled pages, it's got a brown hardback cover, a control number, and I have to make scanned copies once every six months to go into a quality assurance archive.

When was the last time you heard a friend, colleague, or tv reporter say, "studies show that..." or something similar? Probably very, very recently. But how often do you hear that person cite the actual study ("a March 2011 study in the Journal of Agriculture and Food Chemistry led by Dr. Knows Something at the Institute of Human Health...")? Probably less often. And even less often than that, you go pull up that academic journal, read the article itself, and follow up on the references. Right?

So... how do you know whether the study was even real? Or done correctly? Or had valid results? Let’s be honest: academic journals are typically written in ways that make the authors sound smart. I’ve peer-reviewed and edited some of these articles, and I can tell you from an insider perspective that the bigger the words and the longer the equations you use, the less people will question you because they don’t want to look stupid by not knowing your terminology. These things aren’t written for your average Reader’s Digest subscriber. They’re written to be convincing. But even the RD subscriber can pick out a few things that will help you determine how seriously to take a study. We’ve already talked about the basis for research itself- now let’s look at experimental aspects.

When you're looking to conduct scientific research, how you go about setting up, conducting, and interpreting your experiment makes all the difference in the world. While this isn't a comprehensive list, a few things to keep in mind are sample size, sample population and controls, scenario realism, and result framing.

First of all, let’s consider sample size and population. Whether you’re looking at rats, humans, lima bean plants, or air conditioners, the number of items in your experiment makes a difference. Conducting an experiment on a single person isn’t likely to yield results worth anything. Just because I give a person a heart medication and they die of pancreatic cancer doesn’t mean that the medicine caused that. Maybe it did. Or maybe not. Just because a mother who drank coffee in pregnancy gives birth to a child with allergies doesn’t mean that all pregnant moms who drink coffee will then have allergy-prone offspring. On the other hand, if you track 1,000 patients with heart disease being given a heart medication, and 200 of them die of pancreatic cancer, there’s a much higher chance that you’ll be able to link the two. In short, if you’re reading up on a study and they only used two people, take the results with a grain of salt. If they used 70,000, that’s a point in the “good research” column.

But just having an absurdly large population size won’t get you good results. Sample population and controls are also critical. Remember how in math you had to keep solving for x? The “x” was the variable. If you got into more complicated math, you had more variables- x, y, z. In super-duper math, we ran out of regular letters and had to switch to Greek letters. At some point, it just becomes too much to keep track of and control. Well, same thing happens in experiments. The more variables you have, the less control you have over your experiment. Anything you can keep entirely constant is your control. Picture this: you want to find out whether watering lima bean plants with dilute acid kills them or makes them grow stronger. (Don’t ask me why- maybe you bought stock in a vinegar production technology and like limas?) You already know that testing 2 plants probably isn’t a large enough sample size. What if one was already diseased? So you get 100 lima bean plants. You know that they are all about the same age and from the same seed producer. You put them in identical container sizes, and fill the pots with soil from the same source. You put them all in a location with the same amount of light exposure, and where the temperature is kept at a constant 80 degrees. What you’ve just done is controlled a bunch of variables. Light, nutrients, temperature, container size, soil permeability, etc. are all factors that *could* have affected these plants. But by controlling your environment, and controlling your population to plants with similar genetic and age factors, you can be more sure that what you’re watering them with is what would be causing growth differences.

And how do you water them? Ideally, you’d want your “watering” scenario to be realistic. You’d want to know how often lima bean plants should be watered, and how much water they need. It’s not realistic to water a tiny plant with a gallon of water (or acid) a day. It’s also not realistic to never water them. In real life, maybe 0.5 cups every other day is just about right. So you set up your experiment where you water 25 of the plants with pure water, 25 plants with half water, half dilute acid, 25 plants with three-fourths dilute acid to one-fourth water, and 25 with the dilute acid. For control, you water all of them using the same batch of water and acid, and you measure out each 0.5 cup “serving” to each plant. Now you’ve got a reasonable population size, a controlled environment and population sample, and a realistic scenario. You should be able to feel fairly confident that differences you see among the groups of 25 are a result of what you’re watering them with.

By the by, feel free to steal this lima bean thing as your kid’s science fair project this year. You’re welcome.

Last but not least, how you frame your results makes a big difference in the “take-away” message, or conclusions, from your experiment. There’s a book called “How to Lie with Statistics” that was written in the 50’s (still completely valid and recommended reading!) whose title is self-explanatory. Let’s say- and I’m completely making this up- that the 25 plants that got pure water grew 5 inches each, the 50 plants with mixed water grew 10 inches each, and the plants with just acid died completely. If your experimental conclusion is that “watering plants with dilute acid kills them”… yes. That’s true in this case. BUT, you’ve left out the important point that the 50 plants that received slightly acidic water grew better than the water-alone ones! This might not seem like a huge deal, but stuff like this gets used as a scare tactic. For example, we hear all.the.time. about needing to reduce the sodium in our diets, right? Because a high-sodium diet can kill you! Well, yes, but a no-sodium diet can kill you too. A lot of life- and this should not surprise you- is about middle ground. The truth is often somewhere in between extremes. Make sure that the way results are framed makes sense, considers context, and isn’t just serving an agenda.

This is getting really long, so, if you’re still reading, bless you. But let me wrap up.

Have you ever heard of "anecdotal evidence?" Anecdotal evidence is someone's story, basically. Let's say that I know three people who ate the cafeteria's pasta salad. All three got sick. Anecdotally, evidence suggests that pasta salad from the cafeteria makes you sick. However, does that really prove that the pasta salad makes them sick? (Well… *I* wouldn’t eat it, but that’s just me…) As you know by now, that’s not a real experiment. You don’t know that those three people didn’t also drink contaminated water. Or get simultaneous exposure last week to the flu. The best solution here would probably be to send a sample of that pasta salad off to a lab and have it tested for bacteria. But in the absence of that, maybe just have the chicken instead, and don’t go write a book about how all pasta salad causes illness.

So next time you hear “studies show,” question it. See if sources are actually cited. See if the study meets the criteria of being independent and unbiased. Then look at how the experiment was run. Think about the sample size, how variables were controlled (and which weren’t controlled), and how the results are presented. You’ll be in a much better position to decide whether or not that study really DOES show what it claims to.

Class dismissed.

-Schientist