OR/MS Today - April 2003



ORacle


Marvelous Mary's Parable

by Douglas A. Samuelson


The OR/MS analyst was frightened and shaken. The phone call had come an hour earlier, interrupting a lively discussion of computer performance measurement. Now he was in the emergency room of a hospital near his home. His wife, pale and also frightened but now conscious and alert, had been in a car wreck on her way to work.

A police officer was standing by her bed, asking her about the accident. "I was going east on the service road for Highway 50," she explained, "and at Carlin Springs Road, this car came fast from the left and hit me. I woke up on a stretcher."

"Ma'am," the policeman asked gently, "did you see the sign at that intersection, 'All traffic must turn right, 7 to 9:30 a.m. weekdays?'"

"I'm not sure," the analyst's wife replied.

"Well," the officer told her, "accidents like this are why that's there. It's just too dangerous of an intersection when it's busy. It's hard to see what's coming from your left. Why did you keep taking that route?"

"I'd never had a problem before," she protested feebly.

Despite his preoccupation with his wife's well-being, the analyst couldn't help thinking back to his job with a sudden flash of insight. Just before the phone call came in, one of the senior computer scientists had offered the exact same explanation, word for word, for why they hadn't checked some oddities that showed up from time to time on the system logs. The analyst had pointed out, "Have you been reading the stories about the space shuttle crash? Of course they haven't made a formal finding of a cause yet, but did you see the story that the Columbia orbiter had had about twice as many 'near-miss' re-entry problems as any of the other shuttle orbiters? I'll bet the investigation will find that the NASA people overlooked some recurring problems because there was always a work-around. We had better be sure we're not doing the same thing."

The next day, back at his client's offices, he shared the story about his wife and what he had learned. "We do tend, naturally, to play down the risk of things that haven't happened," he re-emphasized. "So let's look at this whole system performance analysis as if we didn't know what hasn't been a problem."

"You know," said one of the senior computer engineers, a fellow known as the best in-the-field trouble-shooter in the company, "you raise an interesting point. We have these models, mostly from people like you, of how often, on average, certain kinds of failure will occur. If your model predicts a failure every 5,000 hours, and we've run the system for a while, and it's having that kind of failure more like every 500 hours, we know there's a problem with the system and with the model. We start looking for the cause of the system problem and you re-examine the model. Right?"

The analyst nodded.

"But what happens," the engineer continued, "if the system only fails every 50,000 hours? That's an order-of-magnitude error in the model, too! Of course we're congratulating ourselves on how well the system works, so we don't call you and tell you the model is way off. How do you even find out about modeling errors like that?"

"It's not something I think about much," the analyst admitted.

"Well, start," the engineer advised him. "Did you ever think about what happened on Jan. 1, 2000?"

"Yeah," the analyst laughed. "Not much, after a whole lot of fuss."

"Right, not much," the engineer went on, "including in places where they had done nothing to prepare! At my doctor's office, the statements didn't come out right, because the billing program had the bug and it had never been fixed. So what happened? Mary, the office manager, was used to all kinds of computer crashes and knew how to work around everything. So the problem we computer gurus had predicted really did occur, at least there, but it never got reported! Now, how many more Marvelous Marys do you think there are in the world, quietly patching over problems we should know about if we're really going to understand how systems work?"

"And if we don't," the analyst added, "what happens when a bunch of those Marvelous Marys retire, or take other jobs, if they don't leave a good record of those workarounds for someone else to take over?"

"Then you get what some big software companies already have," the engineer replied. "Remember how long it took for some of the biggest software vendors to acknowledge security holes and other major bugs in their software? If nobody had reported it, with enough documentation that their engineers could find it, they responded, truthfully enough, 'We have no evidence of a problem with this.' After a few rounds of that, people stopped reporting the problems and just developed patches and workarounds themselves. So it took a long time for the vendors to see the problems with their products. In fact, some of them still don't."

"I've learned at least one thing from this," the analyst averred. "Any time some system works much better than my model predicted, I'll be looking for why the model was wrong — and what hidden processes made the system look so good! "And one more thing," he added. "If I or my wife or anyone else I can influence sees a warning that something isn't safe, even if it looks safe to us, we won't push our luck!"



Doug Samuelson is president of InfoLogix, Inc., a consulting company in Annandale, Va. He is also an adjunct professor at The George Washington University and at the University of Pennsylvania, and an external research professor at the Krasnow Institute, George Mason University.





  • Table of Contents

  • OR/MS Today Home Page


    OR/MS Today copyright © 2003 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Rd., Suite 220, Marietta, GA 30060 USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 2003 by Lionheart Publishing, Inc. All rights reserved.