![]() February 2000 The Right Stuff By M. Elisabeth Paté-Cornell and Robin L. Dillon There are many areas in the space program where operations research and management science tools have been successfully applied, and there are still many where they could provide significant benefits. The culture of the U.S. space program, especially unmanned projects, is currently changing because of an increasing pressure to develop missions under rigid schedule and budget constraints. In this environment of faster-better-cheaper (FBC) missions and after the recent, highly publicized Mars failures, NASA needs to address the issues of risk acceptability, allocation of scarce resources, teamwork and decisions under uncertainty. The relevant analytical tools are therefore those that permit the assessment of failure probabilities and optimization of resource allocation accounting for human and management factors. Available operations research and management science techniques include engineering probabilistic risk analysis (PRA) [1], its extensions to include human and management factors [2], decision analysis [3] and nonlinear programming [4]. When applying these tools to space programs, analysts must rely on Bayesian probability and statistics [5,6]. Classical statisticians base probabilities on past frequencies, but space missions are unique events in an unfamiliar environment, components are often improved between flights, and the sample size of similar projects or even spacecraft parts is small. Decision makers thus need to rely on all available information including test results, statistical data on past performance in flight, and expert opinions to assess Bayesian probabilities that reflect their current knowledge and beliefs. The current state of analyses NASA has generally relied on quantitative engineering models, qualitative management support tools and several complex quantitative risk analyses. Prior to the Challenger accident, shuttle program managers primarily used qualitative failure modes and effects analyses (FMEAs) to identify risks and to set priorities based on a critical items list (CIL). Following the Challenger disaster, NASA and its contractors reviewed all shuttle FMEAs and updated the CIL. The result was an increase in the number of critical items from 2,369 to 4,686 [7]. With so many high-priority items, NASA needed a more quantitative approach to set priorities among upgrades. Several shuttle components were analyzed. Quantitative PRAs were performed for specific subsystems (e.g., the auxiliary power units [8] and the tiles [9]), and in 1995, the first comprehensive shuttle PRA was completed [10]. Other shuttle PRAs followed, and some are currently performed, both at Johnson Space Center and at Marshall Flight Space Center based on a software code called QRAS (Quantitative Risk Assessment System). In addition to the shuttle, the Cassini spacecraft team, for example, was required to perform a detailed quantitative risk assessment to justify the safety of launching a system with radioactive material on-board. Smaller quantitative studies have focused on specific risks or decisions in projects. For example, in the 1970s, NASA used decision analysis to analyze trajectories of the Voyager missions to the outer solar system [11] and to plan the Viking missions to Mars [12]. More recently, a high-level risk analysis was performed for the rover of the Mars Pathfinder mission. Opportunities for Analytical Tools What could NASA do systematically to better manage risk, allocation of scarce resources, teamwork and decision-making under uncertainty? Here are a few examples: Optimization of spacecraft design. When funding for spacecraft involved large budgets, there were little design limitations. Although failure risks were always high, especially in the launch phase, maximum flight safety and mission assurance was the primary objective. By comparison, the current unmanned FBC space programs require some tradeoffs and the explicit acceptance of a residual risk. Therefore, mission directors might like to know "how close to the edge of the risk cliff" they can find themselves when they agree to manage a project with limited resources. PRA is an essential tool to optimize the technical design and maximize safety within resource constraints. It is particularly useful when extended to account explicitly for the effects of human decisions and actions on the variables of the risk model and of the management decisions on the probability of human errors [2]. Based on PRA, we used Karush-Kuhn-Tucker optimization techniques to determine the minimum achievable failure probability given a system configuration and a specified budget to derive its optimal allocation for the reinforcement of the different components [13]. Shadow costs. At the highest management level where they are set, constraints need to be carefully determined based on their shadow risk costs. In some cases, the constraints could perhaps be tightened. For example, at the margin, money is sometimes spent on tests that are not justified by their value of information. In other cases, additional funds would yield risk reduction benefits that would justify the costs. It seems that managers seldom have a clear idea of the shadow risk costs when they set schedules and budgets because their effects on system safety are difficult to guess. In practice, people often do satisfy these constraints in ways that the upper management may not have anticipated, or would not actually like if they were aware of it [2]. Additional resources might also greatly improve the scientific benefits of a mission. The lander of Mars Pathfinder was designed to operate for one month and the Sojourner Rover for one week. Both lasted longer than anticipated. Yet, lengthening the mission by making the components more robust might have permitted gathering data in other areas of the planet. This is a case where computing the shadow costs of the resource constraints (assuming that the system is optimized) would provide valuable insights into the cost-benefit tradeoff. Computation of the marginal costs of constraints should not stop at the boundaries of a specific mission. If a project is critical to the success of future missions, the costs of losing it should include the delays and the loss of data that will be incurred by these future missions. For example, because of the recent loss of the Mars Climate Orbiter, the Mars Global Surveyor orbiter now provides the only communication relay for future landing missions to Mars. Therefore, it is absolutely critical to the success of the Mars exploration program. "Warning systems" and the value of information. Efficient warning systems are likely to detect signals of problems while avoiding both false alerts and missed signals. It must be sensitive enough to detect problems with sufficient lead time, but not so much so that it sends disruptive alerts too soon. Probabilistic modeling can be used to optimize the sensitivity of such a system accounting for the lead time required for appropriate response [14]. In an organization, the model of the response needs to also include the time that it takes for the signal to reach the appropriate decision-maker and for mitigation measures to be implemented [15]. In the case of NASA, these individuals may include both contractors and mission managers. The probability that the problems are detected and when, is influenced by the warning structure in place. Traditionally, a large fraction of the costs of a mission is allocated to the testing of equipment. There is a point, however, where additional testing brings little additional information because decisions will not be affected by their results, and money might be better spent reinforcing the system. The value of information of additional tests depends on prior probabilities of problems, available alternatives, probabilities of testing errors and on risk attitudes. It can be assessed using the classical tools of decision analysis [16]. Management of uncertainties in the development of innovative technologies. Decision analysis provides useful support for technology development decisions; for example, should technologies be developed within a mission or at a higher level of the organization? In the first case, the mission may suffer delays and cost overruns, but if these developments are left to entities central to the organization, what is developed may or may not be what is most needed. The attractiveness of alternative research and development options depend on: 1. the probabilities of success of each option given the resources available, 2. the mission alternatives given success or failure of specific technology developments, and 3. the value of the mission and its consequences for the whole program for each possible outcome. Combining these different elements in a probabilistic model provides support for several key decisions, from the scope of a project to the management of R&D problems considering implications for the overall program success. Comparative value of scientific and technological results across alternative projects. Multi-attribute utility theory is also an important tool to assess the value of a particular mission to NASA scientists and managers, as well as to the public. Attributes of a space mission value include the novelty of potential discovery, the amount of knowledge gained, the importance of the knowledge area, and the excitement value to the public, which, in turn, may influence the future funding of the agency. A coherent valuation function can be helpful to support in a consistent manner a number of NASA decisions including project comparison, scope (where to go and what to observe) and timing. These decisions are made today based on a proposal review process that includes some quantification. A more structured quantitative process, however, might improve NASA's confidence in the quality, consistency and justification of its decisions. It might also help mission proponents to improve their proposals before submitting them to NASA. Optimal combination ofspacecraft and launch vehicles. The smaller the spacecraft, the larger the proportion of the mission cost allocated to the launch vehicle. Therefore, it may seem more advantageous to choose larger launch vehicles carrying a greater number of instruments. But the combination of small launch vehicles and small spacecraft has the advantage of not putting too many eggs in the same basket. The optimal combination depends on the probability of launch vehicle failure and the costs of the launch vehicles. It also depends on the cumulative value of multiple small missions, and the value of larger missions where more can be achieved at the margin for the same cost. There may also be a point where the total cost of the launch vehicle becomes too large by comparison to the cost of the spacecraft itself, and where it may be preferable to turn to larger launch vehicles to achieve economies of scale. Conclusions Good management decisions supported by quantitative analyses tailored to individual systems can provide substantial benefits. Obviously, analytical tools don't replace the common sense and the savoir-faire of a good manager. Yet, when cutting budgets and restricting mission scopes, risk estimates matter. The intention in restricting budgets is to take more risk in each individual mission, but also to gather more information globally. These "calculated risks" need to be computed because they are not intuitive. For example, money has to be strictly allocated among subsystems within each mission, and this allocation affects the reliability of the whole spacecraft. Furthermore, the horizon cannot stop at a single project when several missions are interdependent within a program. In that case, programmatic risk computations must be extended to future missions so that the whole program value can be optimized within the constraints as set. Budget constraints can affect the risk of system failure if the spacecraft involves less robust components, if there are fewer redundancies, or if less testing is performed and therefore, problems are less likely to be detected and corrected. Although they are simple, single-string design implies an incremental faihire probability that is quantifiable through probabilistic risk analyses. In some cases, this kind of design may be justified by the costs of redundancies both in terms of dollars and mass, and by the robustness of the single components. In other cases, given the low cost of spares and a relatively high probability of component failure, it may not be the best option. This tradeoff is worth quantifying, in particular, when extending the potential life of the mission can bring a large amount of additional data, or when part of a mission is critical to later ones. In that case, the risk should be computed within the context of the whole program. Similarly, additional tests provide additional information of quantifiable value. Assessing an optimal level of testing may point to additional test needs or to opportunities to reallocate funds for greater benefit elsewhere. NASA's traditional qualitative risk management matrices provide fast decision rules, but they allow neither comparison of the failure risks associated with different designs and operations options, nor optimization of the probability of mission of success under budget constraints. NASA should use PRA in a more systematic and coordinated way than it currently does. The level of sophistication and therefore, the cost of the PRAs, should be adapted to the decisions to be made. In any case, whatever the chosen level of complexity, great attention should be given to the model assumptions, the appropriateness of the data and corresponding uncertainties in the results. Probabilistic methods have a subjective component and their effective use relies on competent, independent and credible analysts. In general, PRA is better used to set priorities than to assess an overall probability of mission failure. In any case, there should be no pressure on the analyst to get the "right figure," i.e., that which the institution or individual managers would like to hear; otherwise, the result may simply be wrong. Core competence in risk analysis should exist within each project to permit the use of PRA as a management tool. As such, it would allow optimization of the allocation of scarce resources, and help managers to decide at what point constraints can no longer be tightened because it would compromise not only the performance of a specific mission, but also the program as a whole. References
M. Elisabeth Paté-Cornell is the Burt and Deedee McMurtry Professor in the School of Engineering, and chair of the Department of Management Science and Engineering at Stanford University. Her field of expertise is engineering risk analysis. In 1990, she did a study of the risk of shuttle accident due to a failure of the black tile of the Thermal Protection Systems, and from 1996 to 1998, was a member of the NASA Advisory Council. Robin L. Dillon is an assistant professor in the Pamplin College of Business at Virginia Tech. Paté-Cornell and Dillon recently completed a series of case studies examining Faster-Better-Cheaper space missions for NASA at the Jet Propulsion Laboratory. OR/MS Today copyright © 2000 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Street, Suite 220, Marietta, GA 30060, USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 1999, 2000 by Lionheart Publishing, Inc. All rights reserved. |