|
OR/MS Today - February 2007 Statistical Software Analysis Survey Finding a Path in an Uncertain World Biennial update of statistical software survey provides current snapshot of tools for O.R. practitioners. By James J. Swain The modern world is full of uncertainty and variation. Last summer the cost of gasoline varied almost daily, subject to the interplay of demand, current stock levels and interdictions of the weather or politics that are beyond our control. A freak storm earlier this year brought freezing temperatures to the California citrus crop and snow to Malibu, while the improbable success of the New Orleans Saints during the 2006 NFL season served as a counterpoint to the devastation wrought by Katrina and to the slow pace of recovery. In the last five years we have adjusted to the threat of terrorism, living with the increased difficulty of travel and the uncertainty about when another attack will occur and whether present countermeasures will be sufficient to stop or to reduce the severity of the attack. Meanwhile, mass production is increasingly giving way to mass customization in which every unit produced is unique (or produced in a batch size of one); in the case of home computers this process may only take days from order specification to delivery. Such a facility must plan for the variations of demand for many different configurations and options without the benefit of the buffer stocks that were the mainstay of dealing with variation of demand in the past. In addition, manufacturers are constantly under pressure to develop innovative products and produce them more efficiently with increasing quality. Each step is complicated by variation that must be identified and dealt with. It may be that Einstein was correct in his complaint that "God does not play dice," but probability and statistics have proven to be good approaches to quantify and to gain understanding in our variable world. Statistical methods and thinking are a critical part of the O.R. professional repertoire, and an appreciation of variability and uncertainty are a key part of both our training and worldview. As professionals we are often expected to provide models that explore the implications of variability or to make choices that minimize the effects of variability or mitigate the effects of various risks. Often the risks only threaten the smooth operations of our operations, as when we seek to ensure on-time delivery of product at a cost that we can afford, given the uncertainties of the product mix and the vagaries of distribution. Statistical software is therefore an essential tool for practitioners whether the task is to quantify a risk, make an inference among uncertain outcomes or forecast the demand for a product. This biennial update of the OR/MS Today statistical software survey provides a current snapshot of the tools for the O.R. practitioner. These tools cover a variety of capabilities and prices. Each survey demonstrates an increase in the power and range of existing statistical software and introduces new tools. Often methodologies that recently were available only to the specialist are now generally accessible to all through commercial software. Statistical methods have their roots in political administration. The census dates from ancient times (at least 470 B.C.) as the enumeration of population and taxable property that was essential to the administration of empires such as Rome. Using statistics for inference is a more recent phenomenon, though antecedents exist in the ancient world. For instance, in 74 B.C. the Roman General Lucullus used a sample of three to confirm his hunch that the army of Mithridates was short of supplies and that battle could be avoided to his advantage (Missiakoulis, 2006). The results of the decennial U.S. census is so important that methodological adjustments for population undercount have been challenged in the courts. The range of matters now covered by the census includes demographics, employment, education and income. Statistical measures of the economy, such as orders for durable goods, jobs growth and housing prices inform decisions about monetary policy, corporate investment and hiring, often move the stock market when the changes are different from expectation. Public health is another key area of government statistics. Agencies such as the Center for Disease Control monitor the incidence of illness and death for signs that might indicate fresh outbreaks of an epidemic. This syndromic surveillance uses some of the same tools from Quality Control that are used in manufacturing, including the CUSUM (cumulative sum) and EWMA (exponentially weighted moving average) techniques, to provide a signal that the limits of "normal variation" of disease incidence have been exceeded and that a "special cause" such as an epidemic or biological attack may be at work. The terrorist threat, which includes the possibility of a biological attack, has increased the importance of this statistical vigilance and makes it a matter of homeland security. Statistical means can sometimes infer from data what is not apparent even to specialists. For instance, the statistician Ross Cunningham assisting biologists was the first to observe differences among a possum species in Australia that could not be explained on environmental factors alone. This led to further studies and the recognition that there were two distinct species using genetic data (Hall, 2003). This discovery is particularly remarkable since the species has been known for 170 years and the discovery of new mammalian species is extremely rare. Economics and business are a massive area of statistical study, and multivariate methods similar to those used in the possum study are often used to divide the general population into "species" of like-minded buyers. Markets are increasingly fragmented, so advertisers attempt to target their messages at those segments of the population that are likely to be receptive to the products being offered. The population has been divided into clusters that represent homogeneous units with similar means and interests that may be matched to particular product preferences. Online merchants track consumer purchases and attempt to find patterns, so that they can target products that are mostly likely to be of interest. The increasing pace of online purchases has provided analysts with large data bases of consumer data, and that has led to an increasing emphasis on analysis techniques for large data sets. Software has reduced the computational burdens of data analysis, and this has been particularly important, as the traditional trickle of data has been replaced by a flood. Automated sensing and communications have made it possible to capture production, distribution and sales data instantaneously and aggregate it by region, nationally or even worldwide. The Internet and electronic business applications have accelerated the process of data accumulation. In this data-rich environment, new tools have been developed to search for information within the data. These data mining techniques often involve classification methods such as clustering to search for commonalities that define groupings, to which other techniques can be applied, as well as novel graphical approaches to investigate multivariate relations among variables. Products that provide statistical add-ins available for use with spreadsheets remain common. The spreadsheet is the primary computational tool in a wide variety of settings, familiar and accessible to all. Many procedures of data summarization, estimation, inference, basic graphics and even regression modeling can be added to spreadsheets in this way. Statistical tools include PaceXL, NAG Statistical Add-in for Excel and UNISTAT. The functionality of products for use with spreadsheets continues to grow, including risk analysis and Monte Carlo sampling. Products such as Decisioneering's Crystal Ball and Palisade's @Risk extend the modeling capabilities of spreadsheets in addition to statistical analysis. Dedicated general and special purpose statistical software generally have a wider variety and depth of analysis than available in the add-in software. For many specialized techniques such as forecasting, design of experiments and so forth, a statistical package is appropriate. Moreover, new procedures are likely to become available first in the statistical software and only later be added to the add-in software. In general, statistical software plays a distinct role on the analyst's desktop and provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool. An important feature of statistical programs was the importation of data from as many sources as possible, to eliminate the need for data entry when data is already available from another source. Most programs have the ability to read from spreadsheets and selected data storage formats. Also highly visible in this survey is the growth of data warehousing and "data mining" capabilities, programs and training. Data mining tools attempt to integrate and analyze data from a variety of sources (and purposes) to look for relations that would not be possible from the individual data sets. Within the survey we observe several specialized products, such as BESTFIT and STAT::FIT, which are more narrowly focused on distribution fitting than general statistics, but of particular use to developers of stochastic models and simulations. THe 2007 Statistical Analysis Software Survey
OR/MS Today copyright © 2007 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2007 by Lionheart Publishing, Inc. All rights reserved. |