|
OR/MS Today - February 2005 2005 Statistical Software Products Survey: Essential Tools of the Trade By James J. Swain Statistical methods and thinking are a critical part of the OR/MS professional repertoire, and an appreciation of variability and uncertainty are a key part of both our training and worldview. As professionals we are often expected to provide models that explore the implications of variability or to make choices that minimize the effects of variability or mitigate the effects of various risks. Often the risks only threaten the smooth operations of our operations, as when we seek to ensure on-time delivery of product at a cost that we can afford, given the uncertainties of schedules and the reliability of distribution. Not all risks are so minor, as was illustrated in the recent tsunami that wreaked havoc along coastlines of Asia. The magnitude of the disaster is compounded by its unpredictability. Statistical software is therefore an essential tool for practitioners whether the task is to quantify a risk, make an inference among uncertain outcomes, or forecast the demand for a product. This biannual update of the OR/MS Today statistical software survey provides a current snapshot of the tools for the OR/MS practitioner. These tools cover a variety of capabilities and prices. Each survey demonstrates an increase in the power and range of existing statistical software and introduces new tools. Often methodologies that recently were available only to the specialist are now generally accessible to all through commercial software. Statistics have their roots in political administration. The census dates from ancient times as an enumeration of population and taxable property that was essential to the administration of empires such Rome. A decennial census is required by the U.S. Constitution to reapportion representation in Congress among the states as the population distribution fluctuates. The range of matters now covered by the census surpasses enumeration alone and includes demographics, employment, education, income and so on. Statistical measures of the economy, such as orders for durable goods, jobs growth and housing prices inform decisions about monetary policy, corporate investment and hiring, and often move the stock market when the changes are different from expectation. The growing debate on Social Security will hinge in part on competing statistical projections of population demographics, immigration, and expected revenues and receipts to Social Security over a 75-year horizon. Public health is another key area of government statistics, both to provide summary measures such as life expectancy and general risk factors to health and to trigger early warning for new health threats. Agencies such as the Center for Disease Control monitor the incidence of illness and death for signs that might signal fresh outbreaks of familiar diseases or new threats. The terrorist threat, which includes the possibility of both radiological or biological attacks, has increased the importance of this statistical vigilance and made it a matter of national security. Economics is a massive area of statistical study. Modern economies are too large and capital flows too massive to be measured directly, and we rely on sampling to infer the GDP, changes in consumer and producer prices, and a hundred other indicators. Sampling is applied in marketing to determine customer needs and desires; in quality, to determine customer satisfaction and product performance; and even in accounting and auditing to determine inventory and compliance with accounting procedures. In manufacturing, production scheduling is based upon forecasts about demand, estimates of productivity, and prices and availability of raw material. In recent months, oil pricing has depended in part on the vagaries of the weather (for heating oil), political events in Nigeria and Iraq, and the growing demand for oil in countries such as China. One trend in analysis has been an increasing emphasis on the techniques for large data sets and the connection between analysis and the IT infrastructure that obtains the data. Until the advent of the computer, statistical methods were limited both by the computational complexity of the analysis and the amount of data. Software has reduced the computational burdens of data analysis, and the traditional trickle of data has been replaced by a flood. Automated sensing and communications have made it possible to capture production, distribution and sales data instantaneously and aggregate it by region, nationally or even worldwide. The Internet and electronic business applications have accelerated the process of data accumulation. In this data-rich environment, new tools have been developed to search for information within the data. These data mining techniques often involve classification methods such as clustering to search for commonalities that define groupings, to which other techniques can be applied, as well as novel graphical approaches to investigate multivariate relations among variables. Another trend has been the penetration of statistical analysis to all fields. The general availability of statistical software has put these tools into the hands of anyone with data and a computer. Initiatives such as Six Sigma are designed to look for opportunities for improvement, collect and analyze data, and recommend action, and most frequently are conducted by the team initiating the study. This democratization of analysis is often extended to design of experiments and other specialized techniques that might have once required a specialist. The increasing emphasis on modeling and simulation in procurement championed in the government (simulation-based acquisition in the Department of Defense, for instance) is also being adopted by corporations, and this will require statistical support as well. Products that provide statistical add-ins available for use with spreadsheets remain prominent. The spreadsheet is the primary computational tool in a wide variety of settings, familiar and accessible to all. Many procedures of data summarization, estimation, inference, basic graphics and even regression modeling can be added to spreadsheets in this way. This is reflected in many texts for introductory statistics courses that exploit the availability of these tools to perform computations needed in the course. The functionality of products for use with spreadsheets continues to grow, including risk analysis and Monte Carlo sampling. Products such as Palisade's @Risk and Decisioneering's Crystal Ball extend the modeling capabilities of spreadsheets in addition to statistical analysis. Dedicated general and special purpose statistical software generally have a wider variety and depth of analysis than available in the add-in software. For many specialized techniques such as forecasting, design of experiments and so forth, a statistical package is appropriate. Moreover, new procedures are likely to become available first in the statistical software and only later be added to the add-in software. In general, statistical software plays a distinct role on the analyst's desktop, and provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool. One of the strongest impressions from the latest releases is the growth of visualization tools for examining multivariate data and data visualization in the temporal or spatial senses. In addition, there appears to be a general trend from the classical estimation and hypothesis testing to the tools of exploratory data analysis to aid in the search for relations, the investigation of anomalous cases and outliers, and the examination of these cases within the factor space. Almost all of the Web sites feature dramatic graphics that are available with the software. An important feature of statistical programs was the importation of data from as many sources as possible, to eliminate the need for data entry when data is already available from another source. Most programs have the ability to read from spreadsheets and selected data storage formats. Also highly visible in this survey is the growth of data warehousing and "data mining" capabilities, programs and training. Data mining tools attempt to integrate and analyze data from a variety of sources (and purposes) to look for relations that would not be possible from the individual data sets. A large number of the vendors now provide families of products or modules rather than a single, omnibus statistical package, with many of the modules for specialized business functions or fields. The SAS and STATISTICA programs are clearly aimed at support for the entire corporation. These include specialized needs for sample surveys, quality control or process capability, medical data and toxicity, marketing, time series, data warehousing and data mining tools. Another approach is evident from the Statpoint offerings, which provide statistics via the Internet and Java Statbeans for corporate developers. Within the survey we observe several specialized products, such as STAT::FIT and BESTFIT, which are more narrowly focused on distribution fitting than general statistics, but are of particular use to developers of stochastic models and simulations.
OR/MS Today copyright © 2005 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2005 by Lionheart Publishing, Inc. All rights reserved. |