ORMS Today
October 1999

Desktop Statistics Software:
Serious Tools for Decision-Making


By James J. Swain


In this survey we look at statistical software for the desktop computer available to the OR/MS practitioner, updating similar OR/MS Today product surveys from 1993 and 1996. An appreciation of the stochastic nature of empirical data has been a mainstay of the profession from the very beginning, and these tools make it easier than ever to obtain, examine and analyze data. Access to these tools and facility with them will be an important professional asset as statistical analysis becomes more deeply integrated into organizations.

We live in a statistical age, where data from surveys and polls from product registration and warranty cards, as well as from daily sales at corporate outlets, is a torrent that floods through decision-making. This data is summarized and dissected, compared by time and by region, categorized by a dozen correlates, all with an eye to identifying product problems or attractive features that will provide us with an edge or forestall a competitor. Whether the decision is for pricing or features, or the identification of a promising marketing strategy or niche group, quantitative, statistical analysis is at the heart of the decision-making. And what is more, this data is increasingly playing a feedback role on the operations of the firm to quickly identify problems with product or manufacturing, or with shifts in consumer satisfaction.

To put it another way, statistical data analysis is at the heart of a number of problem-solving situations of particular interest to the OR/MS professional, because these problems often cut across the length of the organization. Who else is so uniquely positioned not only to appreciate the stochastic and uncertain nature of the data, but also to take a systems approach that transcends departmental boundaries, or to lead in team-based solutions? Statistical analysis of the raw data provides the empirical grounding of our analytical models, such as the stochastic and the simulation models, as well as the grist for the scheduling and optimization procedures. Data analysis is also needed in the validation of these models as well as in the continuing feedback monitoring of plans and products once implemented. Finally, the data will be collected using designed experiments to test relations that have been empirically observed in order to provide confirmation of the relationships and to estimate magnitudes.

It becomes increasingly clear that the confluence of the computer and the data network is changing not only the mechanics of data collection, but the need for timely, reliable and integrated data, so that decisions can be made accurately and quickly. Statistical analysis is only a piece of the information technology (IT) strategy that links the sources of data, the analysis and dissemination throughout the enterprise.

A Statistical Age


To observe the significance of statistics in a particular application, one need look no further than Alan Greenspan and his "Inflation Hawks" at the Federal Reserve. They watch the statistical indicators of the economy — jobs created, price indices, hours worked, compensation rates — ever alert for signs of inflation and economic overheating. The decisions that they make based on these statistical summaries can move markets, influence the U.S. economy through investment decisions and strategic planning about retrenchment or expansion, and reverberate across the world. These indicators have such importance that daily market news shows now routinely broadcast which statistics will be announced in the coming day or week, knowing that the statistic may well have an immediate effect if it is unexpected in magnitude up or down, or even if there is no change (perhaps signaling that no change in economic policy will be forthcoming).

The federal government is an important source and user of statistics for everything from economic and labor statistics to environmental, health, agriculture, education and population matters. For all of their importance, these activities generally receive little media attention outside the scheduled announcements of sensitive economic indicators; yet, the collection and computation of these statistics is not without controversy. For instance, the census proposal to perform a sampling correction for the population (i.e., correction for the population undercount) involved all three branches of the government and was ultimately disallowed in the Supreme Court, partly on constitutional grounds. An incredible amount of data is available from this source, often via computer network and CD-ROM.

As noted earlier, the use of statistics by industry has been rapidly growing in the last two decades, influenced by Japanese industrial practices and by initiatives such as the Motorola "Six Sigma" program. As noted in a recent survey article for statisticians [Hahn, et al., 1999], the penetration of statistical methods at many corporations is both broad and sustained, resulting in a more data-driven, quantitative approach to problem-solving. For instance, manufacturing and service operations have increasingly distributed the effects of product and process improvements to the operating teams themselves, whether for quality improvement, cost reduction or enhancements such as tighter specifications, improved reliability or durability. Statistical design of experiments is now increasingly prevalent at all levels of the organization and rarely relegated to statistical or other service groups, except where complexity indicates that special design expertise is required.

Echoing these observations, MacDonald [1999] highlights the role that statistical analysis plays within the corporation, particularly with the increased emphasis on shortening the design cycle. That is, once consumer research identifies that a need for a process or a product exists, the technical cycle (design to manufacturing) must be made as short as possible in order to forestall competitors and to secure market share. He identifies the importance of information technology in linking these stages together, and looks at the varied roles that statistics will play. He identifies an increasing importance on visual analysis and interpretation, increased emphasis on computing skills "linked to common business systems (e.g., spreadsheets) and database issues (e.g., organization and management)."

Changes in Statistical Software


Statistical software is widely used in the OR/MS profession and this survey of products is an update of the survey published in 1996. As in the previous surveys, product information was solicited from product vendors and is summarized in the following tables, to highlight general features and capabilities and to provide contact information. Many of the vendors have extensive Web sites for further, detailed information, and many provide demo programs that can be downloaded from these sites. Because of space limitations, no attempt is made to evaluate or rank the products, and the information provided comes from the vendors themselves. The survey will be available on the Lionheart Publishing Web site (www.lionhrtpub.com). Vendors that were unable to make the publishing deadline will be added to the on-line survey.

In the last several years the number of statistical procedures available in the typical spreadsheet package has continued to grow, and as in 1996 we may ask whether a statistical software product is really necessary for all OR/MS practitioners. Certainly for routine procedures and tests of hypotheses, basic graphics and even regression modeling, as well as for most introductory statistics courses, a spreadsheet is likely adequate. Spreadsheets, however, lack the level of detail and flexibility of the good statistical programs and may have limited diagnostic options or even adequate Help information about what the procedures are or what they signify. For specialized roles such as forecasting, design of experiments, distribution fitting and so forth, a statistical package would be preferred. Moreover, new procedures are likely to become available first in the statistical software and only later be added to the spreadsheets. In general, statistical software plays a distinct role on the analyst¹s desktop and, provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool.

One of the strongest impressions from the latest releases is the growth of visualization tools for examining multivariate data and data visualization in the temporal or spatial senses. Not only has the input shifted from the command to the menu system (GUI), but also statistical output is increasingly graphical. In addition, there appears to be a general trend from the classical estimation and hypothesis testing to the tools of exploratory data analysis to aid in the search for relations, the investigation of anomalous cases and outliers, and the examination of these cases within the factor space. Almost all of the Web sites feature dramatic graphics that are available with the software. I particularly enjoyed the dynamic linking feature that DataDesk demonstrated (www.datadesk.com/datadesk), so that data items identified in one graphical view can be highlighted in the others. Several programs have methods of linking data between graphics, and I expect to see this feature to be even more pervasive in the future.

An important feature of statistical programs (noted in 1996) was the importation of data from as many sources as possible, to eliminate the need for data entry when data is already available from another source. This includes the ability to import data from various sources and analysis programs, data transfer via the clipboard (among applications), and dynamic linking of programs (i.e., OLE). The program DBMS/COPY (www.conceptual.com) is still available for shifting data between a large number of file formats, and it includes improved data preview, editing and filtering capabilities in addition to import and export capabilities. Also highly visible in this survey is the growth of data warehousing and "data mining" capabilities, programs and training. These are tools which attempt to integrate and analyze data from a variety of sources (and purposes) to look for relations that would not be possible from the individual data sets. Specialized methodologies for these problems are already appearing in the statistical literature.

A large number of the vendors now provide families of products or modules rather than a single, omnibus statistical package, with many of the modules for specialized business functions or fields. These include specialized needs for sample surveys, quality control or process capability, medical data and toxicity, marketing, time series and so on. Within the survey we observe several specialized products, such as STAT::FIT and BESTFIT, which are more narrowly focused on distribution fitting than general statistics, but of particular use to developers of stochastic models and simulations.

References


  1. Hahn, G. J., W. J. Hill, R. W. Hoerl, and S. A. Zinkgraf (1999), "The Impact of Six Sigma Improvement ­ A Glimpse into the Future of Statistics," The American Statistician, Vol. 53, No. 3, pp. 208-215.

  2. MacDonald, G. C. (1999), "Shaping Statistics for Success in the 21st Century: The Needs of Industry," The American Statistician, Vol. 53, No. 3, pp. 203-207.




James J. Swain is an associate professor in the ISEEM Department at the University of Alabama in Huntsville. He can be reached via e-mail at swain@ise.uah.edu





  • Table of Contents

  • OR/MS Today Home Page


    OR/MS Today copyright © 1999 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Street, Suite 220, Marietta, GA 30060, USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 1999 by Lionheart Publishing, Inc. All rights reserved.