|
OR/MS Today - April 2008 Software Review Statgraphics Centurion XV Despite some learning "bumps," comprehensive stat package compares favorably with competitors. By Jack Yurkiewicz Statgraphics is a full-featured statistical software product designed to compete with SPSS, SAS, Minitab, Systat and other comprehensive programs, the premise being that the one product should satisfy the needs of most data analysis users. Hardware requirements are very modest: Windows 98 through Vista, Pentium processor, 64MB RAM, 65 MB storage. Statgraphics was introduced 25 years ago as a DOS product, and its salient function then was the integration of statistical graphs to the analysis. A few years later, another product, Execustat, was launched. Not having the advanced statistical analysis capabilities of Statgraphics, its main feature was a statistical "interpreter." The program would explain, in plain English and with minimal jargon, what the user may infer or conclude from the analysis. Neil Polhemus was instrumental in developing both products, and he has combined these selling points into the current Centurion XV incarnation. Statgraphics comes in two flavors. The Standard version, list price $695, basically covers descriptive statistics, regression analysis and analysis of variance (one way and multifactor). The Professional version, list price $1,295, augments the features of the Standard version by including more regression capabilities (nonlinear, ridge, logistic, Poisson and negative binomial regression, and also general linear models and partial least squares), statistical process control, forecasting time series, design of experiments, multivariate statistical analysis (factor analysis, cluster analysis, principal components, canonical correlations), neural networks and others. I would not recommend the Standard version because the statistical capabilities are a little sparse for the price. For example, the standard version would not be sufficient even for my introductory MBA statistics class because of its inability to do forecasting and quality control. Discounts are available for users of older versions of Statgraphics and competitive products, as well as for academics and students. You can start Statgraphics via its optional StatWizard, a menu-driven questionnaire that will help you get to the data, methodology or analysis you want. There are four main choices in StatWizard (see Figure 1): "select analysis based on the type of data," which basically opens a new or existing data set (and takes too many steps to do the latter); "select analysis by name," which has the largest set of options; select a "SnapStat" or select a "Quick Pick." A SnapStat gives a one-page preformatted output, but various additional options for displays or analyses are not available. A Quick Pick shows a shorter list of "commonly used analyses" (as defined by Statgraphics), and if you choose one, you will get more complete output. Of course, you can disable the StatWizard altogether and just choose what you want to do from the main menu, which is what I do.
Being familiar with other statistical software, my plan was to accomplish as much as I could without consulting the accompanying 287- page User Manual or the Help system in the program. I have a 251-row, 41-column data set culled from Consumer Reports about automobiles. To start, I wanted to find a 95 percent confidence interval for the average vehicle mileage of manufacturers from Asia, Europe and the United States. Since confidence intervals were not an available choice in any of the groups in the StatWizard or on the main menu, I presumed that a "one-variable analysis" would probably include it, along with other descriptive statistic analyses. Choosing that procedure (found in Describe on the main menu see Figure 2), the resulting dialog box asked me which column to analyze in one field, and then had a second field called "Select." I chose "origin: and clicked OK, and immediately got an error message telling me that Field 2 (that is, origin) must be numeric. Quickly abandoning my "real pros don't need help," stance, I clicked on Help in the Dialog box. For Select it said "subset selection," which wasn't very helpful. I didn't want to transform the data, but I did deduce, from Excel experience, if I typed in origin="USA," I would at least get the summary statistics for just the American vehicles. However, I did not get confidence intervals. Going to the Help on the main menu, clicking on Contents, and looking in the Index yielded nothing, nor did a search for "confidence intervals."
The Help system did include many PDF "on line manuals" of the various capabilities of the program, and sure enough, on page 20 of the One Sample Analysis manual, confidence intervals for the mean are discussed. But unlike paper manuals (my paper documentation for version 5 of Statgraphics, a DOS program which I still use on my handheld HP computers, consisted of more than a thousand pages), these PDF manuals do not have tables of contents nor indices, and so finding anything specific involves a quest by manually viewing the pages. In Statgraphics' defense, the paper User Manual did indeed have confidence intervals in the index and discussed it as well. The output shown is a "split screen." Statgraphics calls the left side (see Figure 3) tables (which are the numerical calculations), and the right side graphs. Each table or graph itself is inside a pane. If you right-click in any pane, you can change the specific output or the look of a graph via pane options. For example, in Figure 3, the upper left pane gives the basic one-variable descriptive statistics for the mileage of the American vehicles. Pane options give choices for additional or fewer statistics to be displayed, while pane options in the middle left pane permit you to change the number and breakpoints of the bins of the frequency distribution.
Right clicking a graphics pane gives two choices pane options and graphics options. Thus, for the upper right pane, choosing the former can give you a vertical or a horizontal box plot, show the median notch, etc., while choosing the latter allows you to change the values on the axes, titles, etc. Finally, you can control which tables or calculations and graphs are displayed by clicking on the Tables icon (in Figure 3, the second icon in the upper left side of the picture) or Graphs icon (the third icon, to the right of Tools). For example, by clicking on the Table icon, a menu appears, and I can choose, as part of my one variable summary, confidence intervals (finally!), hypothesis tests, various percentiles or a stem-and-leaf plot. Clicking on the Graphs icon gives choices for a normal plot, a density plot and others. Every Tools pane includes the numeric analysis followed by the StatAdvisor. The StatAdvisor explains, in plain English, what the calculations say or imply. Think of it as a mini-textbook that explains what, for example, a confidence interval is or what a hypothesis test implies. This typically is generic text but, interspersed with the names of the variables that you analyzed, makes the interpretation that much clearer and gives the sense that the software is talking about your specific data and analysis. While it can never replace a solid knowledge of statistics on your part, it can tell you the correct way of saying or writing what the numbers show. However, when I did a one sample, upper tail t-test that the average price of a vehicle exceeds $33,000, the p-value was 0.021. Assuming a 5 percent significance level, the StatAdvisor said: "The t-test tests the null hypothesis that the mean Price equals 33000.0 versus the alternative hypothesis that the mean Price is greater than 33000.0. Since the P-value for this test is less than 0.05, we can reject the null hypothesis at the 95.0% confidence level." I suspect that this last sentence refers to a one-sided 95 percent confidence interval, and I wonder if some users might find this a bit confusing. To quote Paul Velleman, et al, "Because confidence intervals are naturally two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an alpha level of 1/2(100 - C)%" [1]. Saving the analyses output is easy. Of course, I could have copied and pasted the output into Word, but Statgraphics has StatReporter, a standalone version of WordPad, that generates a well laid-out document (without the panes) in rich text format of the complete output of my session. Thus, you can edit the file in StatReporter or open it in Word for enhanced editing. It worked well, but invoking it is not clear at first. Nothing on the main menu did the job; after a search in the Help system, I learned that you must right-click any pane to invoke StatReporter. Finally, Statgraphics has a feature called StatFolio. As the User Manual describes it (page 103), "When a session is saved in a StatFolio, it is the definition of the analyses that is saved, not the output. When reopening a StatFolio, the data in the associated data sources is reread and all analyses recalculated." Thus, by opening a StatFolio, you can rerun your previous analyses at a later date, with the benefit that if the data is modified in the interim, the new data is analyzed. It is obvious that statistical software products do not all have the same look and feel. Getting used to these may be easier for some users and harder for others, the latter especially for those who carry the mental baggage of using other products in the same arena. Thus, besides missing an easy if-then function, I found:
In contrast to these, I also believe:
In summary, I would rate Statgraphics' ease-of-learning "very good" and the ease-of-use "very good to excellent." Statgraphics does "automatic" forecasting. By this I mean, "you enter the data and the software tells you which forecasting technique is most appropriate." Using various data sets, Statgraphics showed 16 models, with the optimal parameters for each. These included trend analysis curves (linear, exponential, and the simple S-curve), along with Brown's, Holt's, Winters' and several Box-Jenkins models. By default, the Statgraphics tries to minimize the Akaike Information Criterion (AIC), but the user can specify another criterion. I tried an 11-year time series, consisting of monthly values. Figure 4 shows some of the output. Statgraphics recommended a Box-Jenkins procedure, model M. The StatAdvisor went on to tell me that Statgraphics "also summarizes the results of five tests run on the residuals to determine whether each model is adequate for the data. An OK means that the model passes the test. One * means that it fails at the 95 percent confidence level. Two *s means that it fails at the 99 percent confidence level. Three *s means that it fails at the 99.9 percent confidence level. Note that the currently selected model, model M, passes three tests." I found Statgraphics time series forecasting features accurate and easy, and rivals the capabilities of some well-regarded, stand-alone forecasting programs.
Statgraphics also has an extensive set of statistical process control features. Besides the standard variable and attribute control charts, there are time weighted (e.g., exponentially weighted moving average, etc.), multivariate, ARIMA control charts and others. Gage studies, acceptance sampling, capability analysis, Pareto analysis and other SPC techniques are also available. Besides a large feature list, the program is intuitive and easy to use. For example, my class had to analyze hospital admission waiting times for elective procedures with the goal of finding recommended standards. Figure 5 shows the output for the standard X-Bar, Range charts and the StatAdvisor indicating the out of control samples and the various runs. Assuming that you can isolate the assignable causes, you can manually or automatically exclude the out-of-control samples, and the software gives the new control charts, fully connected, and the accompanying numerical results and analyses.
Statgraphics can easily and accurately design a single acceptance-sampling plan for attributes for a specified operating characteristic curve. You enter the standard parameters (AQL, LTPD, producer's and consumer's risks, and the lot size) and the program gives the optimal sample size and acceptance number. Since the solution is approximate (it involves the simultaneous solution of two cumulative binomial distributions), I compared the sampling plans that Statgraphics found to those from several other products. They all came up with essentially the same results. Regression analysis is, perhaps along with descriptive statistics, the mainstay for many users of statistical software. Statgraphics' output is extensive and the StatAdvisor gives succinct but easily understandable interpretations of what the numbers indicate. For example, I particularly liked its description and advice concerning influential points. Finding a prediction interval for a new observation is very easy. You just add rows to the data that specify the values for the independent variables. Statgraphics then assumes these are new observations and gets the prediction intervals. It was not obvious how to change the confidence coefficient for this interval from the default 95 percent, but it can be done (Solution: don't read the book; it's not there. You must specify that Statgraphics show the confidence intervals for the regression coefficients, change the confidence coefficient there, and this will give you the same prediction coefficient). Nonlinear regression analysis is available. You type in the nonlinear function and Statgraphics will find the optimal parameters. Some nonlinear functions are available from a menu, but these are mostly exponential and logarithmic models. I wanted to do a nonlinear trend analysis, via the Weibull, Gompertz and Pearl-Reed curves, of some sales data, and these specific curves were not available. Typing those equations using Statgraphics' syntax was very tedious. The ANOVA features and capabilities of Statgraphics are extensive. In particular, the accompanying graphs and the StatAdvisor summaries are very illuminating. For example, the StatAdvisor gave a clear summary of Tukey's HSD procedure, and gave a reason why I should use the Bonferroni procedure instead on my particular data. After trying Statgraphics' procedures on many data sets, I would rate the program's capabilities and accompanying features "excellent." To summarize, despite a few learning and using "bumps," I like Statgraphics overall and believe that it competes effectively with the leading statistical software. I did call Tech Support once, about software activation, and was immediately connected to a knowledgeable and friendly person. A 30-day trial version is available, which includes the software program with no data restrictions and PDF versions of all the documentation. I recommend that you try it.
Jack Yurkiewicz is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. Besides management science, he teaches business statistics, operations management and forecasting. His current interests include developing distance-learning courses for these topics and assessing their effectiveness. References
OR/MS Today copyright © 2008 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2008 by Lionheart Publishing, Inc. All rights reserved. |