OR/MS Today - April 2008



Software Review


Statgraphics Centurion XV

Despite some learning "bumps," comprehensive stat package compares favorably with competitors.

By Jack Yurkiewicz


Statgraphics is a full-featured statistical software product designed to compete with SPSS, SAS, Minitab, Systat and other comprehensive programs, the premise being that the one product should satisfy the needs of most data analysis users. Hardware requirements are very modest: Windows 98 through Vista, Pentium processor, 64MB RAM, 65 MB storage.

Statgraphics was introduced 25 years ago as a DOS product, and its salient function then was the integration of statistical graphs to the analysis. A few years later, another product, Execustat, was launched. Not having the advanced statistical analysis capabilities of Statgraphics, its main feature was a statistical "interpreter." The program would explain, in plain English and with minimal jargon, what the user may infer or conclude from the analysis. Neil Polhemus was instrumental in developing both products, and he has combined these selling points into the current Centurion XV incarnation.

Statgraphics comes in two flavors. The Standard version, list price $695, basically covers descriptive statistics, regression analysis and analysis of variance (one way and multifactor). The Professional version, list price $1,295, augments the features of the Standard version by including more regression capabilities (nonlinear, ridge, logistic, Poisson and negative binomial regression, and also general linear models and partial least squares), statistical process control, forecasting time series, design of experiments, multivariate statistical analysis (factor analysis, cluster analysis, principal components, canonical correlations), neural networks and others. I would not recommend the Standard version because the statistical capabilities are a little sparse for the price. For example, the standard version would not be sufficient even for my introductory MBA statistics class because of its inability to do forecasting and quality control. Discounts are available for users of older versions of Statgraphics and competitive products, as well as for academics and students.

Ease of Learning and of Use


I tested Statgraphics on a computer with Vista Ultimate, Office 2007, and 2 GB of RAM. Installation is straightforward. Standalone copies of the product are licensed to individual users, who may install it on more than one computer but only for their personal use. Statgraphics uses Excel's 2003 xls files as the common denominator for file importation. Direct data importation from statistical competitors such as SPSS, Minitab, etc. is not available. Statgraphics seamlessly read all my Excel xls files, but was unable to read Excel 2007 "xlsx" files.

You can start Statgraphics via its optional StatWizard, a menu-driven questionnaire that will help you get to the data, methodology or analysis you want. There are four main choices in StatWizard (see Figure 1): "select analysis based on the type of data," which basically opens a new or existing data set (and takes too many steps to do the latter); "select analysis by name," which has the largest set of options; select a "SnapStat" or select a "Quick Pick." A SnapStat gives a one-page preformatted output, but various additional options for displays or analyses are not available. A Quick Pick shows a shorter list of "commonly used analyses" (as defined by Statgraphics), and if you choose one, you will get more complete output. Of course, you can disable the StatWizard altogether and just choose what you want to do from the main menu, which is what I do.



Figure 1. StatWizard main menu.

Being familiar with other statistical software, my plan was to accomplish as much as I could without consulting the accompanying 287- page User Manual or the Help system in the program. I have a 251-row, 41-column data set culled from Consumer Reports about automobiles. To start, I wanted to find a 95 percent confidence interval for the average vehicle mileage of manufacturers from Asia, Europe and the United States. Since confidence intervals were not an available choice in any of the groups in the StatWizard or on the main menu, I presumed that a "one-variable analysis" would probably include it, along with other descriptive statistic analyses.

Choosing that procedure (found in Describe on the main menu — see Figure 2), the resulting dialog box asked me which column to analyze in one field, and then had a second field called "Select." I chose "origin: and clicked OK, and immediately got an error message telling me that Field 2 (that is, origin) must be numeric. Quickly abandoning my "real pros don't need help," stance, I clicked on Help in the Dialog box. For Select it said "subset selection," which wasn't very helpful. I didn't want to transform the data, but I did deduce, from Excel experience, if I typed in origin="USA," I would at least get the summary statistics for just the American vehicles. However, I did not get confidence intervals. Going to the Help on the main menu, clicking on Contents, and looking in the Index yielded nothing, nor did a search for "confidence intervals."



Figure 2. Choosing the data to analyze mileage, broken down by another variable, origin.
(click here to view a larger version in a separate window)

The Help system did include many PDF "on line manuals" of the various capabilities of the program, and sure enough, on page 20 of the One Sample Analysis manual, confidence intervals for the mean are discussed. But unlike paper manuals (my paper documentation for version 5 of Statgraphics, a DOS program which I still use on my handheld HP computers, consisted of more than a thousand pages), these PDF manuals do not have tables of contents nor indices, and so finding anything specific involves a quest by manually viewing the pages. In Statgraphics' defense, the paper User Manual did indeed have confidence intervals in the index and discussed it as well.

The output shown is a "split screen." Statgraphics calls the left side (see Figure 3) tables (which are the numerical calculations), and the right side graphs. Each table or graph itself is inside a pane. If you right-click in any pane, you can change the specific output or the look of a graph via pane options. For example, in Figure 3, the upper left pane gives the basic one-variable descriptive statistics for the mileage of the American vehicles. Pane options give choices for additional or fewer statistics to be displayed, while pane options in the middle left pane permit you to change the number and breakpoints of the bins of the frequency distribution.



Figure 3. Statgraphics standard output layout. The left side shows the Tools Panes, the right side shows the Graphics Panes.
(click here to view a larger version in a separate window)

Right clicking a graphics pane gives two choices — pane options and graphics options. Thus, for the upper right pane, choosing the former can give you a vertical or a horizontal box plot, show the median notch, etc., while choosing the latter allows you to change the values on the axes, titles, etc. Finally, you can control which tables or calculations and graphs are displayed by clicking on the Tables icon (in Figure 3, the second icon in the upper left side of the picture) or Graphs icon (the third icon, to the right of Tools). For example, by clicking on the Table icon, a menu appears, and I can choose, as part of my one variable summary, confidence intervals (finally!), hypothesis tests, various percentiles or a stem-and-leaf plot. Clicking on the Graphs icon gives choices for a normal plot, a density plot and others.

Every Tools pane includes the numeric analysis followed by the StatAdvisor. The StatAdvisor explains, in plain English, what the calculations say or imply. Think of it as a mini-textbook that explains what, for example, a confidence interval is or what a hypothesis test implies. This typically is generic text but, interspersed with the names of the variables that you analyzed, makes the interpretation that much clearer and gives the sense that the software is talking about your specific data and analysis. While it can never replace a solid knowledge of statistics on your part, it can tell you the correct way of saying or writing what the numbers show.

However, when I did a one sample, upper tail t-test that the average price of a vehicle exceeds $33,000, the p-value was 0.021. Assuming a 5 percent significance level, the StatAdvisor said: "The t-test tests the null hypothesis that the mean Price equals 33000.0 versus the alternative hypothesis that the mean Price is greater than 33000.0. Since the P-value for this test is less than 0.05, we can reject the null hypothesis at the 95.0% confidence level." I suspect that this last sentence refers to a one-sided 95 percent confidence interval, and I wonder if some users might find this a bit confusing. To quote Paul Velleman, et al, "Because confidence intervals are naturally two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an alpha level of 1/2(100 - C)%" [1].

Data Manipulation


Data manipulation is an important part of statistical analysis. We frequently want to analyze some subset of the data. Statgraphics offers the usual mathematical and Boolean operators. However, a true, easy-to-use, if-then-else operator was not available. For example, in my automobile data, I wanted the data set to consider vehicles from either America or Asia, reliability rated either "good" or "fair," American vehicles that have gas mileage between 15 and 25 mpg, and Asian vehicles that have gas mileage between 20 and 25 mpg. My goal was to do analyses of the vehicles in this specific subset of the data. My solution? After much reading and experimenting with the program, went to Excel and easily obtained the necessary subset, imported it into Statgraphics and did the analyses.

Saving the analyses output is easy. Of course, I could have copied and pasted the output into Word, but Statgraphics has StatReporter, a standalone version of WordPad, that generates a well laid-out document (without the panes) in rich text format of the complete output of my session. Thus, you can edit the file in StatReporter or open it in Word for enhanced editing. It worked well, but invoking it is not clear at first. Nothing on the main menu did the job; after a search in the Help system, I learned that you must right-click any pane to invoke StatReporter.

Finally, Statgraphics has a feature called StatFolio. As the User Manual describes it (page 103), "When a session is saved in a StatFolio, it is the definition of the analyses that is saved, not the output. When reopening a StatFolio, the data in the associated data sources is reread and all analyses recalculated." Thus, by opening a StatFolio, you can rerun your previous analyses at a later date, with the benefit that if the data is modified in the interim, the new data is analyzed.

It is obvious that statistical software products do not all have the same look and feel. Getting used to these may be easier for some users and harder for others, the latter especially for those who carry the mental baggage of using other products in the same arena. Thus, besides missing an easy if-then function, I found:

  • Statgraphics' Help system sometimes lacking (solution: read the book!);

  • its main menu layout (i.e., Describe, Compare, Relate instead of the old standby, Analysis) a bit over-specified (solution: read the book!);

  • its pane system initially a little daunting (solution: read the book!);

  • getting the output into Word at first frustrating because nothing in the menu system led to this (solution: read the book!); and

  • its Open icon, which I expected would open a data file but instead opens a StatFolio instead, is nonstandard and slightly frustrating (solution: fuhgetaboutit-the sequence is File, Open, then Open Data Source, or remember the keyboard shortcut, control-F12).

In contrast to these, I also believe:

  • the rest of the menu system and the resulting dialog boxes are easy to understand and navigate;

  • data entry, including a "no-problem" reading of Excel 2003 files, is straightforward;

  • many specific or specialized procedures (some mentioned below) are intuitive and easy to manage; and

  • getting and working with the output simple and uncomplicated.

In summary, I would rate Statgraphics' ease-of-learning "very good" and the ease-of-use "very good to excellent."

Capabilities


Because most readers will want more than what is offered in the Standard version, my comments are based on the Professional version. I tried most of the procedures. Here are some thoughts on a selected few.

Statgraphics does "automatic" forecasting. By this I mean, "you enter the data and the software tells you which forecasting technique is most appropriate." Using various data sets, Statgraphics showed 16 models, with the optimal parameters for each. These included trend analysis curves (linear, exponential, and the simple S-curve), along with Brown's, Holt's, Winters' and several Box-Jenkins models. By default, the Statgraphics tries to minimize the Akaike Information Criterion (AIC), but the user can specify another criterion. I tried an 11-year time series, consisting of monthly values. Figure 4 shows some of the output. Statgraphics recommended a Box-Jenkins procedure, model M. The StatAdvisor went on to tell me that Statgraphics "also summarizes the results of five tests run on the residuals to determine whether each model is adequate for the data. An OK means that the model passes the test. One * means that it fails at the 95 percent confidence level. Two *s means that it fails at the 99 percent confidence level. Three *s means that it fails at the 99.9 percent confidence level. Note that the currently selected model, model M, passes three tests." I found Statgraphics time series forecasting features accurate and easy, and rivals the capabilities of some well-regarded, stand-alone forecasting programs.



Figure 4. The results of a time series forecast analysis.
(click here to view a larger version in a separate window)

Statgraphics also has an extensive set of statistical process control features. Besides the standard variable and attribute control charts, there are time weighted (e.g., exponentially weighted moving average, etc.), multivariate, ARIMA control charts and others. Gage studies, acceptance sampling, capability analysis, Pareto analysis and other SPC techniques are also available. Besides a large feature list, the program is intuitive and easy to use. For example, my class had to analyze hospital admission waiting times for elective procedures with the goal of finding recommended standards. Figure 5 shows the output for the standard X-Bar, Range charts and the StatAdvisor indicating the out of control samples and the various runs. Assuming that you can isolate the assignable causes, you can manually or automatically exclude the out-of-control samples, and the software gives the new control charts, fully connected, and the accompanying numerical results and analyses.



Figure 5. Trying to set process mean and variability standards via X-bar and Range charts.
(click here to view a larger version in a separate window)

Statgraphics can easily and accurately design a single acceptance-sampling plan for attributes for a specified operating characteristic curve. You enter the standard parameters (AQL, LTPD, producer's and consumer's risks, and the lot size) and the program gives the optimal sample size and acceptance number. Since the solution is approximate (it involves the simultaneous solution of two cumulative binomial distributions), I compared the sampling plans that Statgraphics found to those from several other products. They all came up with essentially the same results.

Regression analysis is, perhaps along with descriptive statistics, the mainstay for many users of statistical software. Statgraphics' output is extensive and the StatAdvisor gives succinct but easily understandable interpretations of what the numbers indicate. For example, I particularly liked its description and advice concerning influential points. Finding a prediction interval for a new observation is very easy. You just add rows to the data that specify the values for the independent variables. Statgraphics then assumes these are new observations and gets the prediction intervals. It was not obvious how to change the confidence coefficient for this interval from the default 95 percent, but it can be done (Solution: don't read the book; it's not there. You must specify that Statgraphics show the confidence intervals for the regression coefficients, change the confidence coefficient there, and this will give you the same prediction coefficient).

Nonlinear regression analysis is available. You type in the nonlinear function and Statgraphics will find the optimal parameters. Some nonlinear functions are available from a menu, but these are mostly exponential and logarithmic models. I wanted to do a nonlinear trend analysis, via the Weibull, Gompertz and Pearl-Reed curves, of some sales data, and these specific curves were not available. Typing those equations using Statgraphics' syntax was very tedious.

The ANOVA features and capabilities of Statgraphics are extensive. In particular, the accompanying graphs and the StatAdvisor summaries are very illuminating. For example, the StatAdvisor gave a clear summary of Tukey's HSD procedure, and gave a reason why I should use the Bonferroni procedure instead on my particular data.

After trying Statgraphics' procedures on many data sets, I would rate the program's capabilities and accompanying features "excellent."

To summarize, despite a few learning and using "bumps," I like Statgraphics overall and believe that it competes effectively with the leading statistical software. I did call Tech Support once, about software activation, and was immediately connected to a knowledgeable and friendly person. A 30-day trial version is available, which includes the software program with no data restrictions and PDF versions of all the documentation. I recommend that you try it.

Product Information

Statgraphics Centurion XV is published by Statpoint, Inc.
Address: 2325 Dulles Corner Blvd., Suite 500, Herndon, VA 20171
Phone: 800-232-STAT
URL: www.statgraphics.com

United Kingdom
Wolfram Research Europe Ltd.
10 Blenheim Office Park, Lower Road, Long Hanborough, Oxfordshire, OX29 8RY
Phone: 01993 883400
E-mail: info@wolfram.co.uk

Pricing
One License

List Price:
Standard: $695, Professional: $1,295

Academic (faculty, staff):
Standard $295, Professional: $495

Student:
Six-month, $29.99; 12-month: $49.99; entire college career, $99.99

Upgrade prices from previous versions and certain competitive products available.

Thirty-day trial version on the Web site.

Vendor's Comments

Editor's note: It is the policy of OR/MS Today to allow developers of reviewed software an opportunity to clarify and/or comment on the review article. Following are comments by Neil W. Polhemus, Chief Technology Officer for StatPoint Technologies Inc. who directs the development of the STATGRAPHICS product line.

Jack Yurkiewicz should be commended for his thoughtful review of Statgraphics Centurion XV, for which we thank him. As noted, every statistical package has its unique features, such as our use of panes within splitter windows to organize the tables and graphs created by an analysis. A major goal when designing Statgraphics was to avoid the "window clutter" that you find in other packages, where each individual graph appears in a separate window. We also designed the main menu to correspond with how non-statisticians usually approach data analysis; not by looking for a particular statistical procedure but in terms of what they wish to do, such as "Compare" several sets of data. Users can also switch to the Six Sigma menu, where the menu headings change to "Define," "Measure," "Analyze," "Improve," "Control" and "Forecast."

Jack's comments to "read the book" are thought provoking. After considering his observations, we have decided to create several browser-based tutorials that will instantly introduce new users to the important features of the software (such as our extensive use of the right mouse button). Spending 10 minutes with these tutorials, which will soon be available on our Web site, will help new users avoid learning "bumps."

The StatWizard has one additional feature that Jack may have missed, probably because it is just one radio button on a crowded dialog box. It is a "Search" function that presents a list of more than 750 statistics and tests that Statgraphics calculates. If one searches for "confidence intervals," a list of 41 menu items that calculate confidence intervals of various sorts will appear. If one then clicks on "Subset Analysis," it is easy to calculate the confidence intervals for each type of automobile. Understandably, finding a particular procedure in such an extensive package can be a challenge, and we will implement his suggestion to add indexes to the 160 PDF files that describe each procedure in detail.

One advancement worth mentioning is our new Multilingual Edition, which was not in full release in time to be reviewed. This edition allows users who interact with colleagues overseas to create an analysis in one language and then, with a click of the mouse, switch instantly to another, removing language barriers. The current edition supports English, French, German and Spanish, with Italian and Korean to be released soon.





Jack Yurkiewicz is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. Besides management science, he teaches business statistics, operations management and forecasting. His current interests include developing distance-learning courses for these topics and assessing their effectiveness.

References


  1. "Stats, Data and Models, Second Edition," by De Veaux, Velleman and Bock, Pearson Education, 2008, page 511.





  • Table of Contents
  • OR/MS Today Home Page


    OR/MS Today copyright © 2008 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Rd., Suite 220, Marietta, GA 30060 USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 2008 by Lionheart Publishing, Inc. All rights reserved.