|
OR/MS Today - August 2007 Software Review Minitab 15 One of the "big beasts" of statistical computing capable of much more than basic use. By Wayne Holland I recently completed a software review for OR/MS Today of JMP 6.0.3, so I thought it would be interesting to write a comparative review for Minitab 15 a statistical package versus a statistical package. The focus of my review of JMP was on the user-friendliness of performing basic statistical analyses. I did not investigate the more advanced features because I was interested in considering whether JMP was a good option for the management science/operations research professional with either some data to get a handle on or to perform some basic statistical analysis on. I was mainly interested in how easy is it to get something meaningful out. This, therefore, sets the tone of my analysis of Minitab. The software review editor tells me that Minitab is almost synonymous with Six Sigma and is heavily favored by practitioners JMP is a relative newcomer to Six Sigma but Six Sigma is not a part of my review. I am sure readers are aware that Minitab is a long-established standard in the statistical analysis business and capable of much more than the basic use. My last encounter with Minitab was 20 years ago as an undergraduate working on analyzing Box-Jenkins forecasting problems on a mainframe computer. All those DOS-type commands one had to type, such as COPY C1 C2 and ARIMA (1,1), come back to me as a ghost from a past-life when I used to be good at statistics! These days, desktop packages strive for menu-driven smoothness, and I was pleased to see that the new Minitab is no exception. Minitab is one of the big three (along with SAS and SPSS) in the statistical computing business. My question is: Is Minitab best left to the heavyweight statistical user, or does it have something to offer everyone else as well?
Access to all graphical tools, such as histograms, scatterplots and 3-D surface plots, is via the single-menu item "Graph." Similarly, all statistical analyses are stored under the general menu item "Stat." Clicking on this opens up a sub-menu offering the following statistical analyses: basic statistics, regression, anova, DOE, control charts, quality tools, reliability/survival, multivariate, time series, tables, non-parametrics, EDA, power and sample size. It is a neat, logical arrangement to allow self-contained areas of statistics to be explored without needs to understand everything before being able to make sensible progress. Example data. To investigate using Minitab, I used exactly the same data file that I used for the JMP review. It is a data file I created for student coursework. The file is an Excel spreadsheet containing the Forbes Global 2000 companies as of Sept. 20, 2005. I took the data from www.forbes.com. Figure 2 shows the first 11 rows, showing the top 10 companies and the data collected by Forbes to produce the ranking. There are four quantitative variables (sales, assets, profit and market value) and two categorical variables (country and industry sector/category). For the JMP review, I performed some exploratory data analysis, produced scatterplots and correlations, and then performed a multiple regression with validation and a hypothesis test that required the creation of new "flag" variable to separate out data stacked in a single column. Finally, I investigated 3-dimensional plotting facilities. The intention is to repeat these exercises here and compare the ease of production and the quality of the final result.
The Excel data file appeared to be read in very easily by Minitab. It is displayed in Figure 3. For columns read in as text, "-T" is appended to the column heading. This is a useful confirmation that the data has been read in correctly. However, when I started to attempt analyses involving the Category column, C4-T, error messages were produced saying there were unequal numbers of observations in each column. I scanned down the rows and all columns appeared to stop at row 2,000. However, it finally emerged that there was a stray entry 37 rows below the end of the data set. This was careless on my part, but I was somewhat annoyed that Minitab did not fill in rows 2,001 to 2,036 with "*" to indicate that it thought there were missing values in these rows. This is what Minitab does with missing values in a data set. The fact that Minitab did not fill in these rows indicated to me that it did not consider them part of the data set and hence there should have been no problem! Also, a column of "*" beyond row 2,000 would certainly have helped me flag up this issue in less time than I wasted on it.
It is reasonably intuitive to perform basic exploratory data analysis immediately, without reference to the documentation. I created bar charts, scatterplots and summary statistics to get a feel for the data. However, one feature I didn't like was in the production of a bar chart of mean sales and mean market value categorized by industry sector. What I wanted was the sectors listed across the horizontal axis, with two bars at each category to represent the relevant mean sales and mean market value. What I got was Figure 4, which is a bar chart of mean sales by category followed by a bar chart of mean market value. This required a two-stage process displayed in Figure 5 and Figure 6. It may very well be possible to produce the result I was looking for, but it is certainly not easy to find from the options offered, nor by reference to the user guide. [Editor's note: According to Jay Aubuchon, product manager at Minitab, choosing "Graph variables displayed innermost on scale" would produce the desired result in the dialog box shown in Figure 6.]
At this point, I also came across another feature I didn't like: the lack of interactivity on graph manipulation. I was expecting to be able to grab axes and elongate or shrink them. However, they were entirely fixed. I could reduce the size of the box in which the chart was presented, but I couldn't enlarge it. I could make these changes by calling up the relevant menu items for re-scaling and typing in new values, but this seems very restrictive and old-fashioned in comparison with JMP. It is obviously a relic of Minitab's heritage as a mainframe computer package, but this sort of issue should be dealt with in the transference to PC package. [Editor's note: According to Aubuchon, this is a consequence of Minitab's choice to edit graphs like Excel, and has nothing to do with heritage.] The production of scatter-plots and correlation matrix for the four quantitative variables sales, assets, profits and market value (Figure 7) also surprised me. Rather than offering me a default option of correlating all variables against all others, I had to fill in the table, shown in the center of Figure 8, identifying which pairs I required to view. This seems a rather cumbersome way to proceed. [Editor's note: According to Aubuchon, Graph > Matrix Plot would produce the desired result.] The required correlation matrix, with associated p-value below each correlation, was added to the Session window. The scatter-plots were created in a separate chart window.
Next I wanted to experiment with two specific statistical analyses:
Multiple regression. There are various regression options easily accessible via the Stat ... Regression Menu, such as stepwise, partial least squares and various logistic regression methods. However, for illustration, I created directly the multiple regression model: Market Valuei = _ + _1 Salesi + _2 Assetsi + _i This is completed very intuitively and with little effort. The result is shown in Figure 9, which gives not only the model but also all the validation information required, such as test for normality of residuals, durbin-watson to test for autocorrelation and scatterplot of residuals against fitted for heteroscedasticity. This is a well-handled, strong aspect to Minitab, and better, in my view, than the two-stage process required in JMP.
Hypothesis test. I was interested to see whether there was a difference in average profitability for U.S. firms compared to non-U.S. firms in the Forbes 2000 ranking. This required setting up a new column with a "flag" variable which contains either "Y" or "N" (or any bi-value pair) to represent "U.S. company" or not. This would allow the data in the Profit column to be divided into the two relevant data sets. This was fairly intuitive to achieve (it didn't require me to look in help anyway!). Via the Editor ... Formula ... Assign Formula to Column options, Figure 10 was produced showing a form to fill in to calculate the new column. The layout of this form makes it fairly obvious how to set up the necessary IF condition. Anyone who has ever used an IF statement in Excel will have no problems with this feature. The required analysis follows easily (Figure 11).
3-D plots. Finally, I was interested in creating a 3-dimensional plot. Maybe not the most essential example, but given the data I was working with, I decided to create a plot of market value as a function of assets and sales (Figure 12). This allows for direct comparison with JMP. The surface plotting tool is perfectly adequate, but it doesn't have the interactivity of JMP. You can right-click on an axis and get a form to adjust scale (Figure 13), or right-click on the graph and get a form that allows control of Graph Attributes, Graph Size, Figure Location and Figure Attributes. In JMP, all this is done by click and point at the figure with the mouse. It doesn't materially make much difference to the final product, it's just more fun getting there!
Advanced Features
New Features of Minitab 15
Conclusion
Wayne Holland is an associate professor (senior lecturer) in operations research at Cass Business School, City University, London, U.K. He teaches quantitative methods and management science to undergraduate, MBA and Executive MBA students. His research interests are in design and analysis of simulation models to investigate risk-related issues, particularly operational risk in banking and supply chain risk. References
OR/MS Today copyright © 2007 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2007 by Lionheart Publishing, Inc. All rights reserved. |