|
OR/MS Today - February 2007 Software Review JMP 6.0.3 Interactive exploratory data and statistical analysis tool meets the statistical needs of virtually any user. By Wayne Holland When the software review editor for OR/MS Today is looking for someone to review software, I usually lock my office door and hide under the desk until I get the all-clear signal. However, when I noticed the editor was looking for someone to review JMP 6.0.3, I volunteered without being asked. Though some of my students might call me insane, there was method in my madness. I teach basic level courses in Quantitative Methods/Business Statistics for MBA students at Cass Business School and am constantly on the lookout for the holy grail of Applied Statistics teaching statistical software that is simple to use, does all the basics with menu-driven "click and point" functionality, does not need hours of scarce teaching time to introduce and doesn't overwhelm students studying statistics for the first time with too much functionality, or rather advance features that, due to layout, they find before meeting the basics. An added bonus is if the software contains lots of useful help options on not only how to use particular tests, but their purpose. Oh, and lots of illustrative tutorials. Oh, and if it can look and feel like a Microsoft Office-type product then that would be helpful for the first-time user. In my experience, there is a psychological benefit for the new user of exploratory data analysis and applied statistics software if the interface feels familiar. I think that's everything but I reserve the right to add more demands later! "JMP desktop statistical discovery software from SAS uses a structured, problem-centered approach to explore and analyze data on Windows, Macintosh and Linux. The intelligent interface guides users to the right analyses. JMP automatically displays graphs with statistics, enabling users to visualize and uncover data patterns that impact their research, development and production activities." (JMP User Guide, 2006) The focus of this review is on the user-friendliness of performing basic statistical analyses in JMP. I do not investigate the more advanced features, though I will summarize them for interested readers. I am approaching this with the following mind-set: Is this better than using Microsoft Excel? There may be times when you need Minitab, SPSS or SAS itself for a specialized analysis, but the majority of management science/operations research users find their more modest requirements can be met with Excel. So, the question is whether there is something in JMP for those users. Although, with the axes rescaling I very quickly ended up with my data off the scale and it took a little while to get it back. I guess I learned a lesson about the need to be more careful with the mouse when playing about with axes in JMP. Features like this impress first-time users of statistical software, encourage a feeling of confidence and desire to explore further. The initial screen of JMP (Figure 1) allows the user to specify how the data to be analyzed is to be entered existing file, keyboard entry, database link, etc. It also lists the types of statistical analyses that are available: Basic, Model, Multivariate, Survival, Graph, Surface, Measure, Control, DOE (Design of Experiments), Tables. The basic user is not going to go much beyond Basic (for single variable analysis, t tests and simple linear regression), Model (for analysis of variance and multiple regression), Multivariate (for correlation coefficients) and Graph (for graphs!). This is useful in being able to limit the areas of the software the basis user needs to investigate.
The Excel data file was read very easily and correctly by JMP. The data then appeared in the JMP spreadsheet format (Figure 3). I was able to start exploratory data analysis immediately, initially without reference to the documentation. I created bar charts, scatterplots, correlation tables, etc. to get a feel for the data. Figure 4 shows a bar chart of mean sales and mean market value categorized by industry sector. This was produced very easily from the Graph/Chart menu (Figure 5).
I selected to present the mean of sales and market value, but as can be seen, I could easily have graphed any of a range of summary statistics. One aspect of this that I was slightly surprised at was that initially I produced a bar chart with three bars in each category sales, market value and profit. The graph looked a little cluttered and the profit bars were all very small in comparison to the others, so I wanted to remove them. I would have thought I ought to be able to edit the graph directly to do this; unfortunately I wasn't able to. It may be a possible option, but it wasn't easily apparent to me. Of course, returning to the original form and starting again is not hard work, but given the emphasis on interactivity between data and graphs, I would have thought this feature would be present. The scatterplots and correlation matrix for the four quantitative variables sales, assets, profits and market value (Figure 6) was produced in seconds. It was similarly easy to produce all the usual summary statistics (mean, standard deviation, median, quantiles, range, etc).
Next I wanted to experiment with two specific statistical analyses:
Multiple Regression There is a simple stepwise regression procedure available for guidance on independent variable selection. However, for illustration, I created directly the multiple regression model: Market Valuei = _ + _1 Salesi + _2 Assetsi + _i The default output for the least squares fitting tool is the Summary of Fit, Analysis of Variance, Lack of Fit and Parameter Estimates tables shown in Figure 7. By clicking the red "hotspot" button next to Response Market Value at the top, I was able to enter various additional menu options to investigate model validation. I found very easily the residual by predicted plot I wanted to check for heteroscedasticity and the Durbin-Watson statistic to check autocorrelation. However, I also wanted some sort of test for normality of residual.
Now I'm flexible here I don't mind Chi-squared, Q-Q plot, Lillifors test, Anderson-Darling or any of the scores of ways of testing normality; I just want to see some sort of output. Again, there may very well be easier ways of doing this, but I resorted to requesting a column of residuals be added to the dataset and then went into the Distributions option to create a normal quantile plot (Figure 8). Easy enough to do, but I would have thought this should be tagged on to the regression output, as it's such a common part of regression model validation.
To do this I had to choose Column/Create Column (Figure 10) and then create a function to determine the values for this new column. This then revealed a structured box (Figure 11) into which I had to click and place various items from another list, such as "Conditional" for an if statement to appear and "country" for country to appear as the expression in the if statement. Personally, and very subjectively, I dislike this kind of operation. I'm happy with menu-based systems if the point-and-click option genuinely saves me time and makes production of output simpler.
Once the user gets down to needing to create expressions/equations, I prefer to enter them directly into a cell, in the Excel format. Indeed, in Excel, to set up and copy the relevant if statement took literally a few seconds. To do the same thing here going through the various menus took considerably longer and then it can't all be done by point-and-click approach. I still needed to edit the formula and enter '=="US"' as part of the test condition to determine if the country was the United States or not. In my opinion, if it is necessary to do any typing, it is better to allow free format input from a much higher level. The resulting output for my analysis is shown in Figure 12. A good, useful format containing everything I wanted to know.
I dwelt here perhaps a little longer than I should. Playing with the options felt a little like a glamour photographer coaxing the best position out of their model "turn towards me and hold it. That's great, darling!" On a serious note, though, tools like this make exploratory data analysis fun and for new students, or indeed any potential user who is perhaps feeling a little trepidation about getting a handle on their data set, fun is important to coax them into deeper usages. Now, excuse me while I play with the "Cowboy hat plot." I've really only scratched the surface of what JMP will do. It does much, much more with the same menu-driven, interactive approach. Design of experiments, statistical process control, a vast array of 2- and 3-D plotting tools, survival model, principal components analysis, cluster analysis, time series, neural networks, non-linear models, generalized linear model, etc. I would venture to suggest that JMP would meet the statistical analytical needs of all but the most sophisticated, specialized user and even they would probably be able to make good use of JMP using the scripting tools available. Besides the installation guide, and 136-page introductory guide (which in practice was all I really needed to get started to the level of my analysis), there is also a 485-page User Guide, a 906-page Statistics and Graphics tome, 243-page manual on Design of Experiments and 576 pages on Scripting. Laid on top of each other the manuals are 8" thick. I referred to about 1/2-inch worth. But there's more! The user also gets handy JMP Menu Description and Quick Reference Cards: 8.5" x 5" cards containing key instructions. I found them very useful. All of this documentation is also available as pdf files installed with the software for those who like online documentation. Of course, the documentation is not only about how to use JMP, but also acts as a teaching guide to the techniques themselves. From that perspective, I would say the quality of the material is high and well thought out. Its position in the market in terms of functionality is well above Excel. It is much more a competitor of SPSS and Minitab. Personally, I found it a little easier to use than SPSS in that it matched my intuition a little more closely. But you may be different!
Wayne Holland is an associate professor (senior lecturer) in Operations Research at Cass Business School, City University London. He teaches quantitative methods and management science to undergraduate, MBA and Executive MBA students. His research interests are in design and analysis of simulation models to investigate risk-related issues, particularly operational risk in banking and supply chain risk. References
OR/MS Today copyright © 2007 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2007 by Lionheart Publishing, Inc. All rights reserved. |