OR/MS Today - February 2007



Software Review


JMP 6.0.3

Interactive exploratory data and statistical analysis tool meets the statistical needs of virtually any user.

By Wayne Holland


When the software review editor for OR/MS Today is looking for someone to review software, I usually lock my office door and hide under the desk until I get the all-clear signal. However, when I noticed the editor was looking for someone to review JMP 6.0.3, I volunteered without being asked. Though some of my students might call me insane, there was method in my madness.

I teach basic level courses in Quantitative Methods/Business Statistics for MBA students at Cass Business School and am constantly on the lookout for the holy grail of Applied Statistics teaching — statistical software that is simple to use, does all the basics with menu-driven "click and point" functionality, does not need hours of scarce teaching time to introduce and doesn't overwhelm students studying statistics for the first time with too much functionality, or rather advance features that, due to layout, they find before meeting the basics. An added bonus is if the software contains lots of useful help options on not only how to use particular tests, but their purpose. Oh, and lots of illustrative tutorials. Oh, and if it can look and feel like a Microsoft Office-type product then that would be helpful for the first-time user. In my experience, there is a psychological benefit for the new user of exploratory data analysis and applied statistics software if the interface feels familiar. I think that's everything — but I reserve the right to add more demands later!

"JMP desktop statistical discovery software from SAS uses a structured, problem-centered approach to explore and analyze data on Windows, Macintosh and Linux. The intelligent interface guides users to the right analyses. JMP automatically displays graphs with statistics, enabling users to visualize and uncover data patterns that impact their research, development and production activities." (JMP User Guide, 2006)

The focus of this review is on the user-friendliness of performing basic statistical analyses in JMP. I do not investigate the more advanced features, though I will summarize them for interested readers. I am approaching this with the following mind-set: Is this better than using Microsoft Excel? There may be times when you need Minitab, SPSS or SAS itself for a specialized analysis, but the majority of management science/operations research users find their more modest requirements can be met with Excel. So, the question is whether there is something in JMP for those users.

Installation


I have written a couple of software reviews for OR/MS Today (Holland, 2003, 2005) over the past few years and unfortunately, I have always had installation difficulties. Wondering whether it is my incompetence or the increasingly fiendish cleverness of security features of modern software, I approached the installation of JMP with some trepidation. However, I need not have worried. The installation procedure could not be more simple or helpful. I had to click an "OK" button about three times before the software was installed and ready for action. It does require registration by Internet, e-mail or phone for the process to be completed, but the user has 30 days usage of the full and unrestricted package without registration. This is a model of how software installation ought to be structured.

Basic Use of JMP


JMP opens with a "Tip of the Day" — a little how-to tutorial on some basic usage feature. The first one explains the role of Red and Blue Triangles, basically contextual menu and maximize/minimize output section buttons. On the first use of JMP, the user is encouraged to work through the Beginner's Tutorial. This takes 10 minutes and is very useful to get a feel for JMP. It provides interactive exercises demonstrating how menus are structured, what is "clickable" and how to tell it is so, and a pleasing demonstration of the interactivity of graphical display and data table. By clicking on a point (or set of points) in a graph the relevant values in the data set are highlighted. Rolling the mouse over the axes of a graph allows it to be rescaled interactively.

Although, with the axes rescaling I very quickly ended up with my data off the scale and it took a little while to get it back. I guess I learned a lesson about the need to be more careful with the mouse when playing about with axes in JMP. Features like this impress first-time users of statistical software, encourage a feeling of confidence and desire to explore further.

The initial screen of JMP (Figure 1) allows the user to specify how the data to be analyzed is to be entered — existing file, keyboard entry, database link, etc. It also lists the types of statistical analyses that are available: Basic, Model, Multivariate, Survival, Graph, Surface, Measure, Control, DOE (Design of Experiments), Tables. The basic user is not going to go much beyond Basic (for single variable analysis, t tests and simple linear regression), Model (for analysis of variance and multiple regression), Multivariate (for correlation coefficients) and Graph (for graphs!). This is useful in being able to limit the areas of the software the basis user needs to investigate.

Operations Research / Management Science Today

Figure 1: Initial screen of JMP.

Example Data


To investigate using JMP, I selected a data file I created for student coursework last year. The file is an Excel spreadsheet containing the Forbes Global 2000 companies as of Sept. 20, 2005. The data was collected from www.forbes.com. Figure 2 shows the first 11 rows, showing the top 10 companies and the data collected by Forbes to produce the ranking. There are four quantitative variables (sales, assets, profit and market value) and two categorical variables (country and industry sector/category). As part of the coursework assessment, I asked the students to explore the data and pick out any interesting features or relationships. This is the task I set myself, hoping that I'd be able to do at least as well as my students, who analyzed it using Excel.

Operations Research / Management Science Today

Figure 2: Sample of data from the Excel worksheet imported into JMP.

The Excel data file was read very easily and correctly by JMP. The data then appeared in the JMP spreadsheet format (Figure 3). I was able to start exploratory data analysis immediately, initially without reference to the documentation. I created bar charts, scatterplots, correlation tables, etc. to get a feel for the data. Figure 4 shows a bar chart of mean sales and mean market value categorized by industry sector. This was produced very easily from the Graph/Chart menu (Figure 5).

Operations Research / Management Science Today

Figure 3: Excel data imported into JMP.

Operations Research / Management Science Today

Figure 4: Bar chart of mean salary and mean market value by industry sector.

Operations Research / Management Science Today

Figure 5: Creation of Figure 4 in JMP. A variable is selected from "Select Columns." The required statistic for that variable is selected from "Statistics." In this case, mean sales have already been selected and mean of market values is about to be added to it.

I selected to present the mean of sales and market value, but as can be seen, I could easily have graphed any of a range of summary statistics. One aspect of this that I was slightly surprised at was that initially I produced a bar chart with three bars in each category — sales, market value and profit. The graph looked a little cluttered and the profit bars were all very small in comparison to the others, so I wanted to remove them. I would have thought I ought to be able to edit the graph directly to do this; unfortunately I wasn't able to. It may be a possible option, but it wasn't easily apparent to me. Of course, returning to the original form and starting again is not hard work, but given the emphasis on interactivity between data and graphs, I would have thought this feature would be present.

The scatterplots and correlation matrix for the four quantitative variables sales, assets, profits and market value (Figure 6) was produced in seconds. It was similarly easy to produce all the usual summary statistics (mean, standard deviation, median, quantiles, range, etc).

Operations Research / Management Science Today

Figure 6: Scatterplots and correlation matrix for sales, profits, assets and market value.

Next I wanted to experiment with two specific statistical analyses:

  1. A multiple regression explaining market value in terms of sales and assets and testing three of the assumptions of the linear model — autocorrelation, normality of residuals and homoscedasticity of residuals.

  2. A hypothesis test to investigate whether there was a significant difference between average U.S. and non-U.S. company profits.

Multiple Regression


There is a simple stepwise regression procedure available for guidance on independent variable selection. However, for illustration, I created directly the multiple regression model:

Market Valuei = _ + _1 Salesi + _2 Assetsi + _i

The default output for the least squares fitting tool is the Summary of Fit, Analysis of Variance, Lack of Fit and Parameter Estimates tables shown in Figure 7. By clicking the red "hotspot" button next to Response Market Value at the top, I was able to enter various additional menu options to investigate model validation. I found very easily the residual by predicted plot I wanted to check for heteroscedasticity and the Durbin-Watson statistic to check autocorrelation. However, I also wanted some sort of test for normality of residual.

Operations Research / Management Science Today

Figure 7: Multiple regression output for Market Valuei = _ + _1 Salesi + _2 Assetsi + _i

Now I'm flexible here — I don't mind Chi-squared, Q-Q plot, Lillifors test, Anderson-Darling or any of the scores of ways of testing normality; I just want to see some sort of output. Again, there may very well be easier ways of doing this, but I resorted to requesting a column of residuals be added to the dataset and then went into the Distributions option to create a normal quantile plot (Figure 8). Easy enough to do, but I would have thought this should be tagged on to the regression output, as it's such a common part of regression model validation.

Operations Research / Management Science Today

Figure 8: Normal quantile plot of residuals of model in Figure 7.

Hypothesis Test


I was interested to see whether there was a difference in average profitability for U.S. firms compared to non-U.S. firms in the Forbes 2000 ranking. Finding the Y by X analysis form in the Basic menu was easy (Figure 9). However, performing a one-way analysis of Profit as the response (Y) against Country as the factor (X) would not work because, of course, the Country variable is not set up as "US"/"Non-US," but lists more than 50 different countries. I wanted to group all the non-U.S. countries into a single non-U.S. category. I'm sure there must be a simpler menu-driven way of doing this, but I couldn't find it. Instead I resorted to what I would do if I were working in Excel: create an extra column of data headed US?, which contains either "Y" or "N" (or any bi-value pair) and then perform the analysis using this column as the factor (X).

Operations Research / Management Science Today

Figure 9: Fit Y by X menu for performing a t-test comparing mean profits of U.S. and non-U.S. companies. This analysis won't work because "Country" contains 50 different countries and not a U.S./non-U.S. bi-value.

To do this I had to choose Column/Create Column (Figure 10) and then create a function to determine the values for this new column. This then revealed a structured box (Figure 11) into which I had to click and place various items from another list, such as "Conditional" for an if statement to appear and "country" for country to appear as the expression in the if statement. Personally, and very subjectively, I dislike this kind of operation. I'm happy with menu-based systems if the point-and-click option genuinely saves me time and makes production of output simpler.

Operations Research / Management Science Today

Figure 10: First stage of creating a new column called US?, which will contain "Y" or "N" depending on whether the country is the U.S. or not.

Operations Research / Management Science Today

Figure 11: Creation of a conditional statement to set up the new column in Figure 11. Note that the "if" and "country" have to be selected from menus above, but the =="US", "Y" and "N" needs to be typed.

Once the user gets down to needing to create expressions/equations, I prefer to enter them directly into a cell, in the Excel format. Indeed, in Excel, to set up and copy the relevant if statement took literally a few seconds. To do the same thing here going through the various menus took considerably longer — and then it can't all be done by point-and-click approach. I still needed to edit the formula and enter '=="US"' as part of the test condition to determine if the country was the United States or not. In my opinion, if it is necessary to do any typing, it is better to allow free format input from a much higher level. The resulting output for my analysis is shown in Figure 12. A good, useful format containing everything I wanted to know.

Operations Research / Management Science Today

Figure 12: Output for the hypothesis test on whether or not company mean profits are the same for U.S. and non-U.S. companies.

3-D Plots


Finally, I was interested in creating a 3-dimensional plot. Maybe not the most essential example, but given the data I was working with, I decided to create a plot of market value as a function of assets and sales (Figure 13). This surface plotting tool is wonderful. It is simple to initiate and provides great interactive experimentation. By sliding bars on the zoom and rotate menus, the surface is interactively re-presented. The dark and light circles around the edge of the plot in Figure 13 are "lights" that can be turned on or off to give lighting and shading to particular perspectives of the surface.

Operations Research / Management Science Today

Figure 13: Surface plot of market value against assets and sales.

I dwelt here perhaps a little longer than I should. Playing with the options felt a little like a glamour photographer coaxing the best position out of their model — "turn towards me and hold it. That's great, darling!" On a serious note, though, tools like this make exploratory data analysis fun — and for new students, or indeed any potential user who is perhaps feeling a little trepidation about getting a handle on their data set, fun is important to coax them into deeper usages. Now, excuse me while I play with the "Cowboy hat plot."

Advanced Features


I started this review with a vague impression that JMP was a basic tool for exploratory data analysis. A swish, upmarket version of Excel. It is much more than that and is much more fairly compared with SPSS or Minitab. Indeed, in the look of the spreadsheet into which you enter data, and the creation of output reports for requested analysis, it reminded me very much of the feel of SPSS, but perhaps a little sleeker.

I've really only scratched the surface of what JMP will do. It does much, much more with the same menu-driven, interactive approach. Design of experiments, statistical process control, a vast array of 2- and 3-D plotting tools, survival model, principal components analysis, cluster analysis, time series, neural networks, non-linear models, generalized linear model, etc. I would venture to suggest that JMP would meet the statistical analytical needs of all but the most sophisticated, specialized user — and even they would probably be able to make good use of JMP using the scripting tools available.

Quality of Documentation


A lot of modern software comes with online support only. Environmentally sound, for sure, but for old-fashioned users like me, the comfort of exploring new software with a big, thick manual in hand can't be beaten. JMP certainly provides a lot of comfort from that perspective!

Besides the installation guide, and 136-page introductory guide (which in practice was all I really needed to get started to the level of my analysis), there is also a 485-page User Guide, a 906-page Statistics and Graphics tome, 243-page manual on Design of Experiments and 576 pages on Scripting. Laid on top of each other the manuals are 8" thick. I referred to about 1/2-inch worth. But there's more!

The user also gets handy JMP Menu Description and Quick Reference Cards: 8.5" x 5" cards containing key instructions. I found them very useful. All of this documentation is also available as pdf files installed with the software for those who like online documentation. Of course, the documentation is not only about how to use JMP, but also acts as a teaching guide to the techniques themselves. From that perspective, I would say the quality of the material is high and well thought out.

Comparison with JMP 5


I am a new user to JMP, so am not familiar with what has changed between versions 5 and 6. Usually upgraded software comes with some advertisement pronouncing the improvements and benefits in the software over the past version. This seems to be absent from both the documentation and JMP's Web site. There is a mention of "improved functionality," but what that is I cannot shed further light on.

Conclusion


JMP by SAS is an interactive exploratory data and statistical analysis tool. It has a smooth and professional feel and will meet the statistical needs of virtually any user. It is fairly easy to get into and produce meaningful output. However, its power probably means that the genuinely new user of statistical analytical software will need some guidance about which elements can be left unexplored for a while.

Its position in the market in terms of functionality is well above Excel. It is much more a competitor of SPSS and Minitab. Personally, I found it a little easier to use than SPSS in that it matched my intuition a little more closely. But you may be different!

Figure 1:

Product Information

JMP is available from SAS Institute, Inc.
Address: SAS Campus Drive, Cary, NC 27513
Phone: 919-677-8000
Fax: 919-677-4444
E-mail: jmpsales@jmp.com
URL: www.jmp.com
JMP is also available worldwide through SAS country office.
 
Pricing:*
Professional version (JMP 6.0.3):
Commercial: $1,195 perpetual (includes full documentation and 1-year SAS tech support)
Academic: $595 perpetual (includes full doc and 1-year SAS tech support)
Student: $29.95 (6 months), $49.95 (12 months) from www.e-academy.com (U.S. only)
 
Student version (JMP 6 Student Edition):
$29.95 (suggested price, 12 months; packaged with textbooks from major publishers)
* Pricing is for U.S. only. Pricing may vary; check with your local SAS office for more information.
 
JMP 6 system requirements
Windows
OS: Windows XP, Windows 2000, Windows NT 4.x with Service Pack 6
CPU: Pentium II or equivalent processor
RAM: 128 MB minimum, 256+ MB recommended
Drive Space: 110 MB minimum
Browser: Microsoft Internet Explorer 5.01 or higher
Database: UNICODE compliant ODBC 3.0 or higher (required only if connecting to database)
 
Macintosh
CPU and OS: PowerPC G3, G4 or G5 Processor with Mac OS X 10.3 or higher; Intel processors with Mac OS X 10.4.8 or higher
RAM: 128 MB minimum
Drive Space: 110 MB minimum
Database: UNICODE compliant ODBC 3.0 or higher (required only if connecting to database)
 
Linux
OS: Red Hat 9.0, Fedora Core 1 or higher; SuSE 9.0, 9.1, 9.2, 9.3; Mandrake 9.0, 9.1; Red Hat Advanced Server 3.0 or higher
Kernel: Linux kernel 2.4.20 or higher; compatible with the KDE and Gnome desktop environments
CPU: Pentium II or equivalent processor
RAM: 128 MB minimum, 256+ MB recommended
Drive Space: 110 MB minimum
Database: UNICODE compliant ODBC 3.0 or higher (required only if connecting to database)

Vendor Comments

Editor's note: It is the policy of OR/MS Today to allow developers of reviewed software an opportunity to clarify and/or comment on the review article. Following are comments by Curt Hinrichs, manager, JMP Academic Programs.

We love it when our users enjoy using our product. Professor Holland's analogy — that viewing JMP graphs is like coaxing the best position out of a model during a photo shoot — describes the beauty of the interactivity inherent in JMP very well.

Your readers may also be interested to know that we will be launching JMP 7 in 2Q 2007. If Wayne thought version 6.0.3 was impressive, his socks will be knocked off by the new and enhanced interactive graphics of JMP 7. Some highlights to look for: the new animated bubble plot that shows data in many dimensions; the new data filter for dynamic selection of data and the ability to animate graphs; and an updated scatterplot matrix and three-dimensional scatterplot. JMP 7 will also feature improved integration with SAS allowing JMP to run or create SAS programs and import data for exploration seamlessly.

Another new development at SAS is the launch of JMP 6 Student Edition in February 2007. JMP 6 Student Edition is a streamlined version for first year statistics courses in business (both undergraduate and MBA) and engineering among others. JMP 6 Student Edition is available from textbook publishers at a nominal cost when packaged with your preferred textbook. For more information, visit our Web site at www.jmp.com.

A complete series of JMP training courses are available through SAS Education. For first-time users, a free webinar is offered every Friday at 1 p.m. eastern time. Visit our Web site for schedules and detailed information.

Whether you're considering data analysis software for a statistics course or for large-scale commercial applications, JMP was developed by SAS to provide users with both robust statistics and a powerful exploratory and data visualization tool.





Wayne Holland is an associate professor (senior lecturer) in Operations Research at Cass Business School, City University London. He teaches quantitative methods and management science to undergraduate, MBA and Executive MBA students. His research interests are in design and analysis of simulation models to investigate risk-related issues, particularly operational risk in banking and supply chain risk.

References


  1. Holland, W., 2003, "Software Review: @Risk Version 4.5 Pro," OR/MS Today, Vol. 30, No. 1, pp.52-55.
  2. Holland, W., 2005, "Software Review: Crystal Ball v 7.0.1 Professional," ORMS Today, Vol. 32, No. 2, pp. 54-57.
  3. "JMP 6 Introductory Guide," 2006, provided with software.





  • Table of Contents
  • OR/MS Today Home Page


    OR/MS Today copyright © 2007 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Rd., Suite 220, Marietta, GA 30060 USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 2007 by Lionheart Publishing, Inc. All rights reserved.