Recently, I have noticed that companies are asking about SPECjAppServer2004 results for our Enterprise Application Platform. While this isn't entirely new, it seems like it is happening more. In most cases prospective customers don't actually understand that what they are asking for isn't necessarily relevant to their environment. So, exactly what's wrong with this benchmark?
Well, before answering that question directly, let's first talk about benchmarks in general. Industry standard benchmarks are created through a consensus process within the respective organization. They involve the vendors that all have a stake in the outcome, so they all bring their agenda that includes their products strengths (they also try to avoid their products weaknesses). Of course, no one vendor gets everything they want, but they are not independently created without regard for any individual vendors products. The vendors have influence, and that skews the benchmark from the beginning. Besides the vendor influences in the creation of benchmarks, the actual implementation of the benchmarks is not very realistic to a business application. At least not any business application that I have ever written or seen.
I was in IT, developing custom business applications for over 21 years, before joining JBoss (now the middle-ware business unit of Red Hat), and I can tell you from experience, that benchmarks do not reflect real world business applications. In fact, they rarely have anything but trivial business logic in them. They also don't reflect the technology that gets used by customers. They reflect the technology that is either in a specification, or the technology that particular vendors would like to push. This is especially true, when the vendors don't have other alternatives of their own.
The other problem with benchmarks, in general, is that the numbers that any vendor creates with them will not translate directly to anything meaningful in your business. Is something that holds no direct meaning to your business a criteria that you should use in making a decision?
After contemplating my last, albeit rhetorical question, let's get back to the initial question. So, what is wrong with the SPECjAppServer2004 benchmark?
The SPECjAppServer2004 benchmark, suffers from all the ills that almost all benchmarks suffer from. First, it looks a lot like TPC-C, as it appears to be modeled after it, but written in Java, using J2EE 1.4 technologies. So, its not a realistic representation of a business application. Second, its business logic is trivial, and in no way compares to the typical complexities of business logic in real world applications.
All the real-world applications that I have worked on had millions of lines of business logic, along with millions of lines of code that were more technical in nature, for interfaces to other systems, persistence, transaction processing, etc. Third, it is clear that the author, or authors, of this benchmark have never written a business application in their lives. Or if they had, it would have been a very poor one indeed. The code uses floating point numbers to represent dollars and cents! Ouch! Real world business applications written in Java would use the BigDecimal class to represent dollars and cents. Using native types certainly will make the benchmark faster, than using BigDecimal, which uses arbitrary precision arithmetic, but it will create results that aren't correct. The more complex the real world calculations are in your business application, the larger the calculations errors will be. For example, I developed an application where there was complex discounting schemes with discount percentages carried out to four decimal places. The calculation had to be applied back to the original charges (not just a total), and in doing that you had to go back over every detail charge (could be millions and millions per account). In order to do that you had to truncate values, and roll the remainder down, and only round at the end, so that it came out correctly, and no matter what kind of slicing and dicing you did, reporting wise, the totals would always foot. You simply cannot do those kind of calculations with floating point numbers with any accuracy at all! Finally, the benchmark is heavily dependent on EJB 2.x Container Managed Persistence (CMP).
This is the perfect example of a benchmark utilizing technology that is not relevant to customers. With the extreme limitations of EJB QL for EJB 2.x CMP, there aren't many real world applications that can use CMP. In our own customer based the vast majority have turned to alternative ORM technology like Hibernate. We also know that this is true in WebSphere and Weblogic shops as well. I think its also illustrated by the fact that BEA (years before the acquisition by Oracle), announced support of Hibernate with Weblogic! Do you think they would do that, if their customers were using CMP? Just like us, their customer base turned their back on EJB 2.x entity beans with CMP a long time ago.
When you wrap all these things up, what value does a SPECjAppServer2004 result actually provide? Well, I think we can safely say that it doesn't provide any value. In my experience, the best thing that any customer can do is run their own application against the various middle-ware platforms, and compare those results. It is what I always did, when I was in IT. Industry standard benchmarks might be a tempting short-cut, but in this case, it really isn't going to tell you anything meaningful.
There simply is no substitute from seeing your own workload running on the potential solutions!