Blog: Dan E. Linstedt Subscribe to this blog's RSS feed!

Dan Linstedt

Bill Inmon has given me this wonderful opportunity to blog on his behalf. I like to cover everything from DW2.0 to integration to data modeling, including ETL/ELT, SOA, Master Data Management, Unstructured Data, DW and BI. Currently I am working on ways to create dynamic data warehouses, push-button architectures, and automated generation of common data models. You can find me at Denver University where I participate on an academic advisory board for Masters Students in I.T. I can't wait to hear from you in the comments of my blog entries. Thank-you, and all the best; Dan Linstedt http://www.COBICC.com, danL@danLinstedt.com

About the author >

Cofounder of Genesee Academy, RapidACE, and BetterDataModel.com, Daniel Linstedt is an internationally known expert in data warehousing, business intelligence, analytics, very large data warehousing (VLDW), OLTP and performance and tuning. He has been the lead technical architect on enterprise-wide data warehouse projects and refinements for many Fortune 500 companies. Linstedt is an instructor of The Data Warehousing Institute and a featured speaker at industry events. He is a Certified DW2.0 Architect. He has worked with companies including: IBM, Informatica, Ipedo, X-Aware, Netezza, Microsoft, Oracle, Silver Creek Systems, and Teradata.  He is trained in SEI / CMMi Level 5, and is the inventor of The Matrix Methodology, and the Data Vault Data modeling architecture. He has built expert training courses, and trained hundreds of industry professionals, and is the voice of Bill Inmons' Blog on http://www.b-eye-network.com/blogs/linstedt/.

My good friend Richard Winter just published a document about Oracle and Exadata and scalability.  Don't take this the wrong way, but I believe the findings are lopsided at best.  I hold Richard in the highest regards for exercising VLDB systems, but this report clearly is aimed at highlighting what Oracle does best - but it is missing crucial information about very large systems performance that I've been asking about for years.

The report is here: http://www.oracle.com/corporate/analyst/reports/infrastructure/bi_dw/winter-exadata-performance.pdf

You can read it for yourself.  First, I have to give kudos and credit to Oracle for finally recognizing that Infiniband networking is needed for high band-width, and also that high speed disk (such as SATA or SCSI Internal) is also needed for Oracle to perform.  These numbers of throughput are impressive.  However the report itself fails to test the following components:

1) High Performance Batch Load - where are the performance numbers of high performance batch load, or of parallel loads executing against the device?  How many parallel BIG batch loads can execute at once before the upper limits of the machine and Oracle are reached? 

2) Performance of near-real time transaction feeds.  How many feeds can be consumed? what's the maximum throughput rate? What's the upper limit for number of parallel feeds and number of transactions per second that can be "added" to the data warehouse?

3) Mixed workload performance tests.  What happens to the query performance when either one or both of the above loads take place WHILE querying?  How much is the impact to the system?  What happens to the logs and the temp?  Do we end up with CPU bound operations?

These are all things that Richard is very familiar with testing.  I have a feeling that Oracle didn't sanction these tests, or that somehow they were simply "removed" from the paper.  Again, Oracle marketing has stepped forward - it shows the Exadata appliance in the right light, but it doesn't have enough information to lead to sound decision making (in terms of: should we invest or purchase this appliance or not)?

One more piece I can't understand is the Star Schema that was put forward at the end of the report.  What appear to be "dimensions" are EXTREMELY narrow, they almost look like fact tables.  This star does not appear to me as any star I see on customer sites.  The first FACT table appears to house data that is not "fact based", and is extremely wide.  Of course Oracle will eat this up, as the dimensions can almost be "pinned in RAM".  Where is the "type 2" nature of the data in the dimensions?

Typically at least we see a customer dimension with multiple versions of the customer address - then we apply that to millions of customer rows, but nope - the fact table is the only one with billions of rows embedded.

Ok, maybe I'm being to harsh, and if so - my appologies. But I'm just really frustrated with the marketing of all of these companies that say: "The worlds fastest and largest database appliance/engine...." and then fail to include the whole story.

What did you take away from reading the report?  is it biased? is it one-sided? or is it spot-on and provide the full answers?

Curious,

Dan Linstedt

 


Posted March 5, 2009 4:16 AM
Permalink | 2 Comments |

2 Comments

As a member of the team from the Oracle side of the testing you are blogging about, I need to point out that the primary goal of the test was to a) support our claims that Exadata breaks the disk throughput bottlenecks suffered by conventional storage and b) Exadata can do so while servicing complex, concurrent queries.

We didn't propose that the schema was state of the art star schema design. In fact, we preferred to have some non-optimal qualities of the schema and the queries alike to bolster our position that Exadata is not a benchmark-special. That is, it does what we say it does under non-optimal conditions (vis a vis schemas and queries). Human's (even in production!) don't always implement perfection. It is a good thing when storage does not exacerbate the problem.

The bane of DW/BI with Oracle has historically been the conventional storage I/O bottleneck. Since this study showed concurrent, non-trivial SQL queries driving Oracle Database 11g throughput at 14GB/s from a single 42U rack configuration, don't you think it makes the point quite well? Rememeber, focus on the proof point the study was aimed at.

Let me ask, do you think that queries against some other schema would somehow not benefit from the I/O bandwidth this study proved? Especially if the plans used the same access method (full)?

I/O is I/O, storage has no idea why it is doing the kind of I/O it is doing (e.g., full, range-scan, etc).

There will be other proof points along the lines of those you assert *should* have been part of this study.


http://kevinclosson.wordpress.com/exadata-posts/


The views expressed in this comment are my own and do not necessarily reflect the views of Oracle. The views and opinions expressed by others on this comment thread are theirs, not mine.

@Dan

Please don't take this the wrong way, but I believe there are several points this article has missed or got wrong.

"high speed disk (such as SATA or SCSI Internal) is also needed for Oracle to perform"

I'm really confused about this statement.

How are SATA drives high speed? The SATA drives that Exadata uses are 7200 RPM SAS Midline drives (SATA drives, SAS interface). Almost any 15k RPM FCAL drive in a storage array can outperform a 7200 RPM SATA drive. Also, who uses SCSI Internal drives these days, that is unless you are referring to internal SAS (Serial Attached SCSI), but that is unclear from your words. But you are missing the point. Oracle's performance claims on Exadata storage is not simply about fast hard disk drives, because those HDDs are really not any faster than other HDDs on the market. It is about the entire package. The Exadata software appears to be able to push the hardware (HDDs and RAID card) pretty close to its physical capabilities while getting near theoretical max throughput rates.

"One more piece I can't understand is the Star Schema..."

I see the title of the Winter paper as "Measuring the Performance of the Oracle Exadata Storage Server", not "How Best To Design a Star Schema". Even so, I think you need to go back and look at the row counts and sizes of the tables and reread your comments. First, the fact tables look to be ALL_CARD_TRANS, a 172 billion, 13.5 TB table, PARTNER_MERCHANT_SALES and OUR_SALES. The "extremely wide fact table" you mention looks to be CUSTOMER_FACT but does not appear to me to actually be a fact table at all (perhaps it could have a better name, but that seems to me to be irrelevant for the purpose of the paper).

Dan, I apologize if I'm being to harsh on the article, but I think it contains too much emotion and is being critical of the Winter paper for not covering everything and the kitchen sink when that was not the intention of the paper. Perhaps you should go back and reread the "Contents Of This Report" section of the Winter paper. I think your efforts would be best to critique what the paper did cover and worry less about what it did not cover. The latter is certainly an unbounded argument.

Personally I found the paper to be informative and demonstrative of the performance the product is capable of. It speaks to the advantages it has over current storage architectures which I personally feel is one of the major problems with BI/DW storage solutions today. How many solutions in place today can can sustain a 10GB/s and higher read rate as Figure 3 on page 9 demonstrates using a single 42U rack of hardware? Doesn't that seem quite amazing? It does to me. It surely is much more than high speed disk drives and Infiniband.

Leave a comment

    
Search this blog
Categories ›
Archives ›
Recent Entries ›