Why Stars Work A Technical Discussion of Star Schema Designs

Originally published June 15, 2009

 “It is the stars, the stars above us, that govern our conditions.”
~William Shakespeare

Need

Business users need to query the data warehouse and to create reports with little or no intervention from IT. Adopting a reporting and online analytical processing (OLAP) tool that engages a full semantic layer with cubing capabilities will improve the business user experience and lessen the reliance on IT. OLAP tools, while capable of connecting to a third normal form (3NF) database, prefer a star schema-designed database. The goal of any semantic layer should be to produce user-defined queries with only inner JOINs in the SQL statement.

Star Schema Data Models

Star schema data warehouse models are designed for analytics. Specifically, a star schema:
  • Is very easy to understand, even for non-technical business managers.

  • Provides better performance and shorter query times.

  • Is extensible and will handle future changes more easily than the source transactional system.

Why It Works

In the star schema design, the fact table contains only keys and additive facts, or integers and some decimal numbers. The keys are also numbers. Databases store numbers very efficiently. A star schema uses two kinds of database tables: a dimension table and a fact table. Each type of table plays a unique role in the overall design allowing an efficient inner JOIN only.

Dimensions can be thought of as categories. Dimensions usually have hierarchies within them. For example, a Campus dimension could have University (all campuses) at the top of the hierarchy. University can then be broken down into many Campuses and Colleges and can then be further broken down into many Departments and Majors.

A fact table contains the measurements or metrics or facts of the business processes, in this case student registrations. A measurement of this business process would be student FTE, revenue and headcount. The tuition revenue is also captured in the fact table.

Star SQL

Star schemas are extremely user-friendly and allow users to generate SQL statements that are efficient and can be processed easily by a database. Using a star schema and a business intelligence (BI) tool, users will generate queries like the samples provided here. This kind of query technique will allow users to run queries across different units of the business. Users may change columns needed in the report and may apply different filters in the WHERE clause noted here. From an IT perspective, users can generate a standard and yet optimized query with great flexibility and without much IT intervention.

Here are two examples of the expected SQL statement. The first is a query against one star and the second of a cross-star join.

 
alt




alt

 

  • Christina Rouse, Ph.D.Christina Rouse, Ph.D.

    Christina is the Chief Architect at  Incisive Analytics, LLC. An improvement catalyst, Chris applies business intelligence strategy for performance improvement. Leveraging two decades of data experience on a broad range of technical platforms, she developed a technology-agnostic approach to business intelligence consulting. Clients rave about Chris' unique blend of business acumen, technical architect and trainer skills. She may be contacted at  Christina.Rouse@IncisiveAnalytics.com.

Recent articles by Christina Rouse, Ph.D.

 

Comments

Want to post a comment? Login or become a member today!

Posted June 22, 2009 by Christina Rouse chris.rouse@incisiveanalytics.com

Well said Andrea and your advice about the caution in a cross star join is heeded.  For the results to be accurate the fact table grains must indeed be the same.  Some of the new OLAP packages on the market are addressing this very problem.  For example, Microsoft's Analysis Services 2005 and 2008 allows two or more different grain level fact tables to exist in the same cube (data source view).  To manage the problem, one must manually control the alignment of facts to dimensions in the dimension matrix (cube designer).  Business Objects does a nice job of allowing independent queries to be executed and then a desktop join of the results.  When all else fails, the UNION works well.  Consider UNION ALL over UNION as it does not require a disk sort.  Good advice Andrea! ~ Chris

Is this comment inappropriate? Click here to flag this comment.

Posted June 22, 2009 by ANDREA VINCENZI andrea.vincenzi@tiscali.it

Christina, unless I didn't fully understand your example, I think that any discussion of cross-star queries should include a warning of the problems that can arise from different granularities of the fact tables, otherwise called "Chasm Trap". In my experience, querying more than one star at a time shoud be considered with great caution, and the query should be structured as a union.

I'd like to hear if you have found this kind of queries to be working (i.e. delivering the correct results with reasonable response times)

Andrea

Is this comment inappropriate? Click here to flag this comment.