Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration and database technology. Currently my special interests include virtual data warehousing, mashups and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence (BI), application integration and database technology. He is managing director and founder of R20/Consultancy. He is an internationally acclaimed speaker who has lectured worldwide for the last 25 years. He is the chairman of the successful annual European Business Intelligence and Data Warehouse Conference held in London and the annual BI event in The Netherlands. Currently, he is promoting a new architecture for data warehousing called the Data Delivery Platform. He is the author of several books on computing, including the popular Introduction to SQL. Some of his books are available in English, Chinese, Dutch, Italian and German. He is also the author of the successful books SQL for MySQL Developers and The SQL Guide to Ingres. Rick may be contacted by sending an email to info@r20.nl.

Editor's Note: More articles and resources are available in Rick van der Lans' BeyeNETWORK Expert Channel. Be sure to visit today!

Many times we criticize users for having poor or no definitions at all for their concepts, and we can even get upset if different users of the same organization use different definitions for the same concept. However, can we say with certainty that we are doing a good job with respect to definitions in our own field? I am not so sure. It's more like the pot calling the kettle black. In the world of business intelligence and data warehousing, many concepts have been defined poorly or not at all, including those concepts we use daily. Obviously, this always leads to confusing discussions.

 

A good definition of a concept satisfies several requirements, one is reversibility. Suppose that we have the following abstract definition: "A is text". Reversibility means that everything that satisfies the text is also an A. Take for example the concept of an african elephant (Loxodonta). A possible definition of elephant would go along the lines of "a big herbivore with a trunk, tusks, and big feet". So each mammal satisfying these requirements is an elephant by definition. Only having a trunk is not sufficient, you must have tusks, big ears, and big feet as well. 

 

With a decent definition we want to include the correct concepts and exclude the wrong ones. For example, from the above definition of the african elephant we can conclude that the savannah elephant is indeed an african elephant. However, by including big ears as a requirement, we exclude the asian elephant rightfully so. By demanding that a concept's definition is reversible, we assure that the wrong concepts excluded.

 

Unfortunately, in our world not all the definitions are reversible. Let's take as an example Bill Inmon's well-known and frequently used definition of a data warehouse: "A data warehouse is a subject oriented, integrated, non volatile, time variant collection of data for management's decision making". Unfortunately, this definition is not reversible. If a user creates a spreadsheet containing customer data (subject-oriented), that have been brought together from different systems (integrated), that remain unchanged the entire time (non-volatile), and that contain historical data (time variant), and, in addition, if this spreadsheet has been developed to support decision making, then this spreadsheet satisfies all the requirements specified in the specified definition. Ergo, this spreadsheet is a data warehouse. In fact, a lot of data marts that have been created would also satisfy this definition. However, I don't think this is Inmon's intention. In short, the definition has been defined too "wide".

 

Note that it's not only the definition of the concept data warehouse that is not reversible. It applies to definitions of many other popular concepts as well.

 

Isn't it about time we scrutinize all our definitions? If disciplines as chemistry, physics, and economy are able to come up with sound definitions, we should be able to do so as well. By the way, I am not even mentioning the fact that for certain concepts we don't have a definition at all.

Posted December 15, 2010 6:25 AM
Permalink | No Comments |

I hadn't done anything with graph theory and graph analytics for quite some time until I wrote a technical whitepaper on the graph database server InfiniteGraph. After doing some research and studying the product I came to the conclusion that I had neglected this topic. Graph analytics is a powerful form of analytics that allows us to analyze data in a way that's not possible with other tools. In fact, tools for graph analytics can be seen as complimentary to all the reporting and analytical capabilities we are all so familiar with.

 

When writing the paper I talked to several people, and quite a number didn't see why graph analytics is special, nor did they think it would be relevant for many organizations. But that's not the case. All kinds of organizations can benefit from graph analytics. For example, in a government organization a graph can be created linking all private persons and organizations and graph analytics can be used to find 'hidden' relationships between organizations. In the financial world, it can be used to 'follow' money transfers to create a trail, and in transport it can be used to find the shortest route to deliver goods to various addresses. Every organization that logs all the traffic on their website can create a graph that shows how individual visitors travel through the website. This traffic can be simulated to determine whether visitors are using the correct and the most ideal path. The most obvious example is that graph analytics is used to find central members in a social network. And the list goes on.

 

Various tools are available that can do graph analytics and that can show the results graphically. Unfortunately, these tools can't handle large graphs made up of millions of nodes and relations. This is where graph database servers come in. Today, they do make online graph analytics on massive graphs possible.

 

In business intelligence architectures, graph database servers can be used for building data marts designed specifically for graph analytics. These data marts will receive their data from a central data warehouse. In a way, this is comparable to developing an MDX-based data mart for users needing more classic forms of analysis.

 

In a nutshell, if you haven't studied graph analytics and the associated tools and database servers for some time, just like me, take some time and dive into it. It's exciting technology!


Posted September 29, 2010 6:56 AM
Permalink | 1 Comment |

Most analytical tools process a large portion of the analytical logic themselves. For example, the logic to perform a regression analysis which determines the relationship between a dependent variable and one or more independent variables, is executed by the tool. The role of the database server is minimal, it's only used for retrieving all the required data from the database.

Because most of the analytic processing takes place on the machine where the tool runs, it's very likely that too much data is transmitted from the database server to the application, which is bad for performance. Additionally, the processing is not taking place on the most powerful machine.

With in-database analytics, the analytical processing is primarily done by the database server itself. The remaining task of the analytical tool is to present results on the screen and do some minimal processing. This approach has several performance advantages. For example, because the database server (almost certainly) runs on a more powerful machine, the analytical logic is processed more quickly. Secondly, because most of the analytical processing is executed very close to the where the data is stored, the I/O is optimal. And thirdly, because only the result set is transmitted back to the tool, minimal time is wasted on transmitting data from the database server to the tool.

But moving the analytical processing from the application to the database server by itself does not automatically lead to a considerable performance improvement. A serious performance improvement is realized when the analytical logic is executed in parallel by the database server.

A solution based on SQL-MapReduce does allow to push most of the analytical processing to the database server and most of that processing will be executed in parallel. My technical whitepaper Using SQL-MapReduce for Advanced Analytical Queries, which describes Aster Data's implementation of SQL-MapReduce, explains in detail how this works.

 


Posted June 21, 2010 11:42 PM
Permalink | No Comments |

Quite a hip and new term in the world of business intelligence is self-service business intelligence. If you visit this website regularly, you must have come across it. But is the term self-service not a term in contradiction?

 

To me the term service to me means that someone or something offers me a service, and that implies that I do less and the service provider does all or most of the work. For example, if I drive my car through a car wash, my car is automatically cleaned. It's the service that's being provided. Or, if I step into a hotel, packed with luggage, a porter will probably take over my bags, and will bring them to my room. Ok, I have carried them for hundreds of miles and he only does the last 100 yards, but it's still a service the hotel provides. That's basically the idea of service.

 

Now let's go back to the term self-service. The term self placed in front of the term service means you will do it yourself. In the context of self-service business intelligence, it means that the user can develop his own reports. But doing it yourself means you're not receiving service, you are actually doing it yourself. So, self-service means that no one offers you a service, you do all the work yourself.

 

For example, if a hotel positions itself as a self-service hotel, they would offer the service that you can carry your own luggage all the way up to your room. Comparably, a self-service carwash would provide the service that you can wash your car yourself. That's not service!

 

So combining the terms self and service make no sense, because the opposite of service is doing-it-yourself. Maybe we should rename self-service business intelligence to do-it-yourself BI, or no-service BI.


Posted March 30, 2010 8:10 AM
Permalink | 1 Comment |

Who is interested in speaking at the Data Warehouse & Business Intelligence European Conference in London coming November? If you are, please fill in this call for speakers.

 

Last year, this event was a big success, more than 200 delegates showed up. Evaluations showed that the attendees were very pleased with the selected speakers (Bill Inmon, Barry Devlin, Neil Raden, Frank Buytendijk, Daniel Linstedt, and many more), the topics, and setup of the conference.

 

The 2010 edition is aimed at all aspects of data warehousing and business intelligence, including: trends, design guidelines, product overviews and comparisons, best practices, and new evolving technologies. And like last year, the conference is organized together with the highly successful Data Management and Information Quality Conference.

 

With this year's call for speakers we are trying to attract proposals for sessions on traditional and future data warehousing and business intelligence aspects. Delegates have expressed a preference for the use of case studies rather than theoretical or abstract topics. We would particularly like practitioners in the field to respond to this call for papers. We encourage new speakers to apply. Success stories - case studies where data warehousing and business intelligence have produced real bottom-line benefits are very much appreciated.

 

Example topics for proposals are:

 

  • Business analytics
  • BI in the cloud
  • Data modeling for data warehouses
  • The maturity of data warehouses appliances
  • Star schema, snowflake and data vault models
  • Selling business intelligence to the business
  • The relationship between master data management and data warehousing
  • Guidelines for using ETL tools
  • Developing virtual data warehouses with federation servers
  • The BI mashup
  • The need for Master Data Management in a data warehouse environment
  • BAM (Business Activity Monitoring) and KPI (Key Performance Indicators)
  • New database technology for implementing data warehouses
  • Who needs real-time data warehouses?
  • Business Optimization through BPEL, BAM and SOA
  • BI score carding
  • Customer analytics and insight
  • Text mining and text analytics
  • Open source BI
  • Corporate Performance Management

 

Looking forward to your call for speaker, and hope to see you in London coming November.

 

Rick van der Lans

Chairman of the Data Warehouse & Business Intelligence European Conference


Posted March 15, 2010 3:02 AM
Permalink | No Comments |

1 2 NEXT