Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Database servers come in all sizes and shapes. In fact, so many database servers have already been developed the last fifty years, that it looks almost impossible to develop a revolutionary new one. But it can be done. The last few years have proven that by thinking out-of-the-box, new promising and unique products can be developed.

One of these new kids on the block is Snowflake Elastic Data Warehouse by Snowflake Computing. It's not available yet, we still have to wait until the first half of 2015, but information is available and beta versions can be downloaded.

Defining and classifying Snowflake with one term is not that easy. Not even with two terms. To start, it's a SQL database server that supports a rich SQL dialect. It's not specifically designed for big data environments (the word doesn't even appear on the website), but to develop large data warehouses. In this respect, it competes with other so-called analytical SQL database servers.

But the most distinguishing factor is undoubtedly that it's architected from the ground up to fully exploit the cloud. This means two things, one, it's not an existing SQL database server that has been ported to the cloud, but its internal architecture is designed specifically for the cloud. All the lines of codes are new, no existing open source database server is used and adapted. It makes Snowflake highly scalable and really elastic, which is why organizations turn to the cloud.

Second, it also means that the product can really be used as a service. It only requires a minimal amount of DBA work. So, the term service doesn't only mean that it offers a service-based API, such as REST or JDBC, but that the product has been designed to operate hassle-free. Almost all the tuning and optimization is done automatically.

In case you want to know, no, the name has no relationship with the data modeling concept called snowflake schema. The name snowflake has been selected because many of the founders and developers have a strong relationship with skiing and snow.

Snowflake is a product to keep an eye on. I am looking forward to its general availability. Let's see if there is room for another database server. If it's sufficiently unique, there may well be.


Posted October 29, 2014 2:20 AM
Permalink | No Comments |
Many products are easy to categorize. It's a database server, an ETL tool, a reporting tool, and so on. But not every product. One such product is Pneuron. Initially you would say it's a jack of all trades, a Swiss army knife, but it isn't.

Pneuron is a platform that offers distributed data and application integration, data preparation, and analytical processing. With its workflow-like environment, a process can be defined to extract data from databases and applications, to perform analytics natively or to invoke different types of analytical applications and data integration tools, and to deliver final results to any number of destinations, or to simply persist the results so that other tools can easily access them.

Pneuron's secret is its ability to design and deploy distributed processing networks, which are based on (p)neurons (hence the product name). Each pneuron represents a task, such as data extraction, data preparation, or data analysis. Pneurons can run across a network of machines, and are, if possible, executed in parallel. It reuses the investment that companies have already made in ERP applications, ETL tools, and existing BI systems. It remains agnostic to and coordinates the use of all those prior investments.

Still, Pneuron remains hard to clarify. It's quite unique in its sort. But whatever the category is, Pneuron is worth checking out.


Posted October 28, 2014 10:03 AM
Permalink | No Comments |
On October 20, 2104, Teradata announced significant enhancements to QueryGrid at their Partners event in Nashville, Tennessee. QueryGrid allows developers of the Teradata database engine to transparently access data stored in Hadoop, Oracle, and Teradata Aster Database. Users won't really notice that data is not stored in Teradata's own database, but in one of the other data stores.

The same applies to developers using the Teradata Aster database. With QueryGrid they can access and manipulate data stored in Hadoop and the Teradata Database.

With QueryGrid, for both Teradata's database servers, access to big data stored in Hadoop becomes even more transparent than with its forerunner SQL-H. QueryGrid allows Teradata and Aster developers to seamlessly work with big data stored in Hadoop without the need to learn the complex Hadoop APIs.

QueryGrid is a data federator, so data from multiple data stores can be joined together. However, it's not a traditional data federator. Most data federators sit between the applications and the data stores that are being federated. It's the data federator that is being accessed by the applications. QueryGrid sits between, on one hand, the Teradata database or the Aster database, and, on the other hand, Hadoop, Oracle, and the Teradata database and the Aster database. So, applications do not directly access QueryGrid.

QueryGrid supports all the standard features one expects from a data federator. What's special about QueryGrid is that it's deeply integrated with Teradata and Aster. For example, developers using Teradata can specify one of the pre-built analytical functions supported by the Aster database, such as sessionization and connection analytics. The Teradata Database will recognize the use of this special function, knows it's supported by Aster, and automatically passes the processing of the function to Aster. In addition, if the data to be processed is not stored in Aster, but, for example, in Teradata, the relevant data is transported to Aster so that the function can be executed. This means that, due to QueryGrid, functionality of one of the Teradata database servers becomes available for the other.

QueryGrid is definitely an enrichment for organizations that want to develop big data systems by deploying the right data storage technology for the right data.




Posted October 28, 2014 9:59 AM
Permalink | No Comments |
First, we had the data warehouse and the data mart, now we also have the data lake, the data reservoir, and the data hub. These new architectures have all been introduced recently and have a direct relationship with the ever more popular Hadoop technology. But don't the fundamentals of Hadoop clash with those of these architectures? Is the technology not inconsistent with the architectures?

Forbes magazine defined data lake by comparing it to a data warehouse: "The difference between a data lake and a data warehouse is that the data in a data warehouse is pre-categorized at the point of entry, which can dictate how it's going to be analyzed." Data hub is a term used by various big data vendors.  In a recent tweet, Clarke Patterson summarized it as follows: data lake = storage + integration; and data hub = data lake + exploration + discovery + analytics. A data reservoir is a well-managed data lake.

All three architectures have in common that data from many sources is copied to one centralized data storage environment. After it has been stored in the data lake (or hub or reservoir), it's available for all business purposes such as integration and analysis. The preferred technology for developing all three is Hadoop. The reason that Hadoop is selected is, firstly, because its low price/performance ratio makes it attractive for developing large scale environments, and secondly, it allows raw data to be stored in its original form and not in a form that already limits how we can use the data later on.

In principle, there is nothing wrong with idea of making all data available in a way that leaves open any form of use, but is moving all the data to one physical data storage environment implemented with Hadoop the smartest alternative?

We have to remember that Hadoop's primary strength and from which it derives its processing scalability, is that it pushes processing to the data, and not vice versa as most other platforms do. Because it's not pushing the data to the processing, Hadoop is able to offer high levels of storage and processing scalability.

Now, isn't this inconsistent with the data lake, data hub, and data reservoir architectures? On one hand we have, for example, the data lake in which all the data is pushed to a centralized data storage environment for processing, and on the other hand, we deploy Hadoop technology, which shines when processing is pushed to the data.

Shouldn't we come up with architectures that are more in line with the fundamental design concept of Hadoop itself, namely architectures where processing (integration and analysis) is pushed to the systems where the data originates (where the data is entered and collected)?

What is needed are systems that make applications and users think that all data is centralized and that all can be freely analyzed. This also avoids massive and continuous data movement exercises and avoids duplicate storage of data. In other words, we need virtual data lakes, virtual data hubs and virtual data reservoirs. Such architectures would be much more in line with Hadoop, which is based on pushing processing to the data. Maybe we should open this discussion first before we recommend organizations to invest heavily in massive data lakes in which they may drown.


Posted May 19, 2014 12:48 AM
Permalink | No Comments |
We all know and understand dashboards. Dashboards consist of visual components in the form of gauges, graphs, pie charts, bar charts, heat maps, and so on, allowing business users to get a quick overview of key performance indicators (KPI).

I thought that we had seen it all with respect to dashboards and that nothing new could be invented. Wrong! Newcomer VisualCue has surprised me completely. This tools supports a new way of presenting KPIs with dashboards, and a new way of working with them.

Besides the more traditional visual components, the key visual component in VisualCue is a tile. In a tile various icons can be used to show multiple KPIs of a process or object. Each icon represents a KPI. A tile can represent, for example, a call center agent, a truck driver, a flight, a hotel, or a project. Heavy use is made of colors to indicate the state of the KPI, and thus the object. The icons can be designed by the customer or they can be selected from a long predefined list.

By showing many tiles next to one each other, business users can see the states of numerous objects or processes at a glance. In other words, the tiles make it possible for business users to see and process a lot of detailed data at a single glance. There is no need to aggregate data which always leads to hiding data, and hiding data can lead to missing opportunities or possible problems. So, with tiles users will see the wood for the trees.

The tiles are dynamic. They can be classified and grouped based on characteristics, for example, call center operators can be grouped based on reached business goals. Business users can also do drill downs on the tiles, showing even more details. And that's where this product really shines.

Again, I thought I had seen everything with respect to dashboards, but clearly I was wrong. The product is still very young, so it still has to proof itself in large, real life projects. But I recommend to check out VisualCue anyway.




Posted May 13, 2014 2:53 AM
Permalink | No Comments |
PREV 1 2 3 4 5 6