Blog: Rick van der Lans http://www.b-eye-network.com/blogs/vanderlans/ Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration and database technology. Currently my special interests include virtual data warehousing, mashups and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl Copyright 2012 Mon, 17 Oct 2011 10:15:02 -0700 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss Data Virtualization has Reached Maturity

On October 6, 2011 Informatica organized a virtual conference on data virtualization; see http://bit.ly/puyGZ6. During the live event, attendees took part in various polls and competitions. In one case they were asked to describe what they would choose as their first project for using data virtualization in their organization. As the judge for the competitions, I was fascinated by the results that came in. In a nutshell, these attendees were spot on, they were picking the right projects, they had the right arguments for selecting those projects, and they clearly understood the full potential of data virtualization.

 

Here are some of the statements they made - paraphrased to respect sensitive customer information:

 

  • "With data virtualization, analyst teams can gain quick access to data, profile data, and develop prototypes together with the business to finalize requirements before development."
  • "We are starting to bring in a host of external data to supplement our internal data warehouse. There is often a fair amount of uncertainty of the utility of the information upfront until some amount of BI exploration is complete. To improve the time to market, I would consider tools like data virtualization to be able to provide that information quickly, and hold off on any real data warehouse integration until utility of the data is proven."
  • "We have a data warehouse that lags data by a day or more for exposing key data to our customers. By speeding up the delivery of information to our customers, they can respond more quickly..."

 

These quotes show that data virtualization is being applied for a wide range of purposes. The keywords that jump out are: agile, flexibility, prototyping, real-time data, and fast response.

 

To summarize, data virtualization has reached maturity and organizations are deploying the technology and they know why and where they want to use it.

 

The live poll results reflect this as well. On a poll that asked the question, "What are the use cases you believe data virtualization is applicable in your environment," the majority picked enabling agile BI as their first choice followed by single view and data services for SOA.

 

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2011/10/data_virtualiza_1.php http://www.b-eye-network.com/blogs/vanderlans/archives/2011/10/data_virtualiza_1.php Mon, 17 Oct 2011 10:15:02 -0700
Is Hadoop a Relational Database Server? More and more often, Apache's Hadoop is somehow compared to relational databases. In most of those comparisons, Hadoop is presented as a non-relational database, as something that's totally different from classic database servers, such as IBM's DB2, Microsoft SQL Server, and Oracle11g. Comparing Hadoop this way makes no sense. Hadoop can be as relational as those classic database servers.

 

Whether a system is relational does not depend on how data is stored on disk, but fully depends on how the data is perceived by the applications. It depends on what language and/or API the applications use to insert, query, and manipulate the data.

 

In a nutshell, when the relational model was defined and introduced by Tedd Codd, Chris Date and others, the rule was that if a system could present all the data as tables and columns, and if that data could be accessed through a language supporting relational operators such as join, select, and project, that system was a relational system. Tedd Codd called this data independency; an application should not be concerned with how the data was physically stored.

 

What this means is that if a system offers an interface where data is presented as tables and that supports those relational operators, it offers a relational interface. For example, if a system supports a SQL interface on a dataset, that system can be classified as relational. Note: I am aware that SQL does not adhere to all the rules needed to offer a relational interface, but for the sake of simplicity I will regard SQL as relational.

 

We have to make a distinction between, on one hand, the storage model and the storage engine, and, on the other hand, the interface the applications use. Let's call the latter that the interface model. Whether something can be qualified as a relational depends on that interface model and not on the storage model. Whether data in stored as records, in a column-oriented fashion, in a key-value store, or, if possible, in a fish bowl, is irrelevant. The storage model does not determine whether a system is relational or not.

 

Hadoop's HDFS uses a very specific storage model and unique storage engine that are both different from what the classic database servers have implemented. And of course, if we would access the interface of HDFS directly, we wouldn't see a interface that could be called relational, but a very technical low-level interface instead. However, if we would use HiveQL to access the data stored in Hadoop, or if we would use a data virtualization server such as Composite Information Server or Informatica Data Services running on top of Hadoop, in both cases the Hadoop database would be accessed in a relational way, meaning it would become a relational system to the applications.

 

This is not very different from accessing classic relational databases. If we access data via the standard SQL interface, they are relational systems. However, if it would be possible to develop an application that accesses the data by directly accessing their internal storage engines, the same data wouldn't look that relational anymore. By the way, those database servers don't always store the data as records either. For example, in Oracle data can be presented as tables while in fact it's stored as a multi-dimensional cube. And in Sybase IQ data is presented as tables but is stored in a column-oriented fashion using pointer structures.

 

To summarize, whether a system is relational is not dependent on the storage model, but on the language and/or API used to access the data. The same data set can be presented as relational to one application and as not-relational to the other. Hadoop offers a special storage model, but that doesn't mean that data can not be presented in a relational way. In fact, the same applies to most of those new so-called NoSQL database servers.

 

To come back to the comparisons, it would make sense to compare the storage models of Hadoop with those of other database servers, and it would make sense to compare the interfaces of Hadoop with the interfaces of other database servers. But a comparison of the Hadoop storage model with the interface model of classic database servers, is like comparing apples and pears.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2011/09/is_hadoop_a_rel.php http://www.b-eye-network.com/blogs/vanderlans/archives/2011/09/is_hadoop_a_rel.php Mon, 12 Sep 2011 09:26:26 -0700
Data Virtualization with Informatica Data Services Maybe some have missed it, but at the end of last year Informatica entered the market of data virtualization/data federation products with Informatica Data Services (IDS). This product has been built on top of the Informatica 9 platform, from which it inherits its robustness and scalability.

 

Besides all the features you expect from a data virtualization product, it does offer some unique ones. For example, virtual tables (views) are not defined by using SQL or XQuery, but with a flow language that resembles the flow language used in PowerCenter for defining ETL scripts. The only difference is that in PowerCenter the result of a flow is stored in some table or file, while with IDS the result is "pushed" to a reporting or analytics tool. Under the hood, the flow language is transformed into SQL and pushed down to the database servers. It will try to process as much of the data access as close to the data as possible.

 

Another feature is that data profiling has been implemented as an integrated part of the product and the profiling can be done in an on-demand style. What that means is that when a virtual table has been defined, by just clicking on a button, the (virtual) contents of the virtual table is profiled. If something looks incorrect, it can be fixed by adding or changing transformation, or by fixing the source data (if allowed and possible). This will become an iterative process that continues until the virtual table returns the right data.

 

In addition, the developer can ask a user or business analyst to look at the virtual table as well. The user can check whether he thinks the contents is ok, and if not, by using a simple Excel-like language, the user can add his own transformations. Eventually, defining the right transformations becomes a collaborative process between users and developers.

 

Complex cleansing operations can also be executed on-demand. In other words, when data is retrieved by a report, IDS will access the underlying data sources and will execute all the cleansing operations.

 

To summarize, IDS shows how feature-rich and mature the data virtualization products are becoming. If you want to know more about how IDS works and what its features are, get my new technical whitepaper Developing a Data Delivery Platform with Informatica Data Services.

 

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2011/04/data_virtualiza.php http://www.b-eye-network.com/blogs/vanderlans/archives/2011/04/data_virtualiza.php Fri, 08 Apr 2011 01:57:15 -0700
Call for Papers: Data Warehousing and Business Intelligence Conference in London Are you interested in speaking at the Data Warehouse & Business Intelligence European Conference in London coming November? If you are, please fill in the call for papers.

Previous editions were very successful and attracted more than 200 delegates. Evaluations showed that the attendees were very pleased with the selected speakers, the topics, and setup of the conference.

The 2011 edition is aimed at all aspects of data warehousing and business intelligence, including: trends, design guidelines, product overviews and comparisons, best practices, and new evolving technologies. And like the previous years, the conference is organized together with the highly successful European Data Management and Information Quality Conference.

With this year's call for presentations we are trying to attract proposals for sessions on traditional and future data warehousing and business intelligence aspects. Delegates have expressed a preference for the use of case studies rather than theoretical or abstract topics. We would particularly like practitioners in the field to respond to this call for papers. We encourage new speakers to apply. Success stories - case studies where data warehousing and business intelligence have produced real bottom-line benefits are very much appreciated.

Example topics for proposals are:

  • Business and data analytics
  • BI in the cloud
  • Data modelling for data warehouses
  • NoSQL in a data warehouse environment
  • The maturity of analytical database servers
  • Star schema, snowflake and data vault models
  • Selling business intelligence to the business
  • Big data analytics
  • The relationship between master data management and data warehousing
  • Guidelines for using ETL tools
  • Data virtualization and data federation
  • The BI mashup
  • The need for Master Data Management in a data warehouse environment
  • BAM (Business Activity Monitoring) and KPI (Key Performance Indicators)
  • New database technology for implementing data warehouses, such as
  • Business intelligence as ROI for the data warehouse
  • Who needs real-time data warehouses?
  • Business Optimization through BPEL, BAM and SOA
  • BI scorecarding
  • Customer analytics and insight
  • Text mining and text analytics
  • Open source BI
  • Corporate Performance Management

Looking forward to your proposal, and hope to see you in London coming November.

Rick F. van der Lans

Chairman of the Data Warehouse & Business Intelligence European Conference 2011

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2011/03/call_for_papers.php http://www.b-eye-network.com/blogs/vanderlans/archives/2011/03/call_for_papers.php Thu, 24 Mar 2011 18:06:12 -0700
Oco - Vendor of Analytic SAAS BI Applications This week I had a meeting with Oco, a vendor of analytic SAAS BI Applications based in Waltham, MA. I had never heard of them, but the first versions of their current offerings have been available since 2007 and the company has been around since 1999.

 

Currently, there are many vendors on the market that deliver SAAS BI capabilities. However, most of them deliver tools and technologies. The customer BI applications have to be designed and build first. Not so with Oco. Oco's claim to fame is a set of data models for different application areas, such as buyer performance, supplier evaluation, capacity utilization, revenue trending, and account visibility. This means that the product comes with pre-defined data structures (that customers can adapt to their own needs) plus pre-built BI applications operating on those data models.The attractive part of this is that if the data structures of the customer's production data has been mapped to Oco's data models, the hard part has been done. And very quickly, the business data can be analyzed.

 

Being a SAAS vendor, they host all the software. Data can be uploaded periodically to refresh the database. They have their own products for copying and transforming the data. The database server and data integration technology are all developed with Microsoft software. The front end is partly based on SAP/BO software, such as Xcelsius, WelIntelligence, and Explorer. In addition, some of the front ends, such as the one for KPI Dashboards and Multi-dimensional reporting, are developed with their own tool. It's all Flex based.

 

Oco is an interesting company with an interesting offering. Although they are very much focused on SAP, their market is still very US based. Maybe that's the reason why I had never heard of them?

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2011/01/oco_-_vendor_of.php http://www.b-eye-network.com/blogs/vanderlans/archives/2011/01/oco_-_vendor_of.php Thu, 27 Jan 2011 00:10:40 -0700
Data Warehouses and Elephants: About Reversible Definitions Many times we criticize users for having poor or no definitions at all for their concepts, and we can even get upset if different users of the same organization use different definitions for the same concept. However, can we say with certainty that we are doing a good job with respect to definitions in our own field? I am not so sure. It's more like the pot calling the kettle black. In the world of business intelligence and data warehousing, many concepts have been defined poorly or not at all, including those concepts we use daily. Obviously, this always leads to confusing discussions.

 

A good definition of a concept satisfies several requirements, one is reversibility. Suppose that we have the following abstract definition: "A is text". Reversibility means that everything that satisfies the text is also an A. Take for example the concept of an african elephant (Loxodonta). A possible definition of elephant would go along the lines of "a big herbivore with a trunk, tusks, and big feet". So each mammal satisfying these requirements is an elephant by definition. Only having a trunk is not sufficient, you must have tusks, big ears, and big feet as well. 

 

With a decent definition we want to include the correct concepts and exclude the wrong ones. For example, from the above definition of the african elephant we can conclude that the savannah elephant is indeed an african elephant. However, by including big ears as a requirement, we exclude the asian elephant rightfully so. By demanding that a concept's definition is reversible, we assure that the wrong concepts excluded.

 

Unfortunately, in our world not all the definitions are reversible. Let's take as an example Bill Inmon's well-known and frequently used definition of a data warehouse: "A data warehouse is a subject oriented, integrated, non volatile, time variant collection of data for management's decision making". Unfortunately, this definition is not reversible. If a user creates a spreadsheet containing customer data (subject-oriented), that have been brought together from different systems (integrated), that remain unchanged the entire time (non-volatile), and that contain historical data (time variant), and, in addition, if this spreadsheet has been developed to support decision making, then this spreadsheet satisfies all the requirements specified in the specified definition. Ergo, this spreadsheet is a data warehouse. In fact, a lot of data marts that have been created would also satisfy this definition. However, I don't think this is Inmon's intention. In short, the definition has been defined too "wide".

 

Note that it's not only the definition of the concept data warehouse that is not reversible. It applies to definitions of many other popular concepts as well.

 

Isn't it about time we scrutinize all our definitions? If disciplines as chemistry, physics, and economy are able to come up with sound definitions, we should be able to do so as well. By the way, I am not even mentioning the fact that for certain concepts we don't have a definition at all.]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/12/data_warehouses.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/12/data_warehouses.php Wed, 15 Dec 2010 06:25:58 -0700
Why Graph Analytics? I hadn't done anything with graph theory and graph analytics for quite some time until I wrote a technical whitepaper on the graph database server InfiniteGraph. After doing some research and studying the product I came to the conclusion that I had neglected this topic. Graph analytics is a powerful form of analytics that allows us to analyze data in a way that's not possible with other tools. In fact, tools for graph analytics can be seen as complimentary to all the reporting and analytical capabilities we are all so familiar with.

 

When writing the paper I talked to several people, and quite a number didn't see why graph analytics is special, nor did they think it would be relevant for many organizations. But that's not the case. All kinds of organizations can benefit from graph analytics. For example, in a government organization a graph can be created linking all private persons and organizations and graph analytics can be used to find 'hidden' relationships between organizations. In the financial world, it can be used to 'follow' money transfers to create a trail, and in transport it can be used to find the shortest route to deliver goods to various addresses. Every organization that logs all the traffic on their website can create a graph that shows how individual visitors travel through the website. This traffic can be simulated to determine whether visitors are using the correct and the most ideal path. The most obvious example is that graph analytics is used to find central members in a social network. And the list goes on.

 

Various tools are available that can do graph analytics and that can show the results graphically. Unfortunately, these tools can't handle large graphs made up of millions of nodes and relations. This is where graph database servers come in. Today, they do make online graph analytics on massive graphs possible.

 

In business intelligence architectures, graph database servers can be used for building data marts designed specifically for graph analytics. These data marts will receive their data from a central data warehouse. In a way, this is comparable to developing an MDX-based data mart for users needing more classic forms of analysis.

 

In a nutshell, if you haven't studied graph analytics and the associated tools and database servers for some time, just like me, take some time and dive into it. It's exciting technology!

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/09/why_graph_analy.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/09/why_graph_analy.php Wed, 29 Sep 2010 06:56:32 -0700
In-Database Analytics with Aster Data's SQL-MapReduce Most analytical tools process a large portion of the analytical logic themselves. For example, the logic to perform a regression analysis which determines the relationship between a dependent variable and one or more independent variables, is executed by the tool. The role of the database server is minimal, it's only used for retrieving all the required data from the database.

Because most of the analytic processing takes place on the machine where the tool runs, it's very likely that too much data is transmitted from the database server to the application, which is bad for performance. Additionally, the processing is not taking place on the most powerful machine.

With in-database analytics, the analytical processing is primarily done by the database server itself. The remaining task of the analytical tool is to present results on the screen and do some minimal processing. This approach has several performance advantages. For example, because the database server (almost certainly) runs on a more powerful machine, the analytical logic is processed more quickly. Secondly, because most of the analytical processing is executed very close to the where the data is stored, the I/O is optimal. And thirdly, because only the result set is transmitted back to the tool, minimal time is wasted on transmitting data from the database server to the tool.

But moving the analytical processing from the application to the database server by itself does not automatically lead to a considerable performance improvement. A serious performance improvement is realized when the analytical logic is executed in parallel by the database server.

A solution based on SQL-MapReduce does allow to push most of the analytical processing to the database server and most of that processing will be executed in parallel. My technical whitepaper Using SQL-MapReduce for Advanced Analytical Queries, which describes Aster Data's implementation of SQL-MapReduce, explains in detail how this works.

 

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/06/in-database_ana.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/06/in-database_ana.php Mon, 21 Jun 2010 23:42:00 -0700
Self-Service Business Intelligence: What a Strange term! Quite a hip and new term in the world of business intelligence is self-service business intelligence. If you visit this website regularly, you must have come across it. But is the term self-service not a term in contradiction?

 

To me the term service to me means that someone or something offers me a service, and that implies that I do less and the service provider does all or most of the work. For example, if I drive my car through a car wash, my car is automatically cleaned. It's the service that's being provided. Or, if I step into a hotel, packed with luggage, a porter will probably take over my bags, and will bring them to my room. Ok, I have carried them for hundreds of miles and he only does the last 100 yards, but it's still a service the hotel provides. That's basically the idea of service.

 

Now let's go back to the term self-service. The term self placed in front of the term service means you will do it yourself. In the context of self-service business intelligence, it means that the user can develop his own reports. But doing it yourself means you're not receiving service, you are actually doing it yourself. So, self-service means that no one offers you a service, you do all the work yourself.

 

For example, if a hotel positions itself as a self-service hotel, they would offer the service that you can carry your own luggage all the way up to your room. Comparably, a self-service carwash would provide the service that you can wash your car yourself. That's not service!

 

So combining the terms self and service make no sense, because the opposite of service is doing-it-yourself. Maybe we should rename self-service business intelligence to do-it-yourself BI, or no-service BI.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/self-service_bu.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/self-service_bu.php Tue, 30 Mar 2010 08:10:30 -0700
Call for Speakers: Data Warehousing and Business Intelligence Conference in London Who is interested in speaking at the Data Warehouse & Business Intelligence European Conference in London coming November? If you are, please fill in this call for speakers.

 

Last year, this event was a big success, more than 200 delegates showed up. Evaluations showed that the attendees were very pleased with the selected speakers (Bill Inmon, Barry Devlin, Neil Raden, Frank Buytendijk, Daniel Linstedt, and many more), the topics, and setup of the conference.

 

The 2010 edition is aimed at all aspects of data warehousing and business intelligence, including: trends, design guidelines, product overviews and comparisons, best practices, and new evolving technologies. And like last year, the conference is organized together with the highly successful Data Management and Information Quality Conference.

 

With this year's call for speakers we are trying to attract proposals for sessions on traditional and future data warehousing and business intelligence aspects. Delegates have expressed a preference for the use of case studies rather than theoretical or abstract topics. We would particularly like practitioners in the field to respond to this call for papers. We encourage new speakers to apply. Success stories - case studies where data warehousing and business intelligence have produced real bottom-line benefits are very much appreciated.

 

Example topics for proposals are:

 

  • Business analytics
  • BI in the cloud
  • Data modeling for data warehouses
  • The maturity of data warehouses appliances
  • Star schema, snowflake and data vault models
  • Selling business intelligence to the business
  • The relationship between master data management and data warehousing
  • Guidelines for using ETL tools
  • Developing virtual data warehouses with federation servers
  • The BI mashup
  • The need for Master Data Management in a data warehouse environment
  • BAM (Business Activity Monitoring) and KPI (Key Performance Indicators)
  • New database technology for implementing data warehouses
  • Who needs real-time data warehouses?
  • Business Optimization through BPEL, BAM and SOA
  • BI score carding
  • Customer analytics and insight
  • Text mining and text analytics
  • Open source BI
  • Corporate Performance Management

 

Looking forward to your call for speaker, and hope to see you in London coming November.

 

Rick van der Lans

Chairman of the Data Warehouse & Business Intelligence European Conference

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/call_for_speake.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/call_for_speake.php Mon, 15 Mar 2010 03:02:43 -0700
Visit of SQLStream Quite recently, I visited the SQLStream. As the majority of database server vendors, SQLStream is located in California; in San Francisco to be more precise. Of course, their primary product (also called SQLStream) supports the database language SQL, and they try to follow the SQL standard as much as possible. So far, nothing new under the sun. You would almost think that this is again one of many new vendors trying to dethrone Oracle, Microsoft, and IBM. However, that would be an incorrect assumption.


As the name implies, SQLStream is a so-called streaming database server, comparable to IBM InfoSphere Streams and StreamBase Server. The main difference between SQLStream on one hand and most other products on the other hand, is that the former is a pure SQL-based product. The statements to stream are according to the SQL standard. Most other streaming products use proprietary languages, such as Spade, or use extensions.

 
For those who haven't studied this topic in detail yet, a streaming database server allows us to formulate queries on streams of data. Examples of streams are log files of certain systems, messages that are entered, or web logs. Even before this data is stored in tables, we can already access them and analyze the data. Someone once explained streaming database servers as follows: queries executed in the context of a classic database server are like: how many fishes live in this pond, whilst queries executed in the context of a streaming database server  is like: how many fishes swim by in a fast-flowing river during a certain period of time.


SQLStream offers all the features above. In addition, views are used to define streams, and this type of streaming views can serve as input for other streaming views. Through join and union operators, data of different sources can be integrated. In fact, SQLStream supports many of the features normally found in an ETL tool, except that SQLStream uses streams and SQL. Data streams are integrated live the moment they arrive. The result of an integrated stream can be send to an application or data warehouse. See the following link that contains an explanation on how SQLStream can be used together with SQL Power.

In short, SQLStream is absolutely worth studying.

Note: The owners of SQLStream are also the founders of Eigenbase.org. This organization supplies a toolset with which database servers can be developed. As can be expected, SQLStream is also developed with this toolset.

 

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/visit_of_sqlstr.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/visit_of_sqlstr.php Tue, 09 Mar 2010 05:18:34 -0700
After 40 Years of Practice, Developing IT Systems Is Still Hard Last week, my family and I visited a basketball game in Phoenix. The Phoenix Suns were playing against the Philadelphia 76ers. It was a great game, the Suns won with 106-95. However, before I was able to get into the stadium I had the following experience.

 

A few days prior to this game, I bought the tickets through Ticketmaster.com. To buy those tickets, I had to enter my credit card information. Normally, this doesn't cause any problems, because credit companies operate internationally and know that some of their customers are based outside the US and they know those addresses might have different formats and structures.

 

What I had to enter was, as you might expect, my name, credit card number, expiration date, and a security number. In addition, I had to enter my address information so that they can verify a few things. So, dutifully I entered my address including the zip code. Entering the address components went well until I got to the zip code. The zip code was not accepted because the system expects five digits and I tried to enter four digits and two letters, which is the format of the Dutch zip code. But it didn't accept the letters. They had probably switched on a simple check: digits only please.

 

Now, this caused a problem, because for getting through the verification process I had to enter the correct zip code, but for buying a ticket I had to enter an incorrect zip code. Eventually I made the decision to enter the zip code of the hotel I was staying at. And, to my surprise it worked. I got my tickets and printed them.

 

Unfortunately, Ticketmaster.com had accepted my address information, however the credit card company's IT system had not. I discovered that when I entered the stadium and showed them my tickets. They were not accepted. Guess why? The zip code didn't match the rest of the address and it didn't correspond to the correct address.

 

How is it possible that in the year 2010 we still have problems with this simple type of data entry. Didn't they get the right definition of zip code from the credit card companies, or don't they check whether the zip code matches the rest of the address? Is their system not aware that the formats of zip codes can be different in other countries? And how is it possible that they first inform me that the credit company has accepted the credit card information, and later on they indicate they haven't. We have about forty years of experience in developing IT systems, and we still make errors such as this.

 

In the end, I did get in, I just bought new tickets at the ticket sales, and guess what, I got exactly the same seats. I still wonder if I had also entered the full address of the hotel, whether it would have been accepted.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/after_40_years.php http://www.b-eye-network.com/blogs/vanderlans/archives/2010/03/after_40_years.php Tue, 02 Mar 2010 05:09:20 -0700
Ingres enters the Data Warehouse Market with VectorWise This week I had an interesting meeting with Peter Boncz, founder and director of VectorWise. VectorWise is a small startup based in The Netherlands (see www.vectorwise.com). In fact, they are spin-off of a Dutch organization called CWI (the National Research Center for Mathematics and Computer Science). This is the organization that researched and developed MonetDB, an incredibly fast open source database server that uses state of the art database technology. For MonetDB, CWI also created a spin-off company with the same name as the product; see www.monetdb.com .

 

VectorWise is not a database server in itself, it is more like a smart storage engine. Therefore, they needed another database server to complete the product. And they picked the open source database server we have all known for a long, long time Ingres. The last year, both companies have worked hard to integrate the two products.

 

But what's so special about VectorWise? If we forget about the hundreds of details, they are trying to do the same thing as what appliance vendors such as Netezza, Aster Data, and Greenplum, have tried to do: develop a database server environment that is capable of running typical warehouse queries very fast without the need for extensive, time-consuming tuning and optimization. In other words, VectorWise is also aiming for out-of-the-box query performance. But this is where the comparison between VectorWise and the other products end. Most of the other products need special hardware and/or clustered machines to be able to offer these performance rates. VectorWise doesn't. It can use and exploit clustered machines, but the magic is that it can even get great performances on uni-core machines. It goes too far to explain in a blog how it all works, but the product has been designed to exploit the CPU's of today, for example, by not only using the internal memory, which is what most other database servers do, but also by using the CPU cache. And that makes a serious difference. Therefore, VectorWise will improve queries without the need for special hardware. It can even do a great job on some of your existing uni-, dual-, or quad-core low-end servers stored somewhere in the basement.

 

Most of the other vendors of database appliances and analytical databases aim at the largest warehouses and largest customers - the top 500. VectorWise, because it will be open source and because it can run on low-end machines, will also be very suitable for and attractive to the midsize market.

 

By merging VectorWise with Ingres, you get the new technology of the former and the sales and marketing channels of the latter, a company that has been around for some time, and that has a very stable and extensive installed base.

 

For current Ingres customers, switching from Ingres to Ingres-VectorWise will be a very small change. Because on the outside, on the side where the queries come in, nothing changes. That also means that porting existing Business Objects or Cognos reports to this new product will be straightforward.

 

If you're interested in new database technology for data warehousing, this is the product to study. They expect Ingres-VectorWise to be released in the spring of 2010. It looks all very promising, but as we all know, the proof of the pudding is in the eating. How well will Ingres-VectorWise perform in a real life situation? Hopefully, we can check that very soon.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2009/10/ingres_enters_t.php http://www.b-eye-network.com/blogs/vanderlans/archives/2009/10/ingres_enters_t.php Thu, 15 Oct 2009 05:32:33 -0700
Oracle buys Sun I just received the news that Oracle will buy Sun. The first thing that came to my mind was, what will they do with MySQL? And how will the MySQL community react to this? Oracle has always been seen as the the main competitor. This must have an impact on the database market. Let's see what happens.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2009/04/oracle_buys_sun.php http://www.b-eye-network.com/blogs/vanderlans/archives/2009/04/oracle_buys_sun.php Mon, 20 Apr 2009 06:13:36 -0700
The Independent Analyst Platform July last year, we organized the first edition of the Independent Analyst Platform (IAP). This event was attended by some of the most well-known independent BI analysts. Around that time, you might have seen many intriguing blog entries from various analysts who were writing about the sessions and publishing them real-time. Although I must admit, that some of the blogs focused on the weather; Phoenix, Arizona in July can be a little warm, although, as they say, it is a dry heat!

The idea of the event is to bring analysts and vendors together. Vendors get a chance to talk about their new technologies, their products, and ideas. And the analysts have the opportunity the ask questions. And trust me, 20+ analysts can ask a lot of questions, a lot of tough questions.

Coming July, we will organize the second edition; see www.independentanalystplatform.com. Many of the same analysts will be present again, plus a few new names. Various vendors have already signed up and have the bravery to present in front of this critical crowd.

I am looking forward to this event again. I don't think this event will go unnoticed to regular readers of the BeyeNetwork blogs. Again, on those days, you will see a tsunami of blog entries. But, as opposed to last year, now you are warned. Stay connected to the BeyeNetwork on July 7, 8, and 9 for all the content that will be published.

]]>
http://www.b-eye-network.com/blogs/vanderlans/archives/2009/04/the_independent.php http://www.b-eye-network.com/blogs/vanderlans/archives/2009/04/the_independent.php Fri, 17 Apr 2009 05:07:15 -0700