Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration, and database technology. Currently my special interests include data virtualization, NoSQL technology, and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl.

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence, database technology and data virtualization. He is managing director and founder of R20/Consultancy. An internationally acclaimed speaker who has lectured worldwide for the last 25 years, he is the chairman of the successful annual European Enterprise Data and Business Intelligence Conference held annually in London. In the summer of 2012 he published his new book Data Virtualization for Business Intelligence Systems. He is also the author of one of the most successful books on SQL, the popular Introduction to SQL, which is available in English, Chinese, Dutch, Italian and German. He has written many white papers for various software vendors. Rick can be contacted by sending an email to rick@r20.nl.

Editor's Note: Rick's blog and more articles can be accessed through his BeyeNETWORK Expert Channel.

Where a data scientist or analyst will find an answer to his quest, is not always that obvious beforehand. For example, when he is looking for the dominant factor influencing sales of particular products, when he tries to find the way to increase the customer care level, or when he tries to establish what the risk level is when car insurances are sold to young people, he may not have any idea what the answer may be. He may not know which data sets are needed to come up with an answer, or which data items he has to study upfront.

Therefore, he needs tools that allow him to freely explore and investigate data. Incorporating more data sets in the analysis should be very easy, there should be no need to specify a goal beforehand, and it must be possible to analyze data in an unguided way.
Besides all the more standard features such as displaying data as bar charts, in dashboards, on geographical maps, the perfect tool for this type of work should support at least the following characteristics:

  • No advance preparations: There should be no need to predefine data structures of the analysis work in advance. If data is available, it should be possible to load it without any preparations, even if it concerns a new type of data.
  • Unguided analysis: Analysts should be able to invoke the analysis technology without having to specify a goal in advance. The exploration technology should allow for analyzing data in an unguided style.
  • Self-service: Analysts should be able to use the analysis techniques without help from IT experts.
Connexica's analysis tool called CXAIR is such a tool. It's natural language interface, venn-diagramming techniques, and visualization features allow users to freely query and analyze data. No cubes or star schemas have to be defined on forehand (which would limit the analysis capabilities).

CXAIR internally organizes all the data using an intelligent index. In fact, internally it's based on text-search technology. This makes it possible to combine and relate data without any form of restriction, which is what analysts need.

Unlike most analysis tools, CXAIR uses search technology that speeds up data analysis. For calculations a mixture of in-memory and on-disk caching is used to analyse massive amounts of data at search engine speeds. All the loaded data resides on the server as it provides a thin client web interface. In other words, no data is loaded on the client machine. Numbers are cached on the server but not text. The fact that CXAIR doesn't cache all the data means that available memory is not a restriction. Cache is used to improve the performance, but large internal memory is not a necessity but will help performance, particularly for ad-hoc calculations.

CXAIR is clearly a representative of a new generation of reporting/analysis tools that users can deploy to freely analyze data. It's a tool for self-service discovery  and investigation of data. It's the tool that many data scientists have been waiting for and is worth checking out.


Posted December 10, 2013 6:52 AM
Permalink | 1 Comment |

1 Comment

Interesting review. Are you planning to do more tool reviews ?

Leave a comment