Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein.

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation.

Barry's interest today covers the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT. These aims, and a growing conviction that the original data warehouse architecture struggles to meet modern business needs for near real-time business intelligence (BI) and support for big data, drove Barry’s latest book, Business unIntelligence: Insight and Innovation Beyond Analytics, now available in print and eBook editions.

Barry has worked in the IT industry for more than 30 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Editor's Note: Find more articles and resources in Barry's BeyeNETWORK Expert Channel and blog. Be sure to visit today!

eat elephant.jpgHadoop vendors Hortonworks, Cloudera and, most recently, MapR have all amassed substantial cash stashes. This has triggered much speculation about both who will win the lion's share of the the big data market and how the elephant will rampage through the data warehousing landscape. Missing from such debate is an understanding of the central role of information management and its automation in the evolution and eventual success of data warehousing.

Although showing rapid evolution, the Hadoop software environment is still focused on fundamental database, data manipulation and similar technologies. In data warehousing, the focus long ago shifted to ensuring data quality and consistency, from modeling business requirements all the way through to production delivery and ongoing maintenance. We see this in tools such as Wherescape and Kalido, built by teams who had to develop and support real, ongoing and changing business intelligence needs.

Read the full story at my new blog location: Now... Business unIntelligence.

Posted July 11, 2014 12:43 AM
Permalink | No Comments |
eat elephant.jpgHadoop vendors Hortonworks, Cloudera and, most recently, MapR have all amassed substantial cash stashes. This has triggered much speculation about both who will win the lion's share of the the big data market and how the elephant will rampage through the data warehousing landscape. Missing from such debate is an understanding of the central role of information management and its automation in the evolution and eventual success of data warehousing.

Although showing rapid evolution, the Hadoop software environment is still focused on fundamental database, data manipulation and similar technologies. In data warehousing, the focus long ago shifted to ensuring data quality and consistency, from modeling business requirements all the way through to production delivery and ongoing maintenance. We see this in tools such as Wherescape and Kalido, built by teams who had to develop and support real, ongoing and changing business intelligence needs.

Read the full story at my new blog location: Now... Business unIntelligence.

Posted July 11, 2014 12:43 AM
Permalink | No Comments |

Although the yellow elephant continues to trample all over the world of Information Management, it is becoming increasingly difficult to say where more traditional technologies end and Hadoop begins.

Flying Elephant londonjunglebook8.jpg

Actian's (@ActianCorp) presentation at the #BBBT on 24 June emphasized again that the boundaries of the Hadoop world are becoming very ill-defined indeed, as more traditional engines are adapted to run on or in the Hadoop cluster.

The Actian Analytics Platform - Hadoop SQL Edition embeds their existing X100 / Vectorwise SQL engine directly in the nodes of the Hadoop environment. The approach offers the full range of SQL support previously available in Vectorwise on Hadoop. Architecturally as interesting, is the creation and use of column-based, binary, compressed vector files by the X100 engine for improved performance and the subsequent replication of these files by the Hadoop system. These latter files support co-location of data for joins for a further performance boost.

This is, of course, the type of integration one would expect from seasoned database developers when they migrate to a new platform. Pivotal's HAWQ has Greenplum technology embedded. It would be surprising if IBM's on-Hadoop Big SQL offering is not based on DB2 knowledge at the very least.

The real point is that the mix and match of functionality and data seen here emphasizes the conundrum I posed at the top of the blog. Where does Hadoop end? And where does "NoHadoop" (well, if we can have NoSQL...) begin? What does this all mean for the evolution of Information Management technology over the coming few years?

Read full post.


Posted June 26, 2014 8:44 AM
Permalink | No Comments |

Although the yellow elephant continues to trample all over the world of Information Management, it is becoming increasingly difficult to say where more traditional technologies end and Hadoop begins.

Flying Elephant londonjunglebook8.jpg

Actian's (@ActianCorp) presentation at the #BBBT on 24 June emphasized again that the boundaries of the Hadoop world are becoming very ill-defined indeed, as more traditional engines are adapted to run on or in the Hadoop cluster.

The Actian Analytics Platform - Hadoop SQL Edition embeds their existing X100 / Vectorwise SQL engine directly in the nodes of the Hadoop environment. The approach offers the full range of SQL support previously available in Vectorwise on Hadoop. Architecturally as interesting, is the creation and use of column-based, binary, compressed vector files by the X100 engine for improved performance and the subsequent replication of these files by the Hadoop system. These latter files support co-location of data for joins for a further performance boost.

This is, of course, the type of integration one would expect from seasoned database developers when they migrate to a new platform. Pivotal's HAWQ has Greenplum technology embedded. It would be surprising if IBM's on-Hadoop Big SQL offering is not based on DB2 knowledge at the very least.

The real point is that the mix and match of functionality and data seen here emphasizes the conundrum I posed at the top of the blog. Where does Hadoop end? And where does "NoHadoop" (well, if we can have NoSQL...) begin? What does this all mean for the evolution of Information Management technology over the coming few years?

Read full post.


Posted June 26, 2014 8:44 AM
Permalink | No Comments |
Privacy Padlock.pngIn the year since Edward Snowden spoke out on governmental spying, much has been written about privacy but little enough done to protect personal information, either from governments or from big business.

It's now a year since the material gathered by Edward Snowden at the NSA was first published by the Guardian and Washington Post newspapers. In one of a number of anniversary-related items, Vodafone revealed that secret wires are mandated in "about six" of the 29 countries in which it operates. It also noted that, in addition, Albania, Egypt, Hungary, India, Malta, Qatar, Romania, South Africa and Turkey deem it unlawful to disclose any information related to wiretapping or content interception. Vodafone's move is to be welcomed. Hopefully, it will encourage further transparency from other telecommunications providers on governmental demands for information.

However, governmental big data collection and analysis is only one aspect of this issue. Personal data is also of keen interest to a range of commercial enterprises, from telcos themselves to retailers and financial institutions, not to mention the Internet giants, such as Google and Facebook, which are the most voracious consumers of such information. Many people are rightly concerned about how governments--from allegedly democratic to manifestly totalitarian--may use our personal data. To be frank, the dangers are obvious. However, commercial uses of personal data are more insidious, and potentially more dangerous and destructive to humanity. Governments at least purport to represent the people to a greater or lesser extent; commercial enterprises don't even wear that minimal fig leaf.

Take, as one example among many, indoor proximity detection systems based on Bluetooth Low Energy devices such as Apple's iBeacon and Google's rumored upcoming Nearby. The inexorable progress of communications technology--smaller, faster, cheaper, lower power--enables more and more ways of determining the location of your smartphone or tablet and, by extension, you. The operating system or app on your phone requires an opt-in to enable it to transmit your location. However, it is becoming increasingly difficult to avoid opting-in as many apps require it to work at all. More worrying are the systems that record and track without asking permission the MAC addresses of smartphones and tablets that poll public Wi-Fi network routers, which all such devices automatically do. (See, for example, this article, subscription required.) The only way to avoid such tracking is to turn off the device's Wi-Fi receiver. On the desktop, the situation is little better, with Facebook last week joining Google and Yahoo! in ignoring browser "do not track" settings.

It would be simple to blame the businesses involved--both the technology companies that develop the systems and the businesses that buy or use the data. They certainly must take their fair share of responsibility, together with the data scientists and other IT staff involved in building the systems. But the reality is that it is we, the general public, who hand over our personal data without a second thought about its possible uses, who must step up to demanding real change in the collection and use of such data. This demands significant rethinking in at least two areas.

First is the oft-repeated marketing story that "people want more targeted advertising", reiterated again last week by Facebook's Brian Boland. A more nuanced view is provided by Sara M. Watson, a Fellow at the Berkman Center for Internet and Society at Harvard University, in a recent Atlantic article Data Doppelgängers and the Uncanny Valley of Personalization: "Data tracking and personalized advertising is often described as 'creepy.' Personalized ads and experiences are supposed to reflect individuals, so when these systems miss their mark, they can interfere with a person's sense of self. It's hard to tell whether the algorithm doesn't know us at all, or if it actually knows us better than we know ourselves. And it's disconcerting to think that there might be a glimmer of truth in what otherwise seems unfamiliar. This goes beyond creepy, and even beyond the sense of being watched."

I would suggest that given the choice between less irrelevant advertising or, simply, less advertising on the Web, many people would opt for the latter, particularly given the increasing invasiveness of the data collection needed to drive allegedly more accurate targeting. Clearly, this latter choice would not be in the interest of the advertising industry, a position that crystalizes in the widespread resistance to limits on data gathering, especially in the United States. An obvious first step in addressing this issue is a people-driven, legally mandated move from opt-out data gathering to a formal opt-in approach. To be really useful, of course, this would need to be preceded by a widespread mass deletion of previously gathered data.

This leads directly to the second area in need of substantial rethinking--the funding model for Internet business. Most of us accept that "there's no such thing as a free lunch". But a free email service, Cloud store or search engine, well apparently that's eminently reasonable. Of course, it isn't. All these services cost money to build and run, costs that are covered (with significant profits in many cases) by advertising. More of it and supposedly better targeted via big data and analytics.

There is little doubt that the majority of people using the Internet gain real, daily value from it. Today, that value is paid for through personal data. The loss of privacy seems barely noticed. People I ask are largely disinterested in any possible consequences. However, privacy is the foundation for many aspects of society, including democracy--as can be clearly seen in totalitarian states, where widespread surveillance and destruction of privacy are among the first orders of business. We, the users of the Web, must do the unthinkable: we must demand the right to pay real money for mobile access, search, email and so on in exchange for an end to tracking personal data.

These are but two arguably simplistic suggestions to address issues that have been made more obvious by Snowden's revelations. A more complete theoretical and legal foundation for a new approach is urgently needed. One possible starting point is The Dangers of Surveillance by Neil Richards, Professor of Law at Washington University Law, published in the Harvard Law Review a few short months before Snowden spilled at least some of the beans.

Image courtesy Marc Kjerland

Posted June 19, 2014 12:53 AM
Permalink | No Comments |
PREV 1 2 3 4

   VISIT MY EXPERT CHANNEL

Search this blog
Categories ›
Archives ›
Recent Entries ›