Blog: Barry Devlin As one of the founders of data warehousing back in the mid-1980s, a question I increasingly ask myself over 25 years later is: Are our prior architectural and design decisions still relevant in the light of today's business needs and technological advances? I'll pose this and related questions in this blog as I see industry announcements and changes in way businesses make decisions. I'd love to hear your answers and, indeed, questions in the same vein. Copyright 2018 Tue, 19 Aug 2014 03:28:03 -0700 Datameer offers Hadoop-based data mart Datameer could be seen as the Business Objects of the Hadoop world. And it's that thought that leads me to data marts.

As one of the oldest and most divisive debates in business intelligence, it's clear that the time-to-value discussions of data warehouse vs. data mart also apply to Hadoop. Hadoop is increasingly being used to integrate data from a wide variety of sources for analysis, begging the question: do it in advance for data quality or do it as part of the analysis to reduce time to value? Datameer is clearly a data mart.

And in the big data world, it's certainly not the only data mart type of offering. What's different about Datameer is that it has been around for nearly 5 years and has an impressive customer base.

At an architectural level, we should consider how the quality vs. timeliness, mart vs. warehouse trade-off applies in the world of big data. Read more on this at my new blog location.]]> Tue, 19 Aug 2014 03:28:03 -0700
Eating the elephant called Hadoop eat elephant.jpgHadoop vendors Hortonworks, Cloudera and, most recently, MapR have all amassed substantial cash stashes. This has triggered much speculation about both who will win the lion's share of the the big data market and how the elephant will rampage through the data warehousing landscape. Missing from such debate is an understanding of the central role of information management and its automation in the evolution and eventual success of data warehousing.

Although showing rapid evolution, the Hadoop software environment is still focused on fundamental database, data manipulation and similar technologies. In data warehousing, the focus long ago shifted to ensuring data quality and consistency, from modeling business requirements all the way through to production delivery and ongoing maintenance. We see this in tools such as Wherescape and Kalido, built by teams who had to develop and support real, ongoing and changing business intelligence needs.

Read the full story at my new blog location: Now... Business unIntelligence.]]> Fri, 11 Jul 2014 00:43:40 -0700
Link - Hadoop: the Third Wave Breaks Although the yellow elephant continues to trample all over the world of Information Management, it is becoming increasingly difficult to say where more traditional technologies end and Hadoop begins.

Flying Elephant londonjunglebook8.jpg

Actian's (@ActianCorp) presentation at the #BBBT on 24 June emphasized again that the boundaries of the Hadoop world are becoming very ill-defined indeed, as more traditional engines are adapted to run on or in the Hadoop cluster.

The Actian Analytics Platform - Hadoop SQL Edition embeds their existing X100 / Vectorwise SQL engine directly in the nodes of the Hadoop environment. The approach offers the full range of SQL support previously available in Vectorwise on Hadoop. Architecturally as interesting, is the creation and use of column-based, binary, compressed vector files by the X100 engine for improved performance and the subsequent replication of these files by the Hadoop system. These latter files support co-location of data for joins for a further performance boost.

This is, of course, the type of integration one would expect from seasoned database developers when they migrate to a new platform. Pivotal's HAWQ has Greenplum technology embedded. It would be surprising if IBM's on-Hadoop Big SQL offering is not based on DB2 knowledge at the very least.

The real point is that the mix and match of functionality and data seen here emphasizes the conundrum I posed at the top of the blog. Where does Hadoop end? And where does "NoHadoop" (well, if we can have NoSQL...) begin? What does this all mean for the evolution of Information Management technology over the coming few years?

Read full post.

]]> Thu, 26 Jun 2014 08:44:11 -0700
Analytics, Big Data and Protecting Privacy Privacy Padlock.pngIn the year since Edward Snowden spoke out on governmental spying, much has been written about privacy but little enough done to protect personal information, either from governments or from big business.

It's now a year since the material gathered by Edward Snowden at the NSA was first published by the Guardian and Washington Post newspapers. In one of a number of anniversary-related items, Vodafone revealed that secret wires are mandated in "about six" of the 29 countries in which it operates. It also noted that, in addition, Albania, Egypt, Hungary, India, Malta, Qatar, Romania, South Africa and Turkey deem it unlawful to disclose any information related to wiretapping or content interception. Vodafone's move is to be welcomed. Hopefully, it will encourage further transparency from other telecommunications providers on governmental demands for information.

However, governmental big data collection and analysis is only one aspect of this issue. Personal data is also of keen interest to a range of commercial enterprises, from telcos themselves to retailers and financial institutions, not to mention the Internet giants, such as Google and Facebook, which are the most voracious consumers of such information. Many people are rightly concerned about how governments--from allegedly democratic to manifestly totalitarian--may use our personal data. To be frank, the dangers are obvious. However, commercial uses of personal data are more insidious, and potentially more dangerous and destructive to humanity. Governments at least purport to represent the people to a greater or lesser extent; commercial enterprises don't even wear that minimal fig leaf.

Take, as one example among many, indoor proximity detection systems based on Bluetooth Low Energy devices such as Apple's iBeacon and Google's rumored upcoming Nearby. The inexorable progress of communications technology--smaller, faster, cheaper, lower power--enables more and more ways of determining the location of your smartphone or tablet and, by extension, you. The operating system or app on your phone requires an opt-in to enable it to transmit your location. However, it is becoming increasingly difficult to avoid opting-in as many apps require it to work at all. More worrying are the systems that record and track without asking permission the MAC addresses of smartphones and tablets that poll public Wi-Fi network routers, which all such devices automatically do. (See, for example, this article, subscription required.) The only way to avoid such tracking is to turn off the device's Wi-Fi receiver. On the desktop, the situation is little better, with Facebook last week joining Google and Yahoo! in ignoring browser "do not track" settings.

It would be simple to blame the businesses involved--both the technology companies that develop the systems and the businesses that buy or use the data. They certainly must take their fair share of responsibility, together with the data scientists and other IT staff involved in building the systems. But the reality is that it is we, the general public, who hand over our personal data without a second thought about its possible uses, who must step up to demanding real change in the collection and use of such data. This demands significant rethinking in at least two areas.

First is the oft-repeated marketing story that "people want more targeted advertising", reiterated again last week by Facebook's Brian Boland. A more nuanced view is provided by Sara M. Watson, a Fellow at the Berkman Center for Internet and Society at Harvard University, in a recent Atlantic article Data Doppelgängers and the Uncanny Valley of Personalization: "Data tracking and personalized advertising is often described as 'creepy.' Personalized ads and experiences are supposed to reflect individuals, so when these systems miss their mark, they can interfere with a person's sense of self. It's hard to tell whether the algorithm doesn't know us at all, or if it actually knows us better than we know ourselves. And it's disconcerting to think that there might be a glimmer of truth in what otherwise seems unfamiliar. This goes beyond creepy, and even beyond the sense of being watched."

I would suggest that given the choice between less irrelevant advertising or, simply, less advertising on the Web, many people would opt for the latter, particularly given the increasing invasiveness of the data collection needed to drive allegedly more accurate targeting. Clearly, this latter choice would not be in the interest of the advertising industry, a position that crystalizes in the widespread resistance to limits on data gathering, especially in the United States. An obvious first step in addressing this issue is a people-driven, legally mandated move from opt-out data gathering to a formal opt-in approach. To be really useful, of course, this would need to be preceded by a widespread mass deletion of previously gathered data.

This leads directly to the second area in need of substantial rethinking--the funding model for Internet business. Most of us accept that "there's no such thing as a free lunch". But a free email service, Cloud store or search engine, well apparently that's eminently reasonable. Of course, it isn't. All these services cost money to build and run, costs that are covered (with significant profits in many cases) by advertising. More of it and supposedly better targeted via big data and analytics.

There is little doubt that the majority of people using the Internet gain real, daily value from it. Today, that value is paid for through personal data. The loss of privacy seems barely noticed. People I ask are largely disinterested in any possible consequences. However, privacy is the foundation for many aspects of society, including democracy--as can be clearly seen in totalitarian states, where widespread surveillance and destruction of privacy are among the first orders of business. We, the users of the Web, must do the unthinkable: we must demand the right to pay real money for mobile access, search, email and so on in exchange for an end to tracking personal data.

These are but two arguably simplistic suggestions to address issues that have been made more obvious by Snowden's revelations. A more complete theoretical and legal foundation for a new approach is urgently needed. One possible starting point is The Dangers of Surveillance by Neil Richards, Professor of Law at Washington University Law, published in the Harvard Law Review a few short months before Snowden spilled at least some of the beans.

Image courtesy Marc Kjerland
]]> Business unIntelligence Thu, 19 Jun 2014 00:53:23 -0700
Reining in the Internet of Things Thoughts on the societal impact of the Internet of Things inspired by a unique dashboard product.

VisualCue tile.pngNewcomer to the BBBT, on 2nd May, Kerry Gilger, Founder of VisualCue took the members by storm with an elegant, visually intuitive and, to me at least, novel approach to delivering dashboards. VisualCue is based on the concept of a tile that represents a set of metrics as icons colored according to their state relative to defined threshold values. The main icon in the tile shown here represents the overall performance of a call center agent, with the secondary icons showing other KPIs, such as total calls answered, average handling time, sales per hour worked, customer satisfaction, etc. Tiles are assembled into mosaics, which function rather like visual bar charts that can be sorted according to the different metrics, drilled down to related items and displayed in other formats, including tabular numbers.

Visual Cue Mosaic.jpgThe product seems particularly useful in operational BI applications, with Kerry showing examples from call centers, logistics and educational settings. The response of the BBBT members was overwhelmingly positive. @rick_vanderlans described it as "revolutionary technology", while @gildardorojas asked "why we didn't have before something as neat and logical?" @marcusborba opined "@VisualCue's capability is amazing, and the data visualization is gorgeous!"

So, am I being a Luddite, or even a curmudgeon, to have made the only negative comments of the call? My concern was not about the product at all, but rather around the power it unleashes simply by being so good at what it does. Combine this level of ease-of-use in analytics with big data and, especially, data from the Internet of Things, and we take a quantum leap from measurement to invasiveness, from management to Big-Brother-like control.

Each of the three example use cases described by Gilger provided wonderful examples of real and significant business benefit; but, taken together, they also opened up appalling possibilities of abuse of privacy, misappropriation of personal information and disempowerment of the people involved. I'll briefly explore the three examples, realizing that in the absence of the full story, I'm undoubtedly imagining some aspects. Nor is this about VisualCue (who tweeted that "Privacy is certainly a critical issue! We focus on presenting data that an organization already has--maybe we make it obvious") or the companies using it; it's meant to be a warning that we who know some of the possibilities--positive and negative--offered by big data analytics must consider in advance the unintended consequences.

Detailed monitoring of call center agents' performance is nothing new. Indeed, it is widely seen as best practice and key to improving both individual and overall call center results. VisualCue, according to Gilger, has provided outstanding performance gains, including one center where agents in competition with peers have personally sought out training to improve their own metrics, something that is apparently unheard of in the industry. Based on past best practices and detailed knowledge of where the agent is weak, VisualCue can provide individually customized advice. In a sense, this example illustrates the pinnacle of such use of monitoring data and analytics to drive personnel performance. But, within it lies the seeds of its own destruction. As the agent's job is more and more broken down into repeatable tasks, each measurable by a different metric, human innovation and empathy is removed and the job prepared for automation. In fact, a 2013 study puts at 99% the probability that certain call center jobs, particularly telemarketing, will be soon eliminated by technology.

The old adage "what you can't measure, you can't manage" is at the heart of traditional BI. In an era when data was scarce and often incoherent, this focus makes sense. However, applying it to all aspects of life today is, to me, ethically problematical. The example of monitoring the entire scope of an educational institution in a single dashboard--from financials through administration to student performance--is a case where our ability to analyze so many data points leads to the illusion that we can manage the entire process mechanically. The Latin root of "educate" means "to draw forth" from the student, the success of which simply cannot be gauged through basic numerical measures, and is certainly not correlated with the business measures of the institution.

vehtrack.jpgThe final example of tracking the operational performance of a waste management company's routes, trucks and drivers emphasizes our growing ability to measure and monitor the details of real life minute by minute. By continuously tracking the location and engine management signals from its trucks, the dashboard created by this company enabled it to make significant financial savings and improvements to its operational performance. However, it also enables supervisors to drill into the ongoing behavior of the company's drivers: deviations from planned routes, long stops with the engine running, extreme braking, exceeding the speed limit, etc. While presumably covered by their employment contract, such micromanagement of employees is at best disempowering and at worst open to abuse by increasingly all-seeing supervisors. Of much greater concern is the fact that these sensors are increasingly embedded in private automobiles and that such tracking capability is already being applied without owners' consent to smartphones. As far as a year back, Euclid Analytics had already tracked about 50 million devices in 4,000 locations according to a New York Times blog.

1984-big-brother-is-watching-you.jpgI'm grateful to Kerry Gilger for sharing the use cases that inspired my speculations above. Of course, my point is beyond the individual companies involved and products used. At issue is the range of social and ethical dilemmas raised by the rapid advances in sensor technology, data gathered and the power of analytic software. Our every action online is already monitored by the likes of Google and Facebook for profit and by organizations like the NSA allegedly for security and crime prevention. The level of monitoring of our physical lives is now rapidly increasing. Anonymity is rapidly disappearing, if not already extinct. Our personal privacy rights are being usurped by the data gathering and analysis programs of these commercial and governmental organizations, as eloquently described by Shoshana Zuboff of Harvard Business and Law schools in a recent article in Frankfurter Allgemeine Zeitung.

It is imperative that those of us who have grown up with and nurtured business intelligence over the past three decades--from hardware and software vendors, to consultants and analysts, to BI managers and implementers in businesses everywhere--begin to deeply consider the ethical, legal and societal issues now being raised and take action to guide the industry and society appropriately through the development of new codes of ethical behavior and use of information, and input to national and international legislation.

]]> Internet of Things Sun, 04 May 2014 06:26:47 -0700
Automating the Data Warehouse (and beyond) eco-skyscraper-by-vikas-pawar-2a.jpgIn an era of "big data this" and "Internet of Things that", it's refreshing to step back to some of the basic principles of defining, building and maintaining data stores that support the process of decision making... or data warehousing, as we old-fashioned folks call it. Kalido did an excellent job last Friday of reminding the BBBT just what is needed to automate the process of data warehouse management. But, before the denizens of the data lake swim away with a bored flick of their tails, let me point out that this matters for big data too--maybe even more so. I'll return to this towards the end of this post.

In the first flush of considering a BI or analytics opportunity in the business and conceiving a solution that delivers exactly the right data needed to address that pesky problem, it's easy to forget the often rocky road of design and development ahead. More often forgotten, or sometimes ignored, is the ongoing drama of maintenance. Kalido, with their origins as an internal IT team solving a real problem for the real business of Royal Dutch Shell in the late '90s, have kept these challenges front and center.

All IT projects begin with business requirements, but data warehouses have a second, equally important, staring point: existing data sources. These twin origins typically lead to two largely disconnected processes. First, there is the requirements activity often called data modeling, but more correctly seen as the elucidation of a business model, consisting of function required by the business and data needed to support it. Second, there is the ETL-centric process of finding and understanding the existing sources of this data, figuring out how to prepare and condition it, and designing the physical database elements needed to support the function required.

Most data warehouse practitioners recognize that the disconnect between these two development processes is the origin of much of the cost and time expended in delivering a data warehouse. And they figure out a way through it. Unfortunately, they often fail to recognize that each time a new set of data must be added or an existing set updated, they have to work around the problem yet again. So, not only is initial development impacted, but future maintenance remains an expensive and time-consuming task. An ideal approach is to create an integrated environment that automates the entire set of tasks from business requirements documentation, through the definition and execution of data preparation, all the way to database design and tuning. Kalido is one of a small number of vendors who have taken this all-inclusive approach. They report build effort reductions of 60-85% in data warehouse development.

Conceptually, we move from focusing on the detailed steps (ETL) of preparing data to managing the metadata that relates the business model to the physical database design. The repetitive and error-prone donkey-work of ETL, job management and administration is automated. The skills required in IT change from programming-like to modeling-like. This has none of the sexiness of predictive analytics or self-service BI. Rather, it's about real IT productivity. Arguably, good IT shops always create some or all of this process- and metadata-management infrastructure themselves around their chosen modeling, ETL and database tools. Kalido is "just" a rather complete administrative environment for these processes.

Which brings me finally back to the shores of the data lake. As described, the data lake consists of a Hadoop-based store of all the data a business could ever need, in its original structure and form, and into which any business user can dip a bucket and retrieve the data required without IT blocking the way. However, whether IT is involved or not, the process of understanding the business need and getting the data from the lake into a form that is useful and usable for a decision-making requirement is exactly identical to that described in my third paragraph above. The same problems apply. Trust me, similar solutions will be required.


]]> Data warehouse Mon, 17 Mar 2014 05:33:30 -0700
Big Data, the Internet of Things and the Death of Capitalism? Part 5 Parts 1, 2, 3, 4 and 4A of this series explored the problem as I see it. Now, finally, I consider what we might do if my titular question actually makes sense.

Mammoth kill.jpgTo start, let's review my basic thesis. Mass production and competition, facilitated by ever improving technology, have been delivering better and cheaper products and improving many people's lives (at least in the developed world) for nearly two centuries. Capital, in the form of technology, and people--labor--work together in today's system to produce goods that people purchase using earnings from their labor. As technology grows exponentially better, an ever greater range of jobs are open to displacement. When technology displaces some yet to be determined percentage of labor, this system swings out of balance; there are simply not enough people with sufficient money to buy the products made, no matter how cheaply. We have not yet reached this tipping point because, throughout most of this period, the new jobs created by technology have largely offset the losses. However, employment trends in the past 10-15 years in the Western world suggest that this effect is no longer operating to the extent that it was, if at all.

In brief, the problem is that although technology produces greater wealth (as all economists agree), without its transfer to the masses through wages paid for labor, the number of consumers becomes insufficient to justify further production. The owners of the capital assets accumulate more wealth--and we see this happening in the increasing inequality in society--but they cannot match the consumption of the masses. Capitalism, or perhaps more precisely, the free market then collapses.

Let's first look at the production side of the above equation. What can be done to prevent job losses outpacing job creation as a result of technological advances? Can we prevent or put a damper on the great hollowing out of middle-income jobs that is creating a dumbbell-shaped distribution of a few highly-paid experts at one end and a shrinking swathe of lower-paid, less-skilled workers at the other? Can (or should) we move to address the growing imbalance of power and wealth between capital (especially technologically based) and labor? Let's be clear at the start, however, turning off automation is not an option I consider.

My suggestions, emerging mainly from the thinking discussed earlier, are mainly economic and social in nature. An obvious approach is to use the levers of taxation--as is done in many other areas--to drive a desired social outcome. We could, for example, reform taxation and social charges on labor to reduce the cost difference between using people and automating a process. In a similar vein, shifting taxation from labor to capital could also be tried. I can already hear the Tea Party screaming to protect the free market from the damn socialist. But, if my analysis is correct, the free market is about to undergo, at best, a radical change, if employment drops below some critical level. Pulling these levers soon and fairly dramatically is probably necessary; this is an approach that can only delay the inevitable. Another approach is for industry itself to take steps to protect employment. Mark Bonchek , writing in a recent Harvard Business Review blog, describes a few "job entrepreneurs" who maximize jobs instead of profits (but still make profits as well), including one in the Detroit area aimed at creating jobs for unemployed auto workers.

Moving from the producer's side to the consumer's view, profit aside, why did we set off down the road of the Industrial Revolution? To improve people's daily lives, to lessen the load of hard labor, to alleviate drudgery. The early path was not clear. Seven-day labor on the farm was replaced by seven-day labor in the factory. But, by the middle of the last century, working hours were being reduced in the workplace and in the home, food was cheaper and more plentiful; money and time were available for leisure. In theory, the result should have been an improvement in the human condition. In practice, the improvement was subverted by the mass producers. They needed to sell ever more of the goods they could produce so cheaply that profit came mainly through volume sales. Economist Victor Lebow's 1955 proclamation of "The Real Meaning of Consumer Demand" sums it up: "Our enormously productive economy demands that we make consumption our way of life... that we seek our spiritual satisfaction and our ego satisfaction in consumption... We need things consumed, burned up, worn out, replaced and discarded at an ever-increasing rate". Of course, some part of this is human nature, but it has been driven inexorably by advertising. We've ended up in the classic race to the bottom, even to the extent of products being produced with ever shorter lifespans to drive earlier replacement. Such consumption is becoming increasingly unsustainable as the world population grows, finite resources run out and the energy consumed in both production and use drives increasing climate change. As the president of Uruguay, Jose Mujica, asked of the Rio+20 Summit in 2012, "Does this planet have enough resources so seven or eight billion can have the same level of consumption and waste that today is seen in rich societies?"

My counter-intuitive suggestion here, and one I have not seen raised by economists (surprisingly?), is to ramp down consumerism, mainly through a reinvention of the purposes and practices of advertising. Reducing over-competition and over-consumption would probably drive interesting changes in the production side of the equation, including reduced demand for further automation, lower energy consumption, product quality being favored over quantity, higher savings rates by (non-)consumers, and more. Turning down the engine of consumption could also enable changes for the better in the financial markets, reducing the focus on quarterly results in favor of strategically sounder investment. Input from economists would be much appreciated.

But, let's wrap up. The title of this series asked: will automation through big data and the Internet of Things drive the death of capitalism? Although some readers may have assumed that this was my preferred outcome, I am more of the opinion that capitalism and the free market need to evolve rather quickly if they are to survive and, preferably, thrive. But, this would mean some radical changes. For example, a French think-tank, LH Forum, suggests the development of a positive economy that: "reorients capitalism towards long-term challenges. Altruism toward future generations is a much more powerful incentive than [the] selfishness which is supposed to steer the market economy". Other fundamental rethinking comes from British/Scottish historian, Niall Ferguson, who takes a wider view of "The Great Degeneration" of Western civilization. In a word, this is a topic that requires broad, deep and urgent thought.

For my more IT-oriented readers, I suspect this blog series has taken you far from basic ground. For this, I do not apologize. As described in Chapter 2 of "Business unIntelligence", I believe that the future of business and IT is to be joined at the hip. The biz-tech ecosystem declares that technology is at the heart of all business development. Business must understand IT. IT must be involved in the business. I suggest that understanding the impact of automation on business and society is a task for IT strategists and architects as much, if not more, as it is for economists and business planners.

Image from: All elephant photos in the earlier posts are my own work!

]]> Internet of Things Tue, 11 Mar 2014 06:47:55 -0700
Big Data, the Internet of Things and the Death of Capitalism? Part 4 As seen in Parts 1, 2 and 3, mainstream economists completely disagree with my thinking. Am I alone? It turns out that I'm not...

Baby elephant.JPGSurely I'm not the first to think that today's technological advances have the potential to seriously disrupt the current market economy? I eventually found that Martin Ford, founder of an unnamed software development company in Silicon Valley, wrote a book in 2009: "The Lights in the Tunnel: Automation, Accelerating Technology and the Economy of the Future". It turns out that his analysis of the problem is exactly the same as mine... not to mention more than four years ahead of me! Ford explains it very simply and compellingly: "the free market economy, as we understand it today, simply cannot work without a viable labor market. Jobs are the primary mechanism through which income-- and, therefore, purchasing power-- is distributed to the people who consume everything the economy produces. If at some point, machines are likely to permanently take over a great deal of the work now performed by human beings, then that will be a threat to the very foundation of our economic system."

Ford-graph.pngFor the more visually minded, Ford graphs capability to perform routine jobs against historical time for both humans and computer technology. I reproduce this simple graph here. Ford posits that there was a spurt in human capability after the Industrial Revolution as people learned to operate machines, but that this has now largely leveled off. As a general principle, this seems reasonable. Computer technology, on the other hand is widely accepted (via Moore's Law) to be on a geometric growth path in terms of its general capability. Unless one or both trends change dramatically, the cross-over of these two lines is inevitable. When technology becomes more efficient than a human at any particular job, competitive pressure in the market will ensure that the former replaces the latter. At some stage, the percentage of human jobs and their associated income that is automated away will be enough to disrupt the consumption side of the free market. Even increased uncertainty about income is often sufficient to cause consumers to increase savings and reduce discretionary spending. This behavior occurs in every recession and affects production in a predictable manner; production is cut back, often involving further job losses. A positive feedback cycle of reduced jobs drives reduced spending and drives further job losses. In today's cyclical economy, the trend is eventually reversed, sometimes through governmental action, or at times by war--World War II is credited by some for the end of the Great Depression. However, the graph above shows no such cyclical behavior: this is a one-way, one time transition.

Of course, the 64 million dollar questions (assuming you agree with the reasoning above) are: where we are on this timeline and how steep is the geometric rise in technological capability? It is likely that both aspects differ depending on the job involved. For some jobs, we are far from the inflection point in the technology curve, while others are much closer. For information-based jobs, the rate of growth in capability of computers may be very close to the Moore's Law formulation: a doubling in capacity every 18 months. Physical automation may grow more slowly.  But the outcome is assured: the lines will cross. Ford felt that in some areas we were getting close to the inflection point in 2009. The presumed approximate quadrupling of technological ability since then has not yet, however, tipped us over the edge of the job cliff, although few would argue the extent of the technological advances in the interim. Of course, Ford--and I--may be wrong.

If would seem that the hypothesis put forward by Ford should be amenable to mathematical modeling, I have found only one attempt to do so, in an academic paper "Economic growth given  machine intelligence", published in 1999 by Robin Hansen. Given the title, I hoped that this paper might provide some usable mathematical models capable of answering my questions. Unfortunately, I was disappointed. My mathematical skills are no longer up to the equations involved! More importantly, however, Hansen's framing assumptions seem strong on historical precedent (surely favoring continuation of the current situation) and fail to address the fundamental issue (in my view) that consumption is directly tied to income and its distribution among the population. Furthermore, Hansen has taken a largely dismissive attitude to Ford's thesis, as demonstrated by Hansen's advice to Ford in an online exchange: "he needs to learn some basic economics".

So, the hypothesis that I and, previously, Ford have put forward so far remains both intuitively reasonable and formally unproven. For now, I ask can any data scientist / economics major take on the task of producing a useful model of our basic hypothesis.

In the final part of the series, I look at what a collapse of the current economic order might look like and ask what, if anything, might be done to avert it.

For a broader and deeper view of the business and technological aspects of this topic, please take a look at my new book: Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data.  

A bonus Part 4A follows!

]]> Business unIntelligence Tue, 25 Feb 2014 03:34:43 -0700
Big Data, the Internet of Things and the Death of Capitalism? Part 3 Part 1 introduced the elephant in the room: a few of the ways that current technological advances affect jobs and society.  Part 2 explored the economics. Here I dissect some further recent economic thinking.

Elephant Rears.JPGAs we saw in my previous blog, the topic of the impact of technology on jobs has generated increased interest in the past few months. Tyler Cowen's Sept. 2013 "Average is Over" was one of the earlier books (of this current phase) that addressed this area. The pity is that, in the long run, he avoids the core of the problem as I see it: if technology is replacing current jobs and failing to create new ones in at least equal numbers, who will be the consumers of the products of the new technology?

In the early part of his book, Cowen builds the hypothesis that technology has been impacting employment for some decades now, creating a bifurcation in the job market. As the jobs that can be partially or fully automated expand in scope and number, people with strong skills that support, complement or expand technology will find relatively rich pickings. Those with more average technologically oriented skills do less well. If their skills are relatively immune to technological replacement--from building laborers to masseurs--they will continue to be in some demand, although increased competition will likely push down wages. However, the key trend is that middle-income jobs susceptible to physical or informational automation have been disappearing and continue to do so. 60% of US job losses in the 2007-2009 recession (choose your dates and name) were in the mid-income categories, while three-quarters of the jobs added since then have been low-income. The switch to low-paying jobs dates back to 1999; it's a well-established trend.

The largely unmentioned category in Cowen's book, beyond a nod to the figures on page one, is the growing numbers of young, long-term unemployed or underemployed, even among college graduates. I suggest we are actually seeing a trifurcation in the jobs market, with the percentages of this third category growing in the developed world and the actual numbers--due to population growth--exploding in the emerging economies. I don't have easy access to the statistics. However, given that the income of the first two categories, whether directly or indirectly, must support the third (as well as the young, the elderly and the ill), parts of the developed world seem pretty close already to the relative percentages at which the system could break down. Current ongoing automation of information-based jobs may well tip the balance.

Cowen's prescription for the future is profoundly depressing, even in the US context, at which the book is squarely aimed. To paraphrase, tax the wealthy and capital more. Just a little. Reduce benefits to the poor. Maybe a little more. And let both the poorly paid and unemployed move to cheaper accommodation far from expensive real estate. " a few parts of... the warmer states... [w]e would build some 'tiny homes'... about 400 square feet and cost in the range of $20,000 to $40,000. ... very modest dwellings, as we used to build in the 1920s. We also would build some makeshift structures, similar to the better dwellings you might find in a Rio de Janeiro favela. The quality of the water and electrical infrastructure might be low by American standards, though we could supplement the neighborhood with free municipal wireless (the future version of Marie Antoinette's famous alleged phrase will be 'Let them watch internet!')."

If your response is "look what happened to Marie Antoinette", Cowen declares that, given the age demographic of the US, revolution is an unlikely outcome. Consider, however, that this is a US view. In much of the developing world, the great transition from Agriculture to Manufacturing is only now underway. Some of that is supported by off-shoring from the West, some by local economic and population growth. However, robotic automation is already on the increase even in those factories. Foxconn, long infamous as Apple's iPhone hardware and assembly "sweatshop", announced as far back as 2011 that it planned on installing up to 1 million robots in its factories by this year. A year ago this week, they announced"We have canceled hiring entry-level workers, a decision that is partly associated with our efforts in production automation."

Last week, the Wall Street Journal reported that Foxconn is working with Google with the aim of bringing Google's latest robotics acquisitions to the assembly lines to reduce job costs. Google also benefits: it gains a test bed for its purported new robotic operating system as well as access to Foxconn's mechanical engineering skills.  According to PCWorld, Foxconn's chairman, Terry Guo, told investors in June last year: "We have over 1 million workers. In the future we will add 1 million robotic workers. Our [human] workers will then become technicians and engineers."  The question of how many of the million human workers would become technicians and engineers seemingly went unaddressed. Echoing comments in Part 2 of this series, Foxconn are also reported to be looking to next-shore automated manufacturing to the US. To complete the picture, Bloomberg BusinessWeek meanwhile reported that Foxconn was also off-shoring jobs from China to Indonesia. A complex story, but the underlying driver is obvious: reduce labor, and then distribution, costs by any and every means possible.

Cowen's comments above about favelas, Marie Antoinette and revolutions should thus be seen in a broader view. The Industrial era middle class boom of the West may pass quickly through the emerging economies or perhaps bypass the later arrivals entirely. Shanty towns are already widespread in emerging economies; they house the displaced agricultural workers who cannot find or have already lost employment in manufacturing.  The age demographic in these countries is highly compatible with revolution and migration / invasion. For reasons of geography, the US may be less susceptible to invasion, but Europe's borders are under increasing siege. The Roman Empire's latter-day bread and circuses were very quickly overrun by the Vandals and the Visigoths.

The numbers and percentages of new jobs vs. unemployed worldwide are still up for debate among the economists. But their belief, echoed by the larger multinationals, that the consumer boom in the West of the 20th century will be replicated in the emerging economies stands perhaps on shaky ground.

For a broader and deeper view of the business and technological aspects of this topic, please take a look at my new book: Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data

Part 4 follows.

]]> Internet of Things Tue, 18 Feb 2014 07:40:07 -0700
Big Data, the Internet of Things and the Death of Capitalism? Part 2 Kruger Elephant.JPGPart 1 introduced the elephant in the room: a few of the ways that current technological advances affect jobs and society.  Here, I probe deeper into the economics.

We in the world of IT and data tend to focus on the positive outcomes of technology. Diagnoses of illness will be improved. Wired reports that, in tests, IBM Watson's successful diagnosis rate for lung cancer is 90%, compared to 50% for human doctors according to Wellpoint's Samuel Nessbaum. We celebrate our ability to avoid traffic jams based on smartphone data collection. We imagine how we can drive targeted advertising. A trial (since discontinued) in London in mid-2013 of recycling bins that monitored passing smartphone WiFi addresses to eventually push product is but one of the more amusing examples. Big data can drive sustainable business, reducing resource usage and carbon footprint both within the enterprise and to the ends of its supply chains. Using a combination of big data, robotics and the IoT, Google already has prototype driverless cars on the road. Their recent acquisitions of Boston Dynamics (robotics), Nest (IoT sensors), and DeepMind Technologies (artificial intelligence) to name but a few indicate the direction of their thinking. The list goes on. Mostly, we see the progress but are largely blind to the down-sides.

The elephant in the room seems particularly perplexing to politicians, economists and others charged with taking a macro-view of where human society is going. Perhaps they feel some immunity to the excrement pile that is surely coming. But, surely an important question should be what will happen in economic and societal terms when significant numbers of workers in a wide range of industries are displaced fully or partially by technology, both physical and information-based? As a non-economist, the equation seems obvious to me: less workers means less consumers means less production means less workers. A positive feedback loop of highly negative factors for today's business models. Sounds like the demise of capitalism. Am I being too simplistic?

A recent article in the Economist, "The Future of Jobs: the Onrushing Wave" explores the widely held belief among economists that technological advances drive higher living standards for the population at large. The belief is based on historical precedent, most particularly the rise of the middle classes in Europe and the US during the 20th century. Anyone who opposes this consensus risks being labeled a Luddite, after the craft workers in the 19th century English textile industry who attacked the machines that were destroying their livelihoods. And perhaps that is why the Economist, after exploring the historical pattern at some length and touching on many of the aspects I've mentioned earlier, concludes rather lamely, in my opinion: "[Keynes] worry about technological unemployment was mainly a worry about a 'temporary phase of maladjustment' as society and the economy adjusted to ever greater levels of productivity. So it could well prove. However, society may find itself sorely tested if, as seems possible, growth and innovation deliver handsome gains to the skilled, while the rest cling to dwindling employment opportunities at stagnant wages."

 "Sorely tested" sounds like a serious understatement to me. The Industrial Revolution saw a mass movement of jobs from Agriculture to Manufacturing; the growth of the latter largely offset the shrinkage in Agricultural employment due to mechanization. In the Western world, the trend in employment since the 1960's has been from Manufacturing to Services. Services has compensated for the loss of Manufacturing jobs to both off-shoring and automation. But, as Larry Summers, former US Treasury Secretary, mentioned in the debate on "Rethinking Technology and Employment" at the World Economic Forum in Davos in January, the percentage of 25-54 year old males not working in the US will have risen from 5% in 1965 to an estimated near 15% in 2017. This trend suggests strongly that the shrinkage in Manufacturing is not being effectively taken up elsewhere. The Davos debate itself lumbered to a soporific draw on the motion that technological innovation is driving jobless growth. Prof. Erik Brynjolfsson, speaking with Summers in favor of the motion, offered that "off-shoring is just a way-station on the road to automation", a theme echoed by the January 2014 McKinsey Quarterly "Next-shoring: A CEO's guide". Meanwhile, Brynjolfsson's latest book, with Andrew McAfee, seems to limit its focus to the quality of work in the "Second Machine Age" rather than its actual quantity.
Blind_monks_examining_an_elephant 1.jpgAs in the tale of the blind men and the elephant, it seems that we are individually focusing only on small parts of this beast.

For a broader and deeper view of the business and technological aspects of this topic, please take a look at my new book: Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data.  

Part 3 follows.
]]> Internet of Things Mon, 10 Feb 2014 11:11:36 -0700
Big Data, the Internet of Things and the Death of Capitalism? Part 1 Addo Elephant 1.JPGThere's an elephant in the room. No, not the friendly, yellow, Hadoop variety.

The large, smelly, excrement-dumping kind that no one wants to notice, never mind talk about. Allow me to name it: the combination of big data and the Internet of Things (IoT) is reinventing the nature of work. Much of employment as we know it is disappearing. And that signals the end of the consumer society we've painstakingly built over the past half century or so.

Am I being alarmist? Let's think about where big data and the IoT are going and how they are intersecting with the world of people earning and spending money. Include all the automation, connectivity and analytics that we are doing and considering. And you'll soon see what I mean. In part 1 of this short series, I look at the current state of play for big data and the Internet of Things.

Let's start with the physical. The Things of the IoT are acquiring ever more sensing abilities, as well as an increasing level of automation capabilities. Think piece-part robots yalking via the Internet and you wouldn't be far wrong. Look also at the developments in robotics in the past few years. In a recent O'Reilly Radar article, Rodney Brooks, former Panasonic Professor of Robotics at MIT and founder of Rethink Robotics, says "We're at the point with production robots where we were with mobile robots in the late 1980s... The advances are accelerating dramatically." The Rethink Robotics videos show some agonizingly slow-motion action, but it doesn't need Clayton Christensen to recognize a potential disruptive innovation here. The process about to be disrupted is the manual labor involved in a whole variety of repetitive but loosely bounded activities on assembly, packaging and similar production lines. Their manufacturers promote the idea that these new robots can work safely alongside humans, but the cost equation speaks more to replacement of manual labor than coexistence. Even at this early stage of development, Rethink Robotics' Baxter sells from $25,000. The impact on employment seems fairly obvious.

Baxter Robot.jpgBrooks, quoted in the same Radar article by Glen Martin, also targets robots for health care, especially care of the elderly, suggesting that demand for such workers is outstripping supply. "Again, the basic issue is dignity. Robots can free people from the more menial and onerous aspects of elder care, and they can deliver an extremely high level of service, providing better quality of life for seniors." And even less of the desperately needed social interaction that alleviates loneliness and disconnection among the elderly, given the more likely staff replacement vs supporting robotic role. Although you shouldn't discount YDreams Robotics' forthcoming desk lamp, which can sense your mood and call a friend or relative via smartphone and tell them you're in need of cheering up. The scenarios get creepier, right up to and including Spike Jonze's new movie, her. But from the viewpoint of this blog, the bottom line is another set of jobs under threat.

Information-based jobs are no safer, also under threat from multiple directions. Simpler tasks are automated. Sports scores and stock market prices can be spun into passable stories already by automated processes, once the jobs of cub journalists. More complex tasks are dissected into component parts, with the simpler parts automated and the human role often restricted to repetitive or supervisory activities. This is much the same process that took place in industrial factories in the 19th century. Craft workers who, skillfully and lovingly, made complete finished articles were displacing by production workers whose activities were increasingly optimized to produce simple piece parts in the cheapest and most soulless manner. Pattern recognition and machine learning have seen enormous advances as training data sets become ever larger. Human translators are being displaced by Google Translate and its ilk. According to Frey's and Osborne's estimate in their 2013 The Future of Employment, "47 percent of total US employment is in the high risk category, meaning that associated occupations are potentially automatable over some unspecifed number of years, perhaps a decade or two. It shall be noted that the probability axis can be seen as a rough timeline, where high probability occupations are likely to be substituted by computer capital relatively soon." The probabilities produced by their model run from 99% for telemarketers, freight agents, tax preparers and more to 0.3% for emergency management directors and first line supervisors. Frey and Osborne cover both physical and information-based jobs, but it's interesting to note how many information-based jobs are in the high probability range. It's also worth noting how much of the "services industry", that darling of developed Western economies, falls in the higher ranges of probability.

Thumbnail image for Business unIntelligence Cover.jpgSo, there is a problem here. Current technological trends will very clearly impact employment and society. The IT industry is confident that net effect will be positive. Many economists seem to be fence-sitting... but that's fairly normal. However, we'll take a closer look in Parts 2 and 3.

For a broader and deeper view of the business and technological aspects of this topic, please take a look at my new book: Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data.   ]]> Internet of Things Tue, 04 Feb 2014 12:05:52 -0700
BI for Everyone? Mart.jpgBusiness unIntelligence emerged from my questioning of the fundamental assumptions underpinning BI in all its forms, from the enterprise data warehouse to big data analytics. The belief I'm questioning in this post is that the target audience of BI is everyone in the business. This springs from the very reasonable premise that BI should be used more widely throughout the organization than it currently is. But, somewhere along the way, I don't know when, the idea emerged that BI must be used by everybody in the enterprise, if we are to gain full business benefit. BI vendors thus lament low-penetration of their tools, agonizing over how to make them simpler, more appealing. Data marts, echoing the bright and breezy WalMarts and Kmarts of retail, were introduced in the mid-1990s as an alternative to the dark and dismal data warehouse. Today, self-service BI, self-service analytics, self-service everything will solve world BI hunger.

I'm sorry, for me, that dog don't hunt.

This post was triggered by Paxata's mission statement, which aims to "Empower EVERY PERSON in the enterprise to find and prepare analytical information..." In truth, Paxata is building a very impressive and powerful adaptive data preparation tool. But, their mission statement almost derailed my interest. Really, each and every person in the organization should be finding and preparing analytical information? From the janitor to the CEO? I'll return to Paxata later in this post, but first, let me check if anybody else feels the same level of discomfort with this concept as I do.

Let's start with self-service on a personal level. I'm pretty comfortable with self-service when I visit a computer retailer. Put me in a perfume store, and I need an assistant--quickly. My wife has the opposite experience. I conclude from this (and many other examples) that self-service works only if the self-server has (i) sufficient understanding of what she's trying to do and (ii) considerable knowledge of what is available, its characteristics and where it is in the store. My experience in BI is that business users typically satisfy the former condition but fail regularly on the latter. In my original 1988 data warehousing paper (or contact me if you want a copy), I distinguished between dependent and independent users. It was only the second group who satisfied both conditions above. At the simplest level, we may divide business users into two such groups, as I did back then. In reality, of course, it's more subtle. Some business users understand statistics, many don't. Some are more attuned to using information; others rely (often correctly) on their intuitions. In chapter 9 of Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data I discuss many of the ways in which decisions are influenced by many things other than information. So, it's horses for courses at a personal level: self-service BI only works for some of the people some of the time.

Organizationally, the idea that everyone is involved in decision making that demands analytical support is contrary to all concepts of division of labor and responsibility. The above-mentioned janitor has no incentive to analyze his cleaning performance. There still exist a myriad of tasks in every organization that are menial, performed by rote. Managers of such areas have, since the time of Fredrick Winslow Taylor in the early 1900s, been analyzing such work and finding ways to streamline it. While we recognize today that production line workers do have knowledge to contribute to process improvement, such knowledge is tacit, and unlikely, in my view, to be quantified by the workers themselves. Rather, that elucidation depends on another type of skill, that of independent users, power users or, as they like to be called today, data scientists. These are people whose skills and interests bridge the business/IT divide.

Which brings me back to Paxata and the concept of the biz-tech ecosystem, the symbiotic relationship between business and IT demanded by the speed and span of modern business. Finding and preparing data has long been the principal bottleneck of all BI. It is a type work that really requires a balanced mix of business acumen and IT skill. Its exploratory and ad hoc nature defies the old development approach of business requirements statements thrown over the fence to IT.

What's required is an exploratory environment for diverse data types that seamlessly blends business acumen and IT skills. This is precisely what Paxata does. Based more on the data content than on the metadata (field or column names and types), business analysts explore the actual contents of data sources and their inter-relationships in a highly visual manner that uses color and other cues to direct attention to aspect of interest identified by heuristics within the tool itself. Data can be simply cleansed and transformed, split and joined. The interface is deliberately spreadsheet-like, another comfort zone for the business analyst. But, unlike a spreadsheet, all action are recorded and tracked; they can be rolled back and they can be repeated elsewhere, vital aspects of the level of data governance needed to make this solution capable of being put into production. It's hard to describe in words; you really need to try it or see the demo. See further descriptions from Joseph A. di Paolantonio and Jon Reed.

It's tools like this that make the biz-tech ecosystem real, that blend business and IT data skills and knowledge for easier application to real business needs. They enable people from the business side of the enterprise to enhance their IT abilities, and vice versa, removing the barriers to data exploration and preparation that stand between the information and full business value. They make it easier for more people to become independent or power users, data scientists, or whatever they choose to call themselves, making their jobs easier, faster and more productive. That is the vital and visionary work that Paxata (and other vendors like them) are doing in this era of exploding data varieties and information volumes.

Business unIntelligence Cover.jpgBut I still believe that this role and these tools will never be for everyone. What do you think?

News flash: Business unIntelligence is now available as an ebook on Kindle and on Safari.

]]> Business unIntelligence Thu, 23 Jan 2014 02:29:05 -0700
Predictive Analytics - the Power and the Gory astrologer.jpgMore years ago than I care to remember, I took a strong interest in predicting behavior. Besotted by a woman who blew hot and cold, I turned to astrology seeking to know the future of the relationship. The prognosis was far from positive... and ultimately correct.

Modern thinking is highly skeptical of such esoteric arts. Correctly so in the case of many practitioners. But it has far less reason, in my opinion, to dismiss much of the underlying rationale and value of these approaches. That, however, is a topic for another forum. My point here is that I did not believe the prediction; I didn't want to. And that's part of a rather gory truth of any predictive tool.

As we flip the calendar to 2014, it is clear that the audience for predicting people's future behavior has grown far beyond the ragged ranks of lonely lovers. Marketers and advertisers, once content with psychological and sociological generalizations, have become increasingly besotted by big data and the promise it offers of knowing the customer so intimately that her individual and specific future behaviors can be predicted. And, probably of more value, influenced. This, of course, is the power and the glory of predictive analytics. But, as already noted, this blog's title actually says "gory". And here, in detail, is why.

Thumbnail image for Business unIntelligence Cover.jpgWhile writing in Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data about some of the uses of big data way back in February, 2012, I came across Charles Duhigg's New York Times piece on Target's ability to predict not only the pregnancy of female (obviously) customers but also their likely delivery dates. Over the New Year break, I finally found time to read Duhigg's book, The Power of Habit, in which that story first appeared. My point in my own book, and in a related blog Death by a thousand analytics, was that using big data in this manner is both a clear invasion of privacy and likely to creep people out. But Duhigg's book deals with a much broader topic--human habitual behavior and unconscious thought patterns--that has far deeper implications for predictive analytics and decision support in general.

The message of The Power of Habit is fairly easily summarized. A very significant proportion of our thinking, decision-making and action at personal, organizational and societal levels is entirely habitual. Given a particular cue (which can be anything from a thought to an action), a routine set of behaviors is immediately and automatically engaged, which provides an anticipated reward. Simply put: cue => routine => reward. This mechanism operates, both in its set up and execution, in one of the most primitive areas of the brain, the basal ganglia, a structure that derives in evolutionary terms from the earliest chordates. It is, in fact, a basic survival mechanism that enables us undertake a wide range of necessary life-preserving activities, from preparing food to getting home, with minimal expenditure of energy and attention. This, of course, initially freed the brain to deal with novel, life-threatening situations and more recently for higher-brain functions such as empathy and reasoning. However, as anyone who has ever tried to give up biting their nails or mid-afternoon snacking can testify, changing habitual patterns can be very difficult indeed, even those which are counterproductive or even downright dangerous.

This very primitive mental operation of cue-routine-reward has two (at least) important consequences for predictive analytics.

For marketing, the drive will be to move increasingly to an understanding of specific, individual habits and their cues, in order to create opportunities to influence behavior. This is more difficult than it sounds, as such cues are often difficult even for the individuals themselves to identify. Similarly, rewards are often far more subtle and diverse than might be imagined. The story of how Pepsodent toothpaste was sold to an American public famed for poor oral hygiene in the early 1900s has become a classic marketing exemplar of how to create a habit and a sales success. But, according to Duhigg, the actual cue and reward involved were long mistaken. Furthermore, although most Western people are by now inured to mass manipulation by advertising, the much more personal and private knowledge used for such individual influencing may raise resistance to such approaches. From the point of view of privacy, I am of the opinion that it most certainly should.

In the case of personal and organizational decision making support, habitual and other unconscious patterns are a serious impediment to efforts to become "data-driven". My unfortunate lovelorn experience of disbelieving a predicted but undesired outcome is a basic behavior pattern well-recognized in psychology but seldom mentioned in business intelligence. Habitual beliefs and behaviors at an organizational level are so deeply embedded and invisible to participants that they stymie efforts even to recognize problems, never mind take effective action.

The bottom line is that no matter how much data you gather or how elaborate the models you generate, Business unIntelligence operates finally, in full glory or gory fullness, in the minds of your customers and business users.

]]> Business unIntelligence Mon, 06 Jan 2014 03:40:50 -0700
Sex, Toys, and Big Data Sex_Lies_and_Videotape.jpgIn this last blog of 2013, I want to make some serious points that should be front and center (he said coyly) for all analytic, BI and big data professionals next year. In fact, they should be already. And if not, I urge you to use some reflection time over the coming holiday to make them so.

Let's start with sex... toys. Consider the value of deep analytic big data monitoring to enforce the following two verified laws in the USA: (i) you may not have more than two dildos in the same house in Arizona and (ii) it's illegal to own more than six dildos in Texas. (I refuse to speculate on why Texans may be less susceptible to moral degeneracy than citizens of Arizona.) I suspect most of us are mildly amused by such legislation. Many have probably long taken comfort in the belief that it is unenforceable; after all, who can imagine the local cops breaking into your home on suspicion that you are hoarding sex toys? But wait. Haven't the authorities in the US and other countries been rummaging through your digital drawers for years now, and without any just cause for suspicion? Hasn't at least one supermarket chain collated and analyzed data to determine if its (female) customers have recently become pregnant? Haven't two technology companies applied for patents to spy on (sorry, gather behavioral data to enable better targeted advertising to) you via your living room TV? How far will government and business go in mining our personal lives, our sexuality, our bodies, our social circles to (allegedly) protect us, cure us or sell us more (unnecessary) stuff.

ThanksforSharingPOSTER.jpgThe sad truth is that we have lost most of our privacy already, having entered into a Faustian pact to share, both knowingly and unwittingly, the details of our daily lives. That knowing part--the Facebook likes and Google +1s--may be said to represent a conscious tradeoff by the person sharing between a loss of privacy and a perceived increase in social capital or useful contextual information. Even the acceptance that our smartphones report our location minute by minute is driven by a consensual belief that we may be offered a coupon for a nearby coffee shop at any moment. The payoff for ultimate traceability. Apple iBeacon allows newer iPhones or Android phones with Bluetooth Low Energy devices to track their--and your--position in space with centimeter precision. Which aisle in the supermarket are you in? What about some very specific retail therapy recommendations? These, and other soon to emerge toys, have the addictive quality of sex to many of the current generation of CMOs and proponents of big analytics.

However, the smartphone is but the pioneer species of the internet of things, the ultimate in small toys for big data boys. As sensors become ever smaller, cheaper and ever more powerful, the utopian vision is of systems that respond instantly to our needs, that anticipate our very expectations. We are promised houses that know we're home and adjust lighting and heading accordingly. Wrist bands that sense when we awake in the morning, so that the coffee can be brewed, or know that our elderly aunt has not moved for the past hour and may have slipped and broken her hip. Software to the rescue, hardware to alleviate yet another chore. According to Computerworld, Gartner predicts that half of all BI implementations will incorporate machine data from the internet of things by 2017. Research conducted by EMA and 9sight during the summer of 2013, suggested that the future is already here; machine-generated data overtook human-sourced information as big data sources among our respondents. And the more details of our mundane activities that become available in the internet of things for analysis and correlation, the more specific and identifiable our individual patterns of behavior become and the more difficult it is to retain any degree of anonymity. IBM's 5 for 5 this year includes "a digital guardian that will protect you online" within 5 years. But it's mostly about security rather privacy, and already years too late.

Thumbnail image for Business unIntelligence Cover.jpgRealists may like to consider how useful such tools would have been to the Stasi in German Democratic Republic (East Germany) or the Soviet Union's KGB in the second half of the last century. Would the world be as we know it if they had? The more dystopian among us conjure up visions of Franz Kafka's The Trial or George Orwell's 1984 only 30 years late in arrival. But, it is not my intention to propose neo-Luddism. The big data jinni is already well out of the bottle. Despite the silly-bugger games we are currently playing with it, big data does hold the possibility to understand and address many of the most intractable environmental, climatic and social issues we face today. Although, as I've mentioned repeatedly in "Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data", our success in acting upon, never mind solving, them will depend far more on our very human intention towards what we desire than on all the data, software and hardware we apply.

So, as we look to 2014, I urge all of us to keep three resolutions in our minds, a subtle matrix underpinning every business need we evaluate and every design decision we make:

  1. Understand and account for the relationship between big data and the traditional core business information long created and processed in existing operational and informational systems; each complements and contextualizes the other

  2. In gathering and analyzing big data, consider how its use can impact personal privacy, especially the ways in which such data can be combined with other big data sets and compromise anonymity

  3. Perhaps most importantly, consider the universal/global impact of the project: how does it contribute to or mitigate the real-world issues of environmental degradation, over-production and consumption, economic instability, and more. In short, how does it support the best in humanity and the world?

These, especially the last, may sound like utopian dreams. But, consider the almost unimaginable power unleashed by the unprecedented growth and interconnection of information currently in process. We are unleashing a potential for good or ill far greater than that created by a small team of physicists in Los Alamos during the Manhattan Project.

I wish you all a Happy Christmas and a Peaceful and Prosperous New Year, within whatever tradition you choose to celebrate this time of year. ]]> Business unIntelligence Wed, 18 Dec 2013 09:15:06 -0700
SOA for Process and Data Integration sounion_athens1.jpgTraditionally, BI has been a process-free zone. Decision makers are such free thinkers that suggesting their methods of working can be defined by some stogy process is generally met with sneers of derision. Or worse. BI vendors and developers have largely acquiesced; the only place you see process mentioned is in data integration, where activity flow diagrams abound to define the steps needed to populate the data warehouse and marts.

I, on the other hand, have long held - since the turn of the millennium, in fact - that all decision making follows a process, albeit a very flexible and adaptive one. The early proof emerges in operational BI (or decision management, as it's also called) where decision making steps are embedded in fairly traditional operational processes. As predictive and operational analytics has become increasingly popular, this intermingling of informational and operational is such that these once distinctly different business behaviors are becoming indistinguishable. A relatively easy thought experiment then leads to the conclusion that all decision making has an underlying process.

I was also fairly sure at an early stage that only a Service Oriented Architecture (SOA) approach could provide the flexible and adaptive activities and workflows required. I further saw that SOA could (and would need to) be a foundation for data integration as the demand for near real-time decision making grew. As a result, I have been discussing all this at seminars and conferences for many years now. But every time I'd mention SOA, the sound of discontent would rumble around the room. Too complex. Tried it and failed. And, more recently, isn't that all old hat now with cloud and mobile?

All of this is by way of introduction to a very interesting briefing I received this week from Pat Pruchnickyj, Director of Product Marketing at Talend, who restored my faith in SOA as an overall approach and in its practical application! Although perhaps best known for its open source ETL (extract, transform and load) and data integration tooling it first introduced in 2006, Talend today take a broader view and offers data focused solutions, such as ETL and data quality, as well as open source application integration solutions, such as enterprise service bus (ESB) and message queuing. These various approaches are united by common metadata, typically created and managed through a graphical, workflow-oriented tool, Talend Open Studio.

So, why is this important? If you follow the history of BI, you'll know that many well-established implementations are characterized by complex and often long-running batch processes that gather, consolidate and cleanse data from multiple internal operational sources into a data warehouse and then to marts. This is a model that scales poorly in an era where vast volumes of data are coming from external sources (a substantial part of big data) and analysis is increasingly demanding near real-time data. File-based data integration becomes a challenge in these circumstances. The simplest approach may be to move towards ever smaller files running in micro-batches. However, the ultimate requirement is to enable message-based communication between source and target applications/databases. This requires a fundamental change in thinking for most BI developers. So a starting point of ETL and an end point of messaging, both under a common ETL-like workflow, makes for easier growth. Developers can begin to see that a data transfer/cleansing service is conceptually similar to any business activity also offered as a service. And the possibility of creating workflows combining operational and informational processes emerges naturally to support operational BI.

Thumbnail image for Business unIntelligence Cover.jpgIs this to say that ETL tools are a dying species? Certainly not. For some types and sizes of data integration, a file-based approach will continue to offer higher performance or more extensive integration and cleansing function. The key is to ensure common, shared metadata (or as I prefer to call it, context-setting information, CSI) between all the different flavors of data and application integration.

Process, including both business and IT aspects, is the subject of Chapter 7 of "Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data".

Sunset Over Architecture (SOA) image: ]]> Data integration Thu, 12 Dec 2013 03:46:17 -0700