Blog: Krish Krishnan "If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein. Hello, and welcome to my blog. I would like to use this blog to have constructive communication and exchanges of ideas in the business intelligence community on topics from data warehousing to SOA to governance, and all the topics in the umbrella of these subjects. To maximize this blog's value, it must be an interactive venue. This means your input is vital to the blog's success. All that I ask from this audience is to treat everybody in this blog community and the blog itself with respect. So let's start blogging and share our ideas, opinions, perspectives and keep the creative juices flowing! Copyright 2016 Mon, 06 Jan 2014 09:34:14 -0700 2014 - Applications or Data? What Drives Next?
In the next wave of innovations we will be seeing more from the Apps or Applications segment as opposed to the Data segment. The reason for this is the fact that you can deliver the value and insights to end users from all the data using analytics on dashboards and mashups as modern apps, and the result can be pushed to the recipient platform on demand or scheduled. Wait a minute, we can do most of this today and whats new? that's where I would like to draw your attention to focus on the analytics that can be delivered on cloud to mobile platforms. I'm talking about the dawn of next generation applications and the platform (atleast the first I played with and impressed is Enterpriseweb ) for creating a network application model that can be distributed and supported across the globe. This trend along with the devices that generate data from personal lifestyle to  BYOD at work will push the drive for Apps and Applications and the platform will be networked infrastructure and cloud based deployments. An interesting feature to think through here is the security features that need to be developed for this deployment and there are great opportunities here for new techniques and platforms to be designed for these requirements.

I will be visiting this topic periodically and invite your thoughts on the same.
]]> Mon, 06 Jan 2014 09:34:14 -0700
A New Platform For The New World EnterpriseWeb and its cofounders Dave Duggal and Bill Maylk. This startup is having a vision of the next generation platform, where everything that happens in an organization is an event that can be indexed and searched by any business user at any point in time, and an adaptive platform is needed to provide the scalability and flexibility of this demand.

The platform that is built to handle the demands of a modern enterprise class workload, shows  the best in class architecture of combining the Zachman Architecture approach with web style REST API's and a semantic framework that can provide access to any data within the enterprise. This architecture also boasts of highly secure approach to protecting information that makes it a robust approach especially at the enterprise level.

The EnterpriseWeb platform has no compile time versus run time architecture, due to the fact that events occur in an enterprise in real-time. The underlying platform supports late binding features that make it simple to define and design dynamic applications that can provide value of high scale and magnitude to the end user.

Another feature of the architecture that is very impressive is the unified repository that surpassed expectations when it came to performance and scalability. The concept of one architecture for everybody simplifies the fact that many metadata management processes need to be implemented and followed in the enterprise. This approach in my opinion is the best way to manage semantic data in any size of organization.

Enterprise data today means data from within the enterprise and outside the enterprise including social media, web forums, 3rd party data, competitive intelligence data and more. In this model of data integration integrating data is relatively simple compared to traditional architecture approaches and one can build adapters for each type of data that needs to be integrated into the enterprise.

The overall goal of this company is to provide a new platform for the next generation architecture and after seeing the product demo and underlying architecture, it is definitely a robust and scalable architecture that can meet those goals. Feel free to visit the website and learn more. As I learn more about this architecture, I will post details in my blog or article.
]]> Mon, 15 Jul 2013 22:29:00 -0700
Big Data Tipping Point

TDWI Chicago Template Krish Krishnan.html

Looking forward to seeing you at the event.

]]> Wed, 03 Apr 2013 20:12:52 -0700
Monetizing on Your Data Strategy
The main idea behind applying ROI based techniques for any in-house program is to provide management with a roadmap on the value of the investments in technology and the business benefit it will bring about. The one area where we have traditionally struggled to provide a clear and concise point of view is the area of "data", ironically while the same has been classified and touted as an enterprise asset for many years. How does one really apply ROI to data strategy?

The techniques of using data quality and integrated data architectures help in building a business case for data strategy, but does not clearly articulate the ROI as it does not tie the business outcomes to the data strategies used in the organization. In order to measure the ROI on data strategy, we need to employ a combination of
  • projected or predicted ROI from all data programs
  • realized increase in business initiatives
  • increase or decrease in profitability
  • measured customer sentiment

By creating a mashup  of the different pieces, we can create a co-relation on the initiatives of data strategy and the ROI, with a measurement of true impact on the business. This type of value driven mechanism is needed to realize the true ROI on data strategy.

Monetization from the data strategy efforts can be traced with this method and clearly documented along with trends and timelines. This is not a simple exercise and needs to be implemented with acceptable margin of error for the first few iterations till a maturity model can be established.

Once we have this type of a model, every program of data strategy can be tied to a measurable value and you can predict the tangible ROI and the timeline for the realization with a higher degree of confidence. This type of practice exists in many organizations albeit on a tribal scale and it needs to be enabled and empowered to become an enterprise level strategy.

This type of approach uses all the soft costs and the tangible performance results of the business together and hence it is to be treated with utmost security and governance to protect the competitive advantage of the business. Watch for my expanded whitepaper in the next week on this topic.

]]> Wed, 30 Jan 2013 07:56:22 -0700
Trends Driving Business
Businesses started looking at customer sentiments and behaviors closely and started identifying what drives the customer to be loyal to them. Beyond just customer loyalty the trends that have started driving the business is Customer Centricity and associated transformations in business. This is where Big Data plays an important role within enterprises today and the underlying need to integrate and explore the data and insights from the same.

While the buzz from Big Data may stay or die after a period of time, the new trends and directions that it has created for businesses to follow will be the future trend.
]]> Sun, 25 Nov 2012 10:42:29 -0700
Teradata Announces A New Big Data Appliance
Teradata today announced the new Teradata Aster Big Data Appliance, and yes as the name spells it all it is the first integration product from the stables combining not only Aster and its legendary platform with the bells and whistles of Teradata platform in terms of management tools and hardware capabilities and combining the Hadoop integration with Hortonworks

The system in my opinion is a perfect combination of power and ruggedness with a lot of finesse and it differs from competitive announcements on these grounds

1. Aster's patented Sql-H and its legendary scalability architecture
2. 50 plus analytical functions designed to work on large data sets
3.  SQL-MapReduce combining SQL and MapReduce, which was an Aster core strength in 2009
4. Teradata Viewpoint - a well known and tested management tool for the platform - extended to include Aster and Hadoop
5. Teradata TVI a very sophisticated hardware support and failure prevention software
6. Infiniband network interconnect - makes ultra-high-performance connectivity between Aster and Hadoop, as well as scalability, a non-issue

For those of you who have been around these platforms, it definitely is not a me-too solution or taking existing solutions and re-integrating or re-branding them. Neither is the platform configuration a custom build. From a CIO's perspective this is where the Teradata Aster Big Data Appliance makes a strong business case, it brings all proven and tested technologies in a appliance footprint with all the configurations required to handle the onslaught of Big Data.

As a Big Data practitioner and a Data Warehouse evangelist, what truly is a future think architecture from my perspective is the "unified architecture". This is where the rubber meets the road in my opinion and I have discussed a similar solution in any seminar or discussions that I continue to have on Big Data.

What this architecture does for you as a user is creating two platforms at a the same time - one for exploration and mining purposes and the other for analytics and management reporting. You can push workloads across the different architectures here and leverage the power of all the pieces of the infrastructure. With the right approach and solution architectures, enterprises can take a giant leap forward for the Big Data journey on these type of platforms.

The engineering efforts of Teradata, Aster and HortonWorks speaks for itself in terms of performance tests. I look forward to more testing and benchmarking results on large data volumes.

While all the technology announcements till date have been very innovation focused, this one makes a business case at the very introduction itself and that is what gets a business executives attention. The passion and commitment of all the teams involved shows off in the appliance performance from current tests and recorded benchmarks.

This is not the last of the appliances, there are more to come and the users will have a greater choice. If I were to compare anything in the Auto industry today with these appliances, Teradata Aster Big Data Appliance is like the next generation of Prius with more bells and whistles, while IBM Pure is like a custom built souped up supercar. Both have their enthusiasts and loyalists, while both have been able to address different user needs.

Whichever is the direction one chooses, this is the dream come true era for many solution architects, DBA's and CIO's.
]]> Thu, 18 Oct 2012 21:07:43 -0700
Resurgence of Data Appliances (drop Warehouse & add prefix Big)
What interests me is the legacy that is being carried through, as we move on from the Data Warehouse Appliance to Big Data Appliances. The original promise of the Data Warehouse Appliance was a self contained box of hardware, software and API built to handle the rigors of the Data Warehouse, and the Big Data Appliances will be a self contained box configured to handle the demands of Big Data. In the newer situation the issue will be where does all of this go into the ecosystem and how does one handle all the workload. Well that is for the architects to figure over time.  Of course at the time of this writing, we still have more to come our way in all probability, never say we are done with anything in technology.

Innovation thrives and exists.
]]> Wed, 10 Oct 2012 16:43:54 -0700
Streaming Data Processing in The DW
Let us look at the first question, today the business does not get access to the data at the right time, for example credit card fraud, high volume trade fraud or any risk management platform can process a fraud alert based on complex event processing algorithms, but the results are provided to the business user in most cases after the fact when the data is at rest in the DW. This now adds complexity as someone has to trace through the data end to end to sequence the events and discover patterns and with add the added delays due to human intervention in the process, the decisions are often more impacting on the financials. In the present infrastructure, applying a near real-time processing platform is expensive.

Let us look at the second question - why the DW? the answer to this question lies in the fact that the DW is the most stable data repository and the most integrated analytical data source in any organization. The deeper question is should this type of processing be done in the DW or outside the DW with just the results being integrated to the DW. Doing it the latter way is ideal as the volume of data we add is considerably lower compared to all the data being loaded and processed in the DW

This is where the next generation of DW comes into play. In the future, the DW will be the analytical hub with an ecosystem of technologies surrounding it. The better designed platforms for such streaming data analytics include Hadoop, NoSQL and Armanta Inc's platform to name a few. These platforms can process and digest high volumes of data and provide alerts that can be caught while the transaction is in process. An example of such an implementation is where a large credit card vendor can provide an additional client verification service and reject the transaction if found or suspected to be fraudulent.

As businesses transform to a service centric model, these types of data management needs will become the reality of the day. While there are several niche solutions and initial adopters to these types of platforms, the immediate future looks promising to lap-up all the new capabilities especially in the Big Data platform, where cost is not the barrier to enter.

Streaming Data Processing and Analytics will emerge as a top requirement, from a large market demand and adoption perspective in the world of Data Warehousing.
]]> Tue, 28 Aug 2012 18:41:12 -0700
Kognito Brief - In-memory Data Store Adoption
  • Big Data
  • Commoditization of Infrastructure
  • Emergence of NoSQL databases

In a recent conversation with Roger Gaskell, the CTO of Kognitio, I had an opportunity to hear their vision for the future. It is interesting to hear that the queries and users coming to Kognitio are looking to adopt the architecture as an in-memory datastore to their existing EDW or new Big Data environment.

It is not a well known fact that Kognitio had been designed as in-memory data store in its original avatar as WhiteCross (there are several mentions in the web if you are interested). In the first 20 years of its existence, the company could not bring the in-memory design to forefront due to the cost factors  The UK based company had been accepted as a second generation DW Appliance, though going by fellow analyst records, their customer base was not a publicly available artifact. I had mentioned them in my coverage of DW Appliance industry since 2007 and they have continued to remain a part of that space.

Roger briefed me about the Cloud based model from Kognitio and discussed their recent partnerships with Allteryx and HortonWorks from a big data extension perspective. In a typical Big Data solution architecture, we can see how Kognotio will fit into the big picture. Remember that you need to create a heterogenous solution to become scalable and flexible.

While Kognitio may not have a deep pockets of another in-memory vendor SAP. Their technology is something taking a look at, especially if you are non-SAP environment.

Roger cited some clients who were already engaged in the technology evaluation for Big Data platform from an in-memory perspective. However the client names are under NDA and were not disclosed. Going by the recent spate of announcements from Kognitio and its partners, the indications are that the latest trend is in Kognitio's direction. Let us see how this evolution continues.

We will look at other technologies in this series later this week into next.

]]> Mon, 23 Jul 2012 17:31:53 -0700
Actionable Insights?
We initially discovered that on a time value curve, the data acquisition, analysis and delivery latencies, which were built due to design and architecture weaknesses. With the advent of technology and tools, we have somewhat overcome the initial latencies deficiency. With the reduction of initial latency, we have enabled adoption of Business Intelligence. But the adoption has brought the second problem, the situation is something similar to the sequence below

  1. Company XYZ implements a DWBI solution
  2. Users develop reports and analytics,  discover performance needs
  3. Performance and Architecture are tuned
  4. Users start gaining adoption to the Analytics and Reports
  5. Executive Dashboards are commissioned and provide critical metrics
  6. An executive asks a simple question like - How or Why or What? - this is where we build the secondary latencies

The post query analysis to provide different levels of data to the executive decision maker is where the effective use of Business Intelligence falls off the cliff. The different reports, data points and metrics quickly get transitioned from a BI tool to Excel spreadsheets and further analysis and discovery leads to chaos and eventually a million dollar investment loses steam.

This is where we need to draw our attention to looking at new data visualization techniques like mashup's. Powerful tools like Tableau, Spotfire and BIS2 offer a wide variety of data mashup techniques. With a mashup, you can create a summary and details views separately and present them in one single view. This will be an easier technique as you will think of the scenarios together and present better analytics as a end result. There are better techniques then drill down and drill across with the advent of newer visualization tools, as related to providing actionable insights and analytics. This will reduce post analysis latencies and drive Business Intelligence success.

]]> Thu, 14 Jun 2012 09:47:06 -0700
The need for Sandbox Environments
With the advent of Analytics and Big Data, the need for Sandbox environments is going to increase dramatically. If you are a SMB or MMB sized business, you might be tempted to go the cloud route and that might be perfectly okay. But large enterprises cannot simply move to a cloud footprint for this exercise and private clouds and virtualization play a vital role in these situations.

The users of this type of environment will be Business SME's and Data Scientists, both of these user types need an environment where they can play without limitations. Unleash the creative potential and save valuable time by adopting to a practice of Sandbox. Many successful teams that I know are reaping rich benefits from this environment. While you may not solve a mega problem, you will definitely not create a mega problem by keeping all your development and QA environments clean of the exercises conducted on Sandboxes.
]]> Fri, 01 Jun 2012 09:06:15 -0700
Exploring Big Data - Taxonomies
Taxonomies have long been used as catalog or index creation mechanisms in the world of metadata driven approach to data management and more so in the Web driven architecture where you need linked context behind the scenes. The very same taxonomy family can simply be used to create what we call word clouds or tags from content that is within Big Data. these tags can be used to create powerful linkages that will form a lineage and a graph.

What about Data Quality? that is the biggest advantage of using Taxonomies. When you have spelling errors and language issues, due to the intrinsic nature of taxonomies, you can land to a margin of error equation and often arrive at a close match.

Will this work on all types of big data, from my experiments and learning's it has worked with almost all types of data that can be deciphered by human minds. My next article in this channel will be focusing on this subject.

What can you do with the output from such a discovery? the obvious answer is that you can create a data road-map with linkages to all data across the enterprise. This is a foundational first step in a bigdata journey.
]]> Thu, 17 May 2012 12:51:20 -0700
Discovering Hidden Nuggets With Taxonomies
In this world of chaos, is where we will see the strength of a nascent area of data architecture called Taxonomies. Taxonomies themselves are very popular since the early days of Aristotle, and have even been found referenced in the writings of Chinese emperor's in 3000 B.C. The word is derived from the Greek language and means classifying and identifying species. It has been used in Biology and Language for many years.

Today we have Taxonomies available for every product and subject area in the world, thanks to companies like Wand Inc, Pingar and others. These taxonomies  provide a clear metadata lineage and relationship map, which can be directly used on any kind of data to navigate and classify the same. Another benefit of taxonomies is the ability to integrate different types of data about the same subject and create powerful mashups.

The biggest advantage in using taxonomies is your ability to navigate data in its native forms without having to transport it to a single location, this removes latencies and creates minimal integration work. A second advantage lies in the fact that you can navigate multiple subject dimensions in one document or video or picture, without reprocessing the data multiple times over.

In the land of "big data", you can discover hidden nuggets of information with this approach and then create powerful visualization using light weight reporting tools like Tableau or SpotFire.

To learn more on these subjects and their usage, attend EDW 2012, TDWI and Strata Conferences.
]]> Sun, 15 Apr 2012 16:19:21 -0700
Why Analytics Matters
If you stand outside your normal periphery and look at Data, Big Data in particular, inspite of its sheer vastness in volume, velocity, variety, complexity and such, this data is very easily visualized and understood when seen through Analytics. Imagine for example, on your search has additional recommendations that are all textual in nature, you would be least interested in going back to, rather the data is presented as a statistic and has associated confidence factors, which makes it easy for you to shop there repeatedly. This is a simple example for this discussion on Why Analytics matters.

As you start looking for Big Data, remember to look for Analytics too, without the latter the former will never provide you useful insights.
]]> Tue, 13 Mar 2012 13:01:48 -0700
Visualizing Big Data
While all this buzz is great and early adopters to this new and shiny object have made inroads, at a mass adoption level, Big Data has not been embraced yet by the business users. The reason for this being a key aspect of Visualization. One of the fundamental aspects that need to be understood here is the way we approach Big Data and its modeling and integration is very different from any other data integration exercise. Here is a simple way of looking at the difference

  •  Traditional Data Integration - Business Requirements & Analysis --> Model --> Organize--> Collect --> Integrate --> Store -->  Analyze --> Visualize
  •  Big Data - Collect --> Store --> Organize --> Visualize --> Analyze --> Business Requirements & Analysis --> Model --> Integrate --> Visualize
As you can see from the flow shown above, Big Data needs visualization before you can settle down for business requirements and post integration. You might wonder if this is really a huge problem or are we hyping this up, in reality this is a problem and there is very minimal options available at this point to provide as solutions. I do not want to classify any "App store" downloads as a robust solution, they are all driven towards a personal market for a consumer.

The reason for the current situation can be analyzed in two ways

  • Infrastructure Focus - Web 1.0 and 2.0 focused on infrastructure and the underpinnings, the OSI model and current solutions in the marketplace will definitely point that.
  • Data Ambiguity and Complexity - Big Data by nature is complex and ambigous, this requires additional efforts and deep SME's and sometimes Quants to think, integrate and solve. These folks need to able to visually analyze the data than reading machine data or long pages of text. The tools are not there yet for this purpose.
It is simply unfair to anyone to be looking at large sets of data to derive any value from the same. We need tools that can provide an interrogation platform for that data. The tools will and should be very ontology and semantic focused as we are not ready to model or integrate the data yet.

The journey ahead is greenfield still, there are a few vendors who are visionaries and among them there are a few who are considered leaders. In this year we will see a flurry of activity and investments in this side of the house. The frenzy of Big Data has not peaked yet but it is not too far in the future.
]]> Sat, 03 Mar 2012 06:37:28 -0700