Self-Service Analytics Environment for Next-Generation Insights Delivery
by Rathin Das
Originally published May 30, 2014
Consumers are socially active, mobile enabled, information hungry and more value conscious than ever before. We create an explosive 2.5 exabytes of data (exabyte = 1,000,000,000GB) each day, and this data is projected to grow at 40% every year. Every interaction at every touch point with your brand unleashes a wealth of insight and intelligence on how to best build relevant and more intimate relationships.
Self-Service Analytics EnvironmentA self-service analytics environment (SSAE) is basically an environment created for the internal business users to perform self-service business/predictive analytics. The business users, specially the super users (data scientists), do not know all the data they need to until they start their analytical modeling, and they require much more flexibility in data processing for their iterative modeling process. As these data scientists are much more data savvy than traditional BI users, the role of IT is transitioning to a supporting role, ceding control to the data scientists. Also, the past approach of enterprises to internal information management has resulted in the proliferation of application-specific reporting platforms with redundant or inconsistent data and rising costs. This is resolved to a great extent by adopting a dynamic data mining environment. See Figure 1.
Figure 1: Self-Service Analytics Environment
(mouseover image to enlarge)
Data Integration Tier
At a leading financial services company that implemented a SSAE, the data integration tier was designed specifically to meet next-generation business performance as well as customer and sales analytics requirements, incorporating the enterprise data warehouse (EDW), marketing database, profile database, CRM data, extracts and spreadsheets from clients and third- party data syndicators.
One of the key considerations for implementing this tier is to provide access options for the business users using ad hoc query tools, data services and local files. But this approach means prerequisite security and privacy requirements must be met. As seen in above example of the financial services giant, this environment leverages data discovery and analytics tools that are capable of multi-stage querying to perform heavy lifting of data, making multiple transformations and creating the analytical flat files for downstream analytics. These user-friendly tools enable the analyst community to perform these activities without depending on IT for custom extracts or other extract, transform and load (ETL) processes. Data virtualization, another key consideration, helps business users add additional data into their analyses without requiring it to be physically integrated or go through the systems development life cycle (SDLC) process, thereby driving gains in time and efficiency.
BI/Analytics Sandbox Tier
The analytics sandbox tier is the back-end or server-based development platform for the business users (power and super users). It enables creation of recurring reports and model refinement as well as ad hoc analytics. This development platform needs to support enormous data mashups, data exploration, staging of interim data sets and inputs for the model building in the sandbox. Raw and transformed data is extracted out of the sandbox to build the forward-looking statistical models using various statistical engines in the predictive modeling layer (mentioned below), which can be physically hosted on the development platform.
MPP (massively parallel processing) data warehouse appliances and in-memory or In-database capabilities have been some of the key drivers for options explored by enterprises for hosting, processing and storage capabilities. While MPP data warehouse appliances are helping in analytical processing of voluminous or varied data and meet the performance requirements, in-memory/ in-database analytics capabilities and BI tools have enabled granular and quicker analytics without moving large volumes of data outside the database.
The in-database capabilities provide the ability to pass database queries and join many functions to a target data source for faster processing, thereby reducing network traffic and speeding data access. In addition to reduced data movement, analytic models can be developed and deployed faster, turning the data into usable insights. Also, there is a high degree of control over data security because the leading analytics tools honor and augment the native security of the target data source. Access to the database can be provided to as many or few users as necessary.
Predictive Modeling Layer. The back-end predictive modeling layer enables the analysts or the data scientists to use the statistical databases and predictive analytics engines/tools to develop the predictive models. These tools help in understanding consumer behavior, enhance marketing effectiveness, improve fraud detection, optimize products and more.
It is crucial for enterprises to let the analyst group own and manage the content of these analytical engines while IT manages the infrastructure. To avoid complexity in architectural options, the predictive modeling layer is preferred to be a logical layer incorporated into the analytics sandbox tier (development environment) or deployed as a set of services in the cloud.
Presentation Tier – Business and Advanced Analytics Tools
The presentation tier of the SSAE includes end-user-facing self-service analytics tools. The business analytics tools provides self-service analysis capabilities for data exploration, identifying trends and patterns that can be used during the advanced statistical modeling process.
The advanced analytics component provides the end-user-facing tools for data mining, advanced visualization and predictive analytics, which the data scientists use for building their statistical models. This component acts as the presentation layer for the back-end or server-based predictive modeling layer mentioned above.
Enterprises are adopting a suite of business and predictive analytics tools based on the right mix of analytics use cases and the profiles of the analyst group. However, it is critical to understand that the business analysis typically is read-only, whereas predictive analytics require the data scientists to perform read and write operations on data hosted on the analytics sandbox. Since write operations are typically not allowed for the business users, the predictive modeling back-end or server-based tier facilitates write operations, data exploration and the creation of interim data sets.
The scope of analytic computations – as well as the volume of data and number of data sources – is growing at an unprecedented pace. Enterprises need flexibility to manage the analytical life cycle, from discovery to the execution of large numbers of new and existing analytic models that address functional and industry-specific business issues in a secure, scalable manner. Data scientists, who are highly data savvy, need self-service analytics environments rather than BI solutions to perform the predictive analytics in a flexible, effective manner.
Recent articles by Rathin Das
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC