Turning Algorithms into Business Advantage: Insights from Strata Data Conference
by Ron Powell
Originally published October 11, 2017
Business advantage eludes some of today’s organizations, but those who increased their knowledge and learned from the experts at the recent Strata Data Conference in New York certainly find they have a much better path to follow to successfully use their data to achieve business advantage.
The September Strata Conference was held at the Javits Center where attendees heard about building data-driven strategies and implementing algorithmic strategies. Case studies were presented by companies that are already receiving business advantage from strategies centered on the entire of range of big data tools and technologies, AI, Hadoop, the cloud and much more.
Gathered with fellow business decision makers, strategists, architects, developers and analysts, attendees had countless opportunities for networking, training courses, best practice presentations and information-rich sessions.
Also, I noticed there was a lot of concern – both with attendees and vendors – regarding the upcoming May 2018 rollout of the General Data Protection Regulation (GDPR). GDPR compliance requires cross-departmental cooperation because it involves technology, oversight, processes and people. This Regulation not only applies to organizations in the European Union, but also to organizations based outside the European Union if they collect or process personal data of EU residents. Because of the stiff non-compliance penalties – up to €20 million (about US$23 million) or 4 percent of annual global revenue, whichever is greater – organizations need to have a structured plan in place to achieve compliance. According to Kalliopi Spyridaki, Chief Privacy Strategist at SAS, “The new definition of personal data is indicative of the overall tone of the new legislation. Under the General Data Protection Regulation, personal data is considered a valuable asset. And requirements and obligations around it are tightening up considerably. Not coincidentally, this goes hand in hand with technological trends like cloud computing, big data and the Internet of Things. With each of these technologies, data gathering and adequate data analysis are becoming strategic differentiators.” Data governance is going to be mandatory as the amount of data subject to this regulation is massive. Compliance with this regulation and future similar regulations will only be achieved through effective processes, the right technology and by having a chief data officer – a position that is being mandated by the GPDR.
I believe this is one of the best conferences for exploring big data and analytics, and I commend the program chairs for creating such an information-rich event. This year’s chairs were:
Of course, attendees also had the opportunity to meet with the companies that are focused on helping organizations succeed with big data. They include:
Alation demonstrated how the use of their Data Catalog accelerates analyst adoption of the data lake, which is critical for big data and analytic initiatives. The Alation Data Catalog also plays a crucial role in complying with regulations such as the GDPR by helping an organization find its data, better understand usage, and notify data consumers of corrective action. Several of their presentations focused on how organizations are transforming the structure of their teams and big data architectures to support broad-based analytic collaboration between data scientists, data engineers, analysts and business users.
Arcadia Data showcased customer examples of how big data innovators and leaders are making data work by finding insights within their data platforms with native visual analytics. Their customer Procter & Gamble (P&G) presented their use of visual analytics to deliver value for hundreds of P&G marketers, brand managers and product managers. CTO and co-founder Shant Hovsepian presented on streaming visual analytics, demonstrating how an organization can react with and visualize data directly from streams in Apache Kafka, Apache Spark, and Apache Flink.
Ben Szekely from Cambridge Semantics shared the capabilities of their newly launched Anzo Smart Data Lake 4.0, an end-to-end platform for semantic layers based on open standards. In my conversation with John Rueter, Vice President of Marketing at Cambridge Semantics, we discussed how their Anzo Graph Query Engine shattered the previous record of loading and querying “One Trillion Triples.” The Anzo Graph Query Engine was 100 times faster than any previous solution at the same data scale.
Cloudera’s current market vision with regard to business is a dramatic change from the technical focus they had in the past. Recent announcements from Cloudera include the launch of SDX that makes complex machine learning and analytics apps easier to develop, less expensive to deploy and more secure. Also, their Altus Data Engineering for Microsoft Azure beta offers more workload-focused, self-service data management. Cloudera also recently announced the creation of a new organization, Cloudera Foundation, that will focus on using data analytics and machine learning to improve people’s lives.
Cloudian: A partnership driving integrated, enterprise-ready AI/ML solutions caught my eye. Today’s innovation leader in enterprise object storage, Cloudian, teamed up with Skymind, the creator of the Deeplearning4j open-source deep-learning library, to create data management solutions for the hyper-scalable data sets necessary for artificial intelligence (AI) and machine learning (ML) use cases. With this collaboration, organizations are now able to extract actionable insights from vast pools of unstructured data, resulting in quicker, more effective learning and unlocking AI and ML’s business impact.
Collibra focused on the General Data Protection Regulation (GDPR) that takes effect in May of 2018. As a result of the GDPR and other similar regulations, organizations will be required to implement new business processes, enhance accountability across teams, upgrade technical systems, and create a framework that builds and maintains data compliance. Proper data management is part of the answer, but tying these efforts into a data governance framework to manage data protection is key to meeting – and sustaining – GDPR compliance. Collibra’s cross-organizational data governance platform breaks down traditional data silos and opens up organizational data so all users can find the data they need, collaborate on it, and easily understand its meaning.
Dataguise announced the latest version of the company's data-centric audit and protection (DCAP) software at Strata - DgSecure 6.2. The software has been optimized to better detect, protect, monitor, and audit sensitive data in databases, big data repositories, and file storage environments both on-premise and in the cloud, with expanded support for Google BigQuery, spark-enabled Hadoop clusters, and Apache Tez. New machine learning and behavioral analytics technology have been applied to identify and monitor the most difficult to manage data types in addition to detection support for European languages for GDPR compliance.
H20.ai: H20ai developed a way for companies to have data scientist capabilities, almost like a data scientist in a box. H2O.ai’s Driverless AI platform fully integrated on NVIDIA DGX systems lets business users, analysts and data scientists use an incredibly fast, intuitive computing platform. Driverless AI offers the first of its kind model interpretability to transparently explain model accuracy and predictions.
iguazio announced that its Unified Data Platform is now generally available. Grab, the leading on-demand transportation and mobile payments service in Southeast Asia, chose iguazio’s Unified Data Platform to accelerate innovation and boost its competitive edge in a market that serves more than 600 million consumers. While the company continues working with AWS, it uses iguazio for processing at the edge. Using the iguazio platform, Grab maximizes driver profits and reduces passenger wait times by optimizing the decision-making process using advanced real-time supply and demand heat maps.
Kinetica announced a new solution that combines the NVIDIA DGX Station and Kinetica’s GPU-accelerated database to speed up Tableau queries, providing users the ability to make faster data-driven decisions. Immediately available from NVIDIA and Kinetica business partner InterWorks, the new solution allows customers of Tableau to modernize their underlying databases to take full advantage of the insight Tableau provides. For example, customers can more quickly explore their largest data sets to immediately connect to tables, rows and columns, for fast, real-time data exploration, reporting, dashboards, and now machine learning. As a result, business analysts can make faster decisions by instantly visualizing and interacting with billions of data elements.
MapR provides the industry’s only Converged Data Platform that enables customers to harness the power of big data by combining analytics in real-time to operational applications to improve business outcomes. With MapR, enterprises have a data management platform for undertaking digital transformation initiatives to achieve competitive edge. Jack Norris emphasized that every CIO should create a data fabric that incorporates an enterprise view of data and metadata that enables seamless integration of analytics with operational processes.
Qubole presentations focused on how and why Qubole built their data platform service, Qubole Data Service (QDS,) to deploy Spark, Hadoop, and other big data engines across several public IaaS cloud platforms, such as AWS, Microsoft Azure, and Oracle Public Cloud. Qubole utilizes a software-as-a-service model for big data in the cloud and is currently managing over 750 petabytes for its customers. By separating storage from compute, Qubole gives data teams greater agility, better manageability as workloads and data sets increase in scale and a much lower TCO as customers only pay for what they actually use in the cloud.
SAS: SAS recently conducted a global General Data Protection Regulation (GDPR) survey among 340 business executives from multiple industries and geographies. The survey results highlight the biggest challenges and opportunities organizations face on the road to GDPR compliance. Less than half (45 percent) of organizations surveyed have a structured plan in place for compliance and more than half (58 percent) indicate that their organizations are not fully aware of the consequences of noncompliance.
Solix: Their Common Data Platform (CDP) enables enterprises to be data-driven. "In the era of big data, data-driven companies are the ones that will survive disruption," said Sai Gundavelli, Founder/CEO, Solix Technologies, Inc. "Specifically, those that use data to optimize their five Cs: cash, cost, compliance, cloud and customer. Artificial intelligence has the capability to match human intelligence — in healthcare, a simple $300 PC connected to a big data platform will be able to compete with a $300,000 doctor in just a few years from now. This disruption is already happening in many industries, helping drive down operating costs and improving performance."
Splice Machine is enabling the deployment of a new generation of predictive applications designed to integrate fast data streaming, transactional workloads, analytics and machine learning to improve continuous predictions and decision making. These real-time applications continuously monitor conditions to predict what could happen, and then enable you to react accordingly.
Webhose.IO is focused on turning unstructured web content into machine-readable data feeds that can be consumed on a subscription basis by individuals and enterprises on a massive scale. Webhose.IO wants to do for web data what Amazon has done for hosting.
Due to the variety and challenges with big data and analytics as well as the wide range of solutions available to assist your enterprise competitively, you'd be well advised to put future Strata Conferences on your calendar.
Recent articles by Ron Powell
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC