The big buzz around data science in the last couple years has been due to a surging interest in the enterprise (see my blog about “big data hype” that looks at its gain in prominence). Most large enterprises have been working with data science for many years and have been spending millions of dollars on tools like SAS, but only for specific areas of decision support in a very few pockets within the organization. However, with the advances in technologies like R (the open source option for advanced statistical analysis, an alternative to SAS) and the commercialization of the Apache Hadoop big data platform, these techniques are presenting themselves as enterprise opportunities in a much broader sense. The significant drop in hardware and licensing costs for these technologies has been a key enabler of their spread in the enterprise. However, given the current market conditions, discretionary budgets are minimal. So even for these relatively smaller project budgets, given the explorative / experimental nature of initial big data and data science projects, they often face the very familiar “prove the value” before the big funding situation.
In the "challenges" section of my article Big Data Projects, I outlined how different companies are grappling with getting pilots/projects initiated in this space. One theme that has surfaced several times in the last few weeks in multiple different conversations was the need to get a sense of what the initial – “prove the value” – budget size would be, which led to this article.
I will use the “outsourced” model (working with select partners to get the initial pilots done) – as I outlined in the “Big Data Project" challenges – as the basis for my thoughts in this article. Since the other two options – 1) training internal resources, and 2) hiring the needed talent – can have a very different timeline, cost and success probability associated with them, I’m not going to try to detail them here.
I will recount a recent experience to share what an initial pilot project using the "outsourced" model may look like.
This project was with an enterprise that I would classify as a small to medium business with less than $50 million in annual revenue. (Yes, small and medium businesses can also derive value from big data and data science projects.) Given their size, I wasn’t surprised that they were not even thinking of any big data or data science-related projects. However, this interest was triggered over a few conversations when I shared some of the successes I had seen with the bigger companies and their leverage of big data and data science. Despite their interest, the key sponsor was very upfront and did not mince words to set expectations that they would not be able to sign up for a big, expensive project. In their mind, getting an approval for any project that would last more than 3-4 weeks would be a hard sell to their upper management, especially since this would not be a “budgeted” project – given that it was the middle of their financial year.
The approach we took was one where we would extract data for the previous year (12 full months) from one of their key internal system as the primary dataset. The sponsor also made it clear that they would not be able to give us extensive support and would need this project to be run in a hands-off manner. We got a commitment that over this project duration we would have a maximum of six one-hour interactions with internal staff before presenting the final results.
Our charter for this initiative was threefold:
- Prove that there was enough “signal” in the data. This was intentionally vague since we didn’t know what we were looking for. Such an approach where data is analyzed without trying to answer specific questions is referred to as “hypothesis free” discovery-based approach.
- Establish that this signal could be leveraged and outline an approach to operationalize this. This would also have to establish ROI and make the case to promote this into a full-fledged production-grade project.
- Establish commercial terms and measurement criteria for a full-scale project. This was to accommodate the client’s request to keep the cost low upfront and payments tied to the value generated.
This was my first initiative of this type (i.e., “hypothesis free” and with such constraints). Honestly, I did have a bit of an uneasy feeling when this project was kicked off. I am used to the more classic kick-off where I / the team know exactly what questions we are trying to answer. But in this case, we had a four-week charter “to explore strange new worlds, to seek out new life and new civilizations, to boldly go where very few have gone before.” I guess, to some extent, we had signed up for a mission akin to that of the USS Enterprise crew from Star Trek.
Our clock started with the delivery of a .csv file containing the last 12 months of data. It had a total of about a million records, including a couple “comments” and “notes” fields. This qualified as a “big data” project because of the variety aspect of the dataset we needed to analyze.The DBA who sent us this extract was kind enough to also send an accompanying text file that gave a one-line description of all the fields that had been exported. There were about 30 fields in this file. Upon receiving this data, I reached out to our internal data scientists and handed this over to them. Now the pressure was on our data scientists to work their magic and brew the value potion from this data. I will spare you the dramatic details of what transpired over that period, but to sum up: it was a laborious, iterative and cumbersome data analysis job, like it always is.
At the end of this period, here is what we found and how we measured up to the charter:
- We showed the client that there was indeed enough signal in the million records they had given us and identified two possible value-generation opportunities.
- We established the ROI for one of these two options, where we demonstrated how a simple change to their billing process, based on one of our optimization algorithms, could increase their topline by about 7 percent. This required no other change in how they conducted their business. We also provided a minimally invasive operationalization approach to incorporate our “optimization” process to make this a production-grade initiative.
- Finally – this was the easiest part – we were able to present a low front loaded + value-based payout with commercial terms.
- To top this all off, we only consumed 4 of the 6 hours of support from their internal team.
This final presentation to the C-level execs was a big hit. We established a high level of credibility and trust with the executive team, which led to two immediate follow on projects:
- An assessment project to jointly work with the client’s team to look at other internal datasets, as well as external / syndicated datasets and present a broader “optimization” project proposal – to make this a funded 2014 project.
- Initiated a data science study to decipher a key collection phenomenon they had observed but couldn’t interpret. In a matter of two person weeks, we were able to present some compelling details and explanations.
The key learnings from this exercise were as follows:
- Data science can really help enterprises of all sizes to meet their goals and objectives.
- Project initiatives that prove value and establish ROI do not have to cost hundreds of thousands of dollars all the time – many can be well under that.
- Lastly, and more importantly, the success of such initiatives depends not only on the quality of the data science team, but also relies heavily on the strength of the subject-matter experts (SMEs) available to the project team. One will need an SME that can look at the signals being identified by the data scientists and interpret or “make sense” of them. These SMEs are often scarce resources that are difficult to find. We were fortunate for this initiative as we got support from an SME within the client organization, as well as an external partner of ours who had good knowledge in this space.
So as you think about your big data and data science projects, make sure you spend additional time identifying a subject-matter expert who knows the business as well as understands data. This person is not just a classic business analyst, but also someone that can work with data and “make sense” of signals that are identified. In most cases, this is a resource that enterprises have, very likely sitting on the business side. Look internally for this resource before going to outside sources.
With this SME expertise and a good partner with the data science credentials, you can get started on your big data journey with a modest budget and be able to significantly impact your company's top and/or bottom line.
With that, I wish you the very best and hope my experience will help you in successfully driving your big data and data science endeavors.
SOURCE: Are Big Data and Data Science Initiatives Cost Prohibitive?
Recent articles by Haranath Gnana