The Emergence of Data Mining
There was a time, well within the scope living memory, when the dream of market analysts was to have as much data about their prospective customers as they could possibly get. It seemed clear that if they had access to a mass of data about a customer or group of customers, the correct marketing approach would become evident. Now, that dream not only has come true, but also the mass of data concerning us all has become overwhelming. What perhaps has happened is that the means of collecting data outpaced the ability to analyze it. What may have initially begun as an analytic backlog rapidly crossed over into a veritable flood that resulted in the present day amorphous mass of both meaningful and meaningless facts.
It is now clear that means for sifting through this mass of data are required. It is also clear that the tools developed for gleaning information must be intelligent to some degree. The basic requirement is the discovery of useful nuggets of information in an otherwise chaotic data space. For example, a human trying to associate demographics in a database of 1 million records would quickly get lost in one or both of two ways.
First the search would contain so many demographic profiles that the associations would rapidly grow beyond comprehension. Secondly, narrowing the objectives to reduce the risk of being overwhelmed would likely result in the loss of critical relationships. Fortunately, machine learning methods have been developed to find meaningful information in the data glut and present it to an analyst in support of decision making. These methods support a practice called data mining – also known as predictive modeling
or predictive analytics.
Why Most Data Mining Projects Fail
Data mining has been seeping into mainstream business applications for more than two decades. Examples are beyond the scope of this article, yet numerous cases may be quickly referenced via a simple Internet or publication search. Despite a considerable level of over-hype and strategic misuse, data mining has not only persevered but matured and adapted for practical use in the business world. How could a community that is so data-rich and profit-driven abandon a tool that can validate its own ability to predict customer or operational behavior? Its progress is unstoppable, propelled by sustained value justifications – yet stinted by the complexities of development, interpretation, integration and adoption.
Data mining projects do not fail because of poor or inaccurate predictive models. The most common pitfalls in data mining involve a lack training, overlooking the importance of a thorough pre-project assessment, not employing the guidance of a data mining expert and not developing a strategic project definition adapted to what is essentially a discovery process. A lack of competent assessment, environmental preparation and resulting strategy is precisely why the vast majority of data mining projects fail.
The market is saturated with highly effective tools for data mining. However, too many business professionals rush to analyze their data and build reasonably good models that answer the wrong questions. They often do not determine in advance whether they even have the ability to act on what the model suggests. Will users trust and adopt the resulting system? Can the model’s results be translated into meaningful impact for stakeholders and executive management? Does the information hidden within the data adequately support the objectives that management needs to target? If the answer to any of these questions is “no,” then the resources and objectives either need to be re-oriented or the project should be suspended. Alternatively, once a solid strategic platform has been established, the tactical model implementation is easy by comparison.
Where to Begin
The best results in data mining are achieved when a data mining expert combines experience with an organizational domain expert. While neither need to be fully proficient in the other’s field, it is certainly beneficial to have a basic grounding across areas of focus. Even if a data mining project is entirely outsourced, substantial advantages await the organization that is trained to recognize elusive pitfalls, speak confidently about data mining methods, appreciate trade-offs between accuracy and explainability, collaborate more effectively for data preparation and interpret the model’s results succinctly. Such knowledge can also serve well toward evaluating vendors, interacting with project managers and having better instincts for calling suspect results or approaches into question.
Numerous data mining conferences and public training courses exist. Many tool vendors have excellent instructors and worthwhile courses, particularly for their customers. Most times, however, courses offered by tool vendors restrict the scope of the content to accommodate the methods and features available in their solutions. Since tools should not be considered until later in the process, try to identify vendor-neutral conferences and courses to receive an unbiased and non-promotional presentation. Two providers of vendor-neutral data mining conferences and courses are: The Data Warehousing Institute
and The Modeling Agency
If staff or time simply does not exist to train up internally, consider hiring an independent data mining expert who may perform as a liaison and third-party project advocate between your organization and the main project vendor. The consultant should hold three qualities in combination: 1) Most obviously the consultant should be well-steeped in the data mining process with a strong track record of application success; 2) The consultant should be “multi-lingual” – able to converse fluently across teams of analysts, IT staff, users, directors and executive management; 3) Most importantly, s/he should be a seasoned business consultant – not rushing to analyze the data, but focusing first on amassing a comprehensive assessment of all resources, applicable history, benchmarks and objectives. The assessment will then drive an adaptable overarching plan with initial stages firmly mapped. When implemented properly, the data mining project assessment is arguably the most critical component to determining the successful outcome of a data mining project.
Regrettably, most organizations embark on doomed projects by presuming that the data mining process is akin to common software engineering projects. They start by directly evaluating data mining tools without the context of environmental considerations, resource conditions and the potential range of operational goals. They then haphazardly develop models before establishing a framework for results evaluation and actionable implementation.
The ultimate cost of a failed first pass can be substantial. Not only will the organization suffer opportunity costs from value never realized, but competitors will have a greater window to capitalize on the benefits. As well, data mining as a practice may be improperly written off as a viable option in the organization’s BI
toolbox. But it will eventually return when increasing buzz around high-ROI cases resonates and the cost of operating retrospectively is recognized.
The few who make an initial investment in formal training will establish a confident platform from which to make a structured and organized approach to data mining. Their expectations will be leveled properly around data mining’s true risks, rewards, capabilities and limitations. These leading managers and practitioners will understand the value of seeking the mentorship of data mining experts and conduct a thorough project assessment prior to tool selection and model development. Those who train first and consult experts in this particular field will efficiently reap the vast rewards that predictive analytics
can offer – and on the first attempt.
SOURCE: How to Prepare for Data Mining
Recent articles by Ben Hitt, Ph.D., Eric King