We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

How to Prepare for Data Mining A Primer for Assessing and Approaching Successful Projects in Predictive Analytics

Originally published July 23, 2009

The Emergence of Data Mining

There was a time, well within the scope living memory, when the dream of market analysts was to have as much data about their prospective customers as they could possibly get. It seemed clear that if they had access to a mass of data about a customer or group of customers, the correct marketing approach would become evident. Now, that dream not only has come true, but also the mass of data concerning us all has become overwhelming. What perhaps has happened is that the means of collecting data outpaced the ability to analyze it. What may have initially begun as an analytic backlog rapidly crossed over into a veritable flood that resulted in the present day amorphous mass of both meaningful and meaningless facts.

It is now clear that means for sifting through this mass of data are required. It is also clear that the tools developed for gleaning information must be intelligent to some degree. The basic requirement is the discovery of useful nuggets of information in an otherwise chaotic data space. For example, a human trying to associate demographics in a database of 1 million records would quickly get lost in one or both of two ways.

First the search would contain so many demographic profiles that the associations would rapidly grow beyond comprehension. Secondly, narrowing the objectives to reduce the risk of being overwhelmed would likely result in the loss of critical relationships. Fortunately, machine learning methods have been developed to find meaningful information in the data glut and present it to an analyst in support of decision making. These methods support a practice called data mining – also known as predictive modeling or predictive analytics.

Why Most Data Mining Projects Fail

Data mining has been seeping into mainstream business applications for more than two decades. Examples are beyond the scope of this article, yet numerous cases may be quickly referenced via a simple Internet or publication search. Despite a considerable level of over-hype and strategic misuse, data mining has not only persevered but matured and adapted for practical use in the business world. How could a community that is so data-rich and profit-driven abandon a tool that can validate its own ability to predict customer or operational behavior? Its progress is unstoppable, propelled by sustained value justifications – yet stinted by the complexities of development, interpretation, integration and adoption.

Data mining projects do not fail because of poor or inaccurate predictive models. The most common pitfalls in data mining involve a lack training, overlooking the importance of a thorough pre-project assessment, not employing the guidance of a data mining expert and not developing a strategic project definition adapted to what is essentially a discovery process. A lack of competent assessment, environmental preparation and resulting strategy is precisely why the vast majority of data mining projects fail.

The market is saturated with highly effective tools for data mining. However, too many business professionals rush to analyze their data and build reasonably good models that answer the wrong questions. They often do not determine in advance whether they even have the ability to act on what the model suggests. Will users trust and adopt the resulting system? Can the model’s results be translated into meaningful impact for stakeholders and executive management? Does the information hidden within the data adequately support the objectives that management needs to target? If the answer to any of these questions is “no,” then the resources and objectives either need to be re-oriented or the project should be suspended. Alternatively, once a solid strategic platform has been established, the tactical model implementation is easy by comparison.

Where to Begin

The best results in data mining are achieved when a data mining expert combines experience with an organizational domain expert. While neither need to be fully proficient in the other’s field, it is certainly beneficial to have a basic grounding across areas of focus. Even if a data mining project is entirely outsourced, substantial advantages await the organization that is trained to recognize elusive pitfalls, speak confidently about data mining methods, appreciate trade-offs between accuracy and explainability, collaborate more effectively for data preparation and interpret the model’s results succinctly. Such knowledge can also serve well toward evaluating vendors, interacting with project managers and having better instincts for calling suspect results or approaches into question.

Numerous data mining conferences and public training courses exist. Many tool vendors have excellent instructors and worthwhile courses, particularly for their customers. Most times, however, courses offered by tool vendors restrict the scope of the content to accommodate the methods and features available in their solutions. Since tools should not be considered until later in the process, try to identify vendor-neutral conferences and courses to receive an unbiased and non-promotional presentation. Two providers of vendor-neutral data mining conferences and courses are: The Data Warehousing Institute and The Modeling Agency.

If staff or time simply does not exist to train up internally, consider hiring an independent data mining expert who may perform as a liaison and third-party project advocate between your organization and the main project vendor. The consultant should hold three qualities in combination: 1) Most obviously the consultant should be well-steeped in the data mining process with a strong track record of application success; 2) The consultant should be “multi-lingual” – able to converse fluently across teams of analysts, IT staff, users, directors and executive management; 3) Most importantly, s/he should be a seasoned business consultant – not rushing to analyze the data, but focusing first on amassing a comprehensive assessment of all resources, applicable history, benchmarks and objectives. The assessment will then drive an adaptable overarching plan with initial stages firmly mapped. When implemented properly, the data mining project assessment is arguably the most critical component to determining the successful outcome of a data mining project.

Regrettably, most organizations embark on doomed projects by presuming that the data mining process is akin to common software engineering projects. They start by directly evaluating data mining tools without the context of environmental considerations, resource conditions and the potential range of operational goals. They then haphazardly develop models before establishing a framework for results evaluation and actionable implementation.

The ultimate cost of a failed first pass can be substantial. Not only will the organization suffer opportunity costs from value never realized, but competitors will have a greater window to capitalize on the benefits. As well, data mining as a practice may be improperly written off as a viable option in the organization’s BI toolbox. But it will eventually return when increasing buzz around high-ROI cases resonates and the cost of operating retrospectively is recognized.

The few who make an initial investment in formal training will establish a confident platform from which to make a structured and organized approach to data mining. Their expectations will be leveled properly around data mining’s true risks, rewards, capabilities and limitations. These leading managers and practitioners will understand the value of seeking the mentorship of data mining experts and conduct a thorough project assessment prior to tool selection and model development. Those who train first and consult experts in this particular field will efficiently reap the vast rewards that predictive analytics can offer – and on the first attempt.

  • Ben Hitt, Ph.D.Ben Hitt, Ph.D.

    Ben A. Hitt, Ph.D., is the Director of the Schenk Center for Informatic Sciences (SCIS) at Wheeling Jesuit University. He founded the SCIS in 2006 as part of an initiative to emphasize the role of information in the conduct of business in today’s fast paced global economy. The SCIS is engaged in research and development of information discovery and dissemination systems include advanced concept searching, next generation instructional systems, vehicular safety systems and personal health and fitness systems. Dr. Hitt also is actively engaged in developing and teaching courses in information theory and practice.

    Dr. Hitt is a co-founder of Correlogic Systems, Inc. Along with Peter Levine, he is co-inventor of the pattern recognition approach to disease detection. He is also the inventor of Correlogic's proprietary Proteome Quest® software that is designed to analyze patterns in human blood proteins to detect disease. He is a nationally recognized expert in data mining and pattern recognition solutions. He is the inventor and patent holder of numerous algorithms and computer programs. Dr. Hitt’s works include inventions for analyzing disparate data streams, near real-time analysis of audio and text information streams, applications for the detection of credit card fraud, and optimization solutions for direct marketing problems.

    Prior to his work at Correlogic, Dr. Hitt served as Senior Principal Software Engineer for Raytheon Systems. He later held positions with NeuralWare Inc., Advanced Software Applications, and American Heuristics, which he also co-founded. During this period, he developed and expanded his concepts of employing algorithms in data mining and other complex problem-solving applications. Progressing from his initial work with neural networks, Dr. Hitt incorporated genetic algorithms and related analytical techniques into his later inventions.

  • Eric KingEric King

    After graduating from the University of Pittsburgh with a bachelor's degree in computer science in 1990, joined NeuralWare, Incorporated, a neural network tools company, as a senior account executive. In 1994, he moved to American Heuristics Corporation (AHC): an advanced software technology consulting company specializing in artificial intelligence applications. At AHC, he performed as the director of business development for the commercial services division and the training division, The Gordian Institute. With the valued support of AHC, contractors, customers and family, Eric founded The Modeling Agency in the spirit of establishing long-term professional relationships.  The Modeling Agency's focus is providing guidance and results for those who are data-rich, yet information-poor. Eric may be contacted by e-mail at eric@the-modeling-agency.com.

    Editor's Note: More articles and resources are available in Eric's BeyeNETWORK Expert Channel on data mining and predictive analytics. Be sure to visit today!

Recent articles by Ben Hitt, Ph.D., Eric King



Want to post a comment? Login or become a member today!

Be the first to comment!