It has pretty much become conventional wisdom that a significant aspect of customer data analytics involves attribution based on demographic data elements as a prelude to any kind of clustering and segmentation analysis. We naturally blend our perception of a marketing strategy in terms of the groups to whom a product is peddled based on their similarities. Those similarities are stated in terms of well-defined demographic dimensions – sex, age, location and income are typical.
For example, you might not be surprised to see marketing tactics focused on “Millennials,” or those people currently aged 18-34 (who presumably became “market-aware” around the turn of the 21st century). The assignment of a name to a demographic only reinforces its core set of attribution variables by essentially hiding them – we no longer think of the group based on the value of the demographic attribute (“age is between 18 and 34”), but rather as some cohesive collective sharing many other characteristics. In turn, we consider the purchasing patterns of a specific demographic cohort as if each member uses the same decision processes as every other member.
Of course, this assumption is ridiculous. While there may be some similarities, an 18 year-old male high-school dropout is not going to be making the same purchasing decisions as a 33-year-old female, top-level corporate executive. There may be cultural factors that are similar (both are probably more likely to check the time using a smartphone than with a wristwatch), and these are the factors that feed into the marketing methodologies that are driven by these core demographics.
But if we think about what these demographic attributes really imply, it may raise questions about their suitability for description and, consequently, for segmentation. For example, one might segment some typical demographic variables into categories, such as these:
- Chance demographic variables, including those variables over which the individual being characterized has no control, such as age, sex, or right- or left-handedness.
- Decision-based demographic variables, such as those variables related to a decision made by the individual being characterized, such as residential location, the type of company the individual works for or educational attainment.
- Causality demographic variables, which may result as a byproduct of the other types of demographic variables, such as the individual’s annual income being a factor of the type of job that individual has.
Often all of these types of demographic variables are lumped together, but the value of the variable used for segmentation differs based on its type. The chance
demographic variables provide a gross-level differentiation. For example, a female is more likely to buy a bra than a male is.Decision
demographics provide a more granular level of insight. For example, grouping individuals based on their residential location implies that they (presumably) have made a decision to live in that particular place, and that characteristics of that location hold some appeal to all of the individuals that are in that cohort.
Lastly, the causality
demographic variables provide another layer of dimensionality since the causal relationships can be explicit (“the 33-year-old female executive has a salary of $130,000/year”) or implicit (“the 33-year-old female executive probably has a six-figure salary”), allowing some greater flexibility in extending the use of demographics.
It is worth comparing these different types of variables. Although the chance demographics are good for dissecting larger groups into smaller sub-groups, they are probably going to exercise a lower amount of influence over the types of analytical determinations for precision-segmentation purposes. Decision demographics are somewhat more interesting in that they reflect some conscious decisions that potentially influence clustering and segmentation.
On the other hand, the causality demographics may themselves be influenced by the other two types of demographics. You could infer that these types of demographics are less useful for segmentation because
they are a byproduct of other conditions. Adding them into the variable set for a clustering process might not detract from the results, but may not necessarily lead to more precise segmentation.
Instead, the causality demographics may be more useful for two other machine learning and analytics techniques: description and prediction. Patterns of relationships that show correlation between a combination of independent chance and decision demographics and an “as of yet undetermined” (yet presumably dependent) variable can be investigated to determine if the relationship between the independent variables and the dependent one is truly causal. If so, experimental models can be developed that incorporate inferencing the probable existence of the causal variable given the existence of the independent variables.
In other words, not all demographic variables are created equal. Understanding these subtle differences can inform the analytics processes to potentially reveal interesting methods for inferencing and prediction.
SOURCE: Do Demographics Matter for Customer Segmentation?
Recent articles by David Loshin