I recently was an invited speaker at a conference organized for university librarians, who are famous for taking on the role of the guardians of information and knowledge. I point this out as part of the context of one of the sidebar discussions I had with one of the other attendees about personalization, specifically with respect to search engines such as Google’s that employ personalization techniques to improve the precision of any one individual’s search results.
First, let’s consider how personalization (in this scenario) might work. For each identified individual, there are at least five obvious (at least to me) aspects to data used to create a personal profile:
- Self-reported characteristic or demographic data – The search engines like Google allow for the creation of a named identity via a number of different channels, such as email accounts, online disk storage, video channels, blogging, etc. In each of these cases, the user creates an account associated with some discrete name and one or more methods of contact – email address, telephone number, or street address, for example. These bits of information provide a starting point for integration with enhancement data sets that can link all sorts of additional demographic and psychographic information to that individual.
- Behavior data collected through “exposed” search histories – At that point, any time a user logs in to his or her account, all activity, including web searches, is logged and tracked. The collected search terms coupled with the selection of presented results provides data that incrementally grows the profile, and those bits and pieces can be used for inferencing regarding the determination of results that are more likely to be selected. For example, a person searching for information about “lower back pain” along with a host of other symptoms would allow the personalization engine to potentially tag that identity as a person suffering from lower back pain.
- Behavior data collected via affiliate networks – The proliferation of methods for tracking web activity across different parties and providers yields yet another wealthy trove of data to incorporate into a personal profile. For example, ad networks dump cookies within your browser, and each time you visit a web page that has that ad server’s ads on it, your activity can be tracked. Also, consider how many different organizations have a hand in delivering content to your browser: the content publisher, content servers, the ISP, as well as browsers and other portals, among many. They all see your HTTP requests and often will share that information with other affiliated organizations.
- Aggregate profile information related to others with similar characteristics – This is somewhat different, but relies on the data already collected for segmentation and classification among others with similar characteristics. Virtual social networks and communities of interest can be materialized to enable collaborative filtering. In other words, an individual’s search results can be refined based on the types of search results that people who are similar to that individual clicked on when they searched for the same thing.
- Behavior data collected via “unexposed” search histories – Even if a user is not presently logged into an account, if at any point that individual had logged in from the same computer on the same network, web activity marked with the same source IP address can be conflated to the same identity.
Together, all these tidbits of data can be combined to mold a search engine company’s version of a virtual persona that it will associate with a real individual. Presumably, this is for that individual’s benefit, as the search engine will find and deliver the “best” results to any individual’s personal preferences, right? Absolutely, if you are looking at the challenge as one that combines both the service and the business (i.e., advertising) contexts. When you find someone who is presumed to have back pain, not only can you provide content that can help alleviate the pain, but also point that user toward products and services that others like the individual have found beneficial.
So what does this have to do with my discussion with the librarian? She was speaking in the role of a guardian of information, and as such wanted to be able to differentiate between that role and the role of a specific individual. Search results that are refined and targeted at a virtual persona are filtered based on a set of presumptions – the search company doesn’t really know you; they have an image of what they think you are. And clearly, they don’t always get it right. For example, when we ordered an item from an online company as a gift for some friends, the algorithm cannot determine who the item is bought for – that purchase just got added into our personalized profile.
But those refinements and filters create a biased provision of results, which fundamentally is in conflict with the librarian role. My colleague was suggesting that personalization prevents
a fair and balanced presentation of information since the results are crafted to appeal to the user. Yet while she had her own preferences and biases, she did not want those to impact the searches for knowledge performed in her professional context.
That started me thinking about the degree to which we abdicate our objectivity without truly recognizing it or understanding its impact. If we only are presented with what we want to see, it masks the visibility into those things that we might disagree with. Yet disagreement is the trigger for intellectual discourse, which can lead to understanding the broader picture and ultimately resolution of differences. So here is food for thought: To what extent is intellectual discourse eroded through inferred personalization?
SOURCE: Personal Impacts of Personalization
Recent articles by David Loshin