Extracting Business Intelligence from Social Media Dealing with Reluctant Sources
Originally published April 22, 2014
Decision making under uncertainty has been a major topic of interest in operations research and statistics for many decades. In fact, one of the authors of this article dealt with this topic in a master’s thesis in graduate school many years ago. But these days the field takes on significant new challenges as business intelligence, which is at the heart of research, increasingly involves not just analyzing transactional data sources and records but looking at social media.
Soon we discovered that though the government had immense control and reach over mass media, dissenters always find a way to voice their opinions. As a result, we had to look for websites with the postings as they existed prior to having been taken down by the government, which was often happening in near real time, just as quickly as they went up. But if we think of the Internet somewhat as an enormous sandbox, it follows that if anyone ever makes a footprint or drops a rock in that sandbox, the imprint will somehow always be retrievable. That is where the social media came in so handy. While all of the above-listed resources were used to research the situation about the widespread disease and sentiment in Oppressivestan, another important alternate resource was the use of caches.
First a reminder of what a cache is in this context. The dictionary defines it as “a hiding place, especially one concealing and preserving provisions or implements.” But in Internet parlance, it is a “a mechanism for the temporary storage of web documents, such as HTML pages and images…”1
Furthermore, Google refers to caches as “a way of retrieving information from websites that have recently gone down.”2 We like to think of a cache as “the impression of the rock that landed in our Internet sandbox.”
During our social media research of people within Oppressivestan, we came upon numerous links that were unresponsive to being clicked with a message attributing “unforeseen circumstances” or some similar excuse for the malfunction. (Pointedly, in a previous article it was noted that often Chinese censors leave messages such as “Sorry, the host you were looking for does not exist, has been deleted or is being investigated” and even leave police cartoons on the site. See Business Intelligence from Censorship.
But let the mouse hover over the search result, and an arrow appears to the right, pointing to the next search result and revealing a thumbnail. One click of that thumbnail image and the cached page (snapshots of the page prior to not functioning) appears. Comparing the dates on those websites to the bulleted facts provided an interesting insight into what was occurring in Oppressivestan at the time. Since the time of our research, websites like Google Cache Browser and Internet Archive now provide an even easier way to access cached pages that have “disappeared” from the Internet. Cached information was essential to our research, allowing us to see the planning of protests, forum discussions on dissent and how websites were altered to reflect government agendas.
As we mentioned earlier, another important factor to consider is language. As if conducting research involving foreign intelligence and news sources wasn’t challenging enough in the case of the epidemic in Oppressivestan, their primary language is not English so there was an additional barrier to address with much of the information collected. It was important to obtain the correct translation as well as pertinent definitions to ensure a clear understanding of events. Translations of entire websites were needed; and though several websites provided utilities to do so, the Google Translate Web website application was very useful. To ensure we were getting the best translation possible, we compared versions of the data through both Google and Bing Translator.
Once the research was completed, the report had to be structured and a framework developed to present the collected information. With respect to our case study, first we recognized the major actors at play – primarily the government of Oppressivestan, the international organizations and community, the epidemic disease and the affected public. Once these were presented and positioned, the topic was introduced in depth and the data analyzed to demonstrate the impact on all the relevant points. Then the factual dates gathered early in the research were cross-referenced with social attitudes. In the second portion of the report, key public opinion was introduced to provide a picture of the problem buttressed with original analysis that did not exist elsewhere. The result of this process was the presentation of findings that provided a picture of the epidemic, its origin, lifecycle and impacts on the population and the economy that was as realistic as possible. We were able to dissect what the main actors were saying in multiple sources and contrast the narrative with facts and dates that laid out a clearer version of events.
Now, back to decision making under uncertainty. There is no one “correct answer” when drawing conclusions. One can only provide informed inferences and educated observations. Two readers could essentially arrive at two different conclusions, but the researcher’s job is to lay out enough detail about the data collected and the methodology so that any interpretation of the findings can be defended and documented. With respect to our study, the research identified years of unlawful discrimination by the government of Oppressivestan and showed that its people living with a certain disease faced censorship, sexism, abuse and an innate fear of official judgment or rejection. Yet this reality would not have been discovered had we not approached our research with tenacity and creativity.
Research shouldn’t be restricted by what we want the final outcome to be. We must always maintain flexibility and allow the data to speak for itself. It is our job as business intelligence practitioners to find the right data, no matter how challenging – and give it a voice.
Copyright 2004 — 2019. Powell Media, LLC. All rights reserved.
BeyeNETWORK™ is a trademark of Powell Media, LLC