Text Analytics to the Rescue Identifying Online Bullies

Originally published February 9, 2011

In an anti-bullying campaign, Andrew Noyes, Facebook’s Communications Manager, described the two primary methods Facebook uses to help stop cyber bullies.

  1. “Neighborhood Watch” – the Facebook community reports offensive web pages for a Facebook team to review and take action;

  2. Technology to capture and flag the offending comment or message.
Andrew is somewhat coy about the second method, focusing instead on the self-policing of the community.

But, as a text analytics technology supplier, we can reveal some promising approaches that should be very effective in countering cyber bullying.

Following the optimal hybrid approach—human expertise plus machine learning—we recommend:
  • Gathering information

  • Analyzing the data

  • Learning from the results

  • Using discovery and topic migration to stay on top of the problem
I’ll outline how this hybrid approach can be applied.

Gathering the Right Information

The “Neighborhood Watch” technique relies on Facebook users to report bullying and offensive material, which is then reviewed by cyber reviewers—people trained to spot all types of objectionable material. According to Mr. Noyes, it’s very effective in detecting objectionable content.  There are two problems with this methodology, though.
  1. The viral issue: Once bullying has gone viral it is difficult stop. Plus, there’s the ”tattle tale” problem (no one wants to be seen as one) meaning that the bullying messages may go unreported hindering early detection and response.

  2. Cyber reviewer fatigue: For obvious reasons, this is considered one of the worst jobs ever, as reported in the New York Times. Using software technology to limit the quantity and skim off the most objectionable material will help these reviewers perform their jobs more effectively.

Computer Software Makes Quick Work of Detection

A better way is using computer software to filter and flag potentially disturbing content. Computers are fast, efficient and are not affected by content fatigue or human inconsistency. There are two techniques used in determining content type. The rules approach or the statistical approach both require input from subject matter experts—the cyber reviewers. These people have seen virtually everything under the sun, and are well equipped to help build the rules and train the statistical model to determine how to detect cyber-bullying.

Rules Approach

A rules-based approach involves writing rules based on grammar, linguistics, patterns, and concepts to determine what defines cyber bullying. The knowledge and experience of the cyber reviewers allows them to provide input on what rules are needed to help identify cyber bullying.  

Let’s look at the phrase:
“She is”

It could mean various things:  She is in a frame of mind; is her name; or maybe this is part of a cyber-bully attack. Subject matter experts would help set up rules to help determine which category the content belongs to. For example,
  • If “is” NEAR “” and NEAR “POSITIVE ADV or POSITIVE ADJ or POSITIVE PICTURE ” classify as FRAME OF MIND

  • If “is” NEAR “” and “FIRSTNAME ” and PRONOUN “I” REFERS TO “FIRSTNAME ” then classify as PERSONS NAME

  • If “is” NEAR “” AND NEAR PROFANE TERMS or OBJECTIONAL VIDEO or OBJECTIONAL PICTURE then classify as CYBER BULLY

Statistical Approach

Text mining uses a statistical approach on a collection of documents to determine how to classify the documents. 

Subject matter experts classify a sample of documents already identified as bullying. This is called a training set. Drawing on the training set, the computer develops a statistical model it will use to classify future documents as bullying or not.

Hybrid Approach

Optionally, you can deploy the rules model together with the statistical model for a hybrid approach.  Once developed, the rules are executed (by machines) in real-time on new content—sorting through and assessing the severity of the text intent automatically. The two models will determine the likelihood of a document being classified as good or malicious, by providing a score between 0 and 1. The hybrid approach weights the likelihood to derive a combined score, which determines how to handle the document. For example, if the score is above .90, you might automatically remove the document; if the score is between .75 and .90, you might send the document for manual intervention.

A cyber reviewer then confirms if the document is offensive or not. By focusing on the highest scoring documents, the workload is prioritized.  If determined to be a cyber-bully document, the likelihood score would be changed to 1 by the reviewer. This insight provides new input into the model, thereby enhancing its ongoing effectiveness.

Discovery and Topic Migration

Information, words, and techniques undergo continual change.  This is why we need topic discovery and topic migration techniques.  The process involves periodically capturing and scoring new training data. Applying text mining to that updated collection of documents provides critical insight to understand new patterns.   

Text mining can also discover new and emerging concepts and categories. 

New concept identification means that the rules may need to be modified and statistical models retrained.  This is how to analyze topic migration.

Also, text mining spots emerging topics. These haven’t been thought of before, so new rules need to be created and monitored to account for them.  The statistical model will again need to be retrained and redeployed to account for this new information.

Techniques can also Predict Cyber Bullying  

Here are some other key technologies that are used in conjunction with text analytics technologies.

Popular with marketers, customer link analytics analyzes networks of friends, and social networks to determine who is influencing whom, and where to focus attention.

Using social media analytics metrics such as virality and velocity, measure how fast and far the objectionable material travels. The goal with regard to cyber bullying is to prevent the content from going viral and minimize the uptake of the content.

Data mining
integrates text mining and relational data from customer link analytics and social media analytics to develop predictive models that project when and where the bullying is most likely to occur.

More Information on Cyber Bullying: You Can Help

Love our Children was started by Krystin Moore, a bullied Miss Teen NJ.

Facebook Safety tells how to prevent cyber bullies and what to do if you or someone you know is a victim of cyber bullies.

Cyber Bully Prevention discusses technology you can use to protect your kids from cyber bullies.

SOURCE: Text Analytics to the Rescue

  • Richard FoleyRichard Foley
    Active in the online community since 1996, Richard Foley has served as SAS Product Manager since 2001. He is currently World Wide Product Manager for SAS Text Analytics, and he oversees product direction of text mining, enterprise content categorization, sentiment analysis, ontology management and enterprise search.

    Prior to SAS, Richard worked on implementing web analytics standards and KPIs for Medscape, now part of WebMD. He is a Director Emeritus of the Web Analytics Association. He served as President of the Web Analytics Association where he was the Director of Advocacy, including Privacy and Ethics Policies.  



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!