There is something very elegant in the concept of network analysis. I am not talking about the kind of networks that connect computers, rather those that connect entities such as people, places, things, activities, etc. We take connectivity for granted, casually referring to the “six degrees of separation” (or Kevin Bacon) without considering its deeper meaning. Yet individuals and their related organizations are impacted by the social and organizational networks in which they participate, perhaps without even realizing it.
Social network analysis is the process of capturing, mapping and analyzing the relationships, interactions and flows between any number of different object classes (such as people, organizations, locations or things). Lots of things move through social networks – ideas, fashions, market tips, viruses, drugs or money. Many interesting applications can be built on top of social network analysis, ranging from the mundane (identifying cliques of experts within a medical field), to the business-oriented (creating a word-of-mouth marketing campaign), to the criminal (evaluating the way illegal drugs migrate through a sequence of communities), to the extremely critical (assessing the spread of the Bird Flu virus or tracking terrorist activity).
Social networks are represented in a relatively straightforward manner, using the same kind of graphing algorithms and methods discussed in college courses on discrete mathematics. Objects are represented as nodes, and a connection between any two nodes is represented as an edge (or link) between those two nodes. A sample network graph (or map), shown in Figure 1, displays a simple connection network. Each node in the map represents a person, and each link indicates that the two people know each other.
Figure 1: Social Network Graph
The simplicity of the connection between two objects hides potentially complex associations, and the interesting part lies in what those links really mean. Consequently, what one can learn from the analysis depends on the taxonomies and semantics associated with the defined nodes and relationships. In fact, there are some general aspects of network maps that are typically analyzed such as the degree of a node, which is the number of connections that a node has.
In the example shown in Figure 1, Joe has the highest degree of all of the nodes, with direct connections to six others. A high degree might be indicative of popularity; a person with a high number of links has a high number of direct connections to the other nodes. If you were interested in influencing a large group of people in a short time, you might contact those individuals with many direct links, as they undoubtedly would be the fastest means of spreading your message.
Another aspect of the social network is the concept of betweenness, which characterizes the degree to which any individual node exists between other nodes (or sets of nodes) within the network. A node with a high degree of betweenness acts as a bridge, with the potential of acting as an intermediary between the groups and may play a powerful role within the network. In our example, Len is the only intermediary between two groups. Any information that is to pass between members of either connected group must pass through Len.
A third aspect involves measuring closeness. For each node, closeness examines the distance to all the other nodes in the graph. In our example, Bob’s position is the best for closeness; his maximum number of steps to reach any other node is three. Other aspects include evaluating an individual’s reach (characterizing who is within that individual’s network), connectivity structures (are there patterns of connectivity that repeat across different sets of nodes) or opportunities to improve the network, such as finding a hole in the network that can be filled by triggering the establishment of a link between two nodes.
All well and good – but what does this have to do with data management? A lot, it turns out, not because of the data structures involved (which are typically mastered during one’s sophomore year as a computer science major), but because of the taxonomies that must be applied to the nodes and links within the network, and the characterization of the attributes of those objects. In addition, the approaches to identifying and collecting linkage information builds on data quality, data integration, and data and text mining techniques that are prevalent within the information management community.
The example provided is simplistic, mostly because we used a simplified definition for the meaning of a link: that the two people represented by the linked nodes “know each other.” In reality, there are many different ways that two people can know each other (casual acquaintances, work at the same company, attended the same college, are related, etc.). In a more complex graph, there might be a taxonomy associated with the linkage, with different attributes of the connection (e.g., strength, length, quality) dictated by its place in the taxonomy.
For example, there may be multiple ways that two people are related by the “work at the same company” linkage – one might be the CEO and the other an associated staff member (likely to be a weak link), or the two people may share an office (likely to be a strong link). The relationship can also be qualified (on good terms, strongly dislike each other). Additionally, we are not limited to relationships between people. We might introduce other kinds of objects (locations, tools, computer systems) into the graph and analyze their relative positions within the network to determine, for example, house purchasing patterns within connected groups, best job opportunities or sales network effectiveness.
This is a relatively simple model with very interesting applications. In an upcoming article, we will look more closely at some of the applications of social network analysis and at how linkage data can be derived from semi-structured and unstructured data sets.
Recent articles by David Loshin
Editor's note: More David Loshin articles, resources, news and events are available in the Business Intelligence Network's David Loshin Channel. Be sure to visit today!