What is Social Network Analysis (SNA)?
The world is more connected than ever before. Your smartphone connects to your watch, bathroom scale, TV, webcam and car. Websites like Facebook and Twitter catalogue thousands of connections between friends, while LinkedIn tracks our professional connections and past job history. Technology aims to connect just about everything with just about everyone, so how do we even begin to understand and make sense of all these connections? All of this connection information is no doubt stored in a database somewhere.
The easiest way to get a grasp on connection data is to visualize it. The problem with the data is that it is not in a format that people can digest. Rows and columns in a database table by the millions don’t tell humans much without pre-processing. Our brains process visual information much more quickly than reading through raw data. We can “see” how people are connected by looking at a map—which Social Network Analysis (SNA) aims to do by using ideas from graph theory to draw connection data into a map of connections between entities.
How SNA can work for fraud analysis
When we think about webs of connections (a.k.a. networks), the first thing that comes to mind is generally Facebook. Bob is friends with Larry, who is in turn friends with Sue. However, many other types of networks exist in the world beyond social media. You are reading this article on one right now (a computer network). Criminals also form networks—whether they are for collusion, organized crime or simple convenience. These networks can be discovered and mapped using social network analysis. One of the key data science ideas of the 21st century is to find useful ideas from other disciplines and apply them to new use cases. Criminal Investigators have been doing network analysis for years the “analog” way by plotting connections between criminals on a pin board. Each criminal has a photo on the board, where bits of twine are used to show connections between the criminals. The twine (connections) can also be labeled to show what the connection is. For example, Jim and Jerry share a business address.
Since we live in the digital age, we can replace pinboards and bits of string with big data. Going back to the “web” of data analogy, we use simple programming and query methods to create a list of “connections.” Known fraud perpetrators can be the center of your network, but you will also be interested in who those perpetrators are connected to and how. In the example below, a database of made up criminals and their associates are graphed based on shared addresses, phone numbers and e-mail accounts. The number of connections each person has is used to color code the graph. Warmer colors mean more connections. The layout of the graph places well-connected people near the center of the graph.
You can also discover undetected frauds using SNA, as large networks of highly connected people could indicate a new network of fraudsters. The first time I used this technique professionally, I used it on a known population of fraudsters. We expected some of the results, but plenty of unexpected results popped out on the visualization. Fraudsters shared data with non-fraudsters (these connections gave us new leads to forward for investigation). Seemingly unconnected people were also connected to the fraudster through multiple degrees of separation.
You can see in the map below that Jessica, Charlie, and Sam are connected to the network via the more highly connected person, Oscar. In this example, an investigator might consider looking into these three individuals, even though they are connected to the outside of the criminal network.
Technology and tools exist for creating analyzing network graphs. The above examples were created using an ACL script and an Excel add-on called NodeXL. Clustering algorithms can also be used to find subgroups within networks based on how individuals are connected.
This article was written by Rajdeep Dutta, Head of Redwood Knowledge Centre