Social media is a series of networks connecting individuals, companies, organizations, and groups to one another. These networks can transcend local, national, and international borders connecting people to networks far and wide. With all those connections, how can a user find the ones that they want to connect with? That’s where follow suggestions come in. These days, many successful apps on social media offer some kind of follow suggestion functionality. This provides a way to recommend people, that you’re connected to in real life (or that you may be interested in), to follow based on your career, interests, or other connections. This brings up two questions: how do you implement this, and why would you want follow recommendation functionality in your application? Let’s start with “why” first.
Successful Follow Recommendations Lead to a Dynamic, Connected Network
The more connected your network is, the more engaging it becomes. With more common connections, users are more likely to engage with that network and share its content with others. The more people, places, topics, etc. users follow, the more likely they are to revisit your app for interesting content. Not only does a highly connected network become more engaging for the end user, it provides better insights for the development team into each user’s engagement. It also creates clusters of users which can be used for targeted marketing techniques.
Implementing Follow Recommendations For Your Application
Follow suggestions can be framed as a link prediction problem in social networks. Link prediction is modeling the evolution of a network by using features intrinsic to the network topology at time ‘t’ to predict what new connections there will be at time ‘t’. By asking the question, ‘who is most likely to follow whom?’ one can optimize follow suggestions, and help grow the network as quickly as possible. Let’s take a look at 4 different approaches to link prediction.
Perhaps the most intuitive, or most similar to a real life scenario, is neighborhood-based methods which rely on friends in common. In it’s simplest form it can be looked at as Alice is friends with Bob, Bob is friends with Frank, maybe Alice should be friends with Frank.
Some common methods to compute the likelihood of new links based on this idea include the Jaccard Coefficient and the Adamic-Adar Index:
The Jaccard Coefficient
The Jaccard Coefficient calculates the probability that there is a neighbor that both u and v have for a randomly selected neighbor that either u or v have. For the above graph, the Jaccard Coefficient between Bob and David would be calculated as follows:
This states that for a randomly chosen friend of Bob, there is a 2/3 chance that David also has that friend. This has a nice feature that normalizes nodes that have many neighbors. The Jaccard Coefficient for the rest of the missing edges comes out to be:
The Adamic-Adar Index:
The Adamic-Adar predictor refines the simple counting of common features between “u” and “v” by weighing less-used features more heavily. Friends without many connections, but who bridge a gap between you and someone else, have more influence than someone who is friends with everyone. For the above graph, the Adamic-Adar between Bob and David is calculated as follows:
Find friends in common:
Sum 1/log of the degree of those friends in common:
The rest of the missing edges comes out to be:
Perhaps the most well known path-based method is the PageRank Algorithm, originally used by Google to rank webpages in order of importance. With the path-based method, a “random surfer” is modeled with limited attention and will follow links from one page to another, and then randomly close and restart on a random page. The PageRank is calculated by the proportion of number of times that surfer went to a given page. For social networks, this algorithm can be personalized for an individual node by changing the random reset location to itself. For example, to calculate the personalized PageRank for Alice, we would set the probability that the surfer resets all nodes (except Alice) equal to ‘0’, then set the probability that the surfer resets Alice’s node equal to ‘1’. Running this on the above graph gives:
A nice feature the path-based methods give is the ability to weigh edges. A simple example is to weigh each edge by how recently it was created, allowing newer friendships or follows to have more weight and higher ranks than older ones. There are, of course, other algorithms that may better suit the needs of your applications such as HITS and SALSA, which calculate 2 metrics for each node (Hubs and Authority). These metrics may explain your network better if it’s directed (follow relationship) instead of undirected (friendship)
Similar Interest-Based Methods
Stepping away from graph algorithms for a minute, another approach is similarity metrics (we wrote a blog post about matrix factorization for more information on how to calculate these). By using matrix factorization techniques one can extract user feature vectors and calculate the similarities; cosine similarity is one of the more popular methods. The thought process behind the interest-based method is simple: if there is a user with similar interests to you, there is a good chance you’d want to follow that user. This can be particularly useful in follow style platforms such as Twitter.
Machine Learning-Based Methods
This leads to a question of, ‘which method is best for my application?’ which depends on what you are trying to accomplish, though I’d be remiss if I didn’t let a computer try to figure it out. If you treat the link prediction problem as a binary classification problem (1 for connected nodes and 0 for not connected) and use a mix of the features from above as input features for our model, you can then use your favorite classification model. SVM, decision trees, or XGBoost should work fairly well here.
Other Uses for Link Prediction in Networks
Another interesting use case for link prediction in networks is in e-commerce and targeted marketing, such as Amazon’s “People who bought ‘X’, also bought ‘Y’.” By framing this as a link prediction problem it is possible to recommend the most relevant product for that user. More specifically, this is a link prediction problem in a BiPartite network where one set of nodes are users and the other set are products.
Follow Suggestions at Stream
Here at Stream we personalize all of our recommendation systems for individual clients. Once integrated with our framework, clients can connect to a personalized API endpoint to get a list of follow recommendations. Feel free to reach out regarding personalization. We can help pick out the perfect algorithm and deploy it at scale for your application.