A Latent Factor Model for Board Recommendations in Pinterest
MetadataShow full item record
The past two years have seen the rise of a new online social network ? Pinterest ? which has grown more rapidly than any other social network (now reaching 70 million users). Pinterest is primarily organized around photos (or ?pins?), where users reveal their interests via organizing pins into self-assigned categorical boards. However, one of the key challenges for new and existing users of Pinterest is to find boards of interest from the overall collection of 750 million boards. Hence, this thesis focuses on the problem of board recommendation in Pinterest towards identifying personalized, high-quality boards without requiring exhaustive search or browsing by the user. Board recommendation in Pinterest is challenging for a number of critical reasons: (i) Unlike community-oriented recommenders for movies, books, and other media, boards are highly personalized and not viewed or rated by many others. (ii) Many pins and boards lack descriptive text and other features that are typically used to power modern recommenders. (iii) Finally, evaluating the quality of a Pinterest board recommender is difficult, since there are no baseline nor ground truth recommendations of Pinterest to compare against. With these challenges in mind, this thesis proposes a new latent factor model for generating Pinterest board recommendations. To tackle the feature sparsity and personal boards challenges, the overall approach generates ratings for every user-board pair which is then fed to a latent factor model which factorizes the sparse matrix to give ratings for unrated user-board pairs and the top rated boards form the recommendation list. Two of the key components of the proposed latent factor model are the (i) definition of the universe of users around each target user for identifying candidate boards to recommend; and (ii) the approach for assigning implicit ratings to each user-board pair for this universe of users (as the basis of the latent factor model). For the first component, we investigate three universe types: a collection of randomly selected users, a collection of users in the target user's personal Pinterest network, and a collection of users who are ?similar? to the target user. For the second component, we construct ratings via three approaches: a board-count method, a category-based method, and and LDA-based method. We investigate these design choices through a comprehensive set of experiments over a dataset of around 50,000 Pinterest users, 100 million pins, and around 570,000 boards.