Meta’s ad recommendation engine is powered by Deep Learning Recommendation Models (DLRMs), which incorporate thousands of artificial signals or features in a DLRMs-based recommendation system.
Limitations of DLRMs on ad recommendations
Meta’s personalized ad DLRMs rely on a variety of signals to understand people’s purchase intent and preferences. DLRMs revolutionize the way of learning from sparse features, which capture a person’s interactions on entities such as Facebook pages, which often have billions of cardinality. The success of DLRMs is built on their ability to learn generalizable high-dimensional representations, i.e., embeddings from sparse features.
To leverage tens of thousands of such features, Meta employs various strategies to combine features, transform and combine them, and finally output.
Some examples:
Ads clicked by a user in the past N days → [Ad-id1, Ad-id2, Ad-id3, …, Ad-idN] Facebook pages visited by a person in the past M days, with visit count scores for each page → [(Page-id1, 45), (Page-id2, 30), (Page-id3, 8), …]
Limitations of this approach:
Sequential information loss: Sequential information (i.e., the order of user events) can provide valuable data analysis to better recommend ads relevant to user behavior. Sparse feature aggregation loses sequential information. Granular information loss: Since features are aggregated across events, fine-grained information such as attributes within the same event is lost. Reliance on human intuition: Human intuition is unlikely to identify non-intuitive, complex interactions and patterns from large amounts of data. Redundant feature space: Multiple feature variants are created using different aggregation schemes. Despite providing incremental value, overlapping aggregations increase computational and storage costs and make feature management cumbersome.
People’s interests evolve over time and have evolving and dynamic intent. This complexity is difficult to model with handcrafted features. Modeling these interactive factors helps to gain a deeper understanding of user behavior over time, leading to better ad recommendations.
Therefore, in addition to DLRMs, event-based features (EBFs) need to be combined:
Event stream: a data stream of EBFs, such as the sequence of ads that people have recently engaged with or the sequence of pages that people have liked. The sequence length defines the number of recent events merged from each stream and is determined by the importance of each stream. Event information: captures semantic and contextual information about each event in the information stream, such as the category of ads that the user engaged with and the timestamp of the event.
Differences between DLRMs (left) and EBFs (right):