Novelty and Diversity
Evaluation and Enhancement
in Recommender Systems


Saúl Vargas Sandoval


PhD Thesis supervised by
Prof. Pablo Castells

UAM

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Recommender Systems

Definition

Recommender Systems are tools and techniques that provide suggestions about products, video, music, or other resources in a personalized manner.

recsys definition

Examples

Spotify Netflix Amazon
Twitter Facebook Linkedin

Novelty and Diversity

An obvious, redundant recommendation

Recommending The Beatles
The Long Tail The Long Tail

The Long Tail

A few of the most popular items concentrate high sales volumes.

The rest — the Long Tail — have moderate ones.

Promoting sales in the Long Tail has benefits:

  • For the business: make the most of the catalog, niche markets ➡ Sales diversity
  • For the users: less obvious, unexpected recommendations ➡ Novelty

The Harry Potter Effect: personalized recommendations have a bias towards recommending popular items.

Different perspectives on Novelty and Diversity

  • Long Tail Novelty: avoid popular items.
  • Unexpectedness: surprise the user.
  • Temporal Novelty: the same again?
  • Intra-List Diversity: a little bit of everything.
  • Sales Diversity: make the most of the catalog.
  • Sales Novelty: differentiate yourself from competitors.

Diversity in Information Retrieval

IR Diversity

Research Goals

  • RG1: develop a clear common methodological and conceptual ground for novelty and diversity in recommendations.
    ➡ 2. A Unified Framework for Novelty and Diversity
  • RG2: explore the application of theories and methods from Information Retrieval diversity to Recommender Systems.
    ➡ 3. Intent-Aware Diversity
  • RG3: devise recommendation-specific techniques for enhancing the diversity within recommendations.
    ➡ 4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  • RG4: proposing new techniques for alleviating the popularity bias in recommendations.
    ➡ 5. Improving Sales Diversity by Recommending Users to Items

Experimental Design

Common and clear experimental design across experiments.

  • As clear as possible to facilitate reproducibility.
  • Three collaborative filtering datasets:
    • Movie recommendation: MovieLens1M and Netflix.
    • Music recommendation: Million Song dataset.
  • Varied recommendation algorithms:
    • Non personalized: random, most popular.
    • Memory-based: user and item-based.
    • Latent factors: pLSA and iMF.

Code available in http://ir-uam.github.io/RankSys/.

RankSys

RankSys
Java 8 Recommender Systems
framework for novelty, diversity
and much more

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Research Goals

  • RG1: develop a clear common methodological and conceptual ground for novelty and diversity in recommendations.
  • RG2: explore the application of theories and methods from Information Retrieval diversity to Recommender Systems.
  • RG3: devise recommendation-specific techniques for enhancing the diversity within recommendations.
  • RG4: proposing new techniques for alleviating the popularity bias in recommendations.
  • Problem: limitations of the SoA in novelty and diversity evaluation:
    • Unexplored connection between perspectives.
    • Lack of consensus on metrics.
    • Lack of position and relevance-awareness.
  • Proposal: a new unified framework.
    • Item novelty models.
    • Browsing models.

Item Novelty Models

Idea: the novelty and diversity of recommendations can be decomposed as the aggregation of the individual novelty of the items that compose them.

Item Novelty Model \[nov: \mathcal{I} \rightarrow [0, \infty)\]

An item novelty model is defined by two elements:

  • Novelty context.
  • Measurement approach.

Novelty Context

“Novel with respect to what?”

Novelty or diversity perspective ⬌ context $\theta$

  • Long Tail Novelty: what the community knows.
  • Unexpectedness: what the user is familiar with.
  • Temporal Novelty: past recommendations.
  • Intra-List Diversity: the rest of items in a recommendation.
  • Sales Diversity: what is recommended to other users.
  • Sales Novelty: what others systems recommend.

Measurement Approaches

“Estimate the degree of novelty numerically.”

\[nov: \mathcal{I} \times \Theta \rightarrow [0, \infty)\]

We propose two families of measurements:

  • Discovery-based: probability of being discovered.
  • Distance-based: dissimilarity from context.

Examples

Perspective Model Context Measurement
Long Tail Novelty Free discovery (FD) $\theta = \mathcal{R}$ $-\log_2 p(i | seen, \theta)$
Intra-List Diversity Intra-List Distance (ILD) $\theta = R_u$ \[\sum_{j \in \theta} p(j | choose) \; dist(i, j)\]
Sales Diversity Inter-User Reciprocal Discovery (IURD) $\theta = \left\lbrace R_v \right\rbrace$ $1 / p(i | seen, \theta)$

Browsing models

Idea: the effective contribution of an item depends on its probability of being sampled by the user.

Effective contribution of item $i$:
$p(choose | i, u, R) \; nov(i)$

Chosen if seen and relevant:
$p(choose | i, u, R) = p(seen | i, R) \; p(rel | i, u)$

$p(seen | i, R) = disc(k_i^R)$ rank-awareness

$p(rel | i, u)$ from user judgments relevance-awareness

Metric schemes

User-side: novelty or diversity of a single user recommendation.

\[m(R_u) = C_u \sum_{i \in R_u} disc(k_i) \; p(rel | i, u) \; nov(i)\]

Business-side: novelty or diversity of a set of recommendations.

\[m(S) = C_S \sum_{i \in R_u} \left[ \sum_{u \in \mathcal{U}} C_u disc(k_i^{R_u}) \; p(rel | i, u) \right] \; nov(i)\]

Resulting Metrics

Our framework...

  • generalizes many metrics from SoA under a common ground,
  • and incorporates rank and relevance to them.

Examples:

  • EFD: Mean Inverse User Frequency (MIUF) of Zhou et al. (2010).
  • EILD: Intra-List Similarity (ILS) of Ziegler et al. (2005).
  • EIURD: Aggregate Diversity (Aggr-div) of Adomavicius and Kwon (2012).

Enhancing novelty and diversity:
re-ranking strategies

Idea: use item novelty models to optimize novelty and diversity in recommendations.

Re-ranking of the recommendation list $R$ to optimize the relevance-novelty trade-off of items.

Two variants:

  • Direct re-scoring of the recommender's output: \[s_{nov}(i) = (1 - \lambda) \; s(u, i) + \lambda \; nov(i)\]
  • Greedy selection when $R \in \theta$: \[s_{nov}(i | S) = (1 - \lambda) \; s(u, i) + \lambda \; nov(i | S)\]

Experiments

An extensive set of experiments is carried out.

Aiming to know:

  • The behavior of recommendation algorithms w.r.t. the metrics of our framework.
  • The effects of rank and relevance-awareness.
  • The outcome of the re-ranking techniques.
  • Relationship between metrics.

Novelty and Diversity of
SoA recommendation algorithms

An updated analysis of the behavior of SoA recommendation algorithms w.r.t. the novelty and diversity metrics of our framework.

Some overall observations:

  • Random recommendations are the most novel and diverse when not considering relevance.
  • Latent factors more novel than nearest neighbors.
  • Latent factors less diverse than nearest neighbors.

Relevance-awareness

Figure 7

EIURD (generalization of Aggr-Div) in MovieLens1M

Rank-awareness effect is more subtle.

Re-ranking Strategies

Figure 8b

Re-ranking (top 100) targeting EFD and EILD in MovieLens1M

Relation between metrics and perspectives

Interplay between metrics and re-ranking strategies to unveil relations between perspectives and metrics

We find the following results:

  • Popularity bias: Long Tail Novelty, Sales Diversity and Sales Novelty are related.
  • Distance-based metrics: using profile intersection is not equal to comparing content.
  • Intra-List Diversity: distance-based metrics not related to Intent-Aware metrics.

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Research Goals

  • RG1: develop a clear common methodological and conceptual ground for novelty and diversity in recommendations.
  • RG2: explore the application of theories and methods from Information Retrieval diversity to Recommender Systems.
  • RG3: devise recommendation-specific techniques for enhancing the diversity within recommendations.
  • RG4: proposing new techniques for alleviating the popularity bias in recommendations.
  • Problem: Search/recommendation list diversification is been addressed differently in RS and IR.
  • Proposal: study the adaptation of the Intent-Aware family of metrics and diversification method in IR to RS.
  • On top of this adaptation, two novel diversification methods are proposed: explicit relevance models and user sub-profiles.

From Search to Recommendation:
User Aspects

Search Result Diversification as a means to cope with with query ambiguity and under-specification.

Recommendation as a domain where ambiguity and under-specification are also present as:

  • There is usually no explicit query, results are based on previous interactions.
  • Users have a variety of interests and tastes.

User aspects as equivalent of query interpretations and facets to represent the user's varied interest and tastes.

We define and extract user aspect spaces by means of the features of the items in the recommendation domain.

Example of user aspects
\[p(f | u) = \frac{|\{ i \in \mathcal{I}_u : f \in \mathcal{F}_i \}|}{\sum_{f' \in \mathcal{F}} |\{ i \in \mathcal{I}_u : f' \in \mathcal{F}_i \}|}\]

Adaptation of Intent-Aware
Metrics and Methods


Metrics: S-recall, $\alpha$-nDCG, ERR-IA, etc.

\begin{align} ERR\text{-}IA(R) & = \sum_{a \in \mathcal{A}} p(a | u) \sum_{i \in R} \frac{1}{k_i} \; p(r | i, u, a) \\[5pt] & \qquad \prod_{j : k_j < k_i} \left( 1 - p(r | j, u, a) \right) \end{align}

Diversification methods: IA-Select, xQuAD.

\[nov_{xQuAD}(i | S) = \sum_{a \in \mathcal{A}} p(a | u) \; p(i | u, a) \prod_{j \in S} \left( 1 - p(j | u, a) \right)\]

Explicit Relevance Models

  • Generative vs. relevance probabilistic models in Information Retrieval.
  • Adapted diversification methods (IA-Select, xQuAD) rely on the generative probability $p(i | u, a)$.
  • The generative model implies, in a way, the selection of a single document/item.
  • Idea: model our diversification strategy in terms of the probability of relevance $p(r | i, u, a)$
  • We eliminate the conceptual restriction of selecting a single item in the recommendation.
  • Advantages: competitive or better performance than generative-based methods, additional extensions to manage tolerance to redundancy.

Relevance-based xQuAD


\begin{align} nov_{RxQuAD}(i | S) &= p(r_i, \neg r_S | u) \label{eq:ia-rxquad}\\ &= \sum_{a \in \mathcal{A}} p(a | u) \; p(r_i, \neg rel_S | u, a) \nonumber \\ &= \sum_{a \in \mathcal{A}} p(a | u) \; p(r | i, u, a) \; p(\neg r | S, u, a) \nonumber \\ &= \sum_{a \in \mathcal{A}} p(a | u) \; p(r | i, u, a) \prod_{j \in S} \left( 1 - p(r | j, u, a) \right) \nonumber \end{align}

Using User Sub-Profiles

  • Recommendations based on all user's preferences may not be fit to generate good recommendations targeting specific tastes of the user.
  • Idea: combine recommendations targeting particular interests of the user into single, diversified recommendations to properly capture the heterogeneity of user tastes.
  • We define the notion of user sub-profile as a subset the the user profile that represents a unique interest or taste of the user.
  • Our method works in three steps...

Step 1: Extract User Sub-Profiles

Extract User Sub-Profiles

Step 2: Generate Recommendations for Sub-Profiles

Generate Recommendations for Sub-Profiles

Step 3: Combine Recommendations

Combine Recommendations

xQuAD $+$ sub-profiles $=$ SxQuAD

RxQuAD $+$ sub-profiles $=$ SRxQuAD

Experiments

Aiming to know:

  • Behavior of SoA recommendation algorithms w.r.t. to Intent-Aware metrics and effect of diversification techniques.
  • Performance of RxQuAD, SxQuAD and SRxQuAD compared to directly adapted diversification techniques.
  • Effect of the relevance-based redundancy management of RxQuAD.

Comparing diversification methods

ERR-IA

As measured by ERR-IA in MovieLens1M

Comparing diversification methods

alpha-nDCG

As measured by $\alpha$-nDCG in MovieLens1M

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Research Goals

  • RG1: develop a clear common methodological and conceptual ground for novelty and diversity in recommendations.
  • RG2: explore the application of theories and methods from Information Retrieval diversity to Recommender Systems.
  • RG3: devise recommendation-specific techniques for enhancing the diversity within recommendations.
  • RG4: proposing new techniques for alleviating the popularity bias in recommendations.
  • Problem: in the previous contribution we were using genres (as in music, movies or books) as proxies for user aspects.
  • Proposal: study the properties of genres as source of diversity.
  • Genres have two distinct properties: generality and coverage.
  • We identify three requirements for genre-based diversity: coverage, redundancy and size-awareness.
  • We propose a new Binomial framework to fulfill these requirements.

Properties of genres

Generality

drama
Drama (broad)
western
Western (narrow)

Overlap

Overlap between top 5 genres

SoA diversity frameworks present some flaws when modeling diversity using genres.

Which properties should a good diversity framework fulfill?
Coverage, redundancy and size-awareness
Could we define a framework to satisfy them?
Yes, the Binomial Diversity framework

Properties of
Genre Diversity

Coverage

Since most users enjoy items from a variety of genres, it is important that the recommendation list covers as many of them as possible.

The coverage should be proportional: the personalized importance of each genre is not equal.

Redundancy

It is as important to present items that cover a certain genre as to present other items that do not cover it.

Wild Wild West
Western
Comedy
Cowboys and Aliens
Western
Sci-Fi
The Good, the Bad and the Ugly
Western
Adventure

Four genres... but Western is definitely redundant

Size-awareness

Mobile

Mobile setting: limited screen real estate to show recommendations requires a careful selection of what to display in a list.

The available list size should be taken into account by the model.

Not explored in prior work on search or recommendation diversity.

The Binomial Diversity Framework

Rationale

Random recommendation as naturally diverse but inaccurate algorithm.

The selection of genres in a random recommendation can be modeled with the Binomial Distribution.

We take the random recommendation as a model for the occurrence of genres in a good genre-based diverse recommendation.

We propose the Binomial Diversity framework to assess and enhance genre diversity.

Genre selection as
Binomial Distribution

coin toss
\begin{align} P(X_g = k_g) = \binom{N}{k_g} p_g^k (1 - p_g)^{N - k_g} \end{align}

$k_g$: times a genre $g$ appears in the recommendation

$N$: recommendation list size

$p_g$: probability of selecting genre $g$

➡ Estimated as a combination of genre generality and user preference.

Binomial coverage

\begin{align} Binom\text{-}Cov(R) \end{align}

A genre is not covered, how serious is this?

Score lack of coverage by the probability of no selecting the genre in the recommendation

\[ P(X_g = 0) = (1 - p_g)^N \]

Binomial redundancy

\begin{align} Binom\text{-}Red(R) \end{align}

Two occurrences of the same genre are already redundant, but how strong is the effect?

Score redundancy by the probability of observing this or a higher number of items with the genre in the recommendation:

\[ P(X_g \geq k_g | X_g = 0) \]
patience

Binomial diversity

Combining coverage and redundancy
in a single diversity metric:

\begin{align} Binom\text{-}Div(R) = Binom\text{-}Cov(R) \cdot Binom\text{-}Red(R) \end{align}

This metric fulfills all our requirements!

Experiments

Comparing frameworks by re-ranking

Figure 15

Observations from our experiments:

  • The distance-based metrics relate to our Binomial redundancy.
  • The proportionality-based metrics optimize mainly genre coverage.
  • Intent-Aware metrics and diversification retrieve items covering many genres regardless of their redundancy.

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

Research Goals

  • RG1: develop a clear common methodological and conceptual ground for novelty and diversity in recommendations.
  • RG2: explore the application of theories and methods from Information Retrieval diversity to Recommender Systems.
  • RG3: devise recommendation-specific techniques for enhancing the diversity within recommendations.
  • RG4: proposing new techniques for alleviating the popularity bias in recommendations.

Problem: the popularity bias hurts both Long Tail Novelty and Sales Diversity.

Proposal: go beyond list re-ranking by recommending users to items:

  • transpose the recommendation task.
  • ignore initially the popularity bias and focus on selecting users.

Two approaches:

  • Inverted neighborhoods.
  • Probabilistic reformulation.

Inverted Nearest
Neighbors

User-based nearest neighbors: recommend the items the the (user) neighbors' profiles.

UB

How are neighbors selected?

  • Typically the most similar to the target user.
  • Transposing the recommendation problem leads to a new neighbor selection policy.

Standard top-$K$ neighborhood: select the $K$ most similar users to the target user.

Inverted top-$K$ neighborhood: select the users to which the target user is one of the $K$ most similar.

Inverted neighborhoods
$N_2(u)$
Standard neighborhoods
$N_2^{-1}(u)$

Properties and Consequences

Properties of top-$K$ inverted neighborhoods:

  • Variable size.
  • All users/items will appear in the same number $K$ of neighborhoods.

Consequences:

  • User-based: flattens the influence power of users.
  • Item-based: more chances for rare items to be recommended.

Inverted neighborhoods solve some causes of the popularity bias in recommendation algorithms.

Probabilistic
Reformulation Layer

Probabilistic interpretation: \[p(i\,|\,u)\]

Inverted recommendation:\[p(u\,|\,i)\]

Bayes' rule: \[p(i\,|\,u) \sim p(u\,|\,i)\;p(i)\]

Given the scoring function $s$ of any recommendation algorithm: \begin{align} p(u\,|\,i; s) \sim s(u, i)\\[10pt] p(i; s) \sim \sum_v s(v, i)\\[10pt] \Rightarrow p(i\,|\,u; s) \sim s(u, i) \end{align}

...apparently not very interesting but...

$\rho \left(p(i;s), pop(i)\right) \geq 0.7$ the item prior captures the popularity

Idea: smooth the prior by means of an entropic regularization: \[ p(i;s, \alpha) \sim \left(\sum_u s(u,i)\right)^{1 - \alpha} \]

$\alpha$ controls the smoothing applied.

\begin{align} \alpha = 0 & \Rightarrow p(i; s, \alpha) = p(i; s) \\ \alpha = 1 & \Rightarrow p(i; s, \alpha) = \frac{1}{|\mathcal{I}|} \end{align}

Experiments

Inverted Neighborhoods (ML1M)

Figure 16

Probabilistic Reformulation (ML1M)

Figure 19

Outline

  1. Introduction
  2. A Unified Framework for Novelty and Diversity
  3. Intent-Aware Diversity
  4. Coverage, Redundancy and Size-Awareness in Genre Diversity
  5. Improving Sales Diversity by Recommending Users to Items
  6. Future Work

User Studies

  • Further understanding of the perception of novelty and diversity in recommendations.
  • Two purposes: validate our proposals and then help us decide between parameters and configurations.
  • Prior work on user studies does not cover comparing between different models or alternative configurations.

Extending our Contributions

  • Binomial framework to measure Unexpectedness.
  • Further analysis of biases in neighborhood selection.
  • Adapting the probabilistic reformulation to other tasks: promoting "hidden agendas".

Application of our Contributions to other Fields

  • Testing the Binomial framework in the TREC diversity task.
  • Retrievability of documents (Azzopardi and Vinay, 2008) as analogous to Sales Diversity. Could our proposals for the latter work in the former?

FIN