HypTrails

Description

HypTrails is a coherent approach that allows to express and compare hypotheses about human trails (categorical sequences). The fundamental model of HypTrails is a first-order Markov chain model. Hypotheses are expressed as Markov transitions and the belief in these transitions. The main idea of the Bayesian approach of HypTrails is to elicit proper Dirichlet priors from these hypotheses and compare the plausibility of hypotheses based on their marginal likelihoods and Bayes factors. For elicitation, HypTrails utilizes an adaption of the so-called (trial) roulette method. In the end, HypTrails provides an ordered ranking of hypotheses based on their plausibility given the data.

Paper

The initial presentation of HypTrails is provided in the following publication:

Philipp Singer, Denis Helic, Andreas Hotho and Markus Strohmaier,
HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web,
24th International World Wide Web Conference, Florence, Italy, 2015 (Best Paper Award)  [PDF] [arXiv] [Slides] [Tutorial] [Talk] [Code]

Abstract

When users interact with the Web today, they leave sequential digital trails on a massive scale. Examples of such human trails include Web navigation, sequences of online restaurant reviews, or online music play lists. Understanding the factors that drive the production of these trails can be useful for e.g., improving underlying network structures, predicting user clicks or enhancing recommendations. In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. Our approach utilizes Markov chain models with Bayesian inference. The main idea is to incorporate hypotheses as informative Dirichlet priors and to leverage the sensitivity of Bayes factors on the prior for comparing hypotheses with each other. For eliciting Dirichlet priors from hypotheses, we present an adaption of the so-called (trial) roulette method. We demonstrate the general mechanics and applicability of HypTrails by performing experiments with (i) synthetic trails for which we control the mechanisms that have produced them and (ii) empirical trails stemming from different domains including website navigation, business reviews and online music played. Our work expands the repertoire of methods available for studying human trails on the Web.

Implementations

Currently, HypTrails is freely available based on implementations in both Python as well as Java:

Tutorial

There is a thorough tutorial provided in the form of an IPython notebook. This tutorial goes also into the basics of Bayesian inference, Markov chain models and Dirichlet distributions. It utilizes the Python implementation.