Introduction to Bayesian Inference: A Coin Flipping Example

Recently, I have been involved with more teaching and one part of my teaching efforts has been to provide an introduction to Bayesian inference. Personally, I have the intuition that this can be best achieved by working through a very

UDF in Google's BigQuery: An example based on calculating text readability

In my data science workflow, I have recently started to heavily utilize Google's BigQuery which allows you to store and query large data in SQL style. Internally, Google uses their enormeous processing power in order to guarantee blazing fast queriees; even

Bayesian Correlation with PyMC

Recently, I have been getting more and more interested in Bayesian techniques and specifically, I have been researching how to approach classical statistical inference problems within the Bayesian framework. As a start, I have looked into calculating Pearson correlation. To that

HypTrails Tutorial

At this year's World Wide Web conference, colleagues and I have published and presented the following paper: Philipp Singer, Denis Helic, Andreas Hotho and Markus Strohmaier, HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web, 24th International World

Influential papers for my PhD studies

As I am reaching the final stage of my PhD studies, I have been reflecting about which papers have influenced my work the most and which have amazed me the most. Thus, I want to devote this short blog post

Handling huge matrices in Python

Everyone who does scientific computing in Python has to handle matrices at least sometimes. The go-to library for using matrices and performing calculations on them is Numpy. However, sometimes your matrices grow so large that you cannot store them any

Determining Power Law parameter(s) using Bayesian modeling with PyMC

In a previous post I talked about fitting the power law function to empirical data. Recently, I got highly interested in Bayesian modelling and probabilistic programming. I am currently re-reading the excellent freely available book "Probabilistic Programming and Bayesian Methods for

Statistical test for randomness in categorical data sequences

Previously, I worked a lot with sequences consisting of categorical data. For example, sequences of categories where the set of categories is finite. As a prerequisity of my further modeling approaches of such data, I was interested in first applying a statistical

Statistical Significance Tests on Correlation Coefficients

Recently, I had to determine whether two calculated correlation coefficient are statistically significantly different from each other. Basically, there exist two types of scenarios: (i) You want to compare two dependent correlations or (ii) you want to compare two independent

The popularity of subreddits and domains on Reddit

In a previous blog post I introduced a Reddit dataset I crawled which includes submission data for one complete year (2012-04-24 - 2014-04-23). I showed that the number of submissions each day were steadily rising but that interestingly more submissions

