This Reddit dataset consists specific metadata of all submissions posted to Reddit from the beginning of Nov. 2007 to the end of July 2013. The metadata of each submission (e.g., score) were collected around 1-2 months after the initial submission (i.e., when they get blocked from voting) as the metadata has most likely been settled after this period. The dataset is available in JSON format and is zipped. Concretely, the following information is available:
Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfogel and Markus Strohmaier,
Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?,
Web-Science Track at the 23rd International World Wide Web Conference, Seoul, South Korea, 2014 [PDF]
We have limited the metadata in the Reddit dataset to information necesarry to reproduce our scientific results. The rest of the metadata has been removed in order to sustain anonymity of Reddit users.
For accessing the dataset please contact Philipp Singer (philipp.singer@gesis.org). Please, add a short description for which purposes you want to use the dataset.
Please, use the dataset for scientific purposes only and follow general ethical rules. If you publish results obtained from using this dataset, please cite:
Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfogel and Markus Strohmaier,
Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?,
Web-Science Track at the 23rd International World Wide Web Conference, Seoul, South Korea, 2014 [PDF]
We want to sincerely thank Jason Baumgartner (aka u/stuck_in_the_matrix) for conducting the data collection and providing us initial access to the data.