The Max Planck Institute for Software Systems
The Max Planck Institute for Software Systems
Online Social Networks Research @
WWW 2009 Data Set
Data from our WWW 2009 paper is
available from the links below. Each of the data sets has been anonymized to protect the privacy of the users themselves. Included
is information about the (a) evolving link structure of the Flickr social network, (b) photos in the network, and (c) favorite markings of photos by users.
Note that we are unable to
release any non-anonymized data.
We are aware that properly anonymizing online social network data is very challenging. Clever schemes have been found to break seemingly well anonymized data sets (e.g., the NetFlix data set). For the data we make available, we use a "best effort" anonymization. We do not offer any strong guarantees and we suspect that our anonymization scheme can likely be broken by clever comparisons to other real-world data. We encourage people to help bring problems and fixes to our notice, should they find any.
We thank Akshay Bhat for his discovery of a attack and fix for our photo timestamp anonymization scheme. In all of the data sets below, anonymized time is used. In more detail, each timestamp is replaced with a sequence number, where the order of the sequence numbers respects the underlying timestamps.
Note: The anonymized Flickr timestamps originally contained an error that caused the anonymized timestamps to not respect the order of the true timestamps. Anyone who downloaded the Flickr data files (flickr-all-photos.txt.gz, flickr-all-photo-favorite-markings.txt.gz, and flickr-photo-favorite-markings.txt.gz) between 05/17/2010 and 01/02/2012 should re-download updated files below. Note that only the anonymized timestamp field is affected; all other fields are unchanged. We thank Dong Li for locating and notifying us of this error.
Complete data set
•List of links
This file contain a list of all of
the user-to-user links that we crawled from the Flickr social network. All links are directed. Note that this file is the same data file from our WOSN'08 paper.
Format: Gzipped ASCII. Each line contains two user identier and a date separated by a
tab, implying a
link was observed
from the first to the second on the specified data.
Data: Flickr
Links (147MB)
•List of photos
This file contain a list of all of
the photos that were marked as a favorite by at least one user from the list above. Other metadata about the photos is also included, such as the time the photo was uploaded, the (anonymized) user who uploaded the photo, and the number of views, comments, and favorite markings the photo had. Note that the data set and paper analysis include 11,267,320 photos, while the paper incorrectly reported 11,195,144 photos.
Format: Gzipped ASCII. Each line contains information about one photo, with fields separated by a tab. The data is (anonymized) photo identifier, (anonymized) time
of photo upload, (anonymized) photo owner, number of views, number of comments, number of favorite markings.
Data: Flickr All
Photos (154MB)
•List of photo favorite markings
This file contain a list of all of
the favorite markings by users from the list above. The data includes the (anonymized) user who marked the photo as a favorite, the (anonymized) photo identifier, and the time of the favorite marking.
Format: Gzipped ASCII. Each line contains information about one photo being marked as a favorite, with fields separated by a tab. The data is (anonymized) user who
marked the photo as a favorite, (anonymized) photo identifier, and the (anonymized) time of the favorite marking.
Data: Flickr All
Photo Favorite Markings (346MB)
Data for analyses in Section 6
•List of photo favorite markings
This file contain a list of all of
the favorite markings by users from the list above. The data includes the (anonymized) user who marked the photo as a favorite, the (anonymized) photo identifier, and the time of the favorite marking. Note that this file is a subset of the Flickr All Photo Favorite Markings file above.
Format: Gzipped ASCII. Each line contains information about one photo being marked as a favorite, with fields separated by a tab. The data is (anonymized) user who
marked the photo as a favorite, (anonymized) photo identifier, and the (anonymized) time of the favorite marking.
Data: Flickr
Photo Favorite Markings (135MB)
BibTeX
Please use the following BibTeX entry if you would like to cite this dataset.
@inproceedings{social-cascade-www09, author = {Meeyoung Cha and Alan Mislove and Krishna P. Gummadi}, title = {{A Measurement-driven Analysis of Information Propagation in the Flickr Social Network}}, booktitle = {In Proceedings of the 18th International World Wide Web Conference (WWW'09)}, month = {April}, year = {2009}, address = {Madrid, Spain} }
© Copyright by Max Planck Institute for Software Systems 2018. All rights reserved. Imprint and Data Protection. Photo of Liverpool Street Station by David Sim