The Max Planck Institute for Software Systems

 

Online Social Networks Research @

WWW 2009 Data Set


Data from our WWW 2009 paper is available from the links below.  Each of the data sets has been anonymized to protect the privacy of the users themselves.  Included is information about the (a) evolving link structure of the Flickr social network, (b) photos in the network, and (c) favorite markings of photos by users.

Note that we are unable to release any non-anonymized data.


We are aware that properly anonymizing online social network data is very challenging.  Clever schemes have been found to break seemingly well anonymized data sets (e.g., the NetFlix data set).  For the data we make available, we use a "best effort" anonymization.  We do not offer any strong guarantees and we suspect that our anonymization scheme can likely be broken by clever comparisons to other real-world data. We encourage people to help bring problems and fixes to our notice, should they find any.


We thank Akshay Bhat for his discovery of a attack and fix for our photo timestamp anonymization scheme. In all of the data sets below, anonymized time is used. In more detail, each timestamp is replaced with a sequence number, where the order of the sequence numbers respects the underlying timestamps.


Note: The anonymized Flickr timestamps originally contained an error that caused the anonymized timestamps to not respect the order of the true timestamps. Anyone who downloaded the Flickr data files (flickr-all-photos.txt.gz, flickr-all-photo-favorite-markings.txt.gz, and flickr-photo-favorite-markings.txt.gz) between 05/17/2010 and 01/02/2012 should re-download updated files below. Note that only the anonymized timestamp field is affected; all other fields are unchanged. We thank Dong Li for locating and notifying us of this error.

 

Complete data set

 

  1. List of links

    This file contain a list of all of the user-to-user links that we crawled from the Flickr social network.  All links are directed. Note that this file is the same data file from our WOSN'08 paper.

    Format:  Gzipped ASCII.  Each line contains two user identier and a date separated by a tab, implying a link was observed from the first to the second on the specified data.

    Data:
    Flickr Links (147MB)

  2. List of photos

    This file contain a list of all of the photos that were marked as a favorite by at least one user from the list above.  Other metadata about the photos is also included, such as the time the photo was uploaded, the (anonymized) user who uploaded the photo, and the number of views, comments, and favorite markings the photo had. Note that the data set and paper analysis include 11,267,320 photos, while the paper incorrectly reported 11,195,144 photos.

    Format:  Gzipped ASCII.  Each line contains information about one photo, with fields separated by a tab. The data is (anonymized) photo identifier, (anonymized) time of photo upload, (anonymized) photo owner, number of views, number of comments, number of favorite markings.

    Data:
    Flickr All Photos (154MB)

  3. List of photo favorite markings

    This file contain a list of all of the favorite markings by users from the list above.  The data includes the (anonymized) user who marked the photo as a favorite, the (anonymized) photo identifier, and the time of the favorite marking.

    Format:  Gzipped ASCII.  Each line contains information about one photo being marked as a favorite, with fields separated by a tab. The data is (anonymized) user who marked the photo as a favorite, (anonymized) photo identifier, and the (anonymized) time of the favorite marking.

    Data:
    Flickr All Photo Favorite Markings (346MB)

 

Data for analyses in Section 6

 

  1. List of photo favorite markings

    This file contain a list of all of the favorite markings by users from the list above.  The data includes the (anonymized) user who marked the photo as a favorite, the (anonymized) photo identifier, and the time of the favorite marking. Note that this file is a subset of the Flickr All Photo Favorite Markings file above.

    Format:  Gzipped ASCII.  Each line contains information about one photo being marked as a favorite, with fields separated by a tab. The data is (anonymized) user who marked the photo as a favorite, (anonymized) photo identifier, and the (anonymized) time of the favorite marking.

    Data:
    Flickr Photo Favorite Markings (135MB)

 

BibTeX

 

Please use the following BibTeX entry if you would like to cite this dataset.

@inproceedings{social-cascade-www09,
  author = {Meeyoung Cha and Alan Mislove and Krishna P. Gummadi},
  title = {{A Measurement-driven Analysis of Information Propagation in the Flickr Social Network}},
  booktitle = {In Proceedings of the 18th International World Wide Web Conference (WWW'09)},
  month = {April},
  year = {2009},
  address = {Madrid, Spain}
}

© Copyright by Max Planck Institute for Software Systems 2018. All rights reserved. Imprint and Data Protection. Photo of Liverpool Street Station by David Sim