The Max Planck Institute for Software Systems


Online Social Networks Research @

WOSN 2008 and Alan Mislove's PhD Thesis Data Sets

Data from our WOSN 2008 paper and Alan Mislove's PhD Thesis is available from the links below.  Each of the data sets has been anonymized to protect the privacy of the users themselves.  Included is information about the evolving link structure from the networks.

Note that we are unable to release any non-anonymized data.

We are aware that properly anonymizing online social network data is very challenging.  Clever schemes have been found to break seemingly well anonymized data sets (e.g., the NetFlix data set).  For the data we make available, we use a "best effort" anonymization.  We do not offer any strong guarantees and we suspect that our anonymization scheme can likely be broken by clever comparisons to other real-world data. We encourage people to help bring problems and fixes to our notice, should they find any.


  1. List of links

    These files contain a list of all of the user-to-user links which are included in our crawls.  All links are treated as directed, even in the case of undirected networks like Orkut.

    Format:  Gzipped ASCII.  Each line contains two user identier and a date separated by a tab, implying a link was observed from the first to the second on the specified data.

    Flickr Links (147MB) YouTube (directed) Links (210MB) YouTube (undirected) Links (920MB) Internet Links (546KB) Wikipedia Links (271MB)




Please use the following BibTeX entry if you would like to cite the Flickr data:

  author = {Alan Mislove and Hema Swetha Koppula and Krishna P. Gummadi and Peter Druschel and Bobby Bhattacharjee},
  title = {Growth of the Flickr Social Network}, 
  booktitle = {Proceedings of the 1st ACM SIGCOMM Workshop on Social Networks (WOSN'08)},
  location = {Seattle, WA},
  month = {August},
  year = {2008}

or, the following BibTeX entry to cite the other data sets:

  author = {Alan Mislove},
  title = {Online Social Networks:  Measurement, Analysis, and Applications to Distributed Information Systems},
  school = {Rice University, Department of Computer Science},
  year = {2009},
  month = {May}

© Copyright by Max Planck Institute for Software Systems 2018. All rights reserved. Imprint and Data Protection. Photo of Liverpool Street Station by David Sim