The Max Planck Institute for Software Systems
The Max Planck Institute for Software Systems
Online Social Networks Research @
IMC 2007 Data Sets
Data from our IMC 2007 paper is
available from the links below. Each of the data sets has been anonymized to protect the privacy of the users themselves. Included
is information about both the link structure and group memberships from the networks.
Note that we are unable to
release any non-anonymized data.
We are aware that properly anonymizing online social network data is very challenging. Clever schemes have been found to break seemingly well anonymized data sets (e.g., the NetFlix data set). For the data we make available, we use a "best effort" anonymization. We do not offer any strong guarantees and we suspect that our anonymization scheme can likely be broken by clever comparisons to other real-world data. We encourage people to help bring problems and fixes to our notice, should they find any.
•List of users
These files contain a list of all of the user identifiers which are included in our crawls.
Format: Gzipped ASCII. Each line contains a numeric user identifier.
Data: Flickr Users (3.9MB) LiveJournal Users (11.2MB) Orkut Users (6.5MB) YouTube Users (2.4MB)
•List of links
These files contain a list of all of the user-to-user links which are included in our crawls. All links are treated as directed, even in the case of undirected networks like Orkut.
Format: Gzipped ASCII. Each line contains two user identifiers separated by a tab, implying a link exists from the first to the second.
Data: Flickr Links (76MB) LiveJournal Links (298MB) Orkut Links (857MB) YouTube Links (18MB)
•List of groups
These files contain a list of all of the group identifiers which are included in our crawls.
Format: Gzipped ASCII. Each line contains a numeric group identifier.
Data: Flickr Groups (223KB) LiveJournal Groups (15.9MB) Orkut Groups (18.5MB) YouTube Groups (66KB)
•List of group members
These files contain a list of all of the user group memberships which are included in our crawls.
Format: Gzipped ASCII. Each line contains a user identifier followed by a group identifier (separated by a tab), implying that the user is a member of the group.
Data: Flickr Memberships (27.3MB) LiveJournal Memberships (400MB) Orkut Memberships (1.1GB) YouTube Memberships (1.0MB)
BibTeX
Please use the following BibTeX entry if you would like to cite this dataset.
@inproceedings{mislove-2007-socialnetworks, author = {Alan Mislove and Massimiliano Marcon and Krishna P. Gummadi and Peter Druschel and Bobby Bhattacharjee}, title = {{Measurement and Analysis of Online Social Networks}}, booktitle = {Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07)}, address = {San Diego, CA}, month = {October}, year = {2007} }
© Copyright by Max Planck Institute for Software Systems 2018. All rights reserved. Imprint and Data Protection. Photo of Liverpool Street Station by David Sim