Facebook as Personal Data Store

With over 150 million people using Facebook Connect every month at over 1 million websites, Facebook has ushered in a new era, as the world’s largest personal data store.

Personal Data Stores

Personal data stores allow individuals to share online data with service providers. Facebook Connect users can give third-party web sites like Digg, Amazon, and YouTube access to information stored at Facebook, turning Facebook into a personal data store for over 500 million people.

What makes personal data stores special is the seamless sharing with websites for real-time personalization of the web. It’s more than just file back-up or synchronization.  It’s not just publishing “content” to our friends or the public. Personal data stores allow us to bring our information to websites when we want to. It’s a way to treat the user as the point of integration.

Personal data stores can be anywhere, shared with websites whenever we want. Consider giving FedexKinko‘s a link to a Flickr account so they can download photos to print a new calendar. Or giving a new doctor permission to access our personal health history rather than filling out a paper form while we sit in the waiting room. Or giving a website access to our Outlook contact list on our desktop computer so they can give us birthday reminders and gift suggestions. The key is user-managed access, wherever the data lives. Facebook Connect gives this kind of access control over all the data we store at Facebook, enabling web-wide personalization built around the individual.


In recent years, mash-ups and real-time APIs have made it easier and easier for companies to combine information from different services into a single user experience. Instead of building bigger and more complicated proprietary data silos, companies take advantage of services like Google Maps and IP-address geolocation, using real-time information to enhance their websites.

Some service are even built around other companies’ data: Twitter clients like Seesmic and Tweetdeck, which access our Twitter data on our behalf; Trillian, which works with various instant messaging networks; and Mint, which pulls in our financial data from hundreds of websites. The “real-time web” is constructed on the fly, using linked data and real-time APIs to dynamically customize services for each of us.

Personal data stores let us bring our own data to the mash-up party. Not only do we have better control over who sees what, we can provide more timely, higher quality data than service providers can get from other sources. Effective integration with personal data stores means no more ads for that car we’ve already bought; no more recommendations based on false assumptions. Unfortunately, data in the wild is constantly becoming outdated, miscopied, and misconstrued, because that’s the best companies can do using the billions of dollars worth of proprietary data that’s gathered about us rather than provided by us. Personal data stores easily allow individuals to give the most relevant, most up-to-date information to just those companies we want to do business with. That means not just better data, but more intimate relationships with our favorite companies and organizations.

Perhaps the most liberating aspect of personal data stores is that everyone gets to have as many as we want. We all have our favorite websites for different online activities. As those sites open up their data with a user-driven permissions mechanism, they become personal data stores. So, whether it’s YouTube for videos, Flickr for Photos, Foursquare for location updates, TripIt for travel plans, or RunKeeper for exercise data, we get to bring our best data with us wherever we go. Savvy websites pull in this high quality data to personalize our visits, while those with unique data open it up for use elsewhere to maximize value to their users, which is exactly what Facebook is doing with Facebook Connect.

Facebook Connect

Facebook Connect makes this kind of access simple for everyone, with industry changing adoption rates. Over 66% of the top 100 websites and over 1 million total websites now integrate with Facebook in some way. Nearly 1/3 of Facebook users—over 150 million people—use Facebook Connect every month. Every time we do, we give websites access to information stored in our Facebook accounts, such as our name, gender, names of our friends, and all the posts currently on our wall or posted by us. It’s an archetypal personal data store, with highly credible and timely data in the form of our friend list and our status updates. Sure, Facebook Connect is still far too limited in the amount of information we can store and we lack control over how that information gets used… but architecturally, Facebook has changed the game for a vast portion of the World Wide Web.

To find out what information Facebook is sharing, I built a website called “I Shared What?!?“, an information sharing simulator for Facebook. The site uses javascript and Facebook Connect to display everything it can get from Facebook. Visitors see in specific detail exactly what they share when hitting the “allow” button in the Facebook Connect permissions dialog.

Facebook uses open standard technology to bring mash-ups to a new level, built on information provided directly by the user, in real-time, with minimal fuss or bother. There are shortcomings, of course. A lot of them, but I’ll save those for future posts. For now, think of Facebook as the 800 pound icebreaker of a new way for companies to connect with their customers.

To this veteran VRM evangelist, Facebook has done more in 2010 to usher in the era of the personal data store than anyone, ever. In one fell swoop, Facebook launched a World Wide Web built around the individual instead of websites, introducing the personal data store to 500 million people and over one million websites.

Unexpectedly, Facebook has moved VRM from a conversation about envisioning a future to one about deployed services with real users, being adopted by real companies, today. We still have a lot of work to do to figure out how to make this all work right—legally, financially, technically—but it’s illuminating and inspiring to see the successes and failures of real, widely-deployed services. Seeing what Amazon or Rotten Tomatos or Pandora do with information from a real personal data store moves the conversation forward in ways no theoretical argument can.

There remain significant privacy issues and far too much proprietary lock-in, but for the first time, we can point to a mainstream service and say “Like that!  That’s what we’ve been talking about. But different!”

This entry was posted in Information Sharing, Personal Data Store, ProjectVRM and tagged , , , , , , , . Bookmark the permalink.

12 Responses to Facebook as Personal Data Store

  1. Joe, this is a great way to think about Facebook.

    Think about Facebook as a Social Network and it is very intimidating to compete with.

    Think of Facebook as the first Personal Data Store and the possibilities for improvement are infinite, inspiring confidence in developers to take them on.

    Synergistically, launching a new personal data store as a superior Facebook alternative to the consumer is a clear positioning.

    Finally, the new PDS player has the opportunity to capitalize on Facebook Connect to prove the benefit to consumers quickly and initiate momentum. (but better be ready with a way to fill the gap when Facebook leverages this dependency to sabotage success)

    We are working on community networks designed to give people a reason to share information, make sense of it and target it’s release.

    So we are interested in collaborating.
    Katherine Warman Kern

  2. Joe says:


    The hard part, IMO, is getting different companies to integrate with a new PDS. Sure, they’d love additional data about their users… in fact, they are buying that from BlueKai and Acxiom right now. But with much fewer than 500 million users, it’s hard to get enough vendors to add a start-up PDS to their website.

    With Facebook as a (flawed) proof-of-concept, we can figure out how to improve on the model and also how to standardize this sort of information sharing so websites don’t need a custom interface for every new PDS.

    As for positioning, I think every PDS is going to have to win in its niche before it can establish itself as a data store for use by others. FourSquare is a great example. They’ve nailed their niche and anchored themselves in relationships with their users. Now, they are in a position to open that data to others and become a personal data store. But if they hadn’t nailed their niche… well, there wouldn’t be enough users to care about sharing and not enough websites to bother implementing the interface for it…

  3. Salman FF says:

    Great insights.
    What is the PDS niche? (ie The PDS concept is very general, so is there a more specific niche or niche application you are thinking of?)

    • Joe says:

      That’s the billion dollar question. I think the PDS is as broadly applicable as the world wide web. Yet, we also need some killer apps for it.

      I think the deepest, active niche is personal health care records. But that’s also a hugely complicated problem on its own, so I wouldn’t consider it low hanging fruit.

      The first app we focused on in the VRM Standards Committee was change of address, which Mydex is currently leveraging in its community prototype. They are working with the electoral roll data as an authoritative source for address information, giving individuals the ability to re-use that verified information at other service providers.

      FWIW, I’m focused on search as the killer app (at http://switchbook.com) . Our searches might start at our favorite search engine, but they travel all over the web with queries entered at shopping sites, wikipedia, and other “search providers”. Keeping track of this pan-Internet search activity takes more than just a website, it takes a PDS integrated in the browser. At least that’s our take.

      But all of these “niches” face serious adoption issues. The great thing about Facebook is that when you activate 500 million users, a lot of websites pay attention. It’s much harder to an upstart PDS to convince third-party websites that their data is worth the effort to incorporate.

  4. Joe, you led a session about the differences between Facebook and the ideal model of a PDS. I presume that there remain a few critical gaps between the two things. One that I’m wondering about lately is the consequence of facebook’s ability to revise their terms of service and privacy policies in arbitrary ways at any moment.

    • Joe says:


      The biggest gaps are
      1. the relationship between the individual and facebook
      2. the lack of control over the data
      3. the lack of an evolving permissioning capability

      First, Facebook’s privacy controls don’t actually do ANYTHING to protect our data from Facebook. No matter how brilliantly Facebook lets us control what information we give to other providers, *everything* at Facebook is free for Facebook to do whatever it wants. A more enlightened approach would allow people to use the Facebook infrastructure with clear opt-in for those “added-value” features that require Facebook to view or analyze your data, such as targeted advertising.

      Second, there is no control over the use of our data at other websites. We do get to control access, at somewhat poor granularity, but we do get to choose to allow access or not. But we don’t have any control over what that third party site can do with that data. Facebook does, but we don’t. That means that while Facebook is free to limit the use of that data to protect their proprietary position its the source, we can’t place further restrictions on it, nor can we override some of Facebook’s limitations. In an ideal PDS, we would be able to control–or at least express our terms of use–over any data we might provide to anyone.

      Third, the Facebook permission model is monolithic. It’s expected to be a one time, one app approval process. Yet we know that trust is incremental and that as relationships develop, our trust grows. We don’t want to give our credit card # to enter a website or a physical store, but we are happy to do so when we are actually ready to make a purchase. What we need is a permissioning architecture that understands that permissions need to evolve over time. Unfortunately, that is REALLY hard to do with Facebook, even if websites were willing to build it in.

      http://isharedwhat.com is a case in point. Although it allows you to moderate the permissions used in the simulator, it was a nightmare to get that to work. In fact, it still has a failure mode that requires an exception handling routine that jams recovery data into a cookie and reloads the page. It’s a brute force hack to get around Facebook’s built-in limitations. Facebook simply isn’t plugged in to the reality of how messy and dynamic relationships can get… and as a result the needs for an evolving permissions framework.

      Those are three big gaps. There are a lot more, especially around Facebook’s ability to change the terms of use with impunity. Frankly, I think that kind of behavior is beyond the pale of acceptable business practice and should be illegal. Unfortunately, there just isn’t enough traction in the courts or Congress to actually hold Facebook to any given TOS. Even the few lawsuits that get through are hardly a dent in the billions of dollars in profit they are grabbing while they can get away with it. It’s more than a little frustrating, but I’m not sure I see a way out of this situation without a serious users revolt.

      Oh… one last gap: As a general purpose PDS, Facebook lacks any real ability to load user or third party data. Which is a shame. If Facebook were truly a PDS platform, a group of third-party sites /could/ agree to a format and start using Facebook as a user-driven central store for coordinating, for example, media ratings. But since Facebook’s “like” architecture is all that’s available, netflix, hulu, amazon, etc., can’t really leverage the data stored at Facebook (they all use a 5 star rating scale).

  5. Thank you for writing up those three gaps so clearly and carefully Joe!

    BTW, I think that a fourth is that Facebook while today allowing the user to export some of their own profile data, could in the future (e.g. if a serious competitor arose) decide to revoke that right.

    • Joe says:

      Absolutely. That fits right in there with the limited control over the relationship with Facebook. We’re digital serfs in the domain of the great Prince Facebook. Digital Feudalism as Marc Davis puts it. Whatever the Prince says goes.

  6. @NZN says:

    Thats an optimistic perspective. Yes, we have a backdrop in Facebook. But it is still an extension of the old centralized command-n-control model. How is that good news?

    The data construct that is “me” has not been altered. If I dare participate as “Me”, I am a data-slave on a unique work-release program where I get to go to other data-plantations to be harvested. Most of the data-slaves I know don’t even understand how that is happening.

    Sounds liberating. If making the corporate equation more powerful is our holy grail, then we are succeeding.

    If making Individuals more empowered,and inherently connected to a new kind of exchange ecosystem, then we have not even begun to make a dent.

    VRM is a systemic response to the Napster problem… that being that only 1/2 of every exchange is actually representedas an accountableentity.

    The legal framework of VRM changes the world, because the first relationshipit restructures is your participation at birth in civil society. The marketplace, in every context, happens next.

    I dont think there is any way to actually get there except by actually starting at the beginning.

    Until you own yourself, you are owned.

    • Joe says:


      The shift that Facebook illustrates (but doesn’t embody) is that websites don’t need to own and control our data. It is paradoxical that Facebook itself is the worst offender in that vein, but they *have* convinced millions of websites to rely on user-permissioned data to drive their services.

      Now to move beyond Facebook as the centralized entity and transition those million+ website to incorporate user-permissioned data from real personal data stores.

      The good news is that now they have exposure to the value of that information. The bad news is that Facebook is still the 800 lb gorilla guarding a big chunk of our data.

  7. GuruM says:

    Came to your blog after seeing your post on switchbook in 2008 on copy-pasting/importing rich text directly from urls into tiddlywiki: http://groups.google.com/group/tiddlywiki/browse_thread/thread/a419a95a02bbc03a/b7456eb8ae7b0275?lnk=gst&q=how+do+I+move+content+from+blog+to+wiki#b7456eb8ae7b0275.
    Are you by any chance still working on this functionality?

Leave a Reply