VRM and Personal Data Stores

In my previous post on VRM‘s Personal Data Stores, I discussed how we can decentralize information services by focusing on the user as the point of integration. Not only would that give the user direct control over their personal data–to the cheers of privacy advocates everywhere–it would provide a more robust, reliable, and scalable approach for important VRM use cases, including personal health care data, media consumption histories (and licenses), personal RFPs, and more.

Three replies sum up the curious or critical responses I’ve had:

Matthias Gutfeldt: “But how do we do it?”
William Hayes: “Any idea when someone will step forward to provide the user information mgmt service?”
Dave Weinberger: “How does VRM (or Joe’s vision of it) differ from federated identity schemes in which the user has control over her personal info? “

I’ll generalize these into two focused questions:

What is different with VRM and Personal Data Stores?
What will it take to implement them?

VRM’s Personal Data Stores (PDSs) are a new inflection of the familiar paradigm shift of decentralization

The answer to the first question lies in the recent advances in user-centric identity and the upcoming access-rights infrastructure built into XRI and XDI.

Limited versions of user-centric data stores have been around for decades. The PC revolution followed the same paradigm shift: put the applications and data on the user side instead of a central mainframe. The Internet itself echoes that user-centric view of the world, especially when you consider businesses as users and online services as vendors. Any architecture that moves data control from a centralized vendor to the decentralized user resonates with the user-centrism of Personal Data Stores.

What decentralized systems don’t necessarily have–and what PDSs add–is structured third-party read access to that data store.

Internet Email as Personal Data Store

POP, IMAP, SMTP–essentially all store and forward email architectures–allow independent third parties to input data into a simple user data store. That’s the point. An Internet-based email service that can’t accept email from anyone on the net, isn’t really Internet email. However, with email, there isn’t a way for outside third parties, such as vendors, to access that data store. The privacy reasons behind this are self-evident. Most people don’t want neighbors or “vendors” reading their email.

Blogs as Personal Data Store

Blogs, on the other hand, offer both input and output of personal data and move a little bit further along the spectrum towards a Personal Data store. Blogs are primarily output mechanisms; users write posts and those posts are published to the world. Comments provide an input mechanism from the “cloud” of arbitrary Internet users, giving blogs a limited input and output capability for what is essentially a publicly accessible personal data store.

The access rights management on blogs, however, leaves much to be desired–and is far from enabling many core VRM scenarios.

Access Limitations

Most blogs are simply available to the public–or occasionally to a limited “internal” audience by restricting access to the web page. A VRM data store should have extremely fine grained access privileges, including by “identity class” so that, for example, all legitimate Travel Agencies could access a personal RFP for travel or certified medical doctors who have registered an emergency medical situation warrant could access a personal medical history. These sorts of restricted rights mechanisms require not only the emerging user-centric identity technology, they require an institutional infrastructure capable of reliably authenticating “travel agencies” and “certified medical doctors” who have “registered a warrant”. Ultimately, a Personal Data Store must not only store the requisite data, it must provide secure and effective access to the right vendors and individuals, and refuse access to all others.

Input Limitations

Second, the ability to “input” into a blog data store exists in the form of comments, but it is limited. Sometimes this privilege is restricted by identity (using TypeKey, OpenID, or an InfoCard for example), but not always, and access is usually restricted in a simple way: for example, any InfoCard user can post a comment on Kim Cameron’s blog on any post. This is a good start towards identity-based access rights management, but most sites have minimal distinctions between different classes of users and different data sets. Of course, blogs are for blogging, so they don’t have a need for sophisticated access functionality. However, when you treat markets as conversations, VRM needs to enable conversations between users and buyers. That implies that users can, for example:

Input data to their Data Store
- RFPs (requests for proposals)
- Customer Interaction Data
  - service calls
  - RMA requests
  - bug reports
  - reviews
- Personal Health Updates
  - Symptom Reports
  - Doctor Visits
  - Medication Log
  - Exercise Log
Amend/Revise Data
Reply to Vendors (securely, privately)
Manage data access
- publish subsets of the data to specific vendors
- publish to sets of vendors

It also implies that Vendors can

Access subsets of the data
Add new data to the Data Store
- Prescriptions
- Proposals in response to RFPs
Update subsets of data (securely and with reciprocal privacy)
- RMAs
- Customer Service History
- Revised/Updated Proposals

This also implies that data stored by the vendor should be protected from edits by other vendors and even users (although outright deletion must remain an option). It would be a mess if users could edit responses to RFPs or prescriptions directly. Rather, the integrity of the system requires a mechanism to assure that the data is what the original author intended it to be.

This level of access functionality is essentially non-existent in blogs. Other online markets provide some elements of these features, but none that I know of place complete control in the hands of the user, enabling any vendor (approved by the user) to participate. eBay is a good start; it definitely democratizes the vendor/buyer marketplace. However, you don’t control the types of buyers/vendors who can access your listings and you still need to use eBay to culminate the transaction. It will probably be a challenge for eBay to find ways to make money while putting the transaction context in the Personal Data Store.

However, it is worth noting that CompuServe and AOL faced the same conundrum with the Internet, struggling to learn how to profitably transition from a closed-loop system to an Internet that let anyone access and publish anything. Ultimately separate business models worked quite well with access providers (Earthlink, AT&T, etc.) and service providers (Google, Amazon, etc.). Meanwhile, AOL is still struggling to define its business.

VRM and Personal Data Stores will likely create a similar segmentation of silo & service businesses into specialized vendors of Personal Data Store Services and those focused services that leverage PDSs to deliver new value.

Blogs + ID access = Personal Data Store?

Blogs with Identity-based access privileges start looking like a workable Personal Data Store for some VRM use cases. Consider posting a Personal RFP (request for proposals) to your blog, with appropriate tags (travel, ready-to-buy, hotel, airfare, car), pinging a pingmarket service like Technorati, and receiving offers via comments to that post. If access to the post–and the ability to reply–were seamlessly moderated by a credentialing service (so only authentic “travel agents” could respond), then we start to have a system that could work. Vendors who subscribe to RSS feeds from the pingmarket see sales opportunities right on their desktop, not unlike Shopatron‘s manufacturer-to-retail online distribution service.

This architecture highlights two additional requirements: first, how can we trust the claims of the user? Second, how can we (automatically) understand the requests (and claims) of the user?

Validating User Claims

VRM relies upon users making claims of various types:

intention and interest
- In the market for a new car
- Buying a plane ticket
- Looking for a home
affiliations
- AAA Member
- Retired military
- US Citizen
- Member of the California Bar
- credit card #
- employment
facts
- Address
- Age
- Gender
- Income
certificates
- Licensed to drive by California DMV
- Insured to drive by BBB
- Security clearance by US Federal Government
- Credit rating by Equifax

Many vendors avoid wasting time with unqualified leads, including competitors and window shoppers, as well as individuals who can’t legally purchase the product because they are underage or excluded due to export control laws. In addition, the reputational history of the user enables Vendors to focus resources on the most promising buyers. Buyers with no history or with negative history don’t deserve the same VIP treatment that proven, reliable buyers deserve. We see this on eBay, but have no clear way to leverage our eBay reputation with other vendors. A VRM system would allow reputations of this nature to arise explicitly from multiple reputation vendors, and incorporate our transaction history across multiple marketplaces.

It will be a while before our institutions implement these kinds of authentication services, but it is already happening with the earliest “adopters” (apologies to Geoffrey Moore, in his terminology, these guys are all “hobbyists” even when multinational corporations). Sun Microsystems, for example, now validates employee claims so that third party Vendors can rely on that validation for providing services. With Microsoft’s CardSpace technology built into .Net and Vista and soon Active Directory… plus OpenID and Higgins open source solutions, the Internet identity infrastructure is somewere akin to the World-Wide-Web was in 1993 or 1994. Which is to say, about to seriously explode into corporate and mainstream consciousness.

So, to answer the first question, what is different with VRM’s PDs is incredibly fine grained control over both the data and who accesses it. Today’s federation systems actually move in the opposite direction by allowing a wider and wider system of vendors to access personal data with absolutely no control by the user. The result is, frankly, a culture where people hesitate to provide full or accurate information because of fears of what vendors will do with it.

Standardized VRM Data Types and Protocols

For VRM to scale beyond human moderated interactions–the kind enabled by Shopatron where retailers personally check the Shopatron website to select orders to “bid” on–we need a solution for automated understanding of user data. No, I don’t mean some mammoth Artificial Intelligence, natural-language-processing, all-knowing, all-dancing automated salesman. What we need a standardized way for people to make claims such that Vendors can understand them. This means a cross-Vendor open standard for structuring VRM data, including claims, RFPs, personal health records, etc. In some ways, this parallels the work being done by many many folks building the semantic web. If the original data is presented in a structured, commonly understandable format, then programs can have a reasonable expectation of “understanding” it in a useful way.

Currently, Vendors typically have their own internal data structures and formats. This makes it hard to move data from one system to another. Yet, that is exactly the power of the Personal Data Store, serving as the point of integration between multiple vendors, no matter who is sourcing the original data. So, if Amazon, BlockBuster, and NetFlix all want read/write access to my PD to better understand my media consumption history–and provide better recommendations based on that understanding–then all three need to be able to store the data in a mutually understandable way.

This is a huge problem. We are essentially talking about reversing the damage done at the Tower of Babel, of integrating a formal representation of all possible data.

HUGE.

PROBLEM.

Except if we look at it a bit differently. Taken at 30,000 feet, VRM’s PDs seems to offer a secure, universally accessible and universally understandable read-write data store. That sounds great. It also sounds like an insurmountable problem. However, by breaking the data types down into cohesive use cases–at 1,000 feet–we can start to package the PDs in a way that is implementable, scales with use, and provides high-quality understandable data to individuals and Vendors.

First, think of the PD as a fungible store of any kind of data. Built smart read/write access to that data using user-centric identity systems with third-party credentialing for “identity class”-based usage. The VRM is about vendor-customer relationship data, but once the infrastructure is in place, truly any structured data makes sense. (Unstructured data just acts like another ftp or web repository.)

Second, take real-world integration problems and solve them with relatively small, focused data formats and get Vendors to support those formats. For example, a standard media-history record that any vendor can read and write into our PD. Or a standardized RFP format, potentially with an extensible RFP type so that custom, structured information can be embedded in Airfare RFPs, retail goods RFPs, or service RFPs. By tackling real-world problems and working with a handful of real-world vendors, shared data formats that provide immediate value can be developed in a realistic timeframe. By solving each of these in an open-standard, open community fashion, a library of VRM data formats will start to emerge hand-in-hand with the VRM protocols that manage the creation, distribution, and consumption of that data.

You might think of it like MIME for vendor-consumer interactions. MIME is the Multi-media Internet Mail Extension. It was designed to allow email attachments of files like word documents and images. It also allows webservers to specify the type of a file being downloaded by the browser. In both of these cases, the underlying access protocols of SMTP/POP and HTTP don’t need to know anything about what is in the MIME attachment. Instead, applications use the MIME type to do the right thing once the data arrives. In the same way, a PD should provide an identity-based fungible data store where rich data formats of different types can be intelligently stored, accessed, and managed.

The result is a system that scales by adding new open standard data-types to the open data store, just like email and the World Wide Web scaled to support images, Flash, audio, and movies.

Access Rights and Responsibilities

Now that we have a technological infrastructure–and conceptually an institutional infrastructure for validating Identities and claims–we are still missing perhaps the most critical piece of the puzzle: the legal infrastructure.

In today’s internetworked world, there is astonishingly little control over data other than denying access. If a Vendor knows who I am because they bought my name and information from some mailing list company, they can–and do–bombard me with junk mail. They share it with other divisions or sell it to third parties. Some Vendors do have reasonable privacy policies, and I would be remiss not to give a tip of the hat to eTrust, which has done much to advocate in this area.

However, with the Personal Data Store we are talking about a massive restructuring of the scale and type of information that will be made available to vendors, and making it available at incredibly low marginal costs. Not only will Vendors need a viable system for appropriate use of that data, users will need to be assured that the data they put in their PDs is protected in a rigorous way, minimizing user exposure to spam, unwanted solicitations, fraud, stalking, and identity theft. Having personal data–such as your address–in your PD must feel and be at least as secure as entering that same information at the culmination of an online purchase.

Interfaces and Phishing

There are two parts to this problem. The first is the user interface for how a user securely manages their private or semi-private personal data. Largely this has to do with minimizing phishing attacks while assuring the user can feel comfortable with the correct vendors. Kim Cameron often discusses this topic and it remains one of the biggest security risks for user-centric Identity systems. However, VRM and the PD don’t address this problem, nor do I see them ever doing so. As with the rest of the user-centric Identity movement, VRM will build upon the work of others.

Access Rights Management

The second problem is controlling what happens to the data after it leaves the PD. Or, to put it another way, providing restricted use licenses to Vendors who access your data.

I’ve consistently attacked the language of the AttentionTrust and others when they discuss users’ rights in regard to our “attention data”. Many people assert that we, as individuals, own the Attention Data sprinkled around the Internet as we “spend” our attention at various places. I have yet to receive a satisfactory answer to my queries about what it means to “own” that attention data, as it seems ludicrous to me to assert ownership over things like website access logs at YouTube or our transaction history at Amazon. Clearly, the Vendors own that data at least as much as we do.

However, when we store data in our Personal Data Store, we do own it. What the AttentionTrust and APML get right is that by collating our Attention Data in a data store on our computers–or on computers under our control–we are creating a data resource that we do in fact own and control. It doesn’t make sense to then give up that control just to get a better ad from Amazon, does it?

It might. If Amazon were to legally commit to using that data only for presenting that ad. In general, it isn’t usually the immediate use of personal data that we find annoying. What annoys us is the indiscriminant use, propagation, or application of that data out of context and for unexpected uses. I don’t mind telling the bartender what beer I’d like–otherwise she’d have a hard time serving me–but it would be annoying if that choice was broadcast on a loudspeaker “Joe Andrieu orders a Guinness” and posted to the bar’s blog the next morning. Rules of etiquette reinforce these sorts of expectations in real-world society. We call it “discretion”. But until there is something formally restricting Vendors who access a Personal Data Store, we can expect them to use all information as widely and as creatively as they can profitably do so. The consequence is that many users will refuse to expose authentic data, undermining the whole system.

At the same time, we can’t expect every vendor to read, evaluate, and agree to a custom twenty-page licensing agreement for each Personal Data Store they want to access. Instead, what we need is a handful of simple, standard access rights contracts or terms that can legally bind Vendors who access our PDs. Fortunately XRI and XDI have this sort of access rights architecture built-in. However, the actual rights contracts which would use those access protocols remain to be written.

Here are a few rights that users might want to be able to secure for their data, as well as some privileges they could provide to vendors:

Reciprocity–That vendors who access a particular type of data also agree to reciprocally provide updates to that data. For example, I might let Amazon access my media history records if they agree to update it with my past and future media purchases at Amazon.
Non-propagation–No further distribution of the data beyond the specific services authorized. No reselling to third-parties. No re-use by other divisions.
Non-persistence–No retention of the data beyond the session of the current transaction. For example, an emergency room physician can access my personal medical history while I’m under his or her care, but he or she can’t store that data on any internal systems.
Anonymous Persistence–Data can be retained, but only if it is suitably anonymized and disassociated from the individual user.
Editable Persistence–Data may be retained by the vendor, but it must be editable and deletable by the user.
Anonymized Analytic Rights–Vendor has the right to query the PD at a later point for business or operational analysis, as long as that analysis ensures anonymity after the fact.*

*One of the main reasons companies retain detailed customer data is to analyze it for business improvement. Perhaps the product is doing particularly well in certain areas or particular demographics. Perhaps certain customers are having a particularly hard time with certain product features. I expect that many of the largest Vendors will be unable to support non-persistence or anonymous persistence unless they are allowed some way to incorporate rich user data during analysis time. One benefit of the PDs as a source for Vendor analysis is that if performed with non-propagation and non-persistence, it can provide secure, private access to a much broader source of customer data than customers are willing to give and Vendors are able to capture. By shifting this datamining from Vendor silos to Personal Data Stores, not only would Vendors get richer, more timely, and more accurate information for their data mining, individuals would gain explicit control over the use of personal data which is currently entirely under Vendor control in their private silos.

We are already seeing quite a bit of activity by the large search Vendors to modify their own data retention policies to be more user friendly:

http://www.mercurynews.com/business/ci_6449050?nclick_check=1

What makes VRM and Personal Data Stores different?

In summary, VRM is applying user-centrism to vendor/customer relationships in a way not possible (or not worth the effort) before user-centric identity platforms emerged:

Fine grained user control
- the data
- who can access the data (and how)
- access rights and responsibilities for those who do access the data
Third-party validation of claims
Standardized meta-data schema for sharing data with multiple vendors

What will it take to implement VRM and Personal Data Stores?

So, knowing what is different with VRM makes it pretty clear what we’ll need to achieve it:

A service and/or software infrastructure capable of offering users control over fungible, identity-accessed data stores. On the open source software side, look to OpenID, the Higgins Project, and Sun Microsystems (amid many others). I’m not yet aware of anyone offering PDs “hosting” services, but I believe we should see PD capable products within a year or two.
Institutional infrastructure of organizations providing claim verification services using the user-centric Identity Provider (IDP) architecture. The Department of Motor Vehicles, AAA, Equifax, eBay, National Association of Realtors. All are capable of providing added value for their licensed drivers, members, consumers, users, and Realtors, respectively, by being an IDP and enabling Relying Parties (RPs) to provide new services or capabilities based on validated user claims.
Legally binding, and tractable access rights agreements. Before users are comfortable sharing personal data automagically, we’ll need to wrap clear and enforceable rights agreements around our data stores.
Open standards and protocols for the exchange of use-case specific data. Within communities of interest, we need to forge common schemas so Vendors can easily work with data provided by others. There isn’t really a general solution to this problem; however, with focused “vertical” approaches, we can define how we share specific personal information across the Internet.

Closing

This is a long post. If you’ve made it this far, thanks for staying with me. =) VRM is an incredibly rich vein for innovation and I hope I have succeeded in exploring that a bit with you. As a focal point for new services, software, and initiatives, VRM provides both a clear moral framework–it’s about empowering the user in their engagement with vendors–and a clear bounds on our scope: reinvent vendor-customer relationships.

There is a lot of fertile ground for VRM… and a lot of ways that the same underlying technology can be used and applied to use cases outside the vendor-relationship context. Those applications will emerge and be delightful surprises as Doc Searls and the VRM crew rally and clarify the vision we see for a VRM future–and create the technology to bring VRM to market.

Hopefully this article has given some depth and concreteness to a few VRM concepts that have been in recent discussions, especially the Personal Data Store. The entire movement is still early in its gestation. No doubt, the ideas presented here will evolve to become something different than what we have started with and I encourage and invite you to join us in evolving VRM. This is an open community effort and we welcome your input.