You are currently browsing the archives for the PersonalDataStore category.



R-cards “ah-hah!” at IIW

At last month’s Internet Identity Workshop and the subsequent DataSharing Summit, Markus S and Drummond Reed unpacked several ideas about r-cards, which, to a certain extent, are an evolution of the Information Card at the heart of CardSpace.

Going into IIW, I understood r-cards simply as a hybrid of InfoCard’s managed and personal card models. Managed cards are issued by another party–all the data associated/transmitted with that card is controlled by that managing party, while personal cards are self-asserted, allowing individuals to serve as their own card provider, controlling all of the associated data. R-cards then, allow a managing party to co-control a card with the user–with some data controlled by the managing party and some controlled by the user.

However, during the IIW demo of r-card, I had an epiphany about how powerful the r-card is, once we actually allow the user to manage the personal claims through multiple, dereferenceable links.

One issue that came up during the demo was that if the “personal” side of the r-card is manually entered claims, such as contact information, then the user is creating a management nightmare: duplicate claims would need to be entered and maintained across many different r-cards. The more r-cards, the worse the problem.

The “obvious” solution discussed at the session was to allow the user to specify specific claims that are served by other IdPs, such as a Personal Address Manager. And for completeness sake, let’s note that such claims could be mashed up from multiple other IdPs, not just a single one. Thus, any number of claims from a particular IdP could act as a sort of sub-card, combining with other subcards at presentation time.

The net result of this is a realization that that perhaps the most interesting thing about r-cards is their use as dynamic cards or aggregate cards or mashup identity cards.

That’s pretty cool in itself.

However, it also struck me that this also potentially fixes usability problems around authorizing a bunch of vendor’s (M) access to identity claims at a variety of different identity providers (N). This potentially requires N points of authorization and authentication for each M vendors (or relying parties). Sub-cards (or r-cards) may combine that task at the point of presentation for much greater user understanding and simplicity.

Since the Card Selector is itself a trusted point of authorization, we should be able to use the “mashup” gesture as explicit authorization for relying parties to access the claims specified in the sub-cards. That is, the UI of creating the r-card/mashup card/dynamic card also explicitly approves access to specific claims from multiple IdPs, since after all, the selector is where you select which claims to present to relying parties.

This adjustment to the Information Card ceremony greatly simplifies the user experience, while retaining all the power of distributed claims at appropriate IdPs. For example, it would allow me to specify my Passport # to United Airlines, as a verifiable claim served by the US Secretary of State IdP (which should be trusted by UA), streamlining any international travel I might do, while retaining my contact info at my Personal Address Manager. All with the same authorization ceremony I use with any information card relying party.

This realization was, for me, the most surprising insight into the power of the r-card. In fact, I’m wondering if the name “r-card” captures it best.

Running the Numbers

Bart Stevens recently suggested a breakdown on the potential economic impact of VRM, based largely on a post by Steve Rubel arguing that $1B is wasted in online advertising today.

First, I anticipate the Personal Datastore to become a design pattern that underlies other VRM services, rather than a service by itself. In fact, a PD isn’t really a PD unless it enables VRM services explicitly… Personal Datastores aren’t just online storage like Amazon’s S3.

Eye trackingSecond, I think the $1 Billion number is far too small. Steve is only estimating the CPM costs for display ads that are literally missed by users during eye tracking studies. That’s an intriguing number because those ads truly are wasted… there isn’t even any brand exposure because the ads are not even seen. It’s like paying for ads in a magazine that is never opened by a real reader.

On the other hand, there are still plenty of ads that are seen by the wrong people and CPC ads that are clicked on by the wrong people. Note that for the “right” people, those ads arguably generate useful brand exposure, so they aren’t wasted.

Burning moneyWhen advertising starts with the advertiser, it inherently wastes money, as it inevitably buys placement in ineffective or misaligned media. By now it is an old chestnut that advertisers waste half their budget–they just don’t know which half. Sometimes advertising is an investment in exploring potential markets… the goal is the data gained in the test marketing, which isn’t entirely a waste. Other times advertising is educational outreach where the goal isn’t so much to trigger a sale, but instead to introduce people to new products and services. Sometimes this is called demand generation. And that still leaves a vast amount of waste, buying media (offline or online) that just doesn’t perform or create any value. The potential savings in these areas is not only missing from Rubel’s analysis, I’d wager it is far more than $1 billion.

Question MarkExclamation markThe huge potential of VRM is to turn these models inside-out, by providing a scalable pipeline directly into the product development and sales divisions of capable firms. Instead of Vendors guessing what people want, VRM services can cost-effectively tell Vendors what people truly do want. If the product is available, the sales team can enable purchase and delivery. If the product doesn’t exist, the Vendor can create it if demand is sufficient.

This new paradigm is exactly the shift from Attention to Intention that Doc and I have been advocating. The Attention game is the world of traditional advertising, where the industrial manufacturer competes in mass media to get the attention of the right consumers in order to generate demand for their products and services. Given that attention, they seduce, cajole, and entertain in hopes of winning new sales.

The Intention game, on the other hand, starts with explicit requests from the user to fulfill actual demand. Sometimes that intention will be nascent, needing further exploration and discovery. But eventually, for the segment of the population that finds something they want or need, that intention shifts from educating oneself about available options to seeking specific satisfaction, that is, buying a solution. Because intention starts with the user’s commitment to take the relationship to the next level, it immediately takes a vast amount of guesswork and wasted advertising out of the equation.

Raining DollarsThis guesswork and wasted advertising is probably closer to $100 billion/year, but that’s just my gut feeling. And that number only addresses the loss side of the equation, that is, the money we save by not wasting product development and advertising dollars. It ignores the value of products and services that today languish as innumerable missed opportunities–missed because companies have no way to efficiently gauge true market demand. There are undoubtedly services and products that exist–or could be profitably offered today–which fail to reach customers because we don’t have a suitable mechanism for connecting the right customers with the right companies. This potential to close the gap between potential sales and unmet demand, is simply too large to estimate.

The Cost-Per-Action/Pay-for-Performance business model of Affiliate Marketing is likely to continue to transform the ad industry, significantly reducing billions in unnecessary expenses, including the $1B wasted on unseen display ads in Rubel’s analysis.

It won’t be until we transform explicit intent into new offerings and new sales that we unleash the vast potential that is VRM.

NewsGang talks data portability. Next up: Service Portability.

data and globeExcellent chat today by Steve Gillmor, Chris Saad, Mary Hodder, Karoli Kuns, Robert W. Anderson, Matt Terenzio, and Bruce Lerner about data portability. They get to the nitty gritty about data portability, licensing, and social networks. Perhaps the best Gang I’ve ever heard.

So, Steve, if you’re listening, take this to the next level and talk about service portability.

It’s great to be able to move my data from service to service. Data portability is a good thing–and we absolutely must address the licensing and privacy issues that cloud that horizon. We also need to be able to move our services from provider to provider.

We can do that today with domain names that we own. We can move our blog or our website or our email from one hosting provider to another. The next step is to extend that to user-controlled services that expose data on our terms, under our control.

Data portability lets everyone pass data around so different service providers can do smart things with that data. Ok. But we learned long ago that software systems are more robust, more scalable, and more maintainable when rather than exposing the data, you expose functions that use that data.

email imageI don’t want people who email me to have direct access to my email data file a server somewhere. That would be insane. I want them to have a well-defined, constrained, complete service interface for sending me email, no matter which service provider I choose. An interface that lets them reach me, but keeps them from reading and deleting other email.

Similarly, we need to take user data, place it in a personal datastore (yea! portability!), then provide specific, well-defined access services to third party service providers, using that data, where the user controls those services completely: what services are available, who can access them, and even who the underlying service host is. This is how email works. How websites and blogs work. Next is to take this to user-centric services with complete, seamless data and service portability across the entire cloud.

We know that we need to be able to move our email service from one service provider to another. We know that we need to be able to move our websites to the host of our choice. We know that we need to be able to move our cell phone number from one carrier to another. And we know that we need to be able to change our attorney of record, our doctor, our insurance provider, etc.

We also need to be able to move our MySpace profile and Facebook page anywhere, anytime, on our terms… not just the friends list, but the entire visual gestalt. We need to be able to move our IM and our Twitter services. We need to be able to move our search history from one search provider to another. Pick any service you have come to depend on and understand that dependence creates the need for liberation, the need to get that service on your terms with the provider you prefer, under your complete control.

Without complete portability–services and data portability–innovative service providers will corner markets with data silos and service lock in. Only with transparent, seamless portability, can we leverage the open market and open network to drive to the most desirable and most useful services.

Hey YouThe user-centric identity community is way ahead of the curve on this one, and I’m looking forward to the data portability movement re-discovering the architectural realizations learned the hard way by OpenID, CardSpace, Liberty Alliance, and Higgins, just as the identity community begins to extend from the hard core technology built for identity and starts working towards the applications that will connect ultimately to real value for real users. And it has all been learned and continues to be built through collaborative efforts toward real portability and interoperability at the heart of the infrastructure. In particular, XDI has made great progress hashing out exactly the sort of licensed-based identity-authorized data access that Steve talked about in the podcast. ProjectVRM is driving a user-centric approach to commerce in this conversation and I encourage folks to join us all at the next IIW unconference and to keep an eye open for a VRM workshop sometime later in the year.

The VRM Vector

The core of VRM, Vendor Relationship Management, is the vector of activity.

Remember vectors? Vectors are multi-dimensional, scalars one dimensional. In high school they explained it by saying velocity is a vector, it contains both the direction of travel and the magnitude. Speed, on the other hand, is a scalar. It only has the magnitude, direction isn’t included.

VRM isn’t just about magnitude, it is also about direction.

Bart Stevens, a new contributor to the VRM conversation asked this in his post VRM, APML, and Semantics.

I have been doing quite a lot or reading like JP Rangaswami post
on data portability among various sites.

1. My remark had to do with creating some sort of bank for your data. Maybe owned by the community themselves.

Secondly, I have been following the APML/data portability discussion of Chris Saad at Google Groups

2. My remark is that this is moving in the direction of VRM, we should become an active member in this group

Thirdly, I read this interesting post from Yihong Ding.

3. My remark, should we look into semantics as part of the VRM standardization exercise?

The short answer is clearly “Yes”. Semantics matter, as does the work of the data portability group. Having a better understanding of all the data on the Giant Global Graph as Sir Tim humorously calls it, is A Good ThingTM. It frees the user from Vendor data silos and provides a more comprehensive, understandable foundation for creating user-centric value. That is, it might let you do cool stuff for the user.

The more complex answer suggests a grain of salt is in order, but with appropriate care, all of the above can contribute to a VRM future.

Both APML and the GGG, formerly known as the semantic web, suffer from what I consider a misdirection in attention, despite creating real value in the world. That is, they are doing great work, but at a level that is necessarily abstracted away from where the user gets value.

Think of it like this. Consider all the advances that made biochemistry the amazing science it is today. Electrons. Protons. Molecules. Chemical reactions. Organic chemistry. Enzymes. DNA. Biological pathways. Literally dozens and dozens of Nobel prizes underpinning the concrete understanding of our world that lets us apply modern biochemistry. And that modern biochemistry is solving many of the worlds greatest problems. Absolutely brilliant, powerful, important work.

And yet, it won’t tell you a thing about what makes a person fall in love.

Or what color sweaters are going to sell well this season.

Or what the person entering a search query is really looking for.

The semantic web is based on a model that once all the data is properly interrelated, we can do smart stuff with it. That’s certainly true. That’s essentially what forensics departments do. They analyze all the data available to produce clues that can hopefully solve a crime and convict the guilty. Automating and extending that Giant Global Graph would allow an incredible level of forensic analysis to attempt to figure out how companies can create value. Such a graph would nicely align with the CRM and MIS systems of Fortune 500 companies and direct marketers and charity fundraising campaigns, right alongside the Department of Commerce, the IRS, and the CIA. There’s no doubt in my mind that the graph can be used to create value in new and amazing ways for those entities with the wherewithal to understand it and build systems to leverage it.

But it isn’t about helping individuals.

APML on the other hand has greater proximity to users, which is good. However, it still requires forensics to tease out the value. APML is a storage format for keeping track of clickstream, lifestream, and other attention data. This data is created on the user’s machine at the same time we leave our data trails around the web. Since it collects all of a user’s activity, no matter where they go, it has a much greater reach than even the new Google/Doubleclick database of user activity. And because the user owns this file, the user has the power to control how that data is used by vendors who might want to use it. On the whole, this is excellent. A classic Personal Datastore approach (minus the user-centric Identity access control, but that’s a different issue).

But what APML doesn’t do is explain what real-world value is being created for the user. Like the GGG, somebody somewhere has to do the forensics to make sense of it. Is APML just a more thorough version of the same data that Google/Doubleclick already tracks? If so, what good is that to the user? Will it mean they get more appropriate spam? Will it improve search results? Will it improve the ad banners that show up in the Doubleclick ad network? In other words, while APML certainly starts near the user, it isn’t clear if the direction of value is truly towards the user. I can clearly see how it helps advertisers and investigators, but I have yet to see a credible, compelling case for user value.

VRM, in contrast, is about starting with the user and creating value on their behalf, first. We do that specifically by focusing on commercial transactions and by enabling mutually beneficial relationships. It isn’t about moving the power from Vendors to Individuals, it is about creating new efficiencies and new value points across the ecosystem and marketplace that improve the situation for everyone.

With VRM, the value begins with the individual. The rest is implementation.

By focusing directly on the point of value for the user, I believe we can create more value, more quickly than trying a forensics approach on deeper, larger, data sets. The user is the natural point of integration for any number of services. Even many in the data portability group have shifted their language in this area. Initially Brad Fitzpatrick catalyzed the Social Network Portability movement by imagining a Global Social Graph. But many have come to realize that it isn’t the abstract, six-degrees linking everyone Global Graph that matters, its the slice of that graph that defines our own, individual social connections. What I care about is my social graph, my friends, my coworkers. That’s where the value is created.

Similarly, consider the user-centric answers to the problems above. Instead of looking at the biochemistry or forensic data,

  • Try looking at people falling in love if you want to learn about what makes a person fall in love.
  • Try looking at what color sweaters sold well last year and how other color trends are changing if you want to predict what color sweaters are going to sell well this season.
  • Try looking where other people with similar searches actually visited if you want to find out what the person making that search query is really looking for.

Start with the user. Identify the real value being created and build out from there.

The net results of the GGG and APML are definitely useful in realizing VRM. In fact, the data portability movement is huge. There are sea changes that must occur to fully realize the power of VRM, and I think the Scoble Facebook fiasco and subsequent joining of the Data Portability movement by representatives from Google, Facebook, and Plaxo makes January 2008 a watershed month for opening up the web and giving users more control over their own data.

What this means for VRM standards is that there is a lot of work going on in the real world that is all headed in the same direction, and everyone can leverage the accomplishments of the other teams. VRM isn’t going to build everything, we’re just going to put a stake in the ground on behalf of the user and start figuring out how best to create value, for users, in today’s zero-distance world.

As Doc said at Le Web3, the user is the platform of the future. VRM is about figuring out what that means, not just conceptually, but in concrete pragmatic terms so that real companies can build real technologies and services that enable new, more efficient, more flexible, and more powerful relationships between buyers and sellers. And all of that starts with the user.

The VRM Vector starts with the user, straight towards real value.

The user is the platform of the future… Doc Searls @ LeWeb3

I love Doc Searls. Few people inspire the future as well as Doc, especially when he is on a tear. Here’s a delightful short (<5 min) romp in an interview at LeWeb3 in Paris about the future of the web and the critical importance of making user-centric open systems the core of a ubiquitously connected future. (Think VRM and The User As the Point of Integration)

A few gems:

What is meta about life transcends what is meta about electronics.

We have to look to solve problems for ourselves.

What really matters is our indendence, our freedom, our ability to act on our own

Enjoy!

Credit Industry needs new integration paradigm… think VRM and Personal Datastores

Slashdot brings us this article highlighting yet another picture-perfect case for the VRM Personal Datastore:

Technical Writing Geek writes with the news that the retail industry is getting mighty fed up over credit card company policies requiring them to store payment data. The National Retail Federation (NRF) has gone to bat for store owners, asking the credit industry to change their policies. The frustration stems from payment card industry (PCI) standards and new security measures going into place across the retail experience. Retailers are now trying to point out that many of the elements of the standard would not be a requirement if they didn’t have to store so much payment data.

“Even if the NRF’s demands were immediately met, it would take several years before retailers could purge their systems and applications of credit card data, he said. Over the years, retailers have collected and stored credit card data in myriad systems and places — including relatively old legacy environments — and they are just now realizing the data can be a challenge, he said. Purging it can be a bigger headache because the data is often inextricably linked to and used by a variety of customer and marketing applications; simply removing it could cause huge disruptions.”

This is another excellent example where the Personal Datastores of the Vendor Relationship Management initiative would profoundly simplify integration challenges. The current situation has each retailer acting as an unwitting data silo, storing sensitive information just waiting for hackers to bust it open. The PCI standards try to address this problem by hardening the silos, making the myriad of retailer data systems a sort of armored field of honeypots–and making the retailers liable for breaches. Understandably, retailers are a bit frustrated by the additional demands. However, if the data stores were completely distributed based on the user, rather than the retailer, we could not only remove the liability from the retailer, we could turn the field of honeypots (each with data on potentially hundreds or thousands of users) into endless fields of pollen-bearing flowers, each with just the data for a single individual.

Ultimately, each Personal Datastore–indeed any data store–is a potential target for hackers. However, PDs turn the retailer’s problems upside down in two ways. First, to the extent that PDs are distributed down to individuals’ own computers, the potential identity theft is reduced from a sweet haul of data for potentially tens of thousands (or more) individuals stored at a single retailer down to a single, isolated identity at one individual’s computer. That is, the honey is disaggregated back into the pollen, making it much less attractive to potential hackers… a much lower payoff for the same hard work.

Second, generally speaking, retailers aren’t well-equipped to handle secure IT issues. That’s not their business, even if a few do it well. That means most retailers are much better off placing the security risk in the hands of a service provider who is a specialist in maintaining a secure data store. That’s precisely what they are asking the credit card industry to let them do, even if they aren’t quite thinking of it that way. By moving the at-risk information into Personal Datastores run by companies whose entire business is in protecting and maintaining those Datastores, the risk can be managed by trained professionals whose sole goal in life is protecting that data. This would seem much better than leaving it in the hands of retailers whose business focus is, appropriately, on innovative ways make money selling products to customers.

Reintegrating with the user as the focal point would turn this problem inside out and give retailers, credit card companies, and credit card users a more robust, reliable, and secure solution with less risk and reduced liabilities. I’m not sure who the right entities are to build out this solution, but I’m betting that XRI/XDI, Higgins, and VRM are all enablers.

Microsoft & Personal Health Records, Take 1

Microsoft launched its Personal Health Record initiative yesterday, according to the New York Times:

The company’s consumer health offering includes a personal health record, as well as Internet search tailored for health queries, under the name Microsoft HealthVault (www.healthvault.com).

The personal information, Microsoft said, will be stored in a secure, encrypted database. Its privacy controls, the company said, are set entirely by the individual, including what information goes in and who gets to see it. The HealthVault searches are conducted anonymously, Microsoft said, and will not be linked to any personal information in a HealthVault personal health record.

This is definitely a step in the right direction, using Personal Datastores for managing health records, with fine grained access rights management so users can set privileges for multiple health vendors. It’s a classic VRM use case, undoubtedly implemented with full HIPAA compliance.

For those willing to trust Microsoft, their privacy assurances seem reasonable (full policy):

  1. The Microsoft HealthVault record you create is controlled by you.
  2. You decide what goes into your HealthVault record.
  3. You decide who can see and use your information on a case-by-case basis.
  4. We do not use your health information for commercial purposes unless we ask and you clearly tell us we may.

Unfortunately, it doesn’t look like Microsoft is promoting any open standards (no surprise there), nor allowing users a way to download what is stored in their health record. Does that mean if we want that data out, we can only go through a Microsoft-approved medical partner? If so, does that mean that Microsoft actually owns the data… and not the patient? If so, that’s disturbing.

The full text of the Health Vault privacy statement makes this sound like a feature, using full FUD mode to scare users into thinking Microsoft control is a good thing:

To help provide better protection of your information, the information transfer from your computer to the Service is one way; the Service does not transfer your Health Record information back to your computer.

So, minor points for Microsoft. Kudos for showing the way to a smarter way for managing Personal Health Records and shame on them for not doing it in a way that is completely transparent and open for all users.

I’ve sent the folks at Health Vault an email asking about export and ownership. I’ll let you know what I hear back, if anything.

Why Search needs VRM

Much laughter and thanks to Cory Doctorow for a send up of the Googlefuture, Scroogled. I’ll add a subtitle: “Learning to love Google nation.”

Cory’s tale noir makes it crystal clear why Search needs VRM-style solutions to deal with user control of query and clickstream data. Google isn’t about to let you fully edit or delete your unsavory history any time soon (what a boon that they now promise to anonymize after 18 months). Other efforts, like APML, mostly seem to be beautiful ways to aggregate personal data from everywhere you go and everything you do online… with minimal talk about how you control access to it. John Batelles Data Bill of Rights and similar efforts show promise, but none specifically address how we resolve Search as digital trail of inherently privacy-busting data. Even within the Identity and VRM communities, there has been precious little talk about how to put users in control of their relationships with Search providers, which is to say VRM for Search.

SwitchBook is still essentially in stealth mode, which means I won’t yet say much, except that we think our approach to Search addresses some of these problems, offering a privacy-savvy framework for user-centric Search. We can’t make Google give up your data, but we can create new ways to Search the web that fundamentally reshape how your Search history and results are managed. There’s a reason I’m a big supporter of VRM and it has everything to do with putting the user in control of where and how they Search, while leveraging an incredibly rich personal datastore as they do so.

A world of claims, not facts

On the Social Network Interoperability list, Danny Ayers recently pointed to a great post, “The World is Now Closed” by Dan Brickley, with the following quote:

[[from Dan Brickly:] So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is - “Sally (now) loves John” - and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006?; “Stanford University says Sally was awarded a PhD in 2008?. Today’s young internet users are growing up fast, and the Web around them needs also to mature.]

This is fascinating. It belies an underlying hubris of much thinking in both AI and the semantic web. We often imagine that it is somehow possible to map out, understand, or process some sort of “objective” set of facts. Computer Science practically conspires to force this world view on its practitioners. When programming, we not only start with assumptions about data, we must concretize those assumptions so our algorithms have something to transform from input to output. “Fuzzy logic” and neural nets embrace ambiguity, but computer science on the whole lives in a world of clearly defined inputs and outputs. It literally forces one to think in terms of objective data.

But in the real world, nothing is that simple. Was Princess Diana murdered? Is OJ guilty? Is DNA evidence conclusive? These are legal examples, where ambiguity is argued to death in court so contestants can eventually move on with the rest of their lives, but what about love, betrayal, politics, or discrimination? Does he really love her? Did your business partner always plan to stab you in the back or is he actually still acting in what he believes is in the best interest of the company? Were there weapons of mass destruction? Did race or gender influence your hiring decision?

Answers to these kinds of questions can’t be reduced to facts. They can only be reduced to “good enough” approximations of facts.

This is particularly apparent, for example, in Freebase, a socially maintained structured “factual” semantic database which came out of Applied Minds and at least in part from the brilliant mind of Danny Hillis. Freebase is like Wikipedia on crack. Delightfully ambitious, it has set out to leverage the social editing power of wikis to construct a semantically and computationally accessible knowledgebase of everything worth talking about.

If we ignore for a minute that Wikipedia–and all similar social constructs–can never be perfectly accurate and instead accept that they can be exceptionally useful, then we can begin to see the allure of a socially edited and maintained database of facts such that a computer could query or reason over embedded topics. It’s a great idea and hopefully will create enough value by solving enough of the problem.

And yet, one can see in its “factual” hubris, the beginning of its fundamental limitations. Take for example the “type” associated with living people. There is a different distinct type for deceased people. There was a fair amount of discussion about this, but apparently rather than allow “people” to be either living or dead, it made more sense to separate the two types. Ok. It’s often easy to tell if people are really dead. But what if it isn’t? What if someone, like Steve Fosset, is lost and presumed dead? (That’s my presumption, anyway.) What about Amelia Earhart? What if an individual is brain-dead but still breathing? Do you wait for a definitive statement from a coroner? What if there is no body? The “factual” paradigm requires someone–or the collective someone of social editing–to make the call about whether or not someone is categorized as a living person or a deceased one.

And I have barely scraped the surface on religious “facts”. Both Freebase and Wikipedia (which is often used as the source used in Freebase)  address this in part by shifting from “fact” mode into contextualized statements or claims. See Jesus and Mohammad entries in Freebase. Coincidentally, at the time of this writing the Wikipedia entry on Mohammad is locked to editing because of disputes. It is the nature of the most interesting topics to generate disputes, and yet these same disputes prevent us from asserting any sort of singular claim with any honesty.

The solution used is in Wikipedia is to state that so-and-so religion claims certain things, for instance, about Jesus or Mohammad, and cite a source for those claims (and implicitly listing the editor who entered those claims). It is not clear yet how much these semantics will be captured in the underlying data structure at Freebase.

Generally, these factual databases and modeling systems (such as certain unified schema proposed by some proponents of the semantic web) implicitly require someone to distinguish what is fact from what is not, and often do so without clarifying the asserted “fact” is really a “claim”, although the editing history at least allows you to know who made the claim. The systemic requirement that somebody decides what is “true” is patriarchal, Apollonian, and unrealistic. It enforces a top-down view on the world, even though we know as a matter of practical experience that there are many, many viable and interesting and rewarding competing world views. And yet, the architectural assumptions of Wikipedia are clearly making it difficult to come to terms with appropriate language to present “facts” about Mohammad.

Whether or not there is a classic objective reality in the Ayn Rand sense is irrelevant from a systems development perspective. What’s important is that there are numberless different and competing views of the world, stored in people’s heads, in corporate data silos, and soon coordinated in individual personal data stores. No one system can ever assimilate, aggregate, and accommodate all of those distinct datasets into a unified whole. Trying to do so is a fool’s errand and designing your systems to count on it a recipe for an unscalable system.

Instead, what is important, in my not so humble opinion, is that the interfaces between as many sources as possible allow for fluid, low-transaction-cost, accurate engagement across the network, no matter who you are or who they are, moderated by appropriate rights management and identity access control, so each of us can seamlessly access the datasphere as broadly as we have the right to, as easily as if each data store were our own. Consider how most web browsers can access (mostly) all web pages. That ubiquitous access to different data fuels Wikipedia’s editorial preference for citing accessible web pages whenever making claims. That’s a profoundly simple and powerful model for engaging the world’s diverse data and communications needs. We just need to upgrade to sharable semantic interfaces and proper access mechanisms. Brickley’s comment on claims verses facts highlights a critical system requirement: the acceptance of ambiguity.

Clearly this is the kind of thinking that fuels much of my interest in VRM. Vendor Relationship Management still requires much gestation and care before it can truly be judged as a widely useful effort. But what it does in this crazy world where each data silo has divergent data and every vendor wants to own it all, is redefine the working context so that we can focus on what each individual actually knows and needs, which at least for that individual, for that customer, for that “monetizable opportunity,” is actually quite likely to be “right.” And since it is “right” for that closest dataset to that individual, it is likely to be right in a way that might create value for someone who can respond to those needs and for the person whose needs get addressed. We are working by focusing on the interface between these distributed systems, on the protocols that make networked semi-automated vendor-customer relationships work, not on any presumptions of fact or a globally rigorous index or model of all the world’s information.

Hence the incredible resonance of Dan Brickley’s observation about the relative value of “claims” verses “facts”. We can’t really know if a fact is true, generally, but we can convince ourselve that a given person or company or entity has asserted a claim. And by connecting the claim to an a particular person or company, anyone relying on that claim can decide on their own whether or not to trust that entity or keep checking the facts. For most of us, most of the time, a handle for consistent claims is enough to weave together a shared set of expectations and understandings, which we can use in the face of a philosophically intractable inability to discern the “objective” truth.

Some of this is, of course, old-hat to those folks coming from the Identity world, where they already speak of “claims” and “assertions” rather than facts. And as such, VRM gladly claims that heritage and common sensibility. If you think about it, it makes sense in a vendor relationship. Who really cares what the “factual” price of an item is when you can find a credible vendor willing to offer that same item at a better price. That’s all about claims at the interface between the buyer and seller and all about how we, as individuals, relate with vendors.

The upshot: systems that represent claims of fact made by specific entities will be more robust and more useful than systems that simply represent claims of fact. And that you can design on.

Netflix going VRM

Blogging from GnomeDex, Dave Winer says Netflix is looking to offer VRM-style data portability:

I had some interesting hallway talks, but none more interesting than the one with Kevin McEntee of Netflix about providing a way for users to take their movie ratings from Netflix to other services. This could turn Netflix into the hub for movie ratings (the first place that exports becomes the default UI), and could enable all kinds of interesting combos, such as checking a box on Match.com to be introduced to dates who like the same kinds of movies.

Turning Netflix into a hub for movie ratings doesn’t sound like much of an improvement to me, but creating a way for any authorized service to access all of my movie ratings is music to my ears.

Although Personal Datastores are “owned” by the individual, there is no reason they can’t be implemented in a completely distributed way. I imagine we’ll have a VRM world where every individual has numerous Personal Datastore services providing identity-based access to their personal data, across Vendors.

XRI and XDI enable this sort of service discovery, although I’m only just beginning to get a glimpse of how it works. I believe the Netflix use case can be address through service discovery provided by the user’s Identity Provider (which need not be Netflix). So, for Netflix, the win would be to become my “movie ratings datastore” service. Seems reasonable to me, as long as I can actually control how that data is propagated and used by Netflix and others.

In the near term, I expect Netflix to implement their own semi-open data silo, retaining both data ownership and control over identity. Not because they don’t get it, but because it will be the easiest and fastest way to offer an API for users to use Netflix as their movie ratings platform. But will Amazon and Blockbuster want to play in Netflix’ datastore? Hard to say.

However, once the XDI/XRI protocols are in widespread use, the “third party” architecture makes it a straightforward proposition for any movie provider (or any service for that matter) to access the user’s datastore. Standard protocols and access rights will isolate the vagaries of independent providers, making it possible for vendors to trust the data outside their own silos.

Consider this scenario, which starts with the assumption that the user has a suitable Identity Provider (IDP) to resolve service discovery requests and authentication for their i-name:

First, creating the datastore.

  1. User signs up at a movie-ratings datastore, registering his or her i-name. For this scenario, let’s use Netflix as the datastore service.
  2. User confirms/registers Netflix as their movie-ratings datastore service with their IDP
  3. (Optional) User uploads or inputs initial ratings into the datastore. As a datastore service, Netflix would start with the ratings already stored in their system.

Second, accessing that datastore.

  1. User registers i-name with movie provider service, such as Amazon or Blockbuster (let’s pick Blockbuster for this example). Eventually, this will be an integral part of registration for most web services, replacing usernames and email addresses.
  2. Using the IDP responsible for that i-name, the user authorizes Blockbuster to access to his or her movie ratings datastore, specifying whatever access rights are appropriate. Again, this will eventually be a standard part of registration, where users authorize access privileges to their Personal Datastores.
  3. Blockbuster queries the IDP for the movie ratings datastore, confirms access rights terms, and is directed to Netflix. (Note that the ordering of 2 and 3 is implementation dependent; the authorization could be triggered by Blockbuster’s query.)
  4. Blockbuster queries Netflix for movie ratings using the VRM standard protocol for movie ratings data sharing.
  5. Netflix authenticates Blockbuster via IDP, verifying that the user authorized access to the datastore.
  6. Netflix opens communication channel to Blockbuster for appropriate read/write access to the move ratings database, based on IDP authentication.

The point with this architecture is that individuals can use any datastore provider, any identity provider, and any service provider. Today, all three of these functions are bundled into monolithic proprietary services. You log into Netflix with your Netflix ID, they keep track of your ratings, and only they can provide recommendations or services based on those ratings.

Most limiting for Netflix, they can only “see” the ratings you enter on their system, with no way to know what you have at home or have entered at Amazon or Blockbuster. With reciprocity-based access rights, we should be able to get all of our service providers to both store and access our data from a shared Personal Datastore, seamlessly automating the integration of data across multiple vendors.
And for the first time, services can be built that integrate data outside the “specialty” of the offering service, such as Dave suggested with Match.com using movie ratings for romantic matching. For users, that’s more useful, easier, and delightfully empowering…

Clearly, Netflix sees the benefit of opening up the silo. Here’s to hoping they will join the VRM movement and go all the way to full VRM Personal Datastore interoperability with other vendors.