On the Social Network Interoperability list, Danny Ayers recently pointed to a great post, “The World is Now Closed” by Dan Brickley, with the following quote:
[[from Dan Brickly:] So what am I getting at here? I guess it’s just that we need these big social sites to move away from making teen-talk claims about how the world is – “Sally (now) loves John” – and instead become reflectors for the things people are saying, “Sally announces that she’s in love with John”; “John says that he used to work for Microsoft” versus “John worked for Microsoft 2004-2006?; “Stanford University says Sally was awarded a PhD in 2008?. Today’s young internet users are growing up fast, and the Web around them needs also to mature.]
This is fascinating. It belies an underlying hubris of much thinking in both AI and the semantic web. We often imagine that it is somehow possible to map out, understand, or process some sort of “objective” set of facts. Computer Science practically conspires to force this world view on its practitioners. When programming, we not only start with assumptions about data, we must concretize those assumptions so our algorithms have something to transform from input to output. “Fuzzy logic” and neural nets embrace ambiguity, but computer science on the whole lives in a world of clearly defined inputs and outputs. It literally forces one to think in terms of objective data.
But in the real world, nothing is that simple. Was Princess Diana murdered? Is OJ guilty? Is DNA evidence conclusive? These are legal examples, where ambiguity is argued to death in court so contestants can eventually move on with the rest of their lives, but what about love, betrayal, politics, or discrimination? Does he really love her? Did your business partner always plan to stab you in the back or is he actually still acting in what he believes is in the best interest of the company? Were there weapons of mass destruction? Did race or gender influence your hiring decision?
Answers to these kinds of questions can’t be reduced to facts. They can only be reduced to “good enough” approximations of facts.
This is particularly apparent, for example, in Freebase, a socially maintained structured “factual” semantic database which came out of Applied Minds and at least in part from the brilliant mind of Danny Hillis. Freebase is like Wikipedia on crack. Delightfully ambitious, it has set out to leverage the social editing power of wikis to construct a semantically and computationally accessible knowledgebase of everything worth talking about.
If we ignore for a minute that Wikipedia–and all similar social constructs–can never be perfectly accurate and instead accept that they can be exceptionally useful, then we can begin to see the allure of a socially edited and maintained database of facts such that a computer could query or reason over embedded topics. It’s a great idea and hopefully will create enough value by solving enough of the problem.
And yet, one can see in its “factual” hubris, the beginning of its fundamental limitations. Take for example the “type” associated with living people. There is a different distinct type for deceased people. There was a fair amount of discussion about this, but apparently rather than allow “people” to be either living or dead, it made more sense to separate the two types. Ok. It’s often easy to tell if people are really dead. But what if it isn’t? What if someone, like Steve Fosset, is lost and presumed dead? (That’s my presumption, anyway.) What about Amelia Earhart? What if an individual is brain-dead but still breathing? Do you wait for a definitive statement from a coroner? What if there is no body? The “factual” paradigm requires someone–or the collective someone of social editing–to make the call about whether or not someone is categorized as a living person or a deceased one.
And I have barely scraped the surface on religious “facts”. Both Freebase and Wikipedia (which is often used as the source used in Freebase) address this in part by shifting from “fact” mode into contextualized statements or claims. See Jesus and Mohammad entries in Freebase. Coincidentally, at the time of this writing the Wikipedia entry on Mohammad is locked to editing because of disputes. It is the nature of the most interesting topics to generate disputes, and yet these same disputes prevent us from asserting any sort of singular claim with any honesty.
The solution used is in Wikipedia is to state that so-and-so religion claims certain things, for instance, about Jesus or Mohammad, and cite a source for those claims (and implicitly listing the editor who entered those claims). It is not clear yet how much these semantics will be captured in the underlying data structure at Freebase.
Generally, these factual databases and modeling systems (such as certain unified schema proposed by some proponents of the semantic web) implicitly require someone to distinguish what is fact from what is not, and often do so without clarifying the asserted “fact” is really a “claim”, although the editing history at least allows you to know who made the claim. The systemic requirement that somebody decides what is “true” is patriarchal, Apollonian, and unrealistic. It enforces a top-down view on the world, even though we know as a matter of practical experience that there are many, many viable and interesting and rewarding competing world views. And yet, the architectural assumptions of Wikipedia are clearly making it difficult to come to terms with appropriate language to present “facts” about Mohammad.
Whether or not there is a classic objective reality in the Ayn Rand sense is irrelevant from a systems development perspective. What’s important is that there are numberless different and competing views of the world, stored in people’s heads, in corporate data silos, and soon coordinated in individual personal data stores. No one system can ever assimilate, aggregate, and accommodate all of those distinct datasets into a unified whole. Trying to do so is a fool’s errand and designing your systems to count on it a recipe for an unscalable system.
Instead, what is important, in my not so humble opinion, is that the interfaces between as many sources as possible allow for fluid, low-transaction-cost, accurate engagement across the network, no matter who you are or who they are, moderated by appropriate rights management and identity access control, so each of us can seamlessly access the datasphere as broadly as we have the right to, as easily as if each data store were our own. Consider how most web browsers can access (mostly) all web pages. That ubiquitous access to different data fuels Wikipedia’s editorial preference for citing accessible web pages whenever making claims. That’s a profoundly simple and powerful model for engaging the world’s diverse data and communications needs. We just need to upgrade to sharable semantic interfaces and proper access mechanisms. Brickley’s comment on claims verses facts highlights a critical system requirement: the acceptance of ambiguity.
Clearly this is the kind of thinking that fuels much of my interest in VRM. Vendor Relationship Management still requires much gestation and care before it can truly be judged as a widely useful effort. But what it does in this crazy world where each data silo has divergent data and every vendor wants to own it all, is redefine the working context so that we can focus on what each individual actually knows and needs, which at least for that individual, for that customer, for that “monetizable opportunity,” is actually quite likely to be “right.” And since it is “right” for that closest dataset to that individual, it is likely to be right in a way that might create value for someone who can respond to those needs and for the person whose needs get addressed. We are working by focusing on the interface between these distributed systems, on the protocols that make networked semi-automated vendor-customer relationships work, not on any presumptions of fact or a globally rigorous index or model of all the world’s information.
Hence the incredible resonance of Dan Brickley’s observation about the relative value of “claims” verses “facts”. We can’t really know if a fact is true, generally, but we can convince ourselve that a given person or company or entity has asserted a claim. And by connecting the claim to an a particular person or company, anyone relying on that claim can decide on their own whether or not to trust that entity or keep checking the facts. For most of us, most of the time, a handle for consistent claims is enough to weave together a shared set of expectations and understandings, which we can use in the face of a philosophically intractable inability to discern the “objective” truth.
Some of this is, of course, old-hat to those folks coming from the Identity world, where they already speak of “claims” and “assertions” rather than facts. And as such, VRM gladly claims that heritage and common sensibility. If you think about it, it makes sense in a vendor relationship. Who really cares what the “factual” price of an item is when you can find a credible vendor willing to offer that same item at a better price. That’s all about claims at the interface between the buyer and seller and all about how we, as individuals, relate with vendors.
The upshot: systems that represent claims of fact made by specific entities will be more robust and more useful than systems that simply represent claims of fact. And that you can design on.