April 8, 2006
Reply to Craig Hubley (8 and 9)
Taking these points in reverse order, in (9) Craig refers to interesting resources about which I know little or nothing, so I’ll just quote him without comment:
Finally are sociosemantic web considerations. For that I’d suggest the book “Ambient Findability” or (much better) the UN 1993 State of the Future Report from the American Committee for the United Nations University, which covers this in great depth and deals with the various ways delphi methods, semantic webs, sociological and psychographic categories (e.g. “Islamist”, “socialist”, “feminist”) tend to determine aggregation, order, credential, etc. It’s a surprisingly operational description.
Or read this.
Craig’s point (8) raises a number of issues on which I can comment briefly:
Usage varies more by geography than anything else, and the need for discernment of a geographic kind tends to vary not by who we are, but where we are. I need much more detail about a place in California, if I’m actually in California looking for it, than if I’m just asking about it from London, England. I think there must be room here for aggregations to vary and be finer or looser grained based on how “far” we are (in space or in time) from making a decision. Trying to pretend knowledge is spaceless and timeless has a name: scholasticism. And in a word, it’s crap. Temporal and spatial database considerations need to be there from the beginnings, or this will not be useful to guide situated effective action in the real world. In other words, no one will want to read it on a BlackBerry, and that’s where most of the interesting “driving problems” (note the sysygy, that’s “driving” as per Fred Brooks’ “so hard you solve other problems by solving it”, and also as per “moving around in the real world”). If space and time tags are wrong on the first cut, you simply cannot fix them at all later - they’ll lack integrity. As a simple example of the problems, “Toronto” was a very different place in 1998 than in 1999 (the entire 416 area code merged into a new “megacity” whereas it’d only been one of five cities and one borough beforehand). If you have a tag that says “Toronto” on something about North York prior to 1998, it’s about a neighbouring city. If it’s from 1999, it’s about the same city. This matters much more if you are heading into that city than if you are considering just visiting it next year. And, if time has to be on the tagging to determine how our tagging habits/style changes, then, we’re very exposed to semantic errors, e.g. failing to put time zone on date, leaving something which can’t be resolved to an actual time span, other than stochastically.
I don’t think we have a general way to solve the problems Craig refers to, and I’m not sure we can really address them at all. However in one respect we may be able to offer help. I would tag items “Bangkok” as a potential tourist, and our system, if it works well enough, will reflect that in building my classifier. Someone else, living in Thailand, might tag items “Bangkok” because they are analyzing real estate deals in several cities. Then, in building a social landscape, my tag should be grouped with other tourists to southeast asia, whereas theirs should be grouped with other Thai real estate investors.
This is an ambitious goal, and it may well be beyond our grasp. But it indicates our approach toward problems like the one that Craig brings up. Our users are ultimately the ones who decide what distinctions are important to them. They know more about the subtle structure of the world than we can ever hope to capture in a system of categories or attributes. Our goal is to create an environment in which they can make their most important knowledge and values available to the system, and through it to other users, with as little overhead and distraction as possible. Much knowledge and values are tacit, and often could not be made explict with any amount of effort. So to minimize the burden on users, and to engage these tacit sources of knowledge and values, we have to learn from each user’s actions without requiring them to explain themselves.
The example of Bangkok illustrates this tacit component of tagging. Neither I as a tourist, nor the Thai resident as a real estate investor, would be able to explain how we use the term “Bangkok”, but our actual use reflects our interests and knowledge. If the system can learn from that, it will help us find the material we want to see, and potentially help us understand where we fit in a larger social context, and find others whose interests mesh with ours.
To the extent that distinctions of era, locale, etc. form a vital background to the users’ way of understand some topic, the system should capture that, and ultimately help the community reflect on this initially tacit aspect of their diverse perspectives.
Filed by Jed Harris at 7:21 pm under Status & plans
While dialogue is valuable, I’m not sure it serves the purpose of discovering sources of distinction, and uses of distinction, to simply mark these as part of a “reply to” myself. There’s a whole field of study here, and as I noted it goes back to 1993, at least, when the UN first expressed interest in the problem of overlaying social-context-sensitive semantics on the emerging hypertext of the http://WWW. The material they drew on is now 15 years old. It may not be that hard to find, though I’m not sure it was ever published online (ironically).
“Our users are ultimately the ones who decide what distinctions are important to them. They know more about the subtle structure of the world than we can ever hope to capture in a system of categories or attributes.”
This is true. So what’s required is a system of categories or attributes that adapts itself based on statistical norms, which reflects correlations between “people like you” in some sense - not a non-system that assumes only that each user works in some unique and inscrutable way. If that were true, there’d be no such thing as language. If it’s population thinking we want to track, then it’s the methods of demographics we’d have to copy.
“To the extent that distinctions of era, locale, etc. form a vital background to the users’ way of understand some topic, the system should capture that, and ultimately help the community reflect on this initially tacit aspect of their diverse perspectives.” While there are reasons to be quite conservative about accepting the tacit as definitive, agreed, the distinctions of where and when someone is (as opposed to the much more problematic question of who) making distinctions, seem solvable. We may know what body, what geographic location (in the sense of latitude and longitude) and many other things they tell us, or allow us to track. That too is subject to fog and some people would share a lot more than others, and some take care not even to tell systems they nominally control, all that much about themselves. To the goal statement I add only a couple of key words:
“Our goal is to create an environment in which they can make their most important knowledge and values available to the system” they control, “and through it to other users” they trust (only) “with as little overhead and distraction as possible.” Yielding something like the very personal computer. Which is the model that led to the 1993 UN work.