April 8, 2006
Reply to Craig Hubley (3)
Continuing with my response to Craig’s comment, his third point:
Because they’re political, the influence of our politics on our tags probably has to be (if not explicit) easy to determine by correlation. Someone who’s using tags like “Peak Oil” or “culture of life” at all, is clearly part of some group or movement. Mention of “seal hunt” as a specific or distinct topic suggests concern with it as an issue. Varying terms like “monetary reform”, “capital base”, “reserve rules”, “Bretton Woods”, “dollar hegemony”, all suggest different angles on the same problem, some of which (”reform”) suggest action should be taken. The people tagging may NOT all want to find each other, but the reader DOES want to find all the angles on these issues and therefore would PREFER that tags aggregate in certain ways. For someone “on the left”, perhaps “abortion” aggregates with “women’s rights”, while “on the right” it aggregates with “culture of life”. There must be respect for these choices, and there must be ways to keep these aggregations (”redirects” in wiki-speak) under the control of the user, or a user-chosen, user-trusted, agent. At the highest level of abstraction, I’d simply choose metaphors I wished to reinforce or move towards, and those I wished to abandon, and let aggregation occur as a function of those choices. At least, it might decide which of a long list of hits to drop off, or set some ordering choices. That would be no more insidious than what google is now doing, for its own reasons (not mine).
This point raises a number of technical issues about our approach. They are worth discussing, but realistically, we don’t yet know how we’re going to handle them, so any response at this point is speculative. But hey, speculation is fun!
First, we definitely plan to give each user the ability to make their own individual decisions about how to aggregate issues. On the other hand note that that does not require them to make all the decisions themselves, we certainly plan to let them use the aggregation done by others.
Second, as far as possible we’d like the system to implicitly acquire each user’s current preferences, rather than making users “explain” what they want or why they want it. The approach we’ve adopted is to let users attach tags, and then try to learn the attributes of the content that are statistically common to the way that user uses a given tag. In some sense this should let us describe the user’s current “rule” for applying that tag. The “rule”, of course, can change gradually or abruptly over time, and we should be able to track those changes.
Now let’s consider how to achieve Craig’s design goals within this framework. First, we very likely can find the collection of people who have similar “rules”. For example, if one person uses the tag “Peak oil” and another uses the tag “Energy crisis”, and they have tagged different collections of articles for some reason, but their implicit “rules” are very similar, we can recognize that. Our ability to group people together doesn’t depend on how they spell their tags, or which specific items they have tagged, but on the statistical similarity of their tag use.
On the other hand, we don’t currently have any plans to analyze the terms used in the tags themselves. So if one person used the tag “women’s choice” and another used the tag “baby killers” but they had very similar patterns in using these tags, we couldn’t detect that they had opposite feelings about the material. We would just see them as having similar “rules”. I think current computational linguistics doesn’t give us any way analyze tag names accurately enough to avoid this limitation.
Because we start with the individual user I think we can have some confidence that how things are aggregated will remain under their control. Each user determines the interpretation of their tags. If others use tags that are spelled the same, that won’t change how your tags are interpreted. (Note that this is not at all true of existing shared tagging, which may lead to some confusion.)
The harder question in our approach is how to let users group together, share tags, and influence each other’s tagging “rules”. Because we can find users with similar tagging we can help them to group together if they want. At this point it is less clear how to show users the “landscape” of other users with similar tagging patterns, or how best to give them control over their connections to other users in that landscape. I think once we have the user base, tagging data, and technology to work on those questions, the really interesting part will begin.
Filed by Jed Harris at 1:47 pm under Status & plans
At this point it is less clear how to show users the “landscape” of other users with similar tagging patterns, or how best to give them control over their connections to other users in that landscape.” Someone ultimately has to recommend or recognize that “women’s choice” and “baby killers” are just two phrases representative of two divergent positions on one issue or small range of issues. The language required is what dkosopedia.com explains as the issue/position/argument structure. Positions say what “should” be, while issue statements try to avoid it. Some tags imply positions, which also imply issues but indirectly. To sort this out, dkosopedia invents a mechanism called the FrameShop which identifies “defective frames” (from the user’s or users’ group perspectived) and “recalls” them to be “reframed” other ways. So seeing “baby killers”, you’d take that one to the FrameShop and emerge with “women’s choice” (if you want to fight fire with fire) or “abortion” (if you want to fight fire with water). Wikipedia wants all these things piled under “abortion”, as it gets its whole energy out of both sides being forced to edit the same article, and would fail if they could write their own.
Yes, “once we have the user base, tagging data, and technology to work on those questions, the really interesting part will begin”, but it’s already out there, and it’s already begun. You just have to look hard at what’s already out there, and what open content projects are doing with tags and categories.
The open politics (Canadian) list of all issues took a couple of years to get right, and it’s only about 2/3 elaborated even for all of the positions known and taken in Canadian politics. That’s the depth you have to get into. And you do have to consult many sources in a variety of formats and pull them all together into a common form. Just using the data “your own” users generate, is going to lead you down a garden path of systemic bias from day one. While requiring specific technical expertise, i.e. the users having to create this structure themselves, introduces a systematic bias of its own (note systemic = who’s doing the job, systematic = how the job is done). To end up with a database worth reviewing, you have to start by pulling together all the categories and tags used by people you hate.
As Wikipedia calls it, “writing for the enemy”. When you are “writing AS the enemy”, that’s when you understand the user(s). Not before.