April 8, 2006
Reply to Craig Hubley (1)
Craig Hubley wrote a long comment on our initial “Brief status” post, and I’m going to write several posts responding to his specific points. His first point was:
Any means to “learn individual tagging style” implies use of past tagging decisions, and respect for those - which suggests that the algorithm’s recommendations (similar to Amazon’s, I’d presume) constantly pull the user towards their past style, and could inhibit them learning new styles or even adopting new tags. Treating tagging patterns as a deliberate choice, or “style”, suggests too strongly that they are informed decisions as opposed to evolving habits. So I’d change this language and “track individual tagging habits” rather than “learn” a “style” - avoid implying that something’s worth learning, or was chosen deliberately, or has anything to do with user individuality. I think tagging is new enough and hard enough to master, that any user who wants to make good use of new tools, will be learning new habits, discarding old ones, and will rarely want simply to continue their current “style” and have software “learn” it. I suspect it’s far more likely that they’ll want to pick one or more “masters” doing exemplary tagging and prefer to follow the tag styles they recommend.
This raises a number of intertwined issues.
First, Craig is absolutely correct that each user’s tagging will change over time. In addition, the stream of content the user is tagging will change over time. We have to be careful not to nail down any decisions based on a user’s prior actions. (I’m avoiding the word “choices” here, since we don’t really know how much the user made a “choice”.) This is a crucial difference between rules and statistical learning — rules require prior thought and tend to be hard to change. Classifiers created by statistical learning are softer and track current actions, so as the user and/or the content changes, the classifier will change.
Second, many of the questions Craig is raising are ones we want to explore empirically. How stable a user’s preferences are, how happy users are with the classifier judgements, etc. are very much questions we want to test. Within broad limits we can tune the learning to match user preferences (how fast it forgets, how sharp the thresholds are, etc.)
Finally, Craig’s point about taking advantage of other users’ exemplary tagging is potentially very interesting. We have to get basic tagging working, but our next step after that is to let users benefit from tagging decisions by others. There are many ways to do this. The easiest (for us) is to let users see the content from anothers’ point of view — presumably the sort of “master” Craig mentions. We can do this as soon as tagging is working. A more complex option is to figure out what users are “near” each other based on similar tagging patterns. This could automatically let new users get the benefit of other users more experienced tagging.
Again, all these more advanced options need empirical testing and user feedback. I’m sure in the process we’ll learn a lot about both what is technically feasible (and cost effective) and, more important, how users experience the system and what they find appealing.
Filed by Jed Harris at 12:21 pm under Status & plans
“(I’m avoiding the word “choices” here, since we don’t really know how much the user made a “choice”.)” See prior posting about ‘don’t know’ and ‘don’t care’ states, which are poorly handled in all ’standard’ database technologies I know of.
“This is a crucial difference between rules and statistical learning — rules require prior thought and tend to be hard to change. Classifiers created by statistical learning are softer and track current actions, so as the user and/or the content changes, the classifier will change.” However the classifier is using some technique, even if a very general one (singular value decomposition for instance, which goes out of patent I think very soon if it isn’t already - look up Bellcore stuff).
That technique itself has abstractions defined within it, and it’s those I’m concerned with on the API and protocol level, which ultimately affects the data storage. It gains nothing to pretend that things that aren’t flexible, are, or not to meet constraints that the system itself is already constrained to meet. For instance, Standard Time is an abstraction one can abhor, but it exists and is ubiquitous in the data you’ll be reading and tagging. Toronto is a different entity for the purposes of tourism (in which case it might include Niagara Falls for instance) and has changed through time (as noted in a prior post), but Toronto Ward 3 is a specific place at any given point in time, with a specific list of Canadian postal codes and addresses in it, and one Councillor to represent it. When tagging items from Toronto City Hall, you gain nothing by allowing flexibility in references to these things, you simply introduce unnecessary errors and reduce overall trust in tags and categories. So in addition to the technique’s abstractions one also needs to respect the abstractions in the data that is tagged, at least where those are ubiquitous and perhaps defined in some contexts absolutely (such as political boundaries in legal contexts).
It’s fine “to explore empirically” but if you want to explore user error empirically, you’re doing basic cognitive science. To explore technical design parameters, like “how stable a user’s preferences are, how happy users are with the classifier judgements, etc.” requires ruling out all those more basic and perhaps more interesting problems of perceptions of things that are already absolutely defined in the data that one is tagging.
“Within broad limits we can tune the learning to match user preferences (how fast it forgets, how sharp the thresholds are, etc.)” but this requires another abstraction, that of the task. For political purposes I’d like to forget what I know about a government the day it’s voted out of office, for business purposes I have to forget what I know about a business when its acquired or merged or bankrupted, but that doesn’t mean I want the same kind of sharp forgetting applying in every context… this is perhaps the most difficult place where absolute pre-defined conceptions have to compete with user-defined ones. Certainly countries sometimes take pains to reassure neighbours that a regime change doesn’t mean they’re going to behave any differently, i.e. more erratically, and businesses take pains to reassure customers of an acquired business that they will be equally well served by the new regime, but that doesn’t mean a user should not be able to make their own judgements about this or be reminded to make them, or notified of reasons to obsolete a whole pile of tags. Or trust relationships:
Categories themselves are just ways of “taking advantage of other users’ exemplary tagging”. To “let users benefit from tagging decisions by others” is the main reason to want such a thing as a tagging system, else we’d all just use bookmarks.
Just “to let users see the content from anothers’ point of view — presumably the sort of “master” Craig mentions” and accept all the tags of that master as true, would be to use a single category scheme, that applied by that person. One might for instance pay someone to go tag up a big chunk of the net, for a period of time, and might simply choose to mirror the tags that the expert placed on things during that period of time for that purpose (which again speaks to needs to keep time periods and tag purposes/tasks clear in the API and protocol and the data).
Trying “to figure out what users are “near” each other based on similar tagging patterns… could automatically let new users get the benefit of other users more experienced tagging” and it would allow for emergent leadership, which is all desirable… but it assumes that you’ve actually got multiple people tagging. Amazon.com does this right now very well with its lists and its recommendations and reviews, you can determine rather quickly whose reading interests are a lot like yours, and quickly find books they’ve put on lists that they have named in ways that appeal to you. That’s probably the best model to look at, and to try to generalize, for tag purposes.
An implementation note that arises from all this is that having just the order in which tags are applied, not the absolute time, may be enough. If tags are numbered per user monolithically increasing upward, so I have tag 1, 2, 3, … 34343 etc., then, several benefits accrue:
1. a 24 or 32 bit integer is more than enough to store a whole life of tagging decisions, even 8 bits would be enough for most people (look at their bookmarks - do they have more than 256?)
2. algorithms can easily tell what percentage of the user’s tagging behaviour they are observing, simply by seeing which numbers are not in the view of the database that they can see
3. log data from servers or reflectors or kept separately on the user’s terminal can be used to match up the tag number with the actual time, and the rest of the context including perhaps other things in their web browser history at the same time, out-links they followed, etc - giving a view much like spyware if the user agrees to share that ordinal
4. if the user does agree, it’s extremely easy to tell active from inactive taggers, or those who use only one server to tag things… you’ll see fewer dropouts. Servers would know if 30% or 40% of the user’s tagging is being done via that interface.
5. nabbing temporal frames of user tagging behaviour is dirt simple, and discourages arbitrary use of dates as the markers