On Taxonomy …

The following dates from 2002 when I was working a with Sharepoint Portal Server 2001. We used to talk a lot about meta-data (data about the data - file properties) and Taxonomy - the method we use to classify documents. I came up with a Taxonomy Ten Commandments to make Sharepoint document stores work. Looking at the way Vista uses meta data in its search, I've been thinking about this again - and I've mentioned it in passing in a couple of presentations. And a customer at last night's presentation asked if I would share the full thing.

Some of the points don't carry forward to vista (like the question of making authors choose a profile for their documents) but the key points of gathering data, and breaking the dependency on folders and file name as the way to find things are important.

I’ve distilled what I’ve learnt about this subject in these basic rules:

  1. Gather useful meta-data. The meta-rule is "It’s all about gathering and using meta-data".
  2. Do not add any field to a profile unless you are sure it will be useful. Authors want to write their documents, save them and get on with the next job: generally they do not want to spend time entering lots of profile fields. Remember that you need their co-operation.
  3. It is better to search for unique information in the document body than in a meta-data field. For example, getting the author to enter a serial number field allows readers to search for documents with that serial number. But in reality they will probably do a free text search for it, not a property search. The free text search has the side effect of cross referencing documents e.g. a search for DOC71077345 turns up a document which contains "This document supersedes DOC71077345".
  4. Store data in the data, not in the field name. Do not create a long list of properties with yes/no answers. Not only is this awkward for users, but the sequence “Relates to product A, Relates to product B” stores Yes and No as the data. A multi select box “Relates to products...” stores the information where you can search it.
  5. The most important meta-data items are Title, Categories and Description: put these first in your profiles; other fields are a bonus. From a pure SPS point of view: if readers won't use it in a property search, then don't ask authors to enter it. SPS‑based applications can be exceptions to this, but generally see rule 1 and rule 2: don't burden authors with requests for information that readers won't use.
  6. Authors should ALWAYS set a meaningful title: if a document has a short abstract at the start, then authors can (and should) copy and paste it into the description. Remember that the results of a search or category browse show the document title and description. Readers won't open a document to find out if is useful.
  7. Ensure that authors understand the importance of title and description. Yes we know people don't want to fill in properties in word, but remember that Sharepoint gives extra weight to words it finds in the title and description – a document with these filled in will come nearer the top in searches where they it is relevant.
  8. Where authors have to make choices make them easy:
    a. Don't make lots of similar profiles e.g. If you deal with software specifications, don’t create profiles called, User interface Spec, Database spec, Search spec, etc: instead use "Specification" with an "area" field with these choices,
    b. Don't make very fine grain categories (e.g. I use "Windows" and not Windows 95, Windows 98, Windows NT Server, Windows NT Workstation, Windows 2000 Pro... etc. A document for Windows 2000 might well apply to XP, and to Pro, Server etc. Authors would be unlikely to file their documents correctly and the categories would go out of date. I use a “product version” field to qualify them).
  9. Remember folders are for the benefit of authors and administrators, not for readers. You only need to create new folders for different security, different approval, and different document profiles. You will want other folders for authors' convenience. If you find you've got lots of folders you're probably using the folder path to imply something about the content of the document which should be in the meta-data.
  10. Readers will find documents by browsing categories or by searching. If readers are exposed to your folder hierarchy you are doing it wrong!