User talk:Brent Gulanowski/CategorizationOK, so, here are some specific questions:
I'm sorry if this sounds hostile or dismissive, but I've been reading categorization proposals till my face turns blue. Most propose the following features:
Some also have these features:
I guess I'm wondering if you could pull this together into a meta:Categorization requirements article or something. I think we could use that to determine if concrete implementation proposals actually meet the requirements. --ESP 21:49, 17 Dec 2003 (UTC)
Here are a few more thoughts I had as I read your good proporsal: 1. Any existing drill-down searching does not work well, if at all. That method is an exceptionally powerful tool when it exists. This is a needed focus. 2. Portals and category lists are difficult to find. 3. Using more simple-term categorization will help the less knowlegeable find what they need. 4. The problem with combining category and content information is that it tends to add confusion to the whole process. If the category name is not clear, then right there is the place for a disambiguation link. It had appeared to me that the majority of your thoughts are right on point. I very much appreciate improvements in the subject area, as they are sorely needed. In fact, the home page should be the place where all the portals and categorization begins as the starting point for drill-down searching. - KitchM (talk) 16:50, 28 July 2011 (UTC) Addenda to the ProposalCategory:AllAll articles belong to the "all" category by definition. The main page, if it is the "category:all" page (which I implicitly assume in the article), is the summary article for the all category, as opposed to the member list for the "all" category. A third aspect of a category could be its meta-data, which is variable and not related to the categorization scheme. The purpose of meta-data is ergonomic. The meta-data is everything else you'd want on a page for a category -- I don't know how else to define it. The side bar, control panel (search, user links, etc.). The main page is possibly unique in light of the additional content (including what could be called meta-meta-data if that term was not so awful and nearly meaningless), so I deliberately did not dwell on it. Separation of Concept from ImplementationHow pages are tagged should be irrelevant at the conceptual and design stages. I personally have no stake in that. What is more important is the logical process which leads to the definition of a category, which is why I am emphasizing an algorithm, and why I cannot over-emphasize it (from my position). Whether the wiki had two or 200K articles should not matter to the algorithm, although I accept that it matters to a real implementation. It is possible to create constraints which make the algorithm more complicated but more flexible and conducive, which do not change the essential nature of the algorithm. Default CategoriesFor example, one could define a set of default categories that are easy to generate automatically. The most obvious one is alphabetical ordering, but there could be hundreds of similar ones based on sorting (by date created, length, whatever). Although you don't have to use sorting and partioning, either; that is just the easiest to implement or think about. (Partitioning, by the way, leads to a nice uniform tree of categories.) (It would be possible to define a set of default categories using links, and simplify by discarding any link which creates a loop. Either mark each page as categorized (if you want a tree), or not (if you want a DAG). A category could be defined as the set of pages reached from the first page in n links, or it could be the first n pages (assuming some kind of ordering of links). Link (or page) n+1 would be taken as the first in a new category. This is much more complicated a means to auto-generate a default set of categories. It would be interesting to try. Chances are it would not be much more useful than an alphabetical sorting for human use.) Creation of New CategoriesWhether one starts with all articles not categorized or all articles in system-generated default categories, the next step is to start producing useful categories. This happens one page at a time. Ideally you only allow one page at a time, although letting users lose on the system might prove problematic -- I leave that to the sychronization specialists. But say you have no default categories and all 200K articles are just sitting in "all" or even "none". By the algorithm as it stands, we immediately have to define a special "category" category. Whether this is safe by axiomatic set theory we might need a specialist to help us with. I don't know the implications of self reference in set theory, i.e.: can a set contain itself? Probably unimportant here. It might, however, be important to decide if a category is a purely logical entity or if it is merely a page in the "category" category. I'd say the former, since pages are in some sense generated as needed, and are distinct from articles themselves, correct? Example of Creating a New CategoryGiven the "all" category, the "category" category, and two articles, one about each of these categories (being named, for now "category:all" and "category:category", both articles are immediately members of both categories. Thus, a member list page generated for either will include links to both. Category member list pages 'are not' ur-entities (that is, not eligible for category membership themselves), only snapshots of the system at a moment in time. (Set theorists might take issue with this.) We are finally ready to add a new page to the category system. Let's say we have an article entitled "set theory" (or "backgammon" if you prefer). Article (set theory: ur-element) "set theory" ("backgammon") is added first to category "all". At this point the algorithm provides two options:
The algorithm only lets you move one article at a time into the new sub-category. You could choose to define a language for category management tasks which allowed the movement of lists of pages, and a user interface which allowed the selection of pages in order to construct a list interactively. Whatever, that's the implementation. Regardless, the article(s) so moved are still logically part of the parent category, although it is better to call the new sub-category a partition and to represent it with a token in any list of the members of the parent category (this is distinct from saying that the sub-category is itself a member of the sub-category -- we're not allowing sets of sets here, for the moment). Implications of Default Categories (Autogenerated)The system should probably auto-generate an empty category summary article or at least a token or reference. This might be different than how uncreated non-category articles are created. If the articles are initially placed in auto-generated categories, it is immaterial whether those categories are "special" (say, permanent where others are not, or temporary where others are permanent). If there are ten pages per category, then you'll have 20K default categories, so its clear that the system better be able to handle them efficiently, but I don't see why that would be a problem. A more annoying problem of combining default category generation with category summary articles is that you end up with countless stub pages that will never be filled. It would make sense that auto-generated categories do not have auto-generated stub pages. Prevention of Orphan CategoriesWhat is a problem, in the case of multiple contributors defining categories, is the creation of bogus categories, but that's a management issue. The algorithm takes it for granted that contributors only define meaningful categories. What would be required is that an article which begins life in a default category has to be put into an existing real category before it can be moved into a sub-category. So, the first article has to be added to the "all" category (nothing can be added to the "category" category). A stricter variation on the algorithm would involve forcing a new article to be added to sub-categories in sequence, starting at the "all" category, to ensure no orphan categories are created. Alternately, it is enough to require that every new category has a parent category, which can be proved (by induction) to amount to the same thing. One trusts that contributors will not simply make every new category an immediate child of "all". Viewing Membership of a CategoryExactly how a list of members is maintained in the implementation is not especially relevant. Likewise, it is not important what the maximum number of categories per article is set at. What is important is that an article cannot be both a member of a category and a sub-category -- membership in any category implies membership in its parent. If an article somehow made it into the lists of both the parent and the child, it would in fact be in the parent twice, which is a violation of the nature of a set. Keep in mind that we are using a token of some kind to represent the sub-category as a partition of the parent -- that token (a pointer) literally is those members. It is important to distinguish this state of affairs from the idea of a record or structure in a programming language like C -- if a struct contains a struct as a member, the members of the second struct are not members of the first struct. -- Brent Gulanowski 07:45, 18 Dec 2003 (UTC) |
Portal di Ensiklopedia Dunia