Wikipedia:Wikipedia Signpost/2010-09-13/Public Policy Initiative
Experiments with article assessment
- User:Sross (Public Policy) is Sage Ross, the Online Facilitator for the Wikimedia Foundation's Public Policy Initiative. As a volunteer, he edits as User:Ragesoss.
I've been working on Wikimedia's Public Policy Initiative team for a little over three months. The level of interest and enthusiasm we've seen from university professors and volunteers interested in the Wikipedia Ambassador Program has been gratifying, but we still have a long way to go before coming anywhere close to realizing the full potential of all the good will and interest among experts who don't (yet) contribute.
One of the great challenges of this project is assessment: how can we measure the degree to which the project is improving Wikipedia? We're working on three assessment projects within WikiProject United States Public Policy, each of which is relevant to the broader issue of content assessment in general on Wikipedia.
An optional new assessment system
First, our quality assessment system (WP:USPP/ASSESS). Like many other WikiProjects, the U.S. Public Policy project has implemented its own variation on the standard Wikipedia 1.0 assessment system (in which articles are rated as Stub, Start, C, B, GA, A, or FA-class). The basic idea of the new system is to use weighted numerical ratings for six different aspects of article quality: comprehensiveness, sourcing, neutrality, readability, formatting, and illustrations. The system's rubric defines the different scores and how they translate into the standard Wikipedia 1.0 classes. There are several advantages: (1) it contains a specific weighted rubric, (2) it offers more detail on the areas that need work, (3) it provides numerical data for quantitative analysis, and (4) it is backward-compatible with the standard system. We hope it will also prove easier to learn and produce more consistent ratings. The downside is that it's more complicated, and we have yet to reach a critical mass of active reviewers trialing it.
The Wikipedia 1.0 scheme, which was originally pioneered by WikiProject Chemistry, succeeds to a large degree because of its simplicity. Experienced Wikipedians develop a good feel for the stages of improvement articles typically go through, and the 1.0 scale codifies those stages. It provides a quick way to mark the quality of individual articles and a blunt measurement of how quality is changing over large groups of articles, and even across the whole of Wikipedia. However, the system is not easy or intuitive for newcomers to pick up. Although simple from an experienced editor's perspective, the system has nuanced definitions of what, for example, makes a B-class article different from a C-class article or a Good Article; these definitions can be bewildering for those who haven't absorbed Wikipedia's norms. Like our core policies and guidelines, the 1.0 assessment system squeezes a lot of Wikipedia culture into a small package. The goal of the public policy system is to unpack that culture, making more explicit what Wikipedians expect from high-quality articles. We believe this explicitness may reduce some of the inconsistency in the 1.0 system, as well.
Rating the ratings
A second and closely related effort is the plan by our research analyst, Amy Roth, to test how consistent Wikipedia's article ratings are. We are assembling a small team—a mixture of Wikipedians and non-Wikipedian public policy experts—to periodically rate and re-rate a random sample of public policy articles. Amy will measure how closely results from our system match the standard ratings, how much ratings vary from person to person, how well the ratings can account for changes in article quality, and whether outside experts' assessments differ significantly from those of Wikipedians. Amy's test may shed light on the inconsistency of assessments in the middle ranges of the standard scale, particularly Start, C, and B-class.
Recruiting for the assessment team has gone poorly so far, but we have plans to run a watchlist notice to attract more attention to assessment efforts (as well as potentially enlarging the group of Online Ambassadors to keep pace with the expanding number of students who will be participating in Wikipedia assignments).
Input from readers
The Public Policy Initiative will test a new Article Feedback Tool. Beginning 22 September, the feature will be enabled for most of the articles within WikiProject United States Public Policy (it will not be enabled on the most trafficked articles to avoid overtaxing the servers). Editors interested in seeing the extension in action on particular U.S. public-policy-related articles should ensure the articles are tagged with the project banner, {{WikiProject United States Public Policy}}, and assessed with the WikiProject's numerical system.
This pilot is also part of the Wikimedia Foundation's longer-term strategy to explore different mechanisms of quality assessment. The potential upside of reader ratings is straightforward: we may be able to get a large number of ratings, and with a largely external audience judging quality (as opposed to Wikipedians judging their own work). The potential downside is also clear: non-experts may submit low-quality ratings, or there may be attempts to game the system. The rating tool includes a small survey that will complement the collected data.
Together with the technology team, we will test the technology, analyze the data, and continue discussions about how a reader-focused rating and comment system might be used in the next academic term in the Public Policy Initiative, as well as on Wikipedia more broadly. I'm personally very excited about the possibility of creating a robust system for reader feedback, and I hope this test sparks serious discussion about what such a system should look like. A set of Questions and Answers regarding the feedback tool, as well as a general discussion page about it, will be available soon.
If you're interested in any of these assessment experiments, please join WikiProject United States Public Policy, or sign up for Amy's assessment testing team.
Discuss this story
I don't think it makes sense to have a complicated system of assessment. An editor who is experienced in an area can eyeball an article and tell you whether it is Start, or C or B. As someone said above, every article is a moving target anyway, so why worry so much about assessements. So what if a C-class article gets grade-inflated to B: It still needs careful work to be ready for GA. IMO, editors spending all this time on assessment ought to be researching and writing instead. -- Ssilvers (talk) 21:15, 14 September 2010 (UTC)[reply]