User:YuviPanda/GSoC
The SelectionSifter helps anyone choose which wiki articles to collect into a selection. (This was a GSoC 2011 project.)
How to interpret estimates
editThe given values are lower bounds. Multiply by 3 to get higher bound. Multiply by 2 to get average. Overshooting timelines is to be expected. Will be adjusted as the project motors along.
Current implementation
edit- Perl!(?)
- w:en:User:WP_1.0_bot
- Is a batch-processingish bot
Rewrite specifications
edit- Written in PHP
- Backwards compatible with current assessment templates used
- Should be 'good enough' to be deployed on enwiki
- Feature Parity with WP1.0 Bot
Components
edit- Assessment Data Collector
- Update Assessment Data whenever it is changed
- Log changes to assessments
- Import initial data from current Bot
- Querying interface (Assessment Statistics + Articles List)
- Arbitrary Querying of assessment data
- Embedding of arbitrary query results in different forms inside wiki articles (Statistical Table embedding)
- Creating, managing and exporting 'interim collections'
- Usage of Extension:Collections tbd
Component 1: Assessment Data Collector
editMachine Readable Assessments
editAfter talking with User:CBM and User:Awjrichards, we hit on a much better way of doing assessments. Since most assessment templates use the w:en:Template:WPBannerMeta, we can modify that template to provide machine readable assessment data that can be then read by the assessment parser. This eliminates the need to maintain wikiprojects separately.
Representing Assessment Info
editThe approach favored by me would be to modify the template to insert extra attributes (data-*
attributes) on the link pointing to the WikiProject home page. data-wp-importance
and data-wp-quality
placed on that link would denote importance and quality assessment for that particular article from that particular WikiProject, and a class
is added to the a
tag to denote that it represents a wikiproject assessment. In line with the POSH principle from Microformats, but without too much abuse of class
. This puts them in a machine readable form right in the HTML.
Parsing out Assessment Info into Database
editAfter each edit, we could either:
- Parse out the HTML (after it's generated) and pick out the assessment data
- Put an entry into the job queue, which executes code to pick out the assessment data
We then add it to the database if the info has changed, and record pertinent information in the log (user, timestamp, rervision, etc)
Open Issues
edit- Okay to use
data-*
attributes on WMF properties? No issues with browser compat, but still would like to get this clarified. - Is the metatemplate good enough to actually insert these
data-*
attributes properly? I tried reading it (three times!) and got a headache. Need to contact User:MSGJ. - We'll be parsing HTML to get data out. Is this considered dirty and sinful? Will I be punished by the WMF cabal? This is perhaps the most important issue.
- Parse out right after edit, or put in queue? Needs performance testing.
- How do I parse out the HTML? OutputPage doesn't build a DOM afaik, and I'd like to avoid reparsing if possible. External library?
Logs
editLogs of assessment changes every time they are changed.
Tasks
edit- Develop logging model, with DA code (2 hours)
- Write a Special Page extension to view/filter the log. Filter By: (14 hours)
- Time of Change
- Type of Change (Importance/Quality/Other)
- User making change
- Direction of Change (Improve/Detoriate)
- Category/Project of article change is made to
- Article name
Component 2: Querying and Embedding
editQuery Engine
editSet of core components that can execute any arbitrary queries, producing both statistics and article lists
Tasks
edit- Build a basic querying engine that can be extended in the future over other assessment backends (not just WikiProject based assessments). Abstract and well defined interfaces built. List of supported query operations would rather closely mirror that of LINQ. (est: 12 hours)
- Implement the querying engine for the WikiProject based assessments (Component #1) (est: 12 hours)
- Implement specific statistical engine for WikiProject based assesments. Support for overall and per project tables (est: 12 hours)
Querying Interface
editUser Interface to interactively query the assessments - both overall statistics and article lists.
- Expose the query engine via a Special Page (est: 12 hours design + 12 hours implementation)
- Expose the statistical engine via a Special Page (est: 12 hours design + 12 hours implementation).
Embedding Interface
editMagic Words (or similar) that let you embed statistical tables inside wikipages. Customizable.
- Build magic words to embed statistical tables/results in wikipages (est: 6 hours)
- Build magic words to embed query results in wikipages (est: 8 hours)