Core Platform Team/Initiative/Image Suggestion API/Open Questions
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Project Organisation
edit- What are our definitions of done and success criteria? How are these broken down per components and aligned across teams?
- Are we one project team, or two (backend/API) that work together?
- 1 team with 2 concerns: Image Suggestion API & Data Pipeline
- 1 team with 2 concerns: Image Suggestion API & Data Pipeline
- How would we like to communicate with other teams?
- Do we have points of contact for the support teams?
- Do we need a RACI for the project?
- Are we missing any resources?
- No
- No
Timelines and Scope
edit- Are there critical intermediate deadlines for other teams that we should be aware of?
- What is the timeline for the various parts of the project
- Android: MVP Release by March 3
- Android: MVP Release by March 3
- Are there any teams we can decouple dependency from?
- What can we, platform team, stop caring about? (out of scope)
- Are the expectations clear and realistic?
- Can we deliver within the timeline?
- How do we bound this project if it is also going to be iterative?
- What are the risks?
- What constitutes scope creep?
- What internal deadlines can we set for ourselves?
- Proof of Concept target delivery date is March 3
- Proof of Concept target delivery date is March 3
Requirements
edit- Are there any eventual requirements whose deferral jeopardizes the architecture?
- What prereqs must we satisfy before we can start a POC Task API implementation?
- Who approves the API spec?
- The Client Team(s)
- The Client Team(s)
- How often do we expect to re-train the model? The best we can do is currently once a month.
- What system / team will be responsible for tracking recommendations state?
- Can we alter the Image Rec. Algorithm to run more performant(ly)?
- Is it proven that the image rec. algo provides "better" results than MediaSearch?
- Does the ranking system need to be part of the first iteration (where does it fall if the SD is no longer going to use the Task API)
- Confidence Rating will be included as part of the Image Recommendation API proof of concept
- Confidence Rating will be included as part of the Image Recommendation API proof of concept
API Service
edit- What language or framework should we build the api in?
- The proof of concept will be built with nodejs.
- Is the API going to be an extension or service
- The API will be a service.
- Is task api storing the data from image rec algo + MediaSearch somewhere, or doing queries to both in real time, and then smashing the results together?
- The API will "smash" the results together of the image rec algo + MediaSearch if the results from the image rec algo are not "sufficient". This may mean not enough results to satisfy the number of requested results. The API will likely do a query to MediaSearch in real time, and then have intermediate storage between the image rec algo Hadoop cluster and the task API.
- How do we update tasks to reflect user's actions (accept/reject a task)
- What’s meant by the Image Recommendation bot as an end user? I was under the impression the API would be used by human interaction only
- The API will serve both end users (e.g. android app users) as well as MediaWiki bots that will automatically select images (with a high enough confidence score) to be added to articles.
- What happens if a user rejects images for not being relevant? Do we update the options for the next user or remove the recommendation for improvement? Also how are we capturing this information for the algorithm so that it doesn’t offer the same image recommendations the following month (assuming an image hasn’t been added to the page in the last month)?
- Does the POC include the requirement of "List Image Recommendations for a Given Article"?
Storage
edit- Will the task API use Elastic search as a backend or other storage (MySQL, Cassandra etc)
- What storage are we using for the ETL pipeline
- What are the performance requirements?