Talk:Content Transform Team

Latest comment: 8 months ago by MSantos (WMF) in topic Mobile Content Service Decommission

Mobile Content Service Decommission edit

Thank you for the early announcement that the Mobile Content Service (MCS) is going away in July 2023. Currently I am using this API on an educational mobile app that allows users to learn and research history events linking to more detailed Wikipedia articles. To better understand what the alternatives to continue this service are, I would need your help and I will start with a couple of questions.


- Is the Page Content Service (PCS) be supported on the longer run? The current (MCS) returns the page divided into sections, in a json format, and I was wondering if such a feature exists or could be added to the PCS.

- The Parsoid API page (Parsoid/API) has a notice that the data can only be accessed through the RESTBase's REST API. Is this the PCS data-parsoid endpoint?

- I tried to access the data-parsoid data to see the format but I only get a 404 "Not found." error. Is there a way to get some help on this topic?


Thank you and looking forward to any update on the subject! Alexandru Ene (talk) 10:15, 24 March 2023 (UTC)Reply

Thank you @Alexandru Ene for the questions and exploration. And thank you for helping people learn and research history events linking to more detailed Wikipedia articles!
We do anticipate continued support of PCS; it may be subject to fairly rapid changes, though, and it presumes its clients are native Wikipedia for Android and Wikipedia for iOS. Although we try to compose the software in a way that decouples presentation from logic, the level of coupling in practice for the user experience can result in glitchiness if, for example, there are assumptions about client-side operations on the content (e.g., native webview-mediated JavaScript execution on the content that is material to the fundamental presentation and interaction with the content).
We're not presently planning to replicate the JSON-based section content approach into PCS. However, there are <section> tags in the PCS /page/mobile-html/{title} route's output (as well as Parsoid upon which PCS depends, cf. /page/html/{title} route).
It is possible, although not in the immediate plans, that we may desire to consolidate the approach on base HTML for the apps that's closer to the web platform on mobile web or desktop web - for example, via Parsoid HTML directly, as there has been some technical exploration of this. What this would mean is that we would move some pieces from PCS to Parsoid and other pieces strictly to the native app clients. But as I say, it's only a possibility, and not in the immediate plans, as it entails a number of complicated changes to do such a thing with a bunch of tradeoffs for software delivery across teams. We would communicate any such plan of a migration similarly to how we're communicating the plan with MCS if we were going to go with such an approach.
I believe the data-parsoid thing you're looking for is something like this if you were using curl. This is part of the mainline Parsoid implementation, not PCS, and it's considered purely internal (so prone to breakage) - so, if it's just section data you were looking for the <section> tags may be simpler and not require an extra request, just a DOM query selector.
curl 'https://en.wikipedia.org/api/rest_v1/page/data-parsoid/Coffee/1146044596/589a30a0-c8b1-11ed-b921-03c7a1ebbabc' -H 'accept: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/data-parsoid/2.1.0'
Notice how the URL contains a revision ID and a timestamping ID separated by a slash. As an example of getting this info (I'm using a HEAD request via the -I flag here, not a GET with an ommitted -I, just for illustrative purposes) - see this:
$ curl -s -I https://en.wikipedia.org/api/rest_v1/page/html/Coffee | grep etag:
etag: W/"1146044596/589a30a0-c8b1-11ed-b921-03c7a1ebbabc"
ABaso (WMF) (talk) 11:17, 24 March 2023 (UTC)Reply
Thank you @ABaso (WMF) for your detailed response. I only got to work on this change recently, but you helped a lot by pointing me in the right technical direction. I will transition to the PCS endpoint and I appreciate the information you offered also about possible future changes. Alexandru Ene (talk) 09:33, 24 June 2023 (UTC)Reply
Hello @ABaso (WMF)
I have a problem starting with today, as the mobile-html endpoints are not working anymore, even if they were not supposed to be decommissioned.
https://en.wikipedia.org/api/rest_v1/page/mobile-html/July_6 should return a valid response, only the /page/mobile-sections pattern was announced to go away mid July.
Please revert the change until a solution to migrate users is possible. Currently, all users are affected by this change and they cannot use the educational application. Alexandru Ene (talk) 17:16, 6 July 2023 (UTC)Reply
Thanks @Alexandru Ene, the problem has fixed and is documented here https://phabricator.wikimedia.org/T341248#8995142. Please let me know if you have any other questions or concerns. MSantos (WMF) (talk) 17:29, 6 July 2023 (UTC)Reply
Thank you MSantos (WMF) for the quick reply and even faster fix! I am happy to hear that it was a known issue and it was only something temporary! Alexandru Ene (talk) 19:36, 6 July 2023 (UTC)Reply

JSON Endpoint edit

My site used the mobile-section endpoint for providing popup/popin links of WP articles; the JSON was invaluable because it is so painful & inefficient to parse the HTML version in mobile-html, whereas with the JSON we could just call the relevant section and reformat trivially - which worked much better than our prior popup approach using code>mobile-html. Today we realized that the 403 errors that had been happening were not a transient error (as the 403 error code & error page indicated) but actually were the API endpoint being removed entirely. We can't seem to figure out where we are supposed to get JSON from now, and the documentation is extremely confusing. (I really struggle to parse all the unfamiliar jargon in what seem to be the relevant pages like the wikitech-l announcement.)

You say that

We're not presently planning to replicate the JSON-based section content approach into PCS.

Does this mean that there are no JSON endpoints anymore, and never will be? If there are not, what's the best way? --Gwern (talk) 00:26, 8 July 2023 (UTC)Reply

Hello @Gwern, thanks for reaching out and sorry for the late response, it has been a very hectic couple of months.
Although you already reached out to the Phab task and got some responses too, I'll answer it here for posterity.
We are in the middle of a big effort to sunset RESTBase, the underlying proxy between all services. It's a multi-year effort with multiple teams and work-streams. I'm saying that because this project was the final push to deprecate MCS and remove it from our infrastructure, a plan we had before this project started.
Moving forward this committed work will be prioritised and the Content Transform Team won't plan any new APIs or endpoints (JSON or not) unless it overlaps with the work for the Parser Unification (not related with the scope of interest of this topic).
That being said, Yiannis Giannelos made a comment on possible replacements you could be look into at https://phabricator.wikimedia.org/T328036#9000177
I hope that helps and thanks for you patience. Let me know if you have any other questions. MSantos (WMF) (talk) 14:23, 21 August 2023 (UTC)Reply
Return to "Content Transform Team" page.