Topic on User talk:Daniel Kinzler (WMDE)/MCR-PageUpdater

Breakdown

7 comments • 14:59, 8 June 2018 6 years ago

7

Anomie (talkcontribs)

saveRevision(RevisionSlotsUpdate) aka doEditContent() to create update a page by creating a new revision

The method would need more than RevisionSlotsUpdate, at least as it was last I looked. User, edit summary, etc. It might make most sense to put that data into the same object, in which case "PageEditData" might make more sense as a name than "RevisionSlotsUpdate".

(or not, in case of a null-edit).

That has a separate entry point.

we may also want a version of saveRevision that directly takes a RevisionRecord, leaving PST to the caller.

That would follow the same path as the "Null rev" dark-green arrow. It could potentially be exactly the same as that path, depending on the exposed interface.

saveDummyRevision() aka null revision: like saveRevision(), but with no (changed) content.

Not really "like saveRevision()", but as noted above might be the same as the version of saveRevision() that takes a RevisionRecord.

renderRevision() takes a (saved?) RevisionRecord and returns output for it. Similar to preview(). Used for plain views, history views, diff views, etc.

I see no reason to require it be saved. Maybe we'll discover one later, but until then there seems to be no point in restricting it.

Either way it might also be used with a RevisionArchiveRecord.

purge() re-generate any derived data. Re-parsing is optional.

If we go by the diagram, you'd need to get a ParserOutput. That might be able to be taken from ParserCache, I suppose, although the common on-wiki use case I'm aware of is usually specifically refreshing the ParserCache data.

check user permissions, tokens, limits, etc

Tokens are a very UI-level thing, often handled by HtmlForm or ApiMain. Permissions and (probably) limits are more directly related and reusable.

construct an unsaved RevisionRecord from RevisionSlotsUpdate, while apply PST and slot inheritance (and making use of cached data). This component should also provide access to rendered output for the full revision (as well as individual slots). Such output should be created on-demand, and should be cached aggressively.

No, providing the rendered output should be a separate component. In the diagram, "PST, make rev" is the first sentence, and "Render Revision" is the second sentence.

This separation is mainly because various entry points need the latter without the former.

write secondary data to the database.

That's not really a "functionality". The functionality is taking the rendering (i.e. the ParserOutputs) to collect the updates/jobs and scheduling them, while the actual writing of the data is the responsibility of each individual update/job.

Reply 17:04, 31 May 2018 6 years ago

Duesentrieb (talkcontribs)

It might make most sense to put that data into the same object, in which case "PageEditData" might make more sense as a name than "RevisionSlotsUpdate".

It's a possibility, but gluing these things together feels bad to me. While I see your point of having an object that basically represents the data sent by the user when performing an edit, I'd still want a value object that just represents the changed slots.

(or not, in case of a null-edit). That has a separate entry point.

Null revisions have their own entry point, null edits don't. And I'd like to handle null revisions in the same way, by introducing EDIT_FORCE_SAVE or something.

[saveRevision that directly takes a RevisionRecord] would follow the same path as the "Null rev" dark-green arrow. It could potentially be exactly the same as that path, depending on the exposed interface.

Yes, that's the idea. And the path for creating a revision based on a RevisionSlotsUpdate would feed into that path., as would saveDummyRevision().

No, providing the rendered output should be a separate component. In the diagram, "PST, make rev" is the first sentence, and "Render Revision" is the second sentence.

The actual rendering will be done elsewhere, via RevisionRenderer. Constructing a RenderedRevision is pretty light weight. It's needed as the same time as creating PST content (for feeding edit filter hooks), it needs the PST content as input, and it can be taken from the stashed edit, along with the PST output.

So, for practical reasons, I'd keep them together for now. Though I do agree with you that conceptually, there is no need to do that. I just see no practical way to disentangle them without looking up the stash twice, of changing how stashed data is fed to the factory thingy.

This separation is mainly because various entry points need the latter without the former.

Entry points that start with a RevisionRecord would just use a RevisionRenderer directly. That doesn't mean the revision factory can't also use a RevisionRenderer.

That's not really a "functionality". The functionality is taking the rendering (i.e. the ParserOutputs) to collect the updates/jobs and scheduling them, while the actual writing of the data is the responsibility of each individual update/job.

Well, the actual' writing of the data is done by the HDD controller on the DB server. The functionality is whatever the interface contract tells the calelr will happen. If it's implemented directly or passed through five layers of abstraction isn't really relevant. Anyway, I have changed the wording a bit.

Reply 16:21, 4 June 2018 6 years ago

Anomie (talkcontribs)

Null revisions have their own entry point, null edits don't. And I'd like to handle null revisions in the same way, by introducing EDIT_FORCE_SAVE or something.

The effect of a null edit is what's labeled in the diagram as the light green Purge path. Although actual processing is more like following Black/Grey or Red and finding out at the "Revision" oval that the new revision's content is identical to the current revision's, so instead of processing that new revision further it submits the current revision via light-green and returns.

Null revisions shouldn't go in via the Black/Grey or Red paths at all. Although once you get to the "Revision" oval on the Black or Red paths the subsequent processing is identical to the Dark Green path. As you noted just after the bit I quoted here. ;)

Constructing a RenderedRevision is pretty light weight.

I suppose that's true if you make the combined ParserOutput lazy-loaded too. In that case we'd need to be sure that gets constructed before getting to the "Save to Stash" step, because the whole point of the stash is to pre-cache the expensive parsing needed to get the combined ParserOutput for the secondary updates.

It's needed as the same time as creating PST content (for feeding edit filter hooks)

It may be needed for some edit filter hooks, sure. Those don't necessarily need to run from the same "module" that creates the PST content though.

The diagram branches, but code flow will be linear. Code flow for the Black and Red paths might be something like → PST → Make RevisionRecord → Make RenderedRevision → Call edit filters → Save revision → Update RenderedRevision → Schedule secondary updates. For the Grey path we'd have → Get RevisionRecord and RenderedRevision from stash → Call edit filters → Save revision → Update RenderedRevision → Schedule secondary updates. For Dark-Green we get the RevisionRecord as input → Make RenderedRevision → Call edit filters → Save revision → Update RenderedRevision → Schedule secondary updates.

Then, too, code flows for the Yellow and Purple (Magenta) paths probably skip the "Call edit filters" bit. Or maybe we have more than one edit filter hook, some that get called always and some that only get called before saving. Currently we have parser hooks for the former, but MCR is making "render" different from "parse".

And don't forget the null edit flow: → PST → Make RevisionRecord → Find out it's the same content as the current revision → Throw away the new revision and send the current revision via the light-green path instead.

Reply 19:00, 6 June 2018 6 years ago

Duesentrieb (talkcontribs)

Null revisions shouldn't go in via the Black/Grey or Red paths at all. Although once you get to the "Revision" oval on the Black or Red paths the subsequent processing is identical to the Dark Green path. As you noted just after the bit I quoted here. ;)

I'd like to make null (dummy) Rev a special case of the red path. What's unclear to me right now is whether the creation of a dummy revision should trigger a re-parse and other updates. In most cases, that is redundant. Unless the content depends on the revision ID.

I suppose [Constructing a RenderedRevision is pretty light weight] is true if you make the combined ParserOutput lazy-loaded too.

That is indeed the idea.

In that case we'd need to be sure that gets constructed before getting to the "Save to Stash" step,

Sure, the stash code would just call getRevisionParserOutput(), which creates it if it's not yet there.

Then, too, code flows for the Yellow and Purple (Magenta) paths probably skip the "Call edit filters" bit.

I suppose we can figure that out when factoring EditControlelr out of EditPage.

Currently we have parser hooks for the former, but MCR is making "render" different from "parse".

Well, "parse" has been misleading sine the introduction of ContentHandler six years ago...

Reply 15:28, 7 June 2018 6 years ago

Anomie (talkcontribs)

I'd like to make null (dummy) Rev a special case of the red path.

Although the Red path begins with data to be passed to EditRevisionFactory, while currently null revisions begin with a RevisionRecord from RevisionStore->newNullRevision().

What's unclear to me right now is whether the creation of a dummy revision should trigger a re-parse and other updates. In most cases, that is redundant. Unless the content depends on the revision ID.

Or the revision timestamp, or the revision user. Possibly other things depending on the reason for the dummy revision, e.g. the title if it was a move or the protection settings (e.g. via {{PROTECTIONLEVEL}}) if it was a protection or unprotection.

I suspect that if making the dummy revision doesn't trigger the re-parse and other updates then most of the calling code will have to do it anyway.

Reply 14:59, 8 June 2018 6 years ago

Tgr (WMF) (talkcontribs)

Purge doesn't actually regenerate most derived data, it just updates the HTML caches (assuming we are talking about action=purge and not e.g. null edits). The current implementation doesn't really use any of the code paths discussed here, it just clears caches and leaves it to the next view to re-render.

Reply 00:19, 1 June 2018 6 years ago

Anomie (talkcontribs)

API action=purge can reparse. Null edits are a bit of a strange case, since they go through the whole edit pathway and effectively convert to a purge in the middle.

Reply 19:34, 1 June 2018 6 years ago

Reply to "Breakdown"