User:Daniel Kinzler (WMDE)/MCR slot inherited

Phab ticket: https://phabricator.wikimedia.org/T189004

(converted from a mail thread)

DK: Before I respond to you below, I'd like to point out that my original schema draft included the option to have slot_origin instead of slot_inherited. slot_origin would record the ID of the revision that updated the slot.

BJ: On https://www.mediawiki.org/w/index.php?oldid=2558527 you had content_origin, not slot_origin, although I see the latter was mentioned in an aside. I can't find anyplace where slot_origin was actually discussed, e.g. https://www.mediawiki.org/wiki/Topic:Txykwf641o3wysgi only discussed content_origin versus slot_inherited.
DK: Ah yes, you are right - since the revision of origin would be the same for all slots that re-use the content, I saw reason to repeat this value in the slots table for each revision.
We could perhaps have slot_inherited *and* content_origin.

DK: From your comments I'm getting the impression that it would be wise to drop slot_inherited, and introduce slot_origin instead. The equivalent slot_inherited would be ( slot_origin != slot_revision ). I liked this option better than slot_inherited myself, but let myself be dissuaded over the course of the schema discussion. We can still do this schema change easily now - if we want it, we should do it asap.

BJ: That would make the semantics clearer. I support this idea, where I didn't like content_origin.
DK: Is that because conceptually, content rows could be shared across pages, so nothing page-specific should be in that table? I can see that, though I think pragmatically, we won't ever do that. If we want de-duplication of data, we'll do it on the blob store level.
BJ: Not entirely. For the only use case I can think of where content_origin would actually be useful, we'd have to be sure to reuse the row even for manual reverts.
DK: At the moment, I favor slot_inherited + content_origin, simply because it's the easiest schema change we could make at this point. But we could also drop slot_inherited in favor of content_origin, or slot_origin. Or slot_origin + slot_inherited? No, that seems redundant.
How do we make the call, then?
BJ: Any schema change we make at this point is easy. And turning slot_inherited into slot_origin is probably easier.
content_origin can't replace slot_inherited. "show revisions that changed this slot" would ignore the undo, or else we'd never be able to reuse content rows for undos. As I said in https://www.mediawiki.org/wiki/Topic:Txykwf641o3wysgi the first time we discussed it.
BJ: Although, "slot_origin" might be slightly confusing in the case of a revert. If someone makes revision 1, then revision 2, then reverts revision 2 to create revision 3, the "slot_origin" could be thought to be revision 1.
DK: content_origin should be 1, in that case. No confusion there. A manual revert or undo would create a new content row with content_origin = 3. No harm done there either. Same as now.
BJ: A content_origin field doesn't solve the problems in this email thread, except maybe by getting rid of the use case of slot_inherited entirely.
BJ: Maybe "slot_inherited_from" would be a clearer name? Although that would imply that it should be null when there's no inheriting, rather than being the same as slot_revision. I don't know.

BJ: Particularly in light of T185167 <https://phabricator.wikimedia.org/T185167>, where it turns out some things get confused if rev_parent_id doesn't point to the directly previous revision due to deletions and undeletions.

The intended semantics is "when this revision was created, the user changed this slot".
The intended use case is filtering the page history: "show only revisions that touched the foo slot".
If a revision that touched foo was deleted or suppressed, the foo slot may changed between two consecutive revisions that both did not touch the foo slot. That's a bit unfortunate, but I don't think it's a major problem. the fact that the slot changed is still easy to detect, since the content id is different.
BJ: Except that it won't be detected by the straightforward "show only revisions that touched the foo slot" filter. Wiki users may consider that a bug, once things are rolled out.
DK: The UI needs to make this clear: the filter shows edits that touched the slot.
Having a change attributed to an edit that did not actually make that change is bad. We have this problem already with deleted and suppressed revisions. Ideally, we'd show "X intermediate reivions not shown" in the diff view.
BJ: Actual deleted revisions, yes. The revision-deletion feature doesn't have that issue. (I'm not sure whether you were including revision-deletion in "deleted and suppressed revisions" or not)
DK: Ah, right, be cause the revisions are still visible in the history. Just their content (and/or other bits) are not.
Yet another reason to get rid of the "traditional" way to delete revisions.

BJ: Say you have revision 1, then revision 2 that changes all slots, then you make revision 3 that inherits slots. Then you delete revision 2 (== delete the page then undelete 1 and 3 but not 2).[1] Does it make sense that revision 3 has slot_inherited true when the revision it inherited from has disappeared and the previous visible revision has different content for those slots?

DK: Yes. The user who made the revision did not touch the slot, so it's inherited.
But you have a point - the problem here is that revision 3 will still *show* content from the deleted revision. It's not only marked as inheriting the content, it actually still does have that content.
This may be a problem... on the other hand, we'd generally expect a revision deletion to be preceded by a rollback or undo, which would cover all slots and solve the problem.
BJ: That's not any more of a problem than it is in the current, non-MCR case.
DK: It's just more of the same problem. I'd prefer to work on getting rid of the problem, not adding to it.
BJ: Note that deleting a revision is not the same thing as "revision deletion", which would perhaps be better termed "revision info hiding".
DK: I to me, deleting a revision is "revision deletion", which is different from "suppression", which is the same as "oversight". Revision deletion uses the archive table, suppression uses the rev_deleted (aka visibility). Is this not how these terms are used on enwiki? Which terms shall we use to avoid confusion?
BJ: I have no idea what good terms might be. I tend to use "deleting a revision" for actually deleting a revision, and "RevDel" for using rev_deleted.
"oversight" may be confused with the old Extension:Oversight, which had a "hidden" table that worked like the archive table. "suppression" may specifically refer to using RevDel with the DELETED_RESTRICTED flag, rather than any use of RevDel.
DK: Yea, me neither. I just find it very confusing to use "deleted revision" and "revision deletion" for two different things.

BJ: Or delete both 1 and 2, so the only revision left is 3. Does it make sense that revision 3 has slot_inherited true when there is no previous revision visible?

DK: Yes, for the same reason: slot_inherited is basically "slot not edited by this revision". It's really a statement about the user's actions.

BJ: Or say you have revision 10 live and revision 11 has been deleted. Then you make revision 12 that inherits slots from revision 10. And then you undelete 11, does it become confusing that revision 12 says "inherited" when the previous revision has different content for the slot?

DK: The semantics of slot_inherited are still clear, as per the above.
But you raise a good point - what's the expected behavior? When you undelete 11 which changed slot foo, and 12 inherits slot foo - would the user expect revision 11 to now have 11's content in slot foo? That seems intuitive, but it would mean undeleting a single old revision may change the visible content of all subsequent revisions. Which could also be considered surprising.
BJ: The user might at least expect revision 12 to now show up in "show only revisions that touched the foo slot", since the content of the slot now differs from that of the previous revision.
DK: I think this behavior creates more problems than it solves. It's certainly not the intended semantics. And it's highly problematic for legal reasons (attribution).

BJ: [1]: And if you really want to break them apart, move the page with revisions 1 and 3 to a different title, which leaves revision 2 behind at the old title. Then undelete revision 2, it'll be assigned a different page ID.

DK: The inherited slot will still know the original content ID, and use that content.
BJ: Yes, obviously. You missed the point, that we could have revision 3 "inheriting" a slot with no indication as to where exactly it inherited it from.
DK: Oh, because rev_parent will be "repaired"? That's indeed a problem. The only way to fix it would be to use slot_origin.