Flow/Architecture/Internals

data models edit

  • Flow data models manage their own state, there are no setter methods. For example to generate a new revision that is a reply to specific post you call $post->reply( ... ) rather than new'ing up your own object and setting its state externally.
  • Model the problem domain, and do not directly interact with the database beyond the ability to convert between database rows and domain objects
  • Simplifies tests, domain state can be created simply in tests without touching any global state. ex:
    $title = Title::newFromText( 'Talk:Some Page' );
    $user = User::newFromName( '127.0.0.1', false );
    $workflow = Flow\Model\Workflow::create( 'topic', $user, $title );
    $rootPost = Flow\Model\PostRevision::create( $workflow, 'content of topic title' );
    $firstReply = $rootPost->reply( $user, 'content of reply' );
    $editedTitle = $rootPost->newNextRevision( $user, 'replacement topic title', 'edit-title' );

code resources edit

  • container.php provides resources.
    • container.php is configuration, includes/Container.php is implementation class
    • centralizes all access to global variables into one file instead of spread across the application
      • Global variables are provided to objects via constructor or setter injection
    • centralizes service object(as opposed to domain models) instantiation, providing one place to look while refactoring
  • uses third-party Pimple to provide lazily-evaluated closures
  • The container should not be accessed statically, but is in a variety of places within Flow. We need to evaluate and remove these where possible.

actions edit

Likewise, top-level FlowActions.php, then includes/Actions.php implementation

This configures how different actions work, e.g.

  • moderation actions get logged in Special:Log (log_type)
  • nearly everything gets written to RecentChanges (rc_insert)

permissions: if there's nothing there, it's blocked altogether

  • currently no support for moderation of header

Links (for queries) and actions (for writes), are outputs. The configuration says "If the change type of a revision is this, then the API output will include URLs for the specified actions. E.g. response to create-header, outputs these links. Even though e.g. 'board-history' isn't used, the page chrome adds the ?action

example: create-header edit

API post parameters edit

I edited the header of a new board (not on www.mediawiki.org), adding some links. After I solved the captcha, the POST was:

action=flow
ehcontent=Second try: Creating a header just to see what the API hands back. Links to User:spage, [[phabricator:TExpression error: Unrecognized word "nnnn".|bug NNNN]]
format=json
page=User_talk:JunkTest2
submodule=edit-header
token=34d7a48e72f868bcd5a6f98372796be6+\
wpCaptchaId=NNN
wpCaptchaWord=SEKRIT

API response edit

The response body, reformatted, for this create-header action is below. Note how it has the URLs for links and actions that FlowActions.php specifies. The front-end uses (some of) these to create links and buttons in the UI.

{
    "warnings": {
        "main": {
            "*": "Unrecognized parameters: 'wpCaptchaWord', 'wpCaptchaId'"
        }
    },
    "flow": {
        "edit-header": {
            "result": {
                "header": {
                    "type": "header",
                    "editToken": "34d7a48e72f868bcd5a6f98372796be6+\\",
                    "revision": {
                        "workflowId": "s3s2cl1o4eivrhxc",
                        "revisionId": "s3s2cl29y4lyi3y8",
                        "timestamp": "20141008024510",
                        "timestamp_readable": "02:45, 8 October 2014",
                        "changeType": "create-header",
                        "dateFormats": [],
                        "properties": [],
                        "isModerated": false,
                        "links": {
                            "board-history": {
                                "url": "/w/index.php?title=User_talk:JunkTest2&action=history",
                                "title": "hist"
                            },
                            "workflow": {
                                "url": "/w/index.php?title=User_talk:JunkTest2&workflow=s3s2cl1o4eivrhxc",
                                "title": "workflow"
                            },
                            "header-revision": {
                                "url": "/w/index.php?title=User_talk:JunkTest2&header_revId=s3s2cl29y4lyi3y8&action=view-header",
                                "title": "header revision"
                            }
                        },
                        "actions": {
                            "edit": {
                                "url": "/w/index.php?title=User_talk:JunkTest2&action=edit-header",
                                "title": "Edit header"
                            }
                        },
                        "size": {
                            "old": 0,
                            "new": 1104
                        },
                        "author": {
                            "name": "Spage2",
                            "wiki": "testwiki",
                            "gender": "unknown",
                            "links": {
                                "contribs": {
                                    "url": "/wiki/Special:Contributions/Spage2",
                                    "title": "Contributions/Spage2",
                                    "exists": true
                                },
                                "userpage": {
                                    "url": "/wiki/User:Spage2",
                                    "title": "Spage2",
                                    "exists": true
                                },
                                "talk": {
                                    "url": "/wiki/User_talk:Spage2",
                                    "title": "User talk:Spage2",
                                    "exists": true
                                }
                            },
                            "id": 202
                        },
                        "previousRevisionId": null,
                        "content": {
                            "content": "HTML, SEE BELOW",
                            "format": "html"
                        }
                    },
                    "submitted": {
                        "prev_revision": null,
                        "content": "Second try: Creating a header just to see what the API hands back. Links to [[User:spage]], {{Bug|NNNN}}",
                        "token": "34d7a48e72f868bcd5a6f98372796be6+\\"
                    },
                    "errors": []
                }
            },
            "status": "ok",
            "workflow": "s3s2cl1o4eivrhxc"
        }
    }
}

the content key's HTML, reformatted, is:

<p data-parsoid='{"dsr":[0,104,0,0]}'>Second try: Creating a header just to see what the API hands back. Links to 
    <a href="/wiki/User:Spage" title="User:Spage" rel="mw:WikiLink"
        data-parsoid='{"stx":"simple","a":{"href":"./User:Spage"},"sa":{"href":"User:spage"},
        "dsr":[76,90,2,2]}'>User:spage</a>, 
    <a rel="mw:ExtLink" href="https://bugzilla.wikimedia.org/show_bug.cgi?id=NNNN"
       about="#mwt1" typeof="mw:Transclusion"
       data-mw='{"parts":[{"template":{"target":{"wt":"Bug","href":"./Template:Bug"},"params":{"1":{"wt":"NNNN"}},"i":0}}]}'
       data-parsoid='{"stx":"piped","a":{"href":"https://bugzilla.wikimedia.org/show_bug.cgi?id=NNNN"},"sa":{"href":"bugzilla:NNNN"},"isIW":true,"dsr":[92,104,null,null],"pi":[[{"k":"1","spc":["","","",""]}]]}'>
        bug NNNN
    </a>
    <span style="display:none" about="#mwt1" data-parsoid='{"stx":"html"}'>
        <a rel="mw:ExtLink" href="http://bugzilla.wikimedia.org/show_bug.cgi?id=NNNN"
           data-parsoid='{"stx":"url","a":{"href":"http://bugzilla.wikimedia.org/show_bug.cgi?id=NNNN"},"sa":{"href":"API hands back. Links to [[User:spage]], {{Bug|NNN"}}'>
        http://bugzilla.wikimedia.org/show_bug.cgi?id=NNNN
    </a>
    </span>
</p>

and this is the HTML that the front-end code inserts into the page. Note that the data-parsoid attribute from parsoid can change at any time, while data-mw contains stable repeatable data. The data-parsoid attribute should generally be ignored.

Caching edit

There are multiple layers of caching:

  • Varnish caches full HTML pages of boards and topics for anonymous users. These are purged whenever actions are performed against a related board or topic.
  • Flow extends a BagOStuff implementation to remember the keys that have previously been requested, this means requesting the cache key a second time does not incur a network roundtrip.
  • Flow additionally extends that BagOStuff implementation to offer transactional-like writes to the cache. Specifically:
    • When inside a transaction all write commands to the BagOStuff are deferred by appending to an array
    • When the transaction completes successfully, all attempted write commands are flushed to memcache
    • If any of those write commands fail, all cache keys that have already been written out already are deleted so that they properly repopulate on read
    • If the transaction does not complete the deferred commands are discarded
  • Flow has a concept of Indexes stored within a BagOStuff. This is all driven by the code found in the <code>Flow\Data</code> namespace.
    • Indexes receive individual database rows, each Index instance buckets those database rows based on a set of column names provided in its constructor
    • Individual buckets are maintained through BagOStuff::merge operations which retrieve the current cache value of the matching bucket and insert/remove the provided row before writing the data back out.
    • Indexes are updated in-process after write operations to keep in consistent state
  • Two primary kinds of Index, UniqueIndex and TopKIndex
    • There is one UniqueIndex per domain model, it is used to provide a direct lookup from the domain models primary key to the database row representing that model.
    • TopKIndex buckets database rows by a set of columns provided in the constructor, Is typically used to answer queries like 'First 100 posts on board X'.
      • TopKIndex utilizes the <code>Flow\Data\Compactor</code> interface. Using this interface a TopKIndex can be backed by a UniqueIndex. The result of this is that a single bucket of the TopKIndex will hold only a list of primary keys. At query time that list of primary keys is resolved into the original database rows by querying the related UniqueIndex.
      • This is done to ensure consistency, each domain model has a single representation within the cache. Other parts of the caches just point to this single representation.
  • There are additionally some special case Indexes extending from TopKIndex

All in includes/data/ ObjectManager drives it.

From a list of queries

  • determine cache keys
  • if it finds all the keys, then it returns
  • otherwise goes to "backing store", the database.

The posts that are output map to memcached keys according to index classes, e.g. includes/Data/Index/FeatureIndex.php

ObjectManager objects implement either Find or FindMulti

rowCompactor removes keys you don't need.

Tries to reduce the amount of joins so that we can eventually shard the data.

Example edit

A topic list query

 includes/Block/TopicList.php

In general, split into two parts, query and formatter. TopicList also has a paginator deciding what the list is.

  • One memcached key for topics in the page
  • trim to a slice of 10 of these

Talk:Flow has a workflow identifier, each topic has a workflow identifier.

  • Does a MultiGet to get those 10 topics

FormatterRow holds all the information,

may turn out that stuff can't be viewed by user, RevisionFormatter will remove those. Then the pager will

To debug what is cached edit

purge.php script does the query, but replaces from Hash bag of stuff to memcache bagOfStuff, forcing a repopulation, and then you can examine the keys.


Future edit

Store entire topics, but then have to filter out moderated stuff the current user isn't supposed to see.

Listeners edit

Updates of links, IRC updates, etc. are handled by listeners on

Parsoid directory edit

Extractors find references in the Parsoid output.

Core doesn't know about parsoid output, so we have to get stuff (links, etc.) out of Parsoid and hand it to core. Eventually our code should be useful to Parsoid extension