Topic on Talk:EventStreams/Flow

Server Side Filtering

5
Ottomata (talkcontribs)

At launch, EventStreams does not support any server side filtering. RCStream supported wiki project server side filtering, so you could consume RecentChanges from the specific wikis you were interested in.

Is server side filtering, by wiki or arbitrary event field, useful to you?

See also: T152731

Ricordisamoa (talkcontribs)

Yes. If you plan to deprecate RCStream, you should at least offer server-side filtering by wiki (single, multiple, wildcard etc.)

Ottomata (talkcontribs)

Ricordisamoa, we could do this, but it is a little more difficult in EventStreams than RCStream, because the service (intentionally) doesn't know anything about the structure of the events it is serving. So, if we were to offer server side filtering, we'd need to do it in a generic way, so that pretty much any event field could be used for filtering.

We'd like to hear about your (and other) use cases where a simple client side filter isn't good enough. E.g.

eventSource.onmessage = function(event) {
    // event.data will be a JSON string containing the message event.
    var event = JSON.parse(event.data));
    // only print eswiki events
    if (event.wiki == 'eswiki') {
        console.log(event);
    }
};

Before we embark on implementing arbitrary event field server side filtering (which will open us up to API bikeshedding, feature set bikeshedding, and possible server side scaling implications), we'd like to know for sure if server side filtering is really needed. Maybe there are tools out there that really really only want to know about very small wikis. In that case, without server side filtering, they'll have to consume way more data than they use. If something like that is really common, we'll prioritize all those bikeshed discussions.

Ricordisamoa (talkcontribs)

Of course there is nothing a server-side filter can do that can't be done via a client-side filter. However, since most edits are made by Wikidata bots, even a client analyzing human edits in not-so-small wikis is going to consume a lot of useless data. You may want to cap filters to prevent abuse (e.g. give me all non-bot edits in odd namespaces on English sites whose summary contains "new section" etc.)

Xqt (talkcontribs)

The main problem with client side filtering is the big number of events to serve. I've tested that in past with rcstream and the client scripts got growing unserved events in short time and I had to discard that idea to catch them all. Now I make a long time test with EventStreams an it seems working without any exceeding event buffer. Great work and thanks for it.

Reply to "Server Side Filtering"