Architecture meetings/RFC review 2013-12-04

Wednesday, December 4, 2013 at 10:00 PM UTC at #wikimedia-meetbot ^connect
Requests for Comment to review

Propose your own RFCs:
Requests for comment/Simplify thumbnail cache
Requests for comment/Structured logging
Requests for comment/Json Config pages in wiki (if it's in a stable enough state for discussion)
Summary and logs

Meeting summary

Meeting started by MaxSem at 22:01:28 UTC (full logs).
1. https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04 (TimStarling, 22:02:40)
RFC: Simplify thumbnail cache (TimStarling, 22:05:57)
1. https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache (TimStarling, 22:06:04)
2. https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes (paravoid, 22:19:35)
3. ACTION: AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage (TimStarling, 22:41:57)
4. option 5 generally favoured, possibly with modifications, we will proceed with design work on it (TimStarling, 22:43:57)
5. ACTION: bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer (TimStarling, 22:44:18)
RFC: Structured logging (TimStarling, 22:45:45)
1. https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging (TimStarling, 22:45:58)
2. ACTION: ori-l to expand RFC (TimStarling, 22:59:49)
3. https://github.com/mhart/gelf-stream (gwicke, 23:00:41)
4. JSON generally favoured as long as a plain text format can be also made available (TimStarling, 23:00:50)
5. transport selection based on URI-style destination string (TimStarling, 23:01:21)
Meeting ended at 23:05:10 UTC (full logs).
Action items

AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
ori-l to expand RFC
Action items, by person

AaronSchulz
1. AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
2. bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
bd808
1. bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
ori-l
1. ori-l to expand RFC
People present (lines said)

TimStarling (90)
paravoid (74)
gwicke (44)
AaronSchulz (44)
bd808 (42)
ori-l (36)
parent5446 (24)
aude (15)
RoanKattouw (8)
bawolff (6)
MaxSem (5)
subbu (3)
meetbot-wm (3)
Krinkle (2)
Generated by MeetBot 0.1.4.
Full log

Meeting logs
22:01:28 <MaxSem> #startmeeting
22:01:28 <meetbot-wm> Meeting started Wed Dec  4 22:01:28 2013 UTC.  The chair is MaxSem. Information about MeetBot at https://bugzilla.wikimedia.org/46377.
22:01:28 <meetbot-wm> Useful Commands: #action #agreed #help #info #idea #link #topic.
22:01:37 <MaxSem> #chair TimStarling
22:01:37 <meetbot-wm> Current chairs: MaxSem TimStarling
22:01:45 <parent5446> Ah there we go
22:02:02 <MaxSem> yay, I hacked a bot!:P
22:02:34 <TimStarling> ok, so there are 3 RFCs on the wiki page
22:02:40 <TimStarling> #link https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04
22:03:11 <TimStarling> do we have people here who want to talk about them, and are there any others that those present want to add?
22:03:29 <bd808> Ori would like to request that the logging rfc be "not first" as he is AFK until 22:30Z
22:04:03 * aude waves :)
22:04:21 <TimStarling> well, we have you and paravoid, we could talk about "Simplify thumbnail cache"
22:04:30 <paravoid> indeed
22:04:34 <paravoid> that's why I'm here :)
22:04:48 <TimStarling> ah, and there's the third author
22:04:49 <paravoid> and now AaronSchulz too :)
22:05:33 <bd808> Sounds good to me
22:05:56 <paravoid> so, bd808 since you proposed this for discussion (and wrote all the text :)), do you want to take point?
22:05:57 <TimStarling> #topic RFC: Simplify thumbnail cache
22:06:04 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache
22:06:32 <bd808> I mostly just collected notes from paravoid and AaronSchulz :)
22:06:46 <bd808> But sure.
22:06:55 <TimStarling> bd808: which is your preferred option?
22:07:03 <paravoid> is the problem statement clear enough to everyone?
22:07:14 <gwicke> pretty clear to me
22:07:15 <parent5446> So basically we want to get thumbnails off of Swift.
22:07:29 <bd808> And make purges easier
22:07:41 <aude> as long as we can still generate thumbnails of arbitrary size ( on cache miss), it seems fine
22:07:50 <aude> they don't have to be stored forever
22:08:23 <TimStarling> well, I don't think bd808 does want thumbnails off of swift, based on his talk page comments
22:08:39 <gwicke> is there an implementation of the purge pattern match already?
22:09:01 <RoanKattouw> The RFC text suggests that thumbs would move off of Swith
22:09:03 <RoanKattouw> *swift
22:09:09 <RoanKattouw> "3. Configure MediaWiki imagescalers to stop storing generated thumbnails in Swift"
22:09:21 <AaronSchulz> right
22:09:22 <aude> what exactly are the imagescalers (excuse my ignorance)
22:09:34 <bd808> gwicke: There is not an implementation yet, but AaronSchulz has been dying to start working on that
22:09:35 <AaronSchulz> RoanKattouw: I think some would stay for a while though
22:09:41 <aude> in this context*
22:09:51 <AaronSchulz> like media that supports pages and can have many thumbs for one file version
22:09:56 <bd808> I think that "most" would move off of swift.
22:09:59 <gwicke> I would imagine the idea is something like hashing different thumbs to the same cache entry, and then vary on the size?
22:10:03 <RoanKattouw> aude: They are Apache machines dedicated to image scaling
22:10:05 <paravoid> aude: mediawiki application servers that scale uploaded content to thumbnails per request
22:10:06 <AaronSchulz> if we don't use vcl_hash tricks on those, they will have to work the old fashioned way
22:10:11 <aude> RoanKattouw: paravoid thanks
22:10:15 <MaxSem> what about thumbs that are extremely slow to render?
22:10:16 <AaronSchulz> until they get refactored somehow or something
22:10:22 <parent5446> On the note of the imagescalers, do we know if they can handle that 5x increase in utilization?
22:10:25 <TimStarling> the problem is infinite growth of thumbnail storage
22:10:35 <RoanKattouw> HTTP request for nonexistent thumb comes in, thumb is generated locally, stored, HTTP response with thumb goes out
22:10:37 <TimStarling> MaxSem: the thing that is slow is the fetch of the original
22:10:44 <bd808> AaronSchulz has pointed out that some media types are very time consuming to extract thumbs from and should probably be retained in durable storage.
22:10:47 <AaronSchulz> MaxSem: we use ssds in varnish (not just memory cache)
22:10:51 <TimStarling> the actual image scaling part is pretty fast, and can easily be scaled up
22:11:17 <TimStarling> so that should answer parent5446's question also
22:11:23 <parent5446> Yep, thanks.
22:11:27 <AaronSchulz> we also have some simple "ping limiting" in place for thumb.php
22:11:33 <TimStarling> yes, we can absolutely scale 5x as many images, but we can't fetch the originals that fast
22:11:38 <aude> bd808: what about having some fixed size thumbnails for some stuff?
22:11:46 * aude thinking of gigantic tiffs and videos
22:11:48 <AaronSchulz> to avoid too much LRU churn or wasted I/O and CPU from trolling a bit
22:11:57 <bd808> The new MediaViewer feature has shown that generating everything on the fly may be slower than people are used to.
22:11:58 <aude> then stuff can be scaled from those?
22:11:58 <paravoid> TimStarling: why do you think so?
22:12:13 <AaronSchulz> bd808: new thumbnail sizes?
22:12:40 <paravoid> bd808: could you talk a little about that? I haven't heard anything and this sounds interesting
22:12:40 <AaronSchulz> we are also still replicated writes across a DC in a synchronous manner that I can't stand
22:12:43 <bd808> aude: That would be a possibility and actually the subject of https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes
22:12:43 <TimStarling> paravoid: why do I think we can't fetch originals that fast?
22:12:46 <AaronSchulz> *replicating
22:12:52 <aude> bd808: i know
22:12:56 <AaronSchulz> bd808: that doesn't help ;)
22:12:58 <TimStarling> paravoid:  I thought there was a comment from you to that effect
22:13:16 <AaronSchulz> aude: we do something like that with TMH
22:13:32 <AaronSchulz> if two different sized thumbs are requested for the same time position of a video
22:13:32 <bd808> MediaViewer asks for new thumb sizes
22:13:40 <AaronSchulz> a reference frame is used for scaling
22:13:40 <aude> it's contrary to allowing arbitrary sized, but maybe certain cases it makes sens to have special handling for some types of files
22:14:09 <aude> AaronSchulz: makes sense
22:14:11 <bd808> Perfomance is getting better with some changes made by the team, but initial testing was found to be 2-5 seconds for many thumbs to generate
22:14:29 <TimStarling> we can start the image scaling at parse time
22:14:39 <bd808> The changes they have made are basically to "bucket" thumb sizes
22:15:37 <bd808> I really like TimStarling's idea of adding a 3rd layer of varnish
22:15:45 <paravoid> I didn't understand the bucket thumb sizes part
22:16:17 <gwicke> we could also consider generating small thumbs from a smaller standard thumb size
22:16:32 <parent5446> "4. Store "standard" thumbnails permanently and others with TTL (and possibly last use updating)"
22:16:36 <parent5446> Also something worth considering
22:16:46 <gwicke> that would also help IO
22:17:01 <gwicke> and should be faster for video thumbs too
22:17:10 <paravoid> videoscaling is not part of this discussion
22:17:12 <aude> gwicke: essentially what i tried to say
22:17:22 <paravoid> or video thumbs
22:17:25 <bd808> paravoid: The original extension used the screen width of the browser to call for a thumb. Now they are using a series of sizes (histogram basically) and calling for the size closest to the screen width
22:17:25 <bawolff> I assume if we do the three layers of varnish thing, we would increase the max cache time?
22:17:29 <AaronSchulz> video thumbs already do that in TMH and indeed are not part of the rfc
22:17:45 <gwicke> k
22:18:15 <bd808> bawolff: I would guess that the 3rd layer would be backed by spinning disk and use LRU eviction based on sapce
22:18:22 <bd808> *space
22:18:37 <paravoid> and TTLs, and manual PURGEs
22:18:55 <paravoid> it does add some complexity, though.
22:18:56 <AaronSchulz> if you store standard sizes the URLs to purge are known (thus don't need swift)
22:19:11 <AaronSchulz> though changing the standard sizes would require running a one-off script
22:19:20 <paravoid> so, the standard sizes is a separate discussion
22:19:25 <paravoid> there is a separate RFC
22:19:29 <paravoid> it's very much related, though.
22:19:35 <paravoid> https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes
22:19:52 <AaronSchulz> and may be useful for the file types exempt from the bucketing
22:20:00 <TimStarling> bd808: it sounds like this MediaViewer extension needs some wider review
22:20:02 <gwicke> I was just considering it as an option for speeding up generation of non-standard thumbs
22:20:14 <gwicke> especially for speeding up the IO part of that
22:20:34 <bd808> TimStarling: I'm sure they would welcome feedback. Brion has been giving them some attention.
22:20:52 <paravoid> about the IO: storing millions of tiny files in spindles is very inefficient
22:20:56 <TimStarling> where will MediaViewer be used?
22:21:16 <bd808> It is currently deployed to all wikis I believe.
22:21:24 <paravoid> I don't expect Varnish to change that much, although it would change the fact that we are not going to store 3 copies
22:21:26 <TimStarling> on what pages is it activated?
22:21:36 <paravoid> TimStarling: it's part of the new "beta features" thing
22:21:37 <TimStarling> page views or image description pages?
22:21:41 <AaronSchulz> The main thing with this rfc is about having run-of-the-mill jpgs/pngs stored only in varnish and totally LRU and I wouldn't see much benefit to use reference thumbnails for that
22:21:46 <bawolff> Its hidden behind a preference
22:21:54 <paravoid> TimStarling: so you have to explicitly enable it as an experimental feature
22:22:02 <Krinkle> So it's deployed everywhere, but opt-in via beta preferences. It is exposed by clicking on an image thumb after enabling it.
22:22:04 <TimStarling> yes, and after you enable it, where does it appear?
22:22:08 <AaronSchulz> paravoid: that's how it always starts :)
22:22:10 <bd808> TimStarling: It's in the "beta features" program
22:22:18 <TimStarling> thanks Krinkle
22:22:22 <gwicke> AaronSchulz: if the IO portion of scaling can handle that, then that would certainly be simpler
22:22:29 <Krinkle> TimStarling: I had trouble discovering it as well, because we're trained to think that clicking a thumb opens the file page :)
22:22:56 <paravoid> gwicke: handle what?
22:23:02 <gwicke> AaronSchulz: but if IO becomes a bottleneck then reference thumbnails (even a single 1024x1024 bounding box one) could help a lot
22:23:07 <paravoid> sorry, lost in the subthreads of this discussion :)
22:23:29 <gwicke> paravoid: handle potential spikes in miss rates
22:23:57 <gwicke> in case varnish machines go down, there is a deploy issue or the like
22:24:29 <paravoid> so your preference seems to be alternative strategy (5), correct?
22:24:48 <bd808> Failure tolerance and (ab)use of vcl_hash I think are the big open questions with any of the schemes
22:25:03 <bd808> paravoid: personal I like (5) the best
22:25:18 <TimStarling> well, regarding vcl_hash, there is the secondary key feature mentioned
22:25:22 <paravoid> bd808: except the "implementing LRU in a Swift middleware" schemes
22:25:24 <gwicke> paravoid: a single thumb could also live in swift
22:25:32 <TimStarling> which might be "months" away, which doesn't sound so long to wait really
22:25:48 <paravoid> varnish 4.0 technology preview 1 got released... today
22:25:49 <gwicke> not sure that it would need to be LRUed
22:25:51 * AaronSchulz doesn't really get 5
22:25:57 <bawolff> Having 1 htcp packet purge everything sounds really nice
22:26:15 <paravoid> I haven't checked if it includes surrogate keys, though, and a deployment within the WMF is many months away indeed.
22:26:29 <gwicke> so close to 3) combined with the vcl_hash proposal
22:26:44 <TimStarling> AaronSchulz: 5 was my suggestion on the talk page
22:26:52 <bd808> LRU in swift is good but there was some question as to the performance of swift in deleting files
22:26:52 <bd808> I think you actually raised that paravoid ?
22:27:00 <TimStarling> AaronSchulz: follow the ref link
22:27:27 <paravoid> yes, as an open question, not as a known issue
22:27:57 <TimStarling> also, I am not sure if list traversal in a vcl_hash scheme is really worth worrying about
22:28:15 <TimStarling> there are two ways to look at the performance of it: throughput and latency
22:28:21 <paravoid> to be clear, we are excluding TIFF/PDF/Djvu from this discussion, correct?
22:29:07 <TimStarling> throughput: multiply the *mean* number of thumbnails per source by the time per link traversal
22:29:21 <AaronSchulz> paravoid: pretty much
22:29:23 <TimStarling> now, the mean is not large, you don't need to worry about djvu/pdf for that
22:29:37 <TimStarling> maybe for a djvu with 1000 pages it might take 1ms to traverse
22:29:45 <TimStarling> but that doesn't impact the throughput very much
22:29:53 <TimStarling> the other way to look at it is latenc
22:29:54 <TimStarling> y
22:30:03 <bawolff> Why exclude tiff. Tiff with many pahes are very rare
22:30:10 <paravoid> we have tons of of pdf/djvus with hundreds of pages * 5 thumbnails per page
22:30:10 <bawolff> *pages
22:30:18 <TimStarling> then you ask: what is the largest possible number of thumbnails on a given image and will that add user-visible latency to requests for that image?
22:30:34 <AaronSchulz> I think they could be added if it's fine on average, but they were to be excluded in first phases
22:30:46 <TimStarling> the limit there would be say 50ms of latency
22:30:54 <paravoid> that's an interesting approach, TimStarling
22:31:46 <TimStarling> I would expect linked list traversal in phk's style of C to be pretty fast
22:31:50 <gwicke> is there a need to have all entries end up on a single backend varnish?
22:31:56 <TimStarling> like, a lot less than a microsecond
22:31:59 <bd808> With the current application logic the upper bound is something like the width of the original image.
22:32:08 <gwicke> the purge requests are going to all varnishes I guess
22:32:12 <paravoid> I think it was mark who was mostly concerned about that, I don't have counterarguments.
22:32:47 <bd808> Actually width * number of pages I suppose. Do we vary on other dimensions?
22:32:49 <AaronSchulz> TimStarling: so 5 is just vcl_hash+second cache layer to deal with those eviction issues, OK
22:33:26 <TimStarling> AaronSchulz: yes
22:33:41 <paravoid> "second", but yes :)
22:33:50 <AaronSchulz> I was confused at first since I thought it was a complete alternative
22:33:55 <bawolff> Bd808: not normally. Svg has language, tiff has lossless/lossy
22:33:57 <paravoid> I'd say "additional, spindle-backed"
22:34:13 <AaronSchulz> miser! :p
22:34:39 <TimStarling> AaronSchulz: third cache layer, really
22:34:44 <gwicke> so can't the variants for a single thumb be distributed across several backends to limit request latency?
22:35:01 <paravoid> TimStarling: or fourth, for esams/ulsfo
22:35:08 <bd808> memory -> ssd -> disk -> scaler
22:35:09 <paravoid> let's stop counting cache layers, though :)
22:35:10 <TimStarling> yeah
22:35:30 <paravoid> gwicke: we'd have to write an custom director for this
22:35:33 <AaronSchulz> are you counting frontend+backend varnish (e.g. CARP)?
22:35:40 <AaronSchulz> I assume swift would not be part of this
22:35:45 <paravoid> the current ones are "random", "wrr" and "chash" (which mark wrote)
22:35:48 <paravoid> it's not rocket science
22:35:50 <gwicke> paravoid, would that be difficult?
22:36:00 <bd808> Swift would only be used in (5) to fetch originals
22:36:10 <AaronSchulz> right, but not a cache layer
22:36:10 <TimStarling> wouldn't chash do it already?
22:36:22 <gwicke> probably depends on what you feed into the hash
22:36:45 <paravoid> well, yeah, I guess you could hack it up by appending a random replica number to the URL in vcl_hash
22:36:47 <gwicke> if that can be manipulated in VCL, then it might be relatively simple
22:37:06 <paravoid> it's a bit ugly, though, but either way, possible
22:37:07 <AaronSchulz> so we are over halfway though this meeting just to note
22:37:18 <TimStarling> paravoid: by variants, gwicke means thumbnail sizes, right?
22:37:24 <TimStarling> which are already in the URL
22:37:32 <gwicke> TimStarling, yes
22:37:43 <TimStarling> AaronSchulz: well, people seemed to take a while to warm up
22:37:50 <gwicke> they'd map to the same linked variant chain though
22:37:56 <gwicke> in storage
22:37:57 <TimStarling> it seems like the longer we run with it, the faster we make progress
22:38:10 <AaronSchulz> not saying we need to stop, just noting the time
22:38:19 <gwicke> but that's in the backend
22:38:49 <gwicke> if chash is purely url-based (which it is afaik), then we should already get a quasi-random distribution across backends
22:39:01 <paravoid> correct
22:39:05 <gwicke> so latency might not be that bad
22:39:06 <paravoid> I understood something different, I'm sorry.
22:39:12 <TimStarling> ok, so paravoid, what do you think of option 5?
22:39:23 <TimStarling> we need some conclusions and action items now
22:39:24 <AaronSchulz> vcl_hash + extra cache layer, starting with png/jpg and doing others later sounds reasonable?
22:39:52 <TimStarling> is anyone against option 5?
22:39:59 <gwicke> fine with me
22:40:02 <paravoid> I'm okay with option 5
22:40:03 <paravoid> but
22:40:14 <paravoid> I think we might need to consider just expanding the existing cache layer
22:40:14 <gwicke> although I could also live with storing a handful standard sizes in swift
22:40:24 <gwicke> at least one 'large screen size' thumb
22:40:31 <AaronSchulz> paravoid: right
22:40:40 <TimStarling> ok, well either way, we need the same MW support
22:40:44 <paravoid> SSDs are getting cheaper these days, it might not be worth it
22:40:51 <paravoid> nod, either way it doesn't matter much
22:40:59 <AaronSchulz> do we care if vcl_hash puts more hot thumbnails on single boxes?
22:41:02 <TimStarling> MW needs to be adapted to stop storing thumbnails, to just stream them out instead
22:41:26 <TimStarling> who will plan that? AaronSchulz?
22:41:29 * AaronSchulz is open to playing around with the hash since the htcp stream hits everything anyway, they'd still get the purges (as noted)
22:41:57 <TimStarling> #action AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
22:42:07 <AaronSchulz> TimStarling: it would be a config switch I always assumed
22:42:14 <bd808> Will this need to be a feature flag option or can core change unilaterally?
22:42:17 <TimStarling> easy action for you then
22:42:23 <paravoid> mediawiki streams out thumbnails now anyway
22:42:28 <AaronSchulz> I also want it to send a header for vcl to use to determine the hash
22:42:37 <paravoid> it just stores them too
22:42:39 <AaronSchulz> I don't want some ugly regexes in vcl trying to look for thumbs
22:42:49 <AaronSchulz> it would be cleaner for the vcls to look for a custom header IMO
22:43:00 <paravoid> that's not possible I'm afraid
22:43:08 <paravoid> vcl_hash is called on the request path, not the response path
22:43:18 <bd808> AaronSchulz: It has to match the request URL right?
22:43:33 * bd808 doesn't type as fast as paravoid
22:43:42 <AaronSchulz> paravoid: hmm, right
22:43:57 <TimStarling> #info option 5 generally favoured, possibly with modifications, we will proceed with design work on it
22:44:09 <paravoid> thank you TimStarling
22:44:18 <TimStarling> #action bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer
22:44:35 * bd808 nods
22:44:53 <paravoid> do we need an action item for mediawiki to treat PDF/Djvu in a different way?
22:45:03 <paravoid> or is this part of the previous "stream out" action?
22:45:10 <TimStarling> paravoid: you can put notes on the talk page about that
22:45:20 <paravoid> okay
22:45:22 <TimStarling> we have time for a very quick look at one other RFC
22:45:32 <paravoid> ori-l just joined :)
22:45:36 <paravoid> right on time
22:45:37 <parent5446> Ori's here so we can briefly look at logging.
22:45:45 <TimStarling> #topic RFC: Structured logging
22:45:58 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging
22:46:04 <AaronSchulz> csteipp, ori-l: http://pastebin.com/phDgyNHi
22:46:16 <gwicke> +1 on using JSON
22:46:27 <ori-l> AaronSchulz: thanks
22:47:00 <bd808> gwicke: I looked at other alternatives but json seemed the clear winner
22:47:05 <parent5446> OK, so my one question with this is why we need to specify our own serialization format for logs. Maybe it'd be nice to have a "MediaWiki serialization format", but at the same time our logging system should be open to whatever format the sysadmin wants to output into.
22:47:24 <RoanKattouw> ori-l: This looks sweet
22:47:32 <ori-l> RoanKattouw: it's bd808's!
22:47:35 <gwicke> bd808, https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Structured_logging#We_are_considering_a_similar_approach_for_Parsoid_36348
22:47:48 <ori-l> parent5446: I mostly agree, but JSON also constrains the type of data you can emit
22:47:49 <RoanKattouw> Would the recorded fields like vhost and ip be extensible? On the WMF cluster I'd like to add XFF, for instance
22:47:58 <TimStarling> would this have multiple backends? structured and plain text?
22:48:02 <RoanKattouw> (Had to hack that up manually not to long ago to debug 127.0.0.1 problems)
22:48:09 <paravoid> +1 to a modular approach
22:48:13 <parent5446> That's why I proposed we use something like monolog.
22:48:15 <bd808> RoanKattouw: yes. It should be extensible
22:48:27 <parent5446> It allows us to add our JSON format, while also supporting literally everything else.
22:48:34 <bd808> I would actually support monolog as well
22:48:40 <MaxSem> <3 the greppability of plaintext
22:48:40 <ori-l> TimStarling: you could have a PlainTextLogEmitter that munges the array into something human-readable
22:48:47 <TimStarling> ori-l: yeah
22:49:01 <ori-l> a la getTraceAsString
22:49:18 <parent5446> Actually, monolog already has a JsonFormatter.
22:49:27 <TimStarling> regarding "Live exception object to be stringified by the log event emitter"
22:49:30 <parent5446> We'd just need to use a Processor to put in the data we want
22:49:31 <gwicke> it is pretty simple to write a json grepper I guess
22:49:33 <bd808> The important part is keeping the log records structured internally until the emitter is reached
22:49:46 <paravoid> gwicke: jq
22:49:47 <TimStarling> do you mean Exception::__toString() or something else?
22:50:01 <paravoid> gwicke: https://github.com/stedolan/jq
22:50:10 <ori-l> presumably the exception object itself
22:50:11 <gwicke> paravoid, oh, nice
22:50:14 <paravoid> (sorry, not relevant to this discussion)
22:50:33 <bd808> TimStarling: It's an implementation detail. Ideally the formatting of the exception would be left up to the output formatter
22:50:58 <ori-l> the thing that I wanted to flag actually is that we have two subsystems that half-implement the concept of pluggable logging backends
22:51:17 <TimStarling> you mean json_encode($exception)?
22:51:22 <TimStarling> I'm not sure that would work
22:51:31 <ori-l> TimStarling: we already have json-encoded exceptions in core
22:51:35 <TimStarling> some exception objects will have references to massive parents
22:51:41 <paravoid> forgive me for the naive question, is this the discussion for the logging format (plain, json, ...), the transport (udp2log, syslog, gelf, ...), or both?
22:51:56 <ori-l> TimStarling: see exception-json.log on fluorine :P
22:52:06 <TimStarling> I'll file a bug
22:52:15 <ori-l> TimStarling: we redact those from the JSON log
22:52:17 <parent5446> paravoid: the RFC focuses on format, but ideally we should replace our entire logging system
22:52:31 <ori-l> I'm not sure a bug is warranted
22:52:40 <ori-l> but anyways, to finish my point: there's wfDebugLog & co., which recognize udp://, tcp://, and file paths
22:52:40 <bd808> Here's an example of monolog logging an exception: http://pastebin.de/37759
22:52:45 <TimStarling> I assumed it would be 0mq
22:52:49 <TimStarling> since it is ori-l writing it
22:52:52 <parent5446> (Also, I know I'm evangelizing monolog here, but it also cooperates with exception workflow.)
22:52:56 <parent5446> :P
22:53:05 <ori-l> and there's the recent change stream implementation that vvv wrote
22:53:37 <ori-l> the latter lets you specify an emitter class
22:53:38 <gwicke> we should also consider logs from non-PHP services
22:53:45 <ori-l> i wrote a redis one as a way of trying out the API, it's in core too
22:53:52 <ori-l> we should consolidate all of these, obviously
22:53:58 <TimStarling> UDP is sucky lazy rubbish
22:54:05 <AaronSchulz> heh
22:54:10 <ori-l> and make recent changes be a special case of logging
22:54:16 <TimStarling> asynchronous messaging on the cheap
22:54:18 <bd808> gwicke: Unifying across languages would be nice.
22:54:20 <gwicke> if we can agree on a standard set of keys for stuff like host name etc, then those can directly tie into the same infrastructure
22:54:31 <TimStarling> if you have an asynchronous messaging system that isn't prone to losing its messages, why not use it?
22:54:31 <ori-l> cf rcfeeds/RedisPubSubFeedEngine.php for an example
22:54:42 <TimStarling> syslog is ridiculously old and crusty and limited
22:54:57 <TimStarling> like 1024 byte packet limit, and integer facility fields
22:55:01 <parent5446> ori-l: monolog also has a Redis handler
22:55:03 <ori-l> TimStarling: I agree, but I think this is the uninteresting part of the problem
22:55:14 <paravoid> I think we need to split those two discussions
22:55:18 <ori-l> if you have pluggable backends people who love UDP can use UDP
22:55:22 <paravoid> it can be the same RFC
22:55:38 <paravoid> but split the parts of "which format" from "which transport"
22:55:40 <TimStarling> you know that I have mostly driven the adoption of UDP at WMF
22:55:47 <TimStarling> that is because I am lazy and cheap
22:56:17 <TimStarling> and because the queueing options at the time I started were not as good as they are now
22:56:28 <ori-l> we won't use UDP
22:56:49 <paravoid> we could use AMQP, or 0mq, or even Kafka.
22:57:03 <ori-l> can we rely on URL prefixes for dispatcher configuration?
22:57:13 <paravoid> but first agree on the format? :)
22:57:17 <ori-l> this would be consistent with wfDebugLog, the PHP stream API
22:57:29 <ori-l> and partly with the existing RC implementation
22:57:49 <TimStarling> ori-l: yeah, should work
22:57:51 <ori-l> i.e.: $wgLogHandlers[] = "zmq://foo/topic"
22:58:02 <gwicke> is everybody on board with the choice of JSON?
22:58:05 <parent5446> Rather than discussing WMF-specific logging implementations, we should first establish how we'd incorporate a structured logging system.
22:58:12 <parent5446> Where would the loggers go?
22:58:14 <parent5446> In ContextSource?
22:58:29 <TimStarling> gwicke: no, I am in favour of dual logging of JSON and plain text
22:58:31 <parent5446> Whether it's JSON or whatever comes after we have the modular system in place.
22:58:47 <paravoid> unstructured json?
22:58:52 <gwicke> TimStarling: it seems to be easy enough to convert JSON to plain
22:59:00 <ori-l> I propose we limit ourselves to the set of types available in JSON
22:59:03 <paravoid> or an existing structure, like gelf?
22:59:09 <ori-l> but that we make the actual serialization format configurable
22:59:12 <bd808> I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object
22:59:19 <parent5446> ori-l: Agreed with this idea. It makes modular dispatching easier.
22:59:30 <ori-l> actually, maybe that's not a good idea
22:59:44 <ori-l> maybe you should just pass off to the serializer the richest possible objects you have
22:59:49 <TimStarling> #action ori-l to expand RFC
22:59:50 <gwicke> I'd be in favor of standardizing on something
22:59:53 <paravoid> hehe
22:59:56 <paravoid> gwicke: http://www.graylog2.org/gelf#specs ?
22:59:58 <AaronSchulz> ;)
23:00:07 <paravoid> gwicke: and https://github.com/robertkowalski/gelf-node I guess ;)
23:00:18 <gwicke> paravoid, we are considering https://github.com/trentm/node-bunyan
23:00:23 <gwicke> has a gelf backend too it seems
23:00:41 <gwicke> https://github.com/mhart/gelf-stream
23:00:43 <ori-l> if you have the proper abstractions in place implementing backends is trivial, right?
23:00:50 <TimStarling> #info JSON generally favoured as long as a plain text format can be also made available
23:01:03 <ori-l> i mean, log messages are messages, and message queues tend to provide good APIs for queueing messages
23:01:21 <TimStarling> #info transport selection based on URI-style destination string
23:01:29 <ori-l> weeee
23:02:18 <parent5446> URI-based selection might not be the best idea. What if you want a logger to only log certain levels, i.e., warnings or errors?
23:02:23 <ori-l> TimStarling: maybe as a final action-item, agree to the consolidation of RC logging with logging in general?
23:02:33 <TimStarling> <parent5446> Where would the loggers go?
23:02:33 <TimStarling> <parent5446> In ContextSource?
23:02:41 <TimStarling> parent5446: I suggest you comment on the RFC talk page
23:02:59 <parent5446> OK, will do that now.
23:02:59 <ori-l> parent5446: zmq://dest.eqiad.wmnet?loglevel=warn
23:03:03 <bd808> <    bd808>     I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object
23:03:12 <parent5446> ori-l: Ah that works I guess
23:03:19 <TimStarling> ori-l: put it on the RFC
23:03:23 <aude> ori-l: i think RC is a separate consideration, maybe worth own rfc
23:03:38 * aude at least needs more details
23:03:42 <subbu> i've used log4j in other contexts which has notions of formatter, target (file, socket, console, etc.) and log-level (warn, info, debug, etc.) which can all be configured. this proposal  seems similar by separating those concerns.
23:03:59 <gwicke> parent5446, re levels: I think that is both a source and consumer concern; the source selects the min level to send, while the consumer can further filter based on the level in the message
23:04:04 <TimStarling> ok, we are out of time now, please put your ideas on the RFC talk page if possible
23:04:10 * ori-l nods
23:04:28 <paravoid> thank you TimStarling for chairing.
23:04:30 <bd808> Thanks for all the great feedback
23:04:36 * subbu paged into the window rather late ..
23:04:45 * subbu will read scrollback and post on talk page
23:04:48 <ori-l> TimStarling: what bug were you going to file?
23:05:10 <TimStarling> #endmeeting