Architecture meetings/RFC review 2013-12-04
Appearance
Wednesday, December 4, 2013 at 10:00 PM UTC at #wikimedia-meetbot connect
Requests for Comment to review
[edit]Propose your own RFCs:
- Requests for comment/Simplify thumbnail cache
- Requests for comment/Structured logging
- Requests for comment/Json Config pages in wiki (if it's in a stable enough state for discussion)
Summary and logs
[edit]Meeting summary
[edit]Meeting started by MaxSem at 22:01:28 UTC (full logs).
- https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04 (TimStarling, 22:02:40)
- RFC: Simplify thumbnail cache (TimStarling, 22:05:57)
- https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache (TimStarling, 22:06:04)
- https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes (paravoid, 22:19:35)
- ACTION: AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage (TimStarling, 22:41:57)
- option 5 generally favoured, possibly with modifications, we will proceed with design work on it (TimStarling, 22:43:57)
- ACTION: bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer (TimStarling, 22:44:18)
- RFC: Structured logging (TimStarling, 22:45:45)
- https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging (TimStarling, 22:45:58)
- ACTION: ori-l to expand RFC (TimStarling, 22:59:49)
- https://github.com/mhart/gelf-stream (gwicke, 23:00:41)
- JSON generally favoured as long as a plain text format can be also made available (TimStarling, 23:00:50)
- transport selection based on URI-style destination string (TimStarling, 23:01:21)
Meeting ended at 23:05:10 UTC (full logs).
Action items
[edit]- AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
- bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
- ori-l to expand RFC
Action items, by person
[edit]- AaronSchulz
- AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage
- bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
- bd808
- bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer Done
- ori-l
- ori-l to expand RFC
People present (lines said)
[edit]- TimStarling (90)
- paravoid (74)
- gwicke (44)
- AaronSchulz (44)
- bd808 (42)
- ori-l (36)
- parent5446 (24)
- aude (15)
- RoanKattouw (8)
- bawolff (6)
- MaxSem (5)
- subbu (3)
- meetbot-wm (3)
- Krinkle (2)
Generated by MeetBot 0.1.4.
Full log
[edit]Meeting logs |
---|
22:01:28 <MaxSem> #startmeeting 22:01:28 <meetbot-wm> Meeting started Wed Dec 4 22:01:28 2013 UTC. The chair is MaxSem. Information about MeetBot at https://bugzilla.wikimedia.org/46377. 22:01:28 <meetbot-wm> Useful Commands: #action #agreed #help #info #idea #link #topic. 22:01:37 <MaxSem> #chair TimStarling 22:01:37 <meetbot-wm> Current chairs: MaxSem TimStarling 22:01:45 <parent5446> Ah there we go 22:02:02 <MaxSem> yay, I hacked a bot!:P 22:02:34 <TimStarling> ok, so there are 3 RFCs on the wiki page 22:02:40 <TimStarling> #link https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2013-12-04 22:03:11 <TimStarling> do we have people here who want to talk about them, and are there any others that those present want to add? 22:03:29 <bd808> Ori would like to request that the logging rfc be "not first" as he is AFK until 22:30Z 22:04:03 * aude waves :) 22:04:21 <TimStarling> well, we have you and paravoid, we could talk about "Simplify thumbnail cache" 22:04:30 <paravoid> indeed 22:04:34 <paravoid> that's why I'm here :) 22:04:48 <TimStarling> ah, and there's the third author 22:04:49 <paravoid> and now AaronSchulz too :) 22:05:33 <bd808> Sounds good to me 22:05:56 <paravoid> so, bd808 since you proposed this for discussion (and wrote all the text :)), do you want to take point? 22:05:57 <TimStarling> #topic RFC: Simplify thumbnail cache 22:06:04 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/Simplify_thumbnail_cache 22:06:32 <bd808> I mostly just collected notes from paravoid and AaronSchulz :) 22:06:46 <bd808> But sure. 22:06:55 <TimStarling> bd808: which is your preferred option? 22:07:03 <paravoid> is the problem statement clear enough to everyone? 22:07:14 <gwicke> pretty clear to me 22:07:15 <parent5446> So basically we want to get thumbnails off of Swift. 22:07:29 <bd808> And make purges easier 22:07:41 <aude> as long as we can still generate thumbnails of arbitrary size ( on cache miss), it seems fine 22:07:50 <aude> they don't have to be stored forever 22:08:23 <TimStarling> well, I don't think bd808 does want thumbnails off of swift, based on his talk page comments 22:08:39 <gwicke> is there an implementation of the purge pattern match already? 22:09:01 <RoanKattouw> The RFC text suggests that thumbs would move off of Swith 22:09:03 <RoanKattouw> *swift 22:09:09 <RoanKattouw> "3. Configure MediaWiki imagescalers to stop storing generated thumbnails in Swift" 22:09:21 <AaronSchulz> right 22:09:22 <aude> what exactly are the imagescalers (excuse my ignorance) 22:09:34 <bd808> gwicke: There is not an implementation yet, but AaronSchulz has been dying to start working on that 22:09:35 <AaronSchulz> RoanKattouw: I think some would stay for a while though 22:09:41 <aude> in this context* 22:09:51 <AaronSchulz> like media that supports pages and can have many thumbs for one file version 22:09:56 <bd808> I think that "most" would move off of swift. 22:09:59 <gwicke> I would imagine the idea is something like hashing different thumbs to the same cache entry, and then vary on the size? 22:10:03 <RoanKattouw> aude: They are Apache machines dedicated to image scaling 22:10:05 <paravoid> aude: mediawiki application servers that scale uploaded content to thumbnails per request 22:10:06 <AaronSchulz> if we don't use vcl_hash tricks on those, they will have to work the old fashioned way 22:10:11 <aude> RoanKattouw: paravoid thanks 22:10:15 <MaxSem> what about thumbs that are extremely slow to render? 22:10:16 <AaronSchulz> until they get refactored somehow or something 22:10:22 <parent5446> On the note of the imagescalers, do we know if they can handle that 5x increase in utilization? 22:10:25 <TimStarling> the problem is infinite growth of thumbnail storage 22:10:35 <RoanKattouw> HTTP request for nonexistent thumb comes in, thumb is generated locally, stored, HTTP response with thumb goes out 22:10:37 <TimStarling> MaxSem: the thing that is slow is the fetch of the original 22:10:44 <bd808> AaronSchulz has pointed out that some media types are very time consuming to extract thumbs from and should probably be retained in durable storage. 22:10:47 <AaronSchulz> MaxSem: we use ssds in varnish (not just memory cache) 22:10:51 <TimStarling> the actual image scaling part is pretty fast, and can easily be scaled up 22:11:17 <TimStarling> so that should answer parent5446's question also 22:11:23 <parent5446> Yep, thanks. 22:11:27 <AaronSchulz> we also have some simple "ping limiting" in place for thumb.php 22:11:33 <TimStarling> yes, we can absolutely scale 5x as many images, but we can't fetch the originals that fast 22:11:38 <aude> bd808: what about having some fixed size thumbnails for some stuff? 22:11:46 * aude thinking of gigantic tiffs and videos 22:11:48 <AaronSchulz> to avoid too much LRU churn or wasted I/O and CPU from trolling a bit 22:11:57 <bd808> The new MediaViewer feature has shown that generating everything on the fly may be slower than people are used to. 22:11:58 <aude> then stuff can be scaled from those? 22:11:58 <paravoid> TimStarling: why do you think so? 22:12:13 <AaronSchulz> bd808: new thumbnail sizes? 22:12:40 <paravoid> bd808: could you talk a little about that? I haven't heard anything and this sounds interesting 22:12:40 <AaronSchulz> we are also still replicated writes across a DC in a synchronous manner that I can't stand 22:12:43 <bd808> aude: That would be a possibility and actually the subject of https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes 22:12:43 <TimStarling> paravoid: why do I think we can't fetch originals that fast? 22:12:46 <AaronSchulz> *replicating 22:12:52 <aude> bd808: i know 22:12:56 <AaronSchulz> bd808: that doesn't help ;) 22:12:58 <TimStarling> paravoid: I thought there was a comment from you to that effect 22:13:16 <AaronSchulz> aude: we do something like that with TMH 22:13:32 <AaronSchulz> if two different sized thumbs are requested for the same time position of a video 22:13:32 <bd808> MediaViewer asks for new thumb sizes 22:13:40 <AaronSchulz> a reference frame is used for scaling 22:13:40 <aude> it's contrary to allowing arbitrary sized, but maybe certain cases it makes sens to have special handling for some types of files 22:14:09 <aude> AaronSchulz: makes sense 22:14:11 <bd808> Perfomance is getting better with some changes made by the team, but initial testing was found to be 2-5 seconds for many thumbs to generate 22:14:29 <TimStarling> we can start the image scaling at parse time 22:14:39 <bd808> The changes they have made are basically to "bucket" thumb sizes 22:15:37 <bd808> I really like TimStarling's idea of adding a 3rd layer of varnish 22:15:45 <paravoid> I didn't understand the bucket thumb sizes part 22:16:17 <gwicke> we could also consider generating small thumbs from a smaller standard thumb size 22:16:32 <parent5446> "4. Store "standard" thumbnails permanently and others with TTL (and possibly last use updating)" 22:16:36 <parent5446> Also something worth considering 22:16:46 <gwicke> that would also help IO 22:17:01 <gwicke> and should be faster for video thumbs too 22:17:10 <paravoid> videoscaling is not part of this discussion 22:17:12 <aude> gwicke: essentially what i tried to say 22:17:22 <paravoid> or video thumbs 22:17:25 <bd808> paravoid: The original extension used the screen width of the browser to call for a thumb. Now they are using a series of sizes (histogram basically) and calling for the size closest to the screen width 22:17:25 <bawolff> I assume if we do the three layers of varnish thing, we would increase the max cache time? 22:17:29 <AaronSchulz> video thumbs already do that in TMH and indeed are not part of the rfc 22:17:45 <gwicke> k 22:18:15 <bd808> bawolff: I would guess that the 3rd layer would be backed by spinning disk and use LRU eviction based on sapce 22:18:22 <bd808> *space 22:18:37 <paravoid> and TTLs, and manual PURGEs 22:18:55 <paravoid> it does add some complexity, though. 22:18:56 <AaronSchulz> if you store standard sizes the URLs to purge are known (thus don't need swift) 22:19:11 <AaronSchulz> though changing the standard sizes would require running a one-off script 22:19:20 <paravoid> so, the standard sizes is a separate discussion 22:19:25 <paravoid> there is a separate RFC 22:19:29 <paravoid> it's very much related, though. 22:19:35 <paravoid> https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizes 22:19:52 <AaronSchulz> and may be useful for the file types exempt from the bucketing 22:20:00 <TimStarling> bd808: it sounds like this MediaViewer extension needs some wider review 22:20:02 <gwicke> I was just considering it as an option for speeding up generation of non-standard thumbs 22:20:14 <gwicke> especially for speeding up the IO part of that 22:20:34 <bd808> TimStarling: I'm sure they would welcome feedback. Brion has been giving them some attention. 22:20:52 <paravoid> about the IO: storing millions of tiny files in spindles is very inefficient 22:20:56 <TimStarling> where will MediaViewer be used? 22:21:16 <bd808> It is currently deployed to all wikis I believe. 22:21:24 <paravoid> I don't expect Varnish to change that much, although it would change the fact that we are not going to store 3 copies 22:21:26 <TimStarling> on what pages is it activated? 22:21:36 <paravoid> TimStarling: it's part of the new "beta features" thing 22:21:37 <TimStarling> page views or image description pages? 22:21:41 <AaronSchulz> The main thing with this rfc is about having run-of-the-mill jpgs/pngs stored only in varnish and totally LRU and I wouldn't see much benefit to use reference thumbnails for that 22:21:46 <bawolff> Its hidden behind a preference 22:21:54 <paravoid> TimStarling: so you have to explicitly enable it as an experimental feature 22:22:02 <Krinkle> So it's deployed everywhere, but opt-in via beta preferences. It is exposed by clicking on an image thumb after enabling it. 22:22:04 <TimStarling> yes, and after you enable it, where does it appear? 22:22:08 <AaronSchulz> paravoid: that's how it always starts :) 22:22:10 <bd808> TimStarling: It's in the "beta features" program 22:22:18 <TimStarling> thanks Krinkle 22:22:22 <gwicke> AaronSchulz: if the IO portion of scaling can handle that, then that would certainly be simpler 22:22:29 <Krinkle> TimStarling: I had trouble discovering it as well, because we're trained to think that clicking a thumb opens the file page :) 22:22:56 <paravoid> gwicke: handle what? 22:23:02 <gwicke> AaronSchulz: but if IO becomes a bottleneck then reference thumbnails (even a single 1024x1024 bounding box one) could help a lot 22:23:07 <paravoid> sorry, lost in the subthreads of this discussion :) 22:23:29 <gwicke> paravoid: handle potential spikes in miss rates 22:23:57 <gwicke> in case varnish machines go down, there is a deploy issue or the like 22:24:29 <paravoid> so your preference seems to be alternative strategy (5), correct? 22:24:48 <bd808> Failure tolerance and (ab)use of vcl_hash I think are the big open questions with any of the schemes 22:25:03 <bd808> paravoid: personal I like (5) the best 22:25:18 <TimStarling> well, regarding vcl_hash, there is the secondary key feature mentioned 22:25:22 <paravoid> bd808: except the "implementing LRU in a Swift middleware" schemes 22:25:24 <gwicke> paravoid: a single thumb could also live in swift 22:25:32 <TimStarling> which might be "months" away, which doesn't sound so long to wait really 22:25:48 <paravoid> varnish 4.0 technology preview 1 got released... today 22:25:49 <gwicke> not sure that it would need to be LRUed 22:25:51 * AaronSchulz doesn't really get 5 22:25:57 <bawolff> Having 1 htcp packet purge everything sounds really nice 22:26:15 <paravoid> I haven't checked if it includes surrogate keys, though, and a deployment within the WMF is many months away indeed. 22:26:29 <gwicke> so close to 3) combined with the vcl_hash proposal 22:26:44 <TimStarling> AaronSchulz: 5 was my suggestion on the talk page 22:26:52 <bd808> LRU in swift is good but there was some question as to the performance of swift in deleting files 22:26:52 <bd808> I think you actually raised that paravoid ? 22:27:00 <TimStarling> AaronSchulz: follow the ref link 22:27:27 <paravoid> yes, as an open question, not as a known issue 22:27:57 <TimStarling> also, I am not sure if list traversal in a vcl_hash scheme is really worth worrying about 22:28:15 <TimStarling> there are two ways to look at the performance of it: throughput and latency 22:28:21 <paravoid> to be clear, we are excluding TIFF/PDF/Djvu from this discussion, correct? 22:29:07 <TimStarling> throughput: multiply the *mean* number of thumbnails per source by the time per link traversal 22:29:21 <AaronSchulz> paravoid: pretty much 22:29:23 <TimStarling> now, the mean is not large, you don't need to worry about djvu/pdf for that 22:29:37 <TimStarling> maybe for a djvu with 1000 pages it might take 1ms to traverse 22:29:45 <TimStarling> but that doesn't impact the throughput very much 22:29:53 <TimStarling> the other way to look at it is latenc 22:29:54 <TimStarling> y 22:30:03 <bawolff> Why exclude tiff. Tiff with many pahes are very rare 22:30:10 <paravoid> we have tons of of pdf/djvus with hundreds of pages * 5 thumbnails per page 22:30:10 <bawolff> *pages 22:30:18 <TimStarling> then you ask: what is the largest possible number of thumbnails on a given image and will that add user-visible latency to requests for that image? 22:30:34 <AaronSchulz> I think they could be added if it's fine on average, but they were to be excluded in first phases 22:30:46 <TimStarling> the limit there would be say 50ms of latency 22:30:54 <paravoid> that's an interesting approach, TimStarling 22:31:46 <TimStarling> I would expect linked list traversal in phk's style of C to be pretty fast 22:31:50 <gwicke> is there a need to have all entries end up on a single backend varnish? 22:31:56 <TimStarling> like, a lot less than a microsecond 22:31:59 <bd808> With the current application logic the upper bound is something like the width of the original image. 22:32:08 <gwicke> the purge requests are going to all varnishes I guess 22:32:12 <paravoid> I think it was mark who was mostly concerned about that, I don't have counterarguments. 22:32:47 <bd808> Actually width * number of pages I suppose. Do we vary on other dimensions? 22:32:49 <AaronSchulz> TimStarling: so 5 is just vcl_hash+second cache layer to deal with those eviction issues, OK 22:33:26 <TimStarling> AaronSchulz: yes 22:33:41 <paravoid> "second", but yes :) 22:33:50 <AaronSchulz> I was confused at first since I thought it was a complete alternative 22:33:55 <bawolff> Bd808: not normally. Svg has language, tiff has lossless/lossy 22:33:57 <paravoid> I'd say "additional, spindle-backed" 22:34:13 <AaronSchulz> miser! :p 22:34:39 <TimStarling> AaronSchulz: third cache layer, really 22:34:44 <gwicke> so can't the variants for a single thumb be distributed across several backends to limit request latency? 22:35:01 <paravoid> TimStarling: or fourth, for esams/ulsfo 22:35:08 <bd808> memory -> ssd -> disk -> scaler 22:35:09 <paravoid> let's stop counting cache layers, though :) 22:35:10 <TimStarling> yeah 22:35:30 <paravoid> gwicke: we'd have to write an custom director for this 22:35:33 <AaronSchulz> are you counting frontend+backend varnish (e.g. CARP)? 22:35:40 <AaronSchulz> I assume swift would not be part of this 22:35:45 <paravoid> the current ones are "random", "wrr" and "chash" (which mark wrote) 22:35:48 <paravoid> it's not rocket science 22:35:50 <gwicke> paravoid, would that be difficult? 22:36:00 <bd808> Swift would only be used in (5) to fetch originals 22:36:10 <AaronSchulz> right, but not a cache layer 22:36:10 <TimStarling> wouldn't chash do it already? 22:36:22 <gwicke> probably depends on what you feed into the hash 22:36:45 <paravoid> well, yeah, I guess you could hack it up by appending a random replica number to the URL in vcl_hash 22:36:47 <gwicke> if that can be manipulated in VCL, then it might be relatively simple 22:37:06 <paravoid> it's a bit ugly, though, but either way, possible 22:37:07 <AaronSchulz> so we are over halfway though this meeting just to note 22:37:18 <TimStarling> paravoid: by variants, gwicke means thumbnail sizes, right? 22:37:24 <TimStarling> which are already in the URL 22:37:32 <gwicke> TimStarling, yes 22:37:43 <TimStarling> AaronSchulz: well, people seemed to take a while to warm up 22:37:50 <gwicke> they'd map to the same linked variant chain though 22:37:56 <gwicke> in storage 22:37:57 <TimStarling> it seems like the longer we run with it, the faster we make progress 22:38:10 <AaronSchulz> not saying we need to stop, just noting the time 22:38:19 <gwicke> but that's in the backend 22:38:49 <gwicke> if chash is purely url-based (which it is afaik), then we should already get a quasi-random distribution across backends 22:39:01 <paravoid> correct 22:39:05 <gwicke> so latency might not be that bad 22:39:06 <paravoid> I understood something different, I'm sorry. 22:39:12 <TimStarling> ok, so paravoid, what do you think of option 5? 22:39:23 <TimStarling> we need some conclusions and action items now 22:39:24 <AaronSchulz> vcl_hash + extra cache layer, starting with png/jpg and doing others later sounds reasonable? 22:39:52 <TimStarling> is anyone against option 5? 22:39:59 <gwicke> fine with me 22:40:02 <paravoid> I'm okay with option 5 22:40:03 <paravoid> but 22:40:14 <paravoid> I think we might need to consider just expanding the existing cache layer 22:40:14 <gwicke> although I could also live with storing a handful standard sizes in swift 22:40:24 <gwicke> at least one 'large screen size' thumb 22:40:31 <AaronSchulz> paravoid: right 22:40:40 <TimStarling> ok, well either way, we need the same MW support 22:40:44 <paravoid> SSDs are getting cheaper these days, it might not be worth it 22:40:51 <paravoid> nod, either way it doesn't matter much 22:40:59 <AaronSchulz> do we care if vcl_hash puts more hot thumbnails on single boxes? 22:41:02 <TimStarling> MW needs to be adapted to stop storing thumbnails, to just stream them out instead 22:41:26 <TimStarling> who will plan that? AaronSchulz? 22:41:29 * AaronSchulz is open to playing around with the hash since the htcp stream hits everything anyway, they'd still get the purges (as noted) 22:41:57 <TimStarling> #action AaronSchulz to plan MW modification to stream out thumbnails with FileBackend storage 22:42:07 <AaronSchulz> TimStarling: it would be a config switch I always assumed 22:42:14 <bd808> Will this need to be a feature flag option or can core change unilaterally? 22:42:17 <TimStarling> easy action for you then 22:42:23 <paravoid> mediawiki streams out thumbnails now anyway 22:42:28 <AaronSchulz> I also want it to send a header for vcl to use to determine the hash 22:42:37 <paravoid> it just stores them too 22:42:39 <AaronSchulz> I don't want some ugly regexes in vcl trying to look for thumbs 22:42:49 <AaronSchulz> it would be cleaner for the vcls to look for a custom header IMO 22:43:00 <paravoid> that's not possible I'm afraid 22:43:08 <paravoid> vcl_hash is called on the request path, not the response path 22:43:18 <bd808> AaronSchulz: It has to match the request URL right? 22:43:33 * bd808 doesn't type as fast as paravoid 22:43:42 <AaronSchulz> paravoid: hmm, right 22:43:57 <TimStarling> #info option 5 generally favoured, possibly with modifications, we will proceed with design work on it 22:44:09 <paravoid> thank you TimStarling 22:44:18 <TimStarling> #action bd808 to remove options other than 5 from the RFC and include AaronSchulz's variant proposal with expanded existing SSD layer 22:44:35 * bd808 nods 22:44:53 <paravoid> do we need an action item for mediawiki to treat PDF/Djvu in a different way? 22:45:03 <paravoid> or is this part of the previous "stream out" action? 22:45:10 <TimStarling> paravoid: you can put notes on the talk page about that 22:45:20 <paravoid> okay 22:45:22 <TimStarling> we have time for a very quick look at one other RFC 22:45:32 <paravoid> ori-l just joined :) 22:45:36 <paravoid> right on time 22:45:37 <parent5446> Ori's here so we can briefly look at logging. 22:45:45 <TimStarling> #topic RFC: Structured logging 22:45:58 <TimStarling> #link https://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging 22:46:04 <AaronSchulz> csteipp, ori-l: http://pastebin.com/phDgyNHi 22:46:16 <gwicke> +1 on using JSON 22:46:27 <ori-l> AaronSchulz: thanks 22:47:00 <bd808> gwicke: I looked at other alternatives but json seemed the clear winner 22:47:05 <parent5446> OK, so my one question with this is why we need to specify our own serialization format for logs. Maybe it'd be nice to have a "MediaWiki serialization format", but at the same time our logging system should be open to whatever format the sysadmin wants to output into. 22:47:24 <RoanKattouw> ori-l: This looks sweet 22:47:32 <ori-l> RoanKattouw: it's bd808's! 22:47:35 <gwicke> bd808, https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Structured_logging#We_are_considering_a_similar_approach_for_Parsoid_36348 22:47:48 <ori-l> parent5446: I mostly agree, but JSON also constrains the type of data you can emit 22:47:49 <RoanKattouw> Would the recorded fields like vhost and ip be extensible? On the WMF cluster I'd like to add XFF, for instance 22:47:58 <TimStarling> would this have multiple backends? structured and plain text? 22:48:02 <RoanKattouw> (Had to hack that up manually not to long ago to debug 127.0.0.1 problems) 22:48:09 <paravoid> +1 to a modular approach 22:48:13 <parent5446> That's why I proposed we use something like monolog. 22:48:15 <bd808> RoanKattouw: yes. It should be extensible 22:48:27 <parent5446> It allows us to add our JSON format, while also supporting literally everything else. 22:48:34 <bd808> I would actually support monolog as well 22:48:40 <MaxSem> <3 the greppability of plaintext 22:48:40 <ori-l> TimStarling: you could have a PlainTextLogEmitter that munges the array into something human-readable 22:48:47 <TimStarling> ori-l: yeah 22:49:01 <ori-l> a la getTraceAsString 22:49:18 <parent5446> Actually, monolog already has a JsonFormatter. 22:49:27 <TimStarling> regarding "Live exception object to be stringified by the log event emitter" 22:49:30 <parent5446> We'd just need to use a Processor to put in the data we want 22:49:31 <gwicke> it is pretty simple to write a json grepper I guess 22:49:33 <bd808> The important part is keeping the log records structured internally until the emitter is reached 22:49:46 <paravoid> gwicke: jq 22:49:47 <TimStarling> do you mean Exception::__toString() or something else? 22:50:01 <paravoid> gwicke: https://github.com/stedolan/jq 22:50:10 <ori-l> presumably the exception object itself 22:50:11 <gwicke> paravoid, oh, nice 22:50:14 <paravoid> (sorry, not relevant to this discussion) 22:50:33 <bd808> TimStarling: It's an implementation detail. Ideally the formatting of the exception would be left up to the output formatter 22:50:58 <ori-l> the thing that I wanted to flag actually is that we have two subsystems that half-implement the concept of pluggable logging backends 22:51:17 <TimStarling> you mean json_encode($exception)? 22:51:22 <TimStarling> I'm not sure that would work 22:51:31 <ori-l> TimStarling: we already have json-encoded exceptions in core 22:51:35 <TimStarling> some exception objects will have references to massive parents 22:51:41 <paravoid> forgive me for the naive question, is this the discussion for the logging format (plain, json, ...), the transport (udp2log, syslog, gelf, ...), or both? 22:51:56 <ori-l> TimStarling: see exception-json.log on fluorine :P 22:52:06 <TimStarling> I'll file a bug 22:52:15 <ori-l> TimStarling: we redact those from the JSON log 22:52:17 <parent5446> paravoid: the RFC focuses on format, but ideally we should replace our entire logging system 22:52:31 <ori-l> I'm not sure a bug is warranted 22:52:40 <ori-l> but anyways, to finish my point: there's wfDebugLog & co., which recognize udp://, tcp://, and file paths 22:52:40 <bd808> Here's an example of monolog logging an exception: http://pastebin.de/37759 22:52:45 <TimStarling> I assumed it would be 0mq 22:52:49 <TimStarling> since it is ori-l writing it 22:52:52 <parent5446> (Also, I know I'm evangelizing monolog here, but it also cooperates with exception workflow.) 22:52:56 <parent5446> :P 22:53:05 <ori-l> and there's the recent change stream implementation that vvv wrote 22:53:37 <ori-l> the latter lets you specify an emitter class 22:53:38 <gwicke> we should also consider logs from non-PHP services 22:53:45 <ori-l> i wrote a redis one as a way of trying out the API, it's in core too 22:53:52 <ori-l> we should consolidate all of these, obviously 22:53:58 <TimStarling> UDP is sucky lazy rubbish 22:54:05 <AaronSchulz> heh 22:54:10 <ori-l> and make recent changes be a special case of logging 22:54:16 <TimStarling> asynchronous messaging on the cheap 22:54:18 <bd808> gwicke: Unifying across languages would be nice. 22:54:20 <gwicke> if we can agree on a standard set of keys for stuff like host name etc, then those can directly tie into the same infrastructure 22:54:31 <TimStarling> if you have an asynchronous messaging system that isn't prone to losing its messages, why not use it? 22:54:31 <ori-l> cf rcfeeds/RedisPubSubFeedEngine.php for an example 22:54:42 <TimStarling> syslog is ridiculously old and crusty and limited 22:54:57 <TimStarling> like 1024 byte packet limit, and integer facility fields 22:55:01 <parent5446> ori-l: monolog also has a Redis handler 22:55:03 <ori-l> TimStarling: I agree, but I think this is the uninteresting part of the problem 22:55:14 <paravoid> I think we need to split those two discussions 22:55:18 <ori-l> if you have pluggable backends people who love UDP can use UDP 22:55:22 <paravoid> it can be the same RFC 22:55:38 <paravoid> but split the parts of "which format" from "which transport" 22:55:40 <TimStarling> you know that I have mostly driven the adoption of UDP at WMF 22:55:47 <TimStarling> that is because I am lazy and cheap 22:56:17 <TimStarling> and because the queueing options at the time I started were not as good as they are now 22:56:28 <ori-l> we won't use UDP 22:56:49 <paravoid> we could use AMQP, or 0mq, or even Kafka. 22:57:03 <ori-l> can we rely on URL prefixes for dispatcher configuration? 22:57:13 <paravoid> but first agree on the format? :) 22:57:17 <ori-l> this would be consistent with wfDebugLog, the PHP stream API 22:57:29 <ori-l> and partly with the existing RC implementation 22:57:49 <TimStarling> ori-l: yeah, should work 22:57:51 <ori-l> i.e.: $wgLogHandlers[] = "zmq://foo/topic" 22:58:02 <gwicke> is everybody on board with the choice of JSON? 22:58:05 <parent5446> Rather than discussing WMF-specific logging implementations, we should first establish how we'd incorporate a structured logging system. 22:58:12 <parent5446> Where would the loggers go? 22:58:14 <parent5446> In ContextSource? 22:58:29 <TimStarling> gwicke: no, I am in favour of dual logging of JSON and plain text 22:58:31 <parent5446> Whether it's JSON or whatever comes after we have the modular system in place. 22:58:47 <paravoid> unstructured json? 22:58:52 <gwicke> TimStarling: it seems to be easy enough to convert JSON to plain 22:59:00 <ori-l> I propose we limit ourselves to the set of types available in JSON 22:59:03 <paravoid> or an existing structure, like gelf? 22:59:09 <ori-l> but that we make the actual serialization format configurable 22:59:12 <bd808> I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object 22:59:19 <parent5446> ori-l: Agreed with this idea. It makes modular dispatching easier. 22:59:30 <ori-l> actually, maybe that's not a good idea 22:59:44 <ori-l> maybe you should just pass off to the serializer the richest possible objects you have 22:59:49 <TimStarling> #action ori-l to expand RFC 22:59:50 <gwicke> I'd be in favor of standardizing on something 22:59:53 <paravoid> hehe 22:59:56 <paravoid> gwicke: http://www.graylog2.org/gelf#specs ? 22:59:58 <AaronSchulz> ;) 23:00:07 <paravoid> gwicke: and https://github.com/robertkowalski/gelf-node I guess ;) 23:00:18 <gwicke> paravoid, we are considering https://github.com/trentm/node-bunyan 23:00:23 <gwicke> has a gelf backend too it seems 23:00:41 <gwicke> https://github.com/mhart/gelf-stream 23:00:43 <ori-l> if you have the proper abstractions in place implementing backends is trivial, right? 23:00:50 <TimStarling> #info JSON generally favoured as long as a plain text format can be also made available 23:01:03 <ori-l> i mean, log messages are messages, and message queues tend to provide good APIs for queueing messages 23:01:21 <TimStarling> #info transport selection based on URI-style destination string 23:01:29 <ori-l> weeee 23:02:18 <parent5446> URI-based selection might not be the best idea. What if you want a logger to only log certain levels, i.e., warnings or errors? 23:02:23 <ori-l> TimStarling: maybe as a final action-item, agree to the consolidation of RC logging with logging in general? 23:02:33 <TimStarling> <parent5446> Where would the loggers go? 23:02:33 <TimStarling> <parent5446> In ContextSource? 23:02:41 <TimStarling> parent5446: I suggest you comment on the RFC talk page 23:02:59 <parent5446> OK, will do that now. 23:02:59 <ori-l> parent5446: zmq://dest.eqiad.wmnet?loglevel=warn 23:03:03 <bd808> < bd808> I would suggest a global logger factory. It could be a singleton or accessed via some convenient god object 23:03:12 <parent5446> ori-l: Ah that works I guess 23:03:19 <TimStarling> ori-l: put it on the RFC 23:03:23 <aude> ori-l: i think RC is a separate consideration, maybe worth own rfc 23:03:38 * aude at least needs more details 23:03:42 <subbu> i've used log4j in other contexts which has notions of formatter, target (file, socket, console, etc.) and log-level (warn, info, debug, etc.) which can all be configured. this proposal seems similar by separating those concerns. 23:03:59 <gwicke> parent5446, re levels: I think that is both a source and consumer concern; the source selects the min level to send, while the consumer can further filter based on the level in the message 23:04:04 <TimStarling> ok, we are out of time now, please put your ideas on the RFC talk page if possible 23:04:10 * ori-l nods 23:04:28 <paravoid> thank you TimStarling for chairing. 23:04:30 <bd808> Thanks for all the great feedback 23:04:36 * subbu paged into the window rather late .. 23:04:45 * subbu will read scrollback and post on talk page 23:04:48 <ori-l> TimStarling: what bug were you going to file? 23:05:10 <TimStarling> #endmeeting |