Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Uh, yet another collector/grapher. That's nice but..

We have tons of collectors. And tons of graphers. What we have not is a little bit of smarts in that tools. Ability to predict and ability to react.

Predict. We have Holt-Winters Forecasting Algorithm implemented in RRDTool from 2005 and a couple of papers.

React. I'm not talking about 'fix it automagically'. But everyone wants to know 'wtf was that peak on this graph last night?'. Usually your never know, except the simplest cases. Because you cannot collect everything about everything all the time. But monitoring system could enable 'collect everything we can' for short period of time when it detects something. Something wrong or something strange, something out of the pattern. Does anybody hear about system with something like that?



We're working on it here at Etsy :)

It'll be released in a week or two. In the meantime, I've been speaking about it: http://devslovebacon.com/conferences/bacon-2013/talks/bring-...


Is what you're referring to something like Growl that pops up and says 'Hey, metrics.sessions.active just dropped by 70%", or something pre-configured to spin up additional VMs/instances/dynos when some metrics misbehave? TL;DR: How autonomous is it?


That's interesting. Can I subscribe to announce?


It's certainly not available out-of-box, and it would take a bit of work, but it should be possible to build something like this with Heka. You could write a dynamic filter plugin that watched for peaks in a specific graph (or any other arbitrary trigger) and, if found, generate a message to trigger the activation of a bunch of new dynamic filters, quickly turning on a more comprehensive set of data analyzers.


It's definitely possible, at least with RRDtool and 30 lines of Python (only 20 lines for react to measured peak, not forecasting one).


The question is whether reacting to peaks identified this way can be helpful or not. If you have hundreds of thousands of metrics, how many peaks get detected per minute and how many of those indicate something that actually requires attention?


There are many ways to sort peaks out. For instance, there is a script for RRDtool that removes obviously irrelevant spikes [1].

The vast majority of hundreds of thousands metrics are common for any node/server actually, so predefined recipes/settings should be used for them. Only aplication-level metrics for only in-house applications have to be tuned manually.

[1] http://oss.oetiker.ch/rrdtool/pub/contrib/removespikes-20080...


We're working on algorithms to improve the 'predict' part (way beyond Holt-Winters!): ow.ly/jRKrT

If you're interested to see our algorithms in action, email me jennyinc at gmail.com




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: