Marta Musik

SignalFx, a Splunk company | Poland

BIO: I work at SignalFx, a Splunk company, as a Software Engineer, where I implement monitoring features for various cloud services. Prior to that, I worked in Motorola Solutions, contributing to mission-critical public safety software. For the last 8 years, I’ve been writing mostly in Java and javascript (node.js) and automating various processes around CI/CD. I have struggled with monitoring cloud-hosted microservices myself, so now I write tools to lighten this burden for other developers. In my role as an engineer dealing mainly with cloud integrations, I consider keeping up with changing tools and shifting paradigms (and helping other developers to do so) to be one of the biggest challenges of my job.

TALK: 5 reasons why teams don’t buy in the observability hype

Session level: intermediate

Many articles on observability express the opinion that observability is yet another buzzword concocted only because people in software companies got bored with monitoring and logs. Guess what: calling it “observability” didn’t help. Now we are bored and fatigued with observability requirements, despite having observability related tasks all over our backlogs: adding new metrics, trying to understand what just happened in a system, cleaning up logs, debugging issues in production, fixing dashboards (isn’t that third time this month?). Does the work around observability really need to be tedious, boring and repetitive?

In this talk, I will discuss a typical observability stack for microservices, built around an exemplary dockerized application in Java (running on Google Kubernetes Engine). I will demonstrate the challenges around monitoring such as problems with time-series churn, high cardinality labeling, sparse metrics and downsides of sampling. I will share my struggles trying to fulfil observability requirements – both from the point of view of the developer who tries to keep their code uncluttered, and of an architect who tries to design the stack to provide the required level of observability but, at the same time, not have half of the team constantly fight the tooling. I will propose some solutions and discuss if “observability as code” is here to rescue.

Takeaways

  • You won’t get observability by accident or as a byproduct. In highly dynamic and complex environment, achieving observability can be difficult.
  • Many teams deal with similar problems implementing monitoring and tracing. Even simply knowing the concepts and the vocabulary to describe common challenges can help you find and evaluate solutions.
  • The observability tooling supports automation. With scale, there comes a time to use it.