Sebastian Miałkowski Software Engineer, Spartez

BIO: I have finished computer science studies on Technical University in Gdańsk with Bachelor of Engineering degree. I am currently in the process of writing my master thesis. I have been working in Spartez for 2 years now. I started as java developer in JIRA cloud performance team. After 6 months I have joined JIRA Site Reliability Engineering team where I am responsible for improving JIRA monitoring, react to problems on production and fix them before customer notices.

Presentation: How to find issues before customer do [POL]

Running a big Product as a Service, like JIRA Cloud, is a challenge. JIRA Site Reliability Engineers (or JSRE in a short) deal with a lot of things on a daily basis, and probably the less pleasant one is reacting to the issues introduced with the most recent version update. Such situation is stressful because you are expected to address a problem before it’s noticed by your customers, so you are under preasure of time and responsibility. And you have a huge code base so figuring out what caused an issue may take a while.

That’s why JSRE developed a tool to fix JIRA faster. Please meet Instablame. It combines a power of logging as a service (splunk), source code repository management (bitbucket) and JIRA systems. Instablame allows to:

  1. Detect new errors as they appear in real time
  2. Correctly prioritise those errors based on the historical data, frequency of occurrence, number of affected customers etc.
  3. Build a special „class index” of the needed version of JIRA, across multiple repositories in use
  4. Analyze error and detect what changes caused it – line by line, repository by repository. This is  similar to Git „blame” command hence the name – Instablame
  5. Based on the analysis – make a decision on how to proceed – rollback the release, implement a fix, or contact responsible team

This presentation will focus on how Instablame operates, how we use it, and further plans of tool development.