BIO: I have finished computer science studies on Technical University in Gdańsk with Bachelor of Engineering degree. I am currently in the process of writing my master thesis. I have been working in Spartez for 2 years now. I started as java developer in JIRA cloud performance team. After 6 months I have joined JIRA Site Reliability Engineering team where I am responsible for improving JIRA monitoring, react to problems on production and fix them before customer notices.
Presentation: How to find issues before customer do [POL]
Running a big Product as a Service, like JIRA Cloud, is a challenge. JIRA Site Reliability Engineers (or JSRE in a short) deal with a lot of things on a daily basis, and probably the less pleasant one is reacting to the issues introduced with the most recent version update. Such situation is stressful because you are expected to address a problem before it’s noticed by your customers, so you are under preasure of time and responsibility. And you have a huge code base so figuring out what caused an issue may take a while.
That’s why JSRE developed a tool to fix JIRA faster. Please meet Instablame. It combines a power of logging as a service (splunk), source code repository management (bitbucket) and JIRA systems. Instablame allows to:
- Detect new errors as they appear in real time
- Correctly prioritise those errors based on the historical data, frequency of occurrence, number of affected customers etc.
- Build a special „class index” of the needed version of JIRA, across multiple repositories in use
- Analyze error and detect what changes caused it – line by line, repository by repository. This is similar to Git „blame” command hence the name – Instablame
- Based on the analysis – make a decision on how to proceed – rollback the release, implement a fix, or contact responsible team
This presentation will focus on how Instablame operates, how we use it, and further plans of tool development.