Dmitry Lyubarskiy Facebook

7BIO: Software Engineer at Facebook UK working on testing infra for past two years, focusing on scaling running tests in a reliable fashion. Passionate about developer experience, mathematics, algorithms, and scaling systems.

Presentation: Scaling Testing @ Facebook
session level: intrermediate

At Facebook thousands of developers are committing multiple changes everyday. Due to that scale, we use mono-repository approach with very light-weight “feature-branches”: developers typically commit changes in several hours after pull requests. Committed changes are normally getting automatically pushed to production in few hours after commit.

Given the number of people using Facebook, this makes verifying and testing the changes before committing paramount. Our data show that moving testing signal upstream increases bug fix rate and contributes positively to developer efficiency.
At the same time, we have hundreds of thousands of various tests, including many resource-heavy ones. Running a resource-heavy test can require access to a browser, writing data to test DBs, etc. This makes brute-force approach of running all of them on each pull-request impossible.
This talk is dedicated to measures we’ve been taking to move test signal to pre-commit while using feasible amount of resources.

  • Tackling resource problem by automatic grouping of pull requests together.
  • Selecting right tests to be run on pull requests.
  • Analysing test flakiness and preventing false positives.

Take-aways: 

  • It is a good idea to move everything to pre-commit, ie, test while developer is still thinking about changes.
  • Combining pull requests together is a tradeoff between time to signal and resources spent.
  • How combining pull requests is related to the Poisoned Wine Problem.
  • Optimistic and pessimistic strategies of combining pull requests and how it is related to the cost of blame.
  • Different approaches of fighting flakiness