Are Benchmarks From Cloud CI Services Reliable?

Posted on January 06, 2018 by bheisler.github.io

After I released the first version of Criterion.rs, (a statistics-driven benchmarking tool for Rust) I was asked about using it to detect performance regressions as part of a cloud-based continuous integration (CI) pipeline such as Travis-CI or Appveyor. That got me wondering - does it even make sense to do that? Cloud-CI pipelines have a lot of potential to introduce noise into the benchmarking process - unpredictable load on the physical hosts of the build VM’s, or even unpredictable migrations of VMs between physical hosts.