
photo by Christina Morillo on pexels
The UI Testing Bottleneck
Every Android developer has been there: you push your code, kick off your CI pipeline, and then… wait. And wait. And wait some more. Just to find out if your UI tests pass. In my case, I was waiting 40 minutes.
Last month, I found myself staring at this exact problem. Our team’s UI tests had grown to a point where they were taking nearly an hour to complete on our CI server. Worse, they’d occasionally time out completely, leaving us with no feedback at all. The development cycle was grinding to a halt.
That’s when I discovered the power of test sharding.
What is Test Sharding and Why Should You Care?
Test sharding is a simple concept with powerful implications: instead of running your UI tests sequentially on a single device, you split (or “shard”) them across multiple virtual devices running in parallel.
Think of it like the difference between having one person wash all your dishes versus having five people each washing a portion of them. The work gets done much faster, and you get your feedback in a fraction of the time.
For Android developers, this means:
- Faster CI builds
- Quicker feedback loops
- More testing without lengthy timeouts
- Better developer productivity
Firebase Test Lab: Built-in Parallelization
The good news is that Firebase Test Lab (FTL) already supports test sharding out of the box. You don’t need complex infrastructure or custom solutions — just a simple command line modification to specify how many shards you want.
Here’s how you might run a basic sharded test with Firebase Test Lab:
gcloud beta firebase test android run \ --app=app-debug.apk \ --test=app-debug-androidTest.apk \ --device=model=Pixel3,version=30 \ --num-uniform-shards=5
This command tells FTL to run your tests across 5 test shards in parallel. Under the hood, FTL is intelligently distributing your test cases across these shards.
If you need to give it a try before enabling the test sharding using the gcloud you can enable test shards on the Firebase test labs console.

The Challenge: Balancing Your Shards
While implementing this solution, I ran into an interesting constraint: the number of shards must be less than or equal to the number of test cases. This makes sense when you think about it — you can’t have five people washing dishes if you only have three dishes.
But there’s another, more subtle problem. If you have 20 test cases and 5 shards, each shard gets 4 tests. But what if some tests take 10 seconds while others take 2 minutes? You’ll end up with unbalanced shards, and your test run will still be bottlenecked by the longest-running shard.
This is where Flank comes in.
Introducing Flank: Smart Test Sharding
Flank is an open-source tool that works on top of Firebase Test Lab, optimizing how tests are distributed across shards. Instead of naively dividing tests equally, Flank analyzes previous test runs to make smart decisions about which tests should run on which shards.
Setting up Flank requires a configuration file like this:
gcloud: app: app-debug.apk test: app-androidTest.apk device: - model: Pixel_3 version: 30 use-orchestrator: true timeout: 30m flank: max-test-shards: 5 shard-time: 120 smart-flank-gcs-path: gs://your-bucket-path
The magic happens with the smart-flank-gcs-path
parameter, which tells Flank where to store and retrieve historical test execution times. With this data, Flank can distribute tests to create balanced shards, ensuring that no single shard becomes a bottleneck.
The Results: 40% Faster Testing
After implementing Flank in our CI pipeline, the results were impressive. Our UI test execution time dropped by about 40% — from 40+ minutes down to around 24 minutes.
What’s happening behind the scenes is fascinating. When Flank runs with 5 shards, Firebase Test Lab spins up 5 virtual devices to run tests in parallel. However, in the Firebase console, you still see just one test run — FTL abstracts away the parallelization details.
Each shard pulls tests from a JSON file that Flank generates, mapping specific tests to specific shards. When a shard finishes its assigned tests, it can pull more from the queue, ensuring efficient resource utilization.
Beyond Time Savings: Additional Benefits
Beyond just speed, I discovered several other advantages to this approach:
- Improved test stability: By having shorter test runs, we reduced the chance of timeouts and network-related failures.
- Better resource utilization: Our CI resources were used more efficiently, allowing for more parallel jobs.
- Test flakiness detection: Flank can automatically rerun flaky tests, improving reliability.
- Detailed reporting: We got improved insights into which tests were taking the longest time.
Implementing This In Your CI Pipeline
If you’re facing similar UI testing delays, here’s how to implement this solution:
- Install Flank: Add it to your project or CI server.
- Create a configuration file: Set up your
flank.yml
with the appropriate parameters. - Update your CI workflow: Modify your CI configuration to use Flank instead of direct FTL calls.
- Monitor and adjust: Start with a conservative number of shards and adjust based on results.
For teams using Bitrise, you can add a custom script step like this:
$ curl -sL https://github.com/Flank/flank/releases/download/v23.10.1/flank.jar -o flank.jar $ java -jar ./flank.jar firebase test android run
Job Offers
Conclusion
Implementing test sharding with Flank and Firebase Test Lab transformed our development process. The 40% reduction in UI test execution time meant faster feedback loops, happier developers, and ultimately better quality software.
If you’re wrestling with long-running UI tests in your Android projects, I highly recommend giving this approach a try. The setup is straightforward, and the benefits are immediate and substantial.
Have you tried test sharding for your Android UI tests? What strategies have worked well for your team? Share your experiences in the comments!
Note: The example in this article shows a configuration for Pixel 3 devices with Android 11 (API 30). You can adjust these parameters to match your target devices and Android versions.
This article is previously published on proandroiddev.com.