Thursday, May 1, 2025
HomeAndroid app developmentHow Dropbox leverages testing to keep up excessive stage of belief at...

How Dropbox leverages testing to keep up excessive stage of belief at scale | by Jose Alcérreca | Android Builders | Apr, 2025


That is half 2 of the Testing at scale collection of articles the place we requested trade specialists to share their testing methods. On this article, Ryan Harter, Employees Engineer at Dropbox, shares how the form of Dropbox’s testing pyramid modified over time, and what instruments they use to get well timed suggestions.

With a couple of billion downloads, the Dropbox app for Android has to keep up a top quality bar for a various set of use circumstances and customers. With lower than 30 Android engineers, handbook testing and #yolo isn’t sufficient to keep up confidence in our codebase, so we make use of a wide range of totally different testing methods to make sure we will regularly serve our customers wants.

Since Dropbox makes it straightforward to entry your recordsdata throughout your entire gadgets, the Android app has to assist viewing as lots of these recordsdata as attainable, together with media recordsdata, paperwork, images, and the entire variations inside these classes. Moreover, options like Digital camera Uploads, which routinely backs up your entire most necessary images, require deep integration with the Android OS in ways in which have modified considerably through the years and throughout Android variations. All of this wants to repeatedly work for our customers, with out them having to fret in regards to the complexity, as a result of the very last thing anybody needs is to fret that they could lose their information.

Whereas the dimensions and distribution of the Android staff at Dropbox has modified all through the years, it’s crucial that we’re capable of persistently construct and refine options inside the app whereas sustaining the extent of belief from our customers that we’ve change into recognized for. To assist underscore how Dropbox has been capable of foster that belief, I’d wish to share some ways in which our testing methods have modified through the years.

Whereas automated testing has at all times been an necessary a part of engineering tradition at Dropbox, it hasn’t at all times been straightforward on Android. Years in the past Dropbox invested in testing infrastructure that leaned closely on Finish-to-Finish (E2E) testing. Constructed on Android’s instrumentation checks, we developed check helpers for options within the app following the check robotic sample. This enabled a big suite of checks to be created that might simulate a consumer transferring all through the app, however got here with its personal vital prices.

Like many Android initiatives on the time, the Dropbox app began out as a monolithic app module, however that wasn’t sustainable in the long term. Work was executed to decompose the monolith right into a extra modular structure, however the E2E check suite wasn’t prioritized on this effort as a result of complicated interaction of dependencies. This left our E2E check suite as a monolith of its personal, leading to check code that didn’t exist alongside the function code it exercised, permitting them to simply be missed and change into outdated.

Moreover, the lengthy construct occasions that include monolithic modules with many dependencies blended with the checks being executed on emulators in our customized steady integration (CI) setting meant that the suggestions cycle for these E2E checks was gradual. This resulted in engineers feeling incentivised to take away failing checks as an alternative of updating them.

Because the Android ecosystem embraced automated testing increasingly, with the introduction of useful libraries like Espresso, Robolectric, and assist for unit testing constructed immediately into Gradle, Dropbox stored up with these adjustments by transferring from the heavy reliance on E2E checks in the direction of increasingly unit checks, filling out the underside layer of the beforehand inverted testing pyramid. This was a big win for check protection inside the app, and allowed us to roll out high quality assurance practices like code protection baselines, to make sure that we regularly improved the reliability of the product because it moved ahead.

Over time, as unit testing grew to become simpler and simpler and engineers grew to become increasingly pissed off with the gradual suggestions cycles of E2E checks, our testing pyramid grew to become lopsided within the different course. We had confidence in our unit checks and the infrastructure supporting them, however our E2E checks aged with out a lot assist, turning into increasingly unreliable, to the purpose that we principally ignored their failures. Checks that may’t be trusted find yourself turning into a upkeep burden and supply little worth, so we acknowledged that one thing wanted to vary.

Over the previous 12 months we’ve doubled down on our give attention to reliability. We’ve invested in our check infrastructure to make sure that engineers aren’t solely capable of, however incentivised to jot down beneficial checks throughout all layers of the testing pyramid. Along with technical funding in code and tooling, that has additionally required that we take the time to judge the issues we check, and the way we check them, and ensure the whole staff has a greater understanding of which instruments to make use of when.

Unit testing

We proceed to spend most of our efforts writing unit checks. These are quick, targeted checks that present fast suggestions, and function our first line of protection in opposition to regressions. We write JUnit checks each time we will, and fall again to instrumentation checks when we have to. Robolectric’s interoperability with AndroidX Take a look at has allowed us to maneuver lots of our instrumentation checks to JVM-based unit checks, making it even simpler to satisfy our check protection objectives.

Talking of check protection objectives, the unit testing layer is the solely layer that we use to find out our code protection. By default we goal 80% check protection, although now we have a course of to override this goal for circumstances by which unit testing is both not beneficial, or infeasible.

  • Be aware: Whereas we use commonplace JaCoCo tooling to judge our check protection, its lack of deep understanding of Kotlin presents some challenges. For example, we haven’t but discovered a technique to inform JaCoCo that the generated accessors, toString and hashcode of behaviorless information courses don’t require check protection. We’ve been experimenting and contemplating alternate options to make sure that we’re not writing brittle checks that don’t present worth, however for now we’re caught with issuing protection overrides for these circumstances.

E2E testing

Over the previous a number of months we’ve been renewing funding in our automated E2E check suite. This check suite is ready to alert us to extraordinarily necessary points that unit checks merely can’t determine, like OS integration points or sudden API responses. Subsequently we’ve labored arduous to enhance our infrastructure to make checks simpler for engineers to run domestically, we’ve audited and eliminated flaky or invalid checks, and labored on documentation and coaching to make sure that we assist our engineers within the creation and upkeep of our E2E check suite.

Change in E2E check counts earlier than and after check suite enchancment effort.

As I discussed above, our E2E checks simulate a consumer transferring all through the app. Which means that the duty of defining our E2E check circumstances is greater than merely an engineering drawback. Subsequently, we developed steerage to assist engineers work with product and design companions to outline check circumstances that symbolize true use circumstances.

We lately launched a observe of utilizing a correct Definition of Finished for growth work. This quantities to a guidelines of things that have to be accomplished to ensure that a undertaking to be thought of “executed”, which is outlined and agreed upon in the beginning of the undertaking. Our commonplace guidelines contains the declaration of E2E check circumstances for the undertaking, which ensures that we’re including check circumstances in a considerate method, making an allowance for the worth and goal of these checks, as an alternative of concentrating on arbitrary protection numbers.

Screenshot testing

One other dimension of our checks that we’ve ramped up in recent times is screenshot testing. Screenshot checks enable us to validate in opposition to visible regressions, guaranteeing that views render correctly in gentle and darkish mode, totally different orientations, and totally different kind components.

In unit checks we leverage Paparazzi for screenshot testing. This enables us to jot down quick, remoted checks and we discover it’s greatest fitted to testing particular person view or composable layouts, together with our design system elements.

We additionally discover worth executing screenshot checks in additional full featured instrumentation checks. For this, we use our personal Dropshots library, which helps screenshot testing on gadgets and emulators. Since Dropshots executes screenshot checks on actual (or emulated) gadgets, it’s an effective way to validate system integrations like edge-to-edge show, the default window mode on Android 15 gadgets.

Guide testing

With the entire funding we’ve made into automated testing you’d be forgiven for considering that we do no handbook testing, however even at this time that’s merely not possible. There are lots of workflows for which automated checks would both be too arduous to jot down, or too arduous to validate. For instance, now we have each unit and E2E checks to validate that the app behaves appropriately when rendering file content material, however it may be arduous to programmatically validate file content material, and screenshot checks can typically show too flaky.

For these circumstances, we use an internet primarily based check case administration device to keep up a whole set of handbook check circumstances, and a 3rd celebration testing service to execute the checks prior to every launch. This enables us to catch points for which we haven’t but written checks, or which require human judgement.

Testing has confirmed invaluable in figuring out high quality points earlier than they make it to customers, permitting us to earn our buyer’s belief. On condition that worth, we intend to proceed investing in testing to make sure that we will proceed to keep up top quality and reliability. There are some things that we’re wanting ahead to sooner or later.

I’m at present within the strategy of increasing the performance of Dropshots to assist a number of machine configurations, which can enable us to carry out screenshot checks throughout a broad vary of gadgets with a single set of checks. For the reason that Dropbox app works throughout many alternative kind components, it will likely be beneficial for us to concurrently run our screenshot check suite on a wide range of gadgets or emulators to forestall regressions on much less widespread kind components.

Moreover, we’re starting to experiment with Compose Preview Screenshot Testing, which permits our Compose Preview features to serve double obligation by rushing up growth cycles whereas additionally getting used to guard in opposition to regressions.

Lastly, we intend to proceed guaranteeing that now we have a great stability of the best sorts of checks. Balancing our testing pyramid to make sure that our funding in testing serves our reliability objectives as an alternative of chasing arbitrary protection targets. We’ve already seen the worth {that a} wholesome check suite can present, and we’ll proceed investing on this space to make sure that we proceed to be worthy of belief.



Supply hyperlink

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments