Tech Showcases,
Developer Resources &
Partners
droidcon News
Tales of a Mobile DevOps
By
Nicola Corti
droidcon APAC 2020
If you're developing mobile apps, you probably know what's one of your most important KPI: Store Rating.
In such an environment, speed is key. You want to be able to iterate fast and ship beautiful apps to your users frequently. But with growing mobile teams, this is becoming more and more challenging.
The Mobile DevOps imposes a prospective shift to the "classical" DevOps perspective. Tools and processes should be adapted to the mobile release flow. You need to have tools to monitor your application, rollout a feature safely, and react to incident and 1-star reviews.
Mobile DevOps engineers to the rescue. They play a fundamental role in adapting your development flow to deliver mobile apps to your end-users.
In this talk, I will share my experience as a Mobile DevOps engineer, some of my preferred tools, and some of the lessons learned while building mobile infrastructures.
Transcript
English
00:10
drakkon apac with me today my name is nicola corti and my session is called tales of a mobile devops so monitor is light about myself i am a kotlin google developer expert and i work as android infrastructure engineer at spotify in stockholm sweden that means that i basically develop tools and libraries for other android engineers at my company and you can find me online on twitter as cortinico and also on github i personally do a lot of open source and i also want to leave you a little note uh i recently started a podcast it's called the developers bakery you can find you can find it at thebakery.dev and i talk about developer tools and a lot of other things that might be interesting for you if you're also interested in this session but then let's deep dive into mobile devops so if you go online and you search for the definition of devops you will not find a clear definition of devops there are a variety of different definitions every company every tool tries to give its own definition of devops and i picked one so this is definition that google cloud is given to devops and i call it an organizational and cultural movement that aims to increase software delivery velocity improve service reliability and build shared ownership among software stakeholders whoa full of buzzwords but if you pick any other definition of devops you will see that all of them have one thing in common they all have culture devops is all about the culture a mobile devops is no different so i want to talk to you today about mobile devops starting exactly from there the culture because i believe that the culture is the foundational stone that needs to be there in order to be able to create a mobile devops environment in your company one direct byproduct of the culture is the development like if you have a strong culture your developers will actually write code and also you will be able to ship product and give value to your end users obviously development and product are strictly correlated each other but i still strongly believe that they depend upon having a strong culture behind so today we're going to walk you through these three steps the who so who is writing uh like who is driving this whole movement so the people that are behind our organization the what so the code that we're actually writing and how to do it better and why like how to actually deliver value to our end users so let's start from the culture and again i want to stress that this i believe is the most important part so like you need to understand that devops is um like it's all about the culture and it's all about the people that are building your product and i want to point out that i believe that devops is not just responsibility of a group of people that maybe have devops in their job title devops is built by everyone it's a company process that everyone should participate to so don't um delegate that only to a group of like a restricted group of people it also should be based on shared responsibility so you know you know like one of the crucial point of of a devops organization is how do we split ownership who owns what which part of the app is owned by certain teams and so on you also need to create a sense of shared responsibility like people should feel connected to the processes and the code base and the tools they are developing your culture should necessarily be built blameless like you should not blame people if things are going wrong things will go wrong we would see that also later incidents will happen things will go on fire but and the culture should be built in a way that it should be possible to point out what the root cause was without pointing the finger against other people
00:51
and i said at the beginning that journey culture is built by everyone but there is also like a group of people that is driving this growth and from my point of view this is the so-called platform team and my motto when i think about the platform team is be a shepherd not a gatekeeper what does this exactly mean so let me talk a little bit about the platform team the platform team is a group of people that work mostly on devops and infrastructure specific tasks and it can take different configurations based on the size of your company so when you're a small company like startup you maybe are like three or four mobile engineer and you don't have a specific platform team because there are too few other engineers so there will be one specific mobile developer that's responsible of dealing with the ci dealing when the test infra maybe just because the person is the one that is more interested in working in such field but then your company grows and you start having multiple teams and every team has multiple mobile engineers with easily every team as one let's say infra or devops like interested developer that could be the case that it could also not be the case but generally devops tasks and infra tasks at this level are handled with a shared rotation of tasks so you have like a common backlog and with the rotation every week or every other week each team picks some tasks from this from this backlog so maybe there is someone that is again prefers to work with the ci so that person is a point of contact but is again there is no specific team that is responsible of dealing with infra specific tasks the next natural evolution of this is having the so-called platform team so having a team that is responsible of all the infra-related tasks so in this setup you will have a lot of other teams there are around this core team and they're generally called feature teams and this is the classical setup of a single platform team and generally like mid small to mid-sized companies achieve this level this means that those group of people like in the center are responsible of all the infra like all the devops tasks like setting up the ci raising up to production etc etc and it's easy to spot this team because it's generally called like android core or android platform team or android infra or foundation team mobile foundation mobile infra mobile core whatsoever but you will like as soon as you join a company you will you will identify which team is there is a like core platform team of your company but then your company grows and you start having more and more mobile engineers so you end up with a setup there is the so-called functional platform team that means that you have multiple core team and multiple feature teams so for every core team there is a specific area of your devops infra that needs to be handled by them so there will be one team responsible of the ci another team responsible of the testing infra and so on and the other feature team will act as a client server with those with those teams so whenever like a feature team needs to do a change in the publishing system they will reach out to the publishing team and ask them support so i'm telling you this because i believe that this platform team is crucial to build a strong mobile devops culture in your company and i believe that the mobile devops culture is built when certain questions arise in your company like like those one i will introduce how about we introduce this new challenge library that i read on android weekly yesterday or i want to use this new framework like i want to use jetpack compose it's so cool i want to use it and let's write my my feature in another language because i found this esotheric language and i want to try it or i want to bump this library to the next major version because the next major version contains so much many new features and in my personal experience in the past i saw a lot of feedback given there like like this one like gandalf saying like you shall not pass like no no no we are overwhelmed with stuff and this is not allowed it's yet another framework we are not we're not doing this so my motto here as i said before is be a shepherd not a gatekeeper don't act like gandalf that you will basically push back every request like this those kind of questions they are crucial you need to understand what was the underlying requests you need to shepherd those people you need to act as a mentor you need to create rfcs create prototypes and understand what are the underlying needs in order to build a devops culture there is not just on top of your platform team your platform team is responsible driving this forward but you are not the sole responsible you should not be the gatekeeper in this context so now let's move over to something a little bit more concrete and let's see how actually we can write the code so the development phase and for the development phase identified five steps uh that i think are like generally like the five major steps when you when you write like some code and you want to introduce a new feature and i'm going to walk you through all of those steps with some practical tips and opportunities where you can build your mobile devops culture on top of so let's start from the first the code so the first thing that your developers needs to do is actually write the code like they need to write you know the the source code for this feature and i like to think of the code as a form of ownership as i said before uh in in a bigger culture and a bigger mobile devops culture one of the crucial point is understanding ownership like i should be able to say hey this part of the app is owned by team x and these are the parts of the app is owned by team y if you don't have a strong sense of ownership in your app you're going to have problems because there will be features that are contested between different teams and there will be features that are like left behind and no one cares about them so the first question you're going to ask yourself is also how do you share the code what happens if multiple teams want success a shared feature a shared functionality i should library how do you share the code between android and ios those are crucial crucial question and based on how you answer to those questions a lot of things will shape in your organization specifically do you want to have a single repository like a monorepo with all of your code there or do you want to have multiple repositories those again will shape your your devops infrastructure a lot and there are tools in the in the industry right now like gradle bazel buck that are great for supporting you building certain types of repositories but it all depends on the choices you do also do you want to commit to multi-platform development do you want to use flutter or do you prefer using rack native currently multi-platform which pattern are you going to follow when you want to share the code between different platforms or do you even don't want to share the code between platforms at all that might also be a choice but it's up to you to make a decision it's up also to the platform team the platform team generally has the final call on those kind of decisions and it's crucial that those are disgusted internally in the company another fundamental part is also taking care of your build performances using tools like gradle enterprise little advertisement so in the latest episode of my of my podcast i'm talking about another tool called gradle doctor that is also useful to identify if there are problems in your build so make sure you check it out and ide optimizations as like if you're part of a platform team again you are responsible of taking care of the build performances and of the developer environment of your developers you want to make sure they're happy you want to make sure they are efficient so if your developers are using android studio or intellij you can create a custom plugins to automatize certain tasks like using templates or provide localization support i don't know whatever you will need but there will definitely be tasks that your developers are doing every day every other day that you can automatize if you develop a plugin for that so make sure you you spend some time and investigate on that front the next step is the merge it's when my code meets other people's code and becomes a single thing in the code base and i like to think at the merge phase as a form of collaboration and in this i think the crucial process is the code review and this is where you can build a lot of devops culture around you need to make sure that the code review process is healthy people are welcome to deliver feedback to others again it should be completely blameless people should not be feel should not feel bad if i'm letting them know that their code is broken because i mean they just draw the code that might be the reviewer so uh it's a different pair of eyes and can provide a different feedback so everyone should be uh should accept feedbacks that comes from outside it's complicated especially if you don't have a code review process at all it might sound like obvious to have a good review pro process but i spoke with companies in the past they were like not not doing code reviews and just like pushing to master like crazy uh that doesn't work so make sure you spend time also improving your career process if you don't have one obviously you need to set up a continuous integration system so uh again this we could do another talk only on this but your code should not be tested on developer's machine it should be tested on a separate environment that should be used to run tests to run static analyzers and also shipped up to production so make sure you spend time on that on that section of your infra because that's also crucial in the matter of timing in a matter of performances for your developers so when your company grows you might end up in a situation where you need a merge queue so merge queue helps you prevent scenarios where like we have two pull requests my pull request was green the other person request was also green and they both get merged at the same time and although they were green the sum of the changes applied by both leads the code base to a broken state so all the people that comes after us too they can't merge anymore they can't test anymore because everything is broken so you need a tool like mergeq.com or whatever custom solution that helps you to merge changes one after the other and make sure that they are tested in the installation because as soon as you grow the number of developers you will see that people starts pushing crazy code like crazy and it's really hard to handle so this is like this will become a bottleneck for you and you need to make sure that this is like efficient and works fine also we are in mobile so we care a lot about the artifact that we ship to our end users and we want to make sure it's healthy and is working fine so there are tools like artifact analyzers that helps you track like key metrics for example the apk size you might be interested in do not like overgrowing the size of your apk this means running analyzers at pr time you want to run immediately when a developer is pushing new things an analyzer to understand if everything is fine or if that specific change is going to impact the final application in some way you don't want to run it too late like when you're preparing a release it's already too late you have hundreds of commits what do you do so think about like all the aspect of the final apk health and on this front also strictly related to this is localization you want to make sure that wanna feature its master that feature is fully translated because if it's not i can tell you that it's really unpleasant to open an app in english and seeing that everything works fine but then you change language too like i don't know italian and half of the strings are not translated and it's like whoa i am a second class citizen that's not cool so you can check at merge time that the feature is fully translated i don't allow to merge if things are not translated yet so this is another health metrics for your apk that you need you need to take care of then the next step is the analyze and this is actually one of my favorite steps because here is where the linters and static analyzers plays a fundamental role i like to think of this phase as a form of education i believe here we can push a lot of learning on on our developers that means as i said using static analyzers personally i'm a big fan of detect for kotlin so if you use calling code make sure you use it there are a lot of other tools out there that you can use but again make sure there is like an automatic system that is checking how about the health of your code and it's not just about default checks because you can actually go one step further and implement custom checks custom checks are rules that you write that are specific to your code base and they can help you do two kinds of tasks the first one is running insights on your code base like understanding if there are areas of your code base that are older like more legacy or understanding if there are as they are more touched or understanding if there are certain patterns yeah how much occurrences of rx java versus x java 2 i have in my code base or which kind of functions of rex java my developers are using the most and also they can help you enforce certain patterns that are specific to your company for example let's say you have like a design system and you want to make sure your developers are using a certain button that you provide and not the android default button you can write a custom role for lent to to provide this kind of check so make sure you customize those tools because those tools are automatic and they save you a lot of time and are really helpful to enforce pattern and antipatterns on basically all of your developers and also dependencies so a mobile the vast majority of the code well i like to say it's not the one we wrote but is the one others wrote obviously the code you write it's it's fundamental for the app but you also pull in a lot of third-party dependencies like okay http retrofit all the dependencies you need and you want to make sure those are healthy like you want to make sure you're not pulling in broken dependencies so you need to pay attention to that again on these i gave a talk also two years ago about this but there is a tool called dependency check gradle from onwasp that checks the list of your dependency against a list of known vulnerabilities and make sure that you're not pulling in anything that is known as broken but pay attention to your third party dependencies the next step is the test and here i really like tests and i like to think of them as a form of trust like i want to go to bed i don't want to sleep well because i know that my code was tested so it would not be broken in production uh like obviously you would probably have a lot of unit tests but definitely also a lot of ui tests like espresso test automated tests and also these are not a crucial point where the health of your infrastructure is crucial so make sure you pick the right platform like firebase test lab or generation cloud or whatever infrastructure allows you to host emulator or physical devices and the problem is that ui tests takes time and they're also costly so make sure you act wisely here you don't need to run everything like you don't need to run all the entire ui test suite every time you can do some test filtering and running certain tasks only on every pr and run like another bigger set of tests only when you go on preparing a new release and also talking about the eye test one thing that will be a problem definitely will be the flicky test so flaky tests are those tests that you run it like ten times and nine times out of ten they are green and then once they fail because of external reasons because of timeouts timing whatsoever so those can be a pain because there are random failures that you don't really identify if they are valid failures or not so it's crucial that you have like a mechanism in place to identify them and like a mechanism to react that can be like a simple threshold and rerun like identified a flaky test then i set a threshold of 80 percent so i run it like 30 times and if more than 80 of the times this uh passes then it's green i consider that as green i will not block the merge or the release process because of this so it's crucial that you act in some way because otherwise like a random failure might just block your entire pipeline and uh developers will be annoyed and will just ignore a test and ignore it and there is nothing worse than a no written test than an ignored test that is not running never and the next and last step is the measure so i like to think of measuring and analytics specifically as a form of awareness like you want your developers to know what your users are doing like you developed a feature but are the users actually using that how they're using it are they using that in the correct way like the way you code at that or are they using that like in a completely different way that's why you need like behavior analytics like google analytics firebase analytics or flurry.com that helps you track how your users are clicking around in your app basically but also there might be other type of analytics that are not necessarily behavioral like they're not necessarily like users clicked on button x y but they're still health metrics for you like timing you want to know how fast your app is you want to know how much how many seconds are wasted before the users see the first valid screen or you want to know like how slow your network requests are and those are things i need to track somehow and you need to report it somewhere to be able to understand if your app is doing well or not so once you have everything set up in place it's time to hit the store it's time to go to production and release a new update or release a brand new apk for our product so on this front there will be really a lot to say but i picked two techniques that i found them really valuable for me so we're gonna we're gonna briefly touch those two and the first one is the release trains so release trains means releasing often and regularly so let's say that i have my application on production and is at version 140 and i'm releasing that today really like doing release trains means that i release every week or every other week and i just like i have a regular cadence so i release 138 139 140 31 32 regularly this allows me to have multiple lines in parallel so when for example 141 was in production i had 142 in better and when 140 was in production i had 142 in alpha and 141 in battle that allows me to like orchestrate my different distributions group and knowing that at a certain time x i will have version 1 in alpha version 2 in beta ambition 3 in production this has a lot of benefits might sound over complicated if you don't if you're not doing this but first it it allows you to make sure that when you ship your version it's small because if you ship like every week or if you ship every other week you will just know that every week on average developers commits so if there is something going wrong you can be sucked between those 50 commits and understand where is the problem if instead you would say like let's wait till we finalize this amazing feature to release a new update then you might end up having like 500 commits and it's broken and oh my god releasing becomes like a pain so that's why like releasing often and using in this case like feature flags to enable certain parts of your app allows you to ship out smaller updates that are easier to debug that are easier to handle also think about version numbers version numbers are free like use them use them as you think they're better for you as you find them more useful so you specifically you're not releasing a library you don't have a public api you don't have a public api that your users like needs to respect and if you remove a feature you need to bump the major version no that's not the case your users will most likely not even see the version number so you need to use a version number that is useful for example in this case like 140 4142 that might be the number of the week of the year when the app was uh created or you can add a timestamp to understand which time that specific release was created or the sha of the commit that created that uh that release that all those are also informations that might be useful for you so it depends on how your infra set it up and then the last thing that i want to talk about is a little bit about the release cycle so the tools that i used in the past and i found them useful to fully like manage the release cycle in my experience so i envision the recycle as divided in four steps the first one is a distribute so you need to push your artifact to your users and your users in this way might either be like consumer users or your internal testers like your q a team or your alpha or beta group so ideally you you know you will use tools like bitrise or google play google play has both the internal distribution track or the alpha beta or production even or attitudes like firebase up distribution so once you manage to eat your users it's time to remotely control your app as i said before you do release trades so you release every every week and at a certain point in time like first of the month first of january you want to release a new feature so you want to have a remote way to configure your app and enable certain parts and for this you need tools like firebase remote config mixpanel or rollout.you that will help you like remotely control and it's not just about releasing features like it's not about like i want to launch this amazing new profile page uh january 1st no you also want a way to control when things on fire like if things breaks really bad you want to have the opportunity to push a switch and turn certain parts of your app off this is really the power of uh those kind of tools and we are mobile if things are on fire and you don't have such capabilities you will have to release a new version of your app that will take days that's something you cannot allow the next step is the detect phase so you actually need to understand if things are going bad because you might just like ship it up to production and you don't want to read on reddit or on the verge that your application is going wild you'll want to know it before so tools like firebase crashlytics instabug or backsnag are great tool to track the your number of crashes number of crashes is like the first alpha matrix like you want to make sure your app doesn't crash first but as i said before you might have other health metrics network timing uh startup timing so you need tools like splunk or any other logging tool that allows you to have like real-time streams of vlogs and checking how they're doing like how things are behaving uh like maybe there is a banner on the home screen that should be shown and it's not shown why like maybe the app is nurturing this banner and i'm losing money because of this and then once you realize that things are on fire because things will be on fire get ready it's time to respond it's time to this is the face that i really like because here i say like it's time to wake up people so it's time to open an instant and you need tools like page of duty or victor ops to set up an on-call rotation like making sure there is always someone available at every point in time that uh it being that day or night that can be awakened and uh like should like report and respond in some way to this problem if you don't have such tools also sending a custom message on slack on the team page on the team channel could be enough but at least you need to have an automated way to be notified if alerts are being fired and your attention is needed and then and this is the i think the most crucial part of also the the old infra and is where like you can build a lot of mobile devops on top it's the post-mortem like things will go on fire things will break but then you they will awfully also maybe recover but then it's up to you to write a report and tell people like in your company or maybe also sometimes outside of your company how things went this is again where your culture needs to be blameless like you know it should be possible to point out i don't know maybe an intern change the configuration file and remove the line and things went on fire stuff happens uh it's normal it's i.t but at the same time this should be this should be blameless you should not blame the intern because a line was removed you shouldn't understand why this was even allowed why there was no tool that prevented certain like broken configuration to be pushed out so there you can build a lot of your mobile devops culture on top of on top of so make sure you have a strong postmodern process so to wrap up we talked about a lot of things today this was the diagram we saw the beginning so from the culture you shape the development and the product but today i hope i convince you that there are a variety of tools from the development like code reviews linters and tests and from the product like release trainings and post mortems that helps you reshape the culture so from the culture you're able to develop great product and write good software but also as part of those processes you reinforce the culture if you have those tools like post-mortems code reviews and so on that was all for me thank you very much for being with us again you can find me on twitter escortinico make sure you follow my podcast the developers bakery you will find it online on all the major uh podcasting platforms and now i'm happy to answer your questions if there are any
01:32
okay now uh yeah they are on my second screen so okay so the first question is pm's will disagree with these disagree the release traits if i implement their features today they will want it live on production tomorrow well uh so here there is no question it's just like statement so um i would say that yeah that can happen it can happen that pm's don't really like uh like i mean they want your feature to be out tomorrow but um we are in mobile like it's not web i would say that that's my main argument when those kind of things came up like we are a mobile like things next days to reach our end users and they can't just be shipped randomly so you need to like educate your pms to understand the processes of mobile if they're not used that the next question is how do you balance number of checks versus build time
02:13
i would say that um i don't know you're anonymous but if you ask this question you probably should look into parallelizing like um don't don't restrict yourself against having like too many checks checks are never too many especially if you run them on ci and you run them in parallel and if they give value obviously to you if they're useless you don't need to run them but if they're valuable make sure you run them in parallel
02:54
the next question is just curious to know what mechanism you use to reduce build time in the app well that it's a really broad question there are a lot of there are a lot of answers to it it depends on which build system you're using if you're using gradle there are a lot of tools like remote caches you can use gradle enterprise to do that as i said uh take a look at griddle doctor i was booking with nelson ozaki from gradle he has a lot of great tips on how you can improve your build time so really like online you will find like a lot of material on out improved build times parallel builds um like really there are a lot of a lot of options that you can enable just make sure like you keep a close look to your builds because uh it's not just that you enable features like you can't just like enable remote cache and pretend that things should be faster yeah maybe they will be faster but then you will need to understand if you're actually if you're actually faster or not so there is a little bit of education that needs to be done around the build and like you need to do like a little bit of observability like you need to constantly looking at your builds and making sure they're running fine so the next question is any good resources to refer from writing custom link rules sure so if you are writing cuddling code i really invite you to take a look at detect as i said before i'm yeah just to to advertise i'm one of the maintainer of detect our community it's pretty strong and if you are writing a custom role we do have a documentation page about that and it's pretty easy i would say also um you can check out the rules that are in the repositories some of those are really easy to understand and if you get stuck just open an issue i helped a lot of people in the past that came up with hey i want to write my custom rules that checks if users are using dagger in the wrong way and this doesn't belong to detect but i would love to have it so how do i do it and we helped people over there so we detect you can do a lot there are materials online also to write like list linked rules another type of rules but i'm pretty sure you can find it online uh question from a how do you expose metrics for a code base for everyone to see like readability tests scores etc um so on these i want to mention there is a tool from from spotify called backstage backstage.io i think is the website and there we it's a tool that we use for creating developer portals and that's a great tool to show metrics to everyone in your company so every developer can go there and see all the features in the app all the backend services and have statistics for that like how many tests how the build speed and etc etc so check out that tool if you don't use that kind of tool you can also create your own but you need to have some sort of web ui to to distribute metrics inside of your company so the next question is ci cd related why we should not release feature if it's done implemented merged tested being agile means release every time when it's something which brings value to the user uh why we should not release feature if it's done yeah you should release it if it's done so i'm not sure i fully get the the question here but i would say just always release even if the feature isn't finished you can release it and you can put it behind a feature flag the only reason to don't release a feature i would say if is if you want to make sure that that feature doesn't leak on the market like if you're developing something really confidential and you don't want like if it's halfway done and you don't want like people to decompile your application and understand what your app is doing like with strings or so on for that case you will need to use build flags like you will need to put everything down like a debug build flag and make sure that r8 or proguard are stripping out the features that you really don't want to end up in your final apk so the next question is any tools to measure apk size and build pipeline for pr validations uh nothing that comes on top of my mind honestly uh but i remember oh yeah there is one for uh github actions if i'm not mistaken i will not bet on this but i believe there is a tool for um like an action for github actions that you pass the the like you create the apk and you pass it to them and they analyze it but i mean you can just create a script that has like a golden apk like the one from the previous pr build and then your next one and just compares the size and if the size is too big you just report like hey this is not allowed this the delta is too high uh that being said uh i think we are done with questions uh people from tricon because i see one but
03:35
so how do tim do feature flags question from tuna thanks for asking this there is a great blog post from aaron maltz is another gde for android and he talks specifically about this he believes are not using feature branches always like deliver your feature to master or domain and just use feature flags so you create you claim your own feature flag and you put your feature behind that flag and then you use those like firebase remote config to control that feature like you don't really need to use a feature branch because feature branches especially if your feature is really big it will become a pain to merge and it will diverge so much so we will have to keep those branches in sync it's not ideal so use feature flags remote config flags and immediately you will be fine in the vast majority of cases and with this we are done with questions again it was a pleasure for me for being here feel free to reach out on twitter as curtinico i'm more than happy to answer all the remaining questions
04:16
bye
droidcon News
Tech Showcases, Developer Resources & Partners
EmployerBrandingHeader
jobs.droidcon.com
![]() Latest Android Jobs
Kotlin Weekly
![]() Your weekly dose of Kotlin
ProAndroidDev
![]() Android Tech Blogs, Case Studies and Step-by-Step Coding
Zalando
![]() Meet one of Berlin's top employers
Academy for App Success
![]() Google Play resources tailored for the global droidcon community |
Droidcon is a registered trademark of Mobile Seasons GmbH Copyright © 2020. All rights reserved.