Few useful tips
Before continuing with our story, I would like to give you a few tips that will help to investigate production issues:
- Determine the severity of the issue:
- What is the scale of the issue? does it happen to a lot of users or just a few?
- Does it have the potential to escalate and become bigger?
- Does it affect the performance of the app?
- Does it affect battery consumption/available storage?
- Does it affect the business?
- Do the users see the issue or do it happen in the background? - You can implement Thread.UncaughtExceptionHandler and handle uncaught exceptions right before they are thrown.
This must be done carefully otherwise this can lead to memory leaks and other maladies:
https://chaosinmotion.com/2011/05/15/on-memory-leaks-in-java-and-in-android/ - If you cannot reproduce the issue, try to look at the wider picture and think about how users use your app? Maybe they get a phone call or send a message while using your app
- For you to be able to track users more easily in your crash monitoring system, you can add to your crash logs a custom key with something that identifies the users uniquely. For example, we use Firebase, thus we add to the logs a user id that will be associated with the crash.
Story continues…
The first thing that you need to do is to determine the severity of the issue. You see in your analytics system that this issue happens to many users and because it affects stock trading it also affects the business.
Let’s say that you’ve sent the user’s id as a custom key to your crash monitoring system, and you found that one of these users’ IDs was also found in a crash related to the info screen.
But, when you try to reproduce it with the same flow that the users did, the crash doesn’t happen and you’re wondering how is that possible.
You start to think, maybe when users want to see info about the stocks, they will probably want to see it wider because it’s more convenient to see charts horizontally, so they will probably rotate the screen.
You start checking the flow again, but this time you rotate the screen right after you go to the info screen, and… it crashes!
Suddenly you realize you forgot to restore the state after screen rotation and when the users rotate the screen, there is no data to show and you get NPE.
Now you handle saving/restoring the state, fix the bug, run all tests again, and… Congratulations! the crash doesn’t happen again.
Summary
All the production issues, as hard as they might seem, usually share common things, and following the tips I gave you can really help in investigating most of them.
Hope you enjoyed reading my article and that the tips I gave and the mindset I introduced, will really help you next time a production issue will come your way! 😊