droidcon San Francisco 2019
Tweet |
droidcon News
We gave a Mouse an NDK: Non Android Developers' Experience with NDK
By
Armin Rocher, Bruno Garcia
droidcon San Francisco 2019
A story of a rabbit hole full of unexpected puzzles when tasked with debugging native code on Android in production builds through the lense of developers who were previously not exposed to the world of Android.
Transcript
English
00:00
[Music]
00:11
so my name is Armand
00:13
I work for a company called century and
00:16
we are a crash reporting company and
00:19
this is going to be our experience in
00:22
building something for in decay or
00:26
indicate was a bit of a rabbit hole
00:28
experience because we are not really
00:33
from an Android background and a lot of
00:35
what we built we thought would be a
00:37
little bit easier so this is really our
00:39
our experience in actually doing a crash
00:43
reporting framework for NDK so we have
00:48
handled a lot of platforms for crash
00:50
reporting already in particular we have
00:52
done C and C++ but we haven't done it on
00:56
Android so this is this is sort of new
00:59
that we do it on Android and we want to
01:02
go a little bit about might its kind of
01:03
tricky and and what some interesting
01:05
learnings from this are so who are we
01:10
it's a little bit of a joke but we're
01:12
kind of a stack trace company so a lot
01:13
of what we do is stack traces we can
01:16
tell you why your app is misbehaving and
01:18
primarily we do this because when your
01:20
app crashes we show you stack trees so
01:24
this is where it's like Facebook like
01:26
what's interesting here and it's a
01:29
little bit relevant to the talk is that
01:30
we want to show you a file name I want
01:33
to show you a line number we want to
01:34
show you the function name and if we can
01:37
accomplish it we want to show you the
01:38
source code that goes around it so this
01:41
is this is sort of what the end result
01:42
is supposed to be and as we mentioned we
01:45
already had that just not for NDK so my
01:48
name is Amina Hana ha and my background
01:50
is actually - more than it is anything
01:54
else
01:54
I love rust we do a lot of rust at
01:57
century but not really not really
02:03
Android right so I work in Vienna for
02:07
the client info team and there we are
02:09
responsible for writing SDKs for century
02:11
as well as the events
02:13
so when he asked the case and some stuff
02:15
in there's a lot of stuff that works on
02:16
the server that's also our under
02:17
responsibility hi my name is Bruno
02:21
Garcia I'm a software engineer in
02:23
century in the same team 1914 and I have
02:25
a background in dotnet so we're part of
02:29
the client infra it's based in Vienna
02:30
and over there mmm we'd have to touch
02:33
all sorts of programming languages right
02:35
so although my background is dotnet I
02:36
was exposed to Android and Cortland and
02:39
this time in decay so why are we here to
02:43
talk to you about in decay like
02:45
considering you probably know more about
02:46
Android than we do there's one thing
02:49
that we know pretty well it's crash
02:51
reporting right we're a crash reporting
02:52
company a lot of we do is touches all
02:56
different platforms so we wanted to do
02:58
something we're good at in C C++ Java in
02:60
Cortland also for Android so Android
03:02
developers would be able to have good
03:04
quality stack traces also for NDK so
03:08
let's start talk about talk about why is
03:10
NDK special so on NDK you have basically
03:20
a whole bunch of stuff that looks a
03:21
little bit like Linux but it's not
03:23
necessarily so NDK gives a developer the
03:26
ability to run C C++ and a bunch of
03:28
other compiled languages on an Android
03:30
system even though like Android itself
03:33
runs on top something that more or less
03:35
resembles a JVM and a lot of NDK code
03:39
actually interacts with a java code
03:42
through a system called gene Ida travel
03:44
native interface and so it allows you to
03:46
call from C or C++ or some other
03:49
language that compiles natively down
03:50
into into basically the sort of the Java
03:54
ecosystem and the other way around
03:57
but as far as the the interactions
04:00
actually go it's it's it looks very very
04:05
much like its own operating system in
04:07
that sense because the paradigms that
04:09
you have are very different so on on the
04:11
base layer of what NDK actually does
04:13
there's a there's an implementation of
04:15
Lipsy that's very google specifics
04:17
called Bionic and it's not entirely like
04:19
a normal lip C implementation would be
04:21
so it it misses a bunch of things and
04:23
that we would really like to have it's
04:25
just not
04:26
because Bionic covers a whole lot of but
04:29
a lot of people need from a JVM or from
04:31
my lips II but not necessarily
04:33
everything that you need and any chaos
04:36
it comes with a bunch of hand-picked
04:37
libraries like that lip but it doesn't
04:42
come necessarily with some of the other
04:43
libraries that you would expect so for
04:44
instance I would love to have a leap
04:46
year ID on there it doesn't exist there
04:48
is a no lip unwind which is for us very
04:50
disappointing so there's a lot of stuff
04:53
that's missing in NDK that since we
04:56
already had a lot of code that runs on
04:57
Linux was expecting to be there but
04:59
really wasn't so as we mentioned we
05:03
already did Java right we also already
05:04
had an Android library and we already
05:07
did C++ but just NDK wasn't really
05:10
wasn't really working so our goal as
05:14
mentioned its production crash reporting
05:15
and production crash reporting means
05:18
that you have a stack trace in a release
05:20
build and not just in a debug build so
05:23
everybody knows that you can get a lot
05:25
of useful information out of a debug
05:26
build but the moment you go into the
05:28
release environment some other
05:30
constraints come in so in particular
05:32
production crash reporting or release
05:34
crash reporting is fighting a few
05:36
paradigms the idea is that there is a
05:39
debug build that gives you a lot of
05:40
debug information and there's a separate
05:42
release build it just doesn't have a lot
05:44
of that and that's not really true
05:48
especially not on mobile because when
05:50
you ship if you flip an application to a
05:54
customer it's very likely that their
05:56
problem will only occur on a customer's
05:58
device and not actually on your
05:60
development environment so you really
06:01
want to be able to capture product debug
06:04
quality stack traces on a production
06:06
build so we need a whole bunch of stuff
06:08
to support that but there are some
06:12
constraints that make it hard to do this
06:14
in a release environment debug ability
06:17
and performance off methods so the lower
06:21
level the language the harder it is
06:23
often to get some of the performance
06:26
benefits the lower level language has
06:27
and that's why I kind of write write a
06:29
lot of codes in a lower level language
06:32
but you lose the debug ability so in
06:35
particular stack traces and and all that
06:38
sort of debug functionality
06:40
often unavailable in C++ so if you for
06:42
instance for a C++ exception the stack
06:44
trace is typically not even captured so
06:46
you would have to find the facility to
06:47
capture the stack trace with throwing
06:49
the exception and so this is this is one
06:53
of the core problems that that code like
06:55
a crash reporting library is fighting is
06:57
that what you have in a release
06:59
environment is very different from what
07:01
you have in a production everybody
07:03
having in a debug environment and to
07:06
understand a little bit the context of
07:08
this talk we don't really care about the
07:10
debug builds all that matters fast is
07:12
the release build so we want to get
07:15
stack traces only running in a release
07:16
build the debug build would also be nice
07:19
but it's not really something that we
07:20
care about so let's talk a little bit
07:23
about what production builds look like
07:24
on Android so you have to run times to
07:28
work with one of which is the Java
07:30
runtime or it's actually more
07:31
appropriately the Android runtime and
07:33
that's responsible for running Java code
07:36
coupling code a bunch of stuff like that
07:38
and then you have sort the C runtime
07:39
which a lot of people say it's not even
07:41
a runtime it's just a bunch of low-level
07:43
support stuff like memory allocations
07:46
and the C runtime is sort of the minimum
07:51
set of stuff that you need to run any
07:55
code at all so it gives you stuff like
07:58
memory allocations but it also in normal
08:02
cases gives you the ability for instance
08:04
to do what's called stack unwinding and
08:06
that's the part that's actually
08:07
surprisingly a little bit absent on on
08:10
Android and we're going to this a little
08:11
bit so the Java Runtime more or less
08:16
gives you the ability to run Java code
08:18
on Android is a little bit more
08:20
complicated because it's actually not a
08:21
JVM that runs there it's on most modern
08:23
Android phones the Android runtime but
08:27
it's it sort of resembles Botticelli and
08:29
us and in particular it gives you a
08:31
function that gives you stack trace
08:32
which is pretty powerful if you have an
08:35
exceptional object that will already be
08:36
accept a stack trace on it you can do
08:39
something really useful with that on a C
08:41
runtime it's very different it's very
08:45
low-level and it doesn't have anything
08:49
there really to give you a stack trace
08:50
and even if you managed to get a stack
08:53
trace
08:53
the only thing that you get is just some
08:56
numbers the instruction addresses where
08:59
they function would return to and
09:01
unfortunately on Android in particular
09:04
there's actually no built-in support to
09:05
get the stack tree so you need to write
09:07
your own you bring your own second
09:09
winder with the application to get this
09:10
facility there aren't some theoretical
09:13
ones which are available but they are
09:15
not really that good
09:17
so yeah we care with sex races readable
09:23
stack traces on Java are not that hard
09:27
but if you use obfuscation tools like
09:30
progress or aid they will not be
09:33
readable they will not tell you where
09:35
the function actually was it will be
09:37
whatever was obfuscated so there's some
09:39
stuff that's necessary on the server
09:40
side
09:41
where the crash report is being sent to
09:42
to actually map it back to what you
09:44
expect with C it's a lot more
09:48
complicated because the stack trace even
09:49
if you get it it's just a bunch of
09:50
numbers and what we're going to little
09:52
bit by on indicates particularly hard to
09:54
get a sector is to get the numbers on
09:57
the sea side into function names line
09:59
numbers file names and so forth you need
10:01
what's called dwarf debug information
10:03
and Wharf debug information can get huge
10:06
and you definitely don't want to ship it
10:08
with your app it's way too big so we get
10:11
this debug info on to our server sites
10:13
where we do that but do the processing
10:15
but there will be a whole separate
10:19
section in this talk that actually goes
10:21
into why this debug information is so
10:23
hard to work with yeah so our goal is
10:26
turning the the funnies numbers or human
10:31
unreadable function names into something
10:33
that we can work with and that users can
10:35
comprehend so let's go with Java yes I
10:39
mean I already mentioned a little bit in
10:41
Java it's easy right so throwable has a
10:43
method get' stack trace and you know you
10:45
have already an object you can query the
10:48
class name package name and line number
10:49
it's all good except all we care is
10:53
production and in production you're not
10:55
gonna get the the real the real names
10:57
you will minify your application for
10:59
sure you're gonna use something like
10:60
ProGuard or ir 8 so your production
11:03
build is like faster and smaller but
11:05
also means that when you call guest
11:07
tres we're gonna get gibberish out of it
11:09
right so during your release build
11:12
ProGuard output something like this
11:15
robot mapping file so on the right hand
11:17
side you have what your the Co you wrote
11:19
in Java in this case for example cotton
11:21
and on the left-hand side is what
11:23
ProGuard would rewrite her code to look
11:25
like so here's a simple example is just
11:28
mapping from one thing to the other you
11:31
can get a lot more complicated with line
11:33
number optimisation and functioning
11:35
lining and this can cause some problems
11:39
right like if you use Java with
11:42
reflection and use ProGuard you already
11:44
know that you have to create some rules
11:45
and with that means that you have to
11:47
prevent minification from happening
11:49
altogether so that would break in
11:51
runtime even worse so here we have an
11:55
example of a ProGuard rule it just says
11:57
hey don't touch my exception types maybe
11:59
you access them via reflection but more
12:02
importantly and related to this talk is
12:04
if you do have Jay and I bridge if
12:07
you're calling from C++ called code in
12:10
Java you definitely don't want ProGuard
12:12
to touch that because again it would
12:13
fail at runtime just like if it was in
12:15
reflection so how do we get readable
12:21
text residency it's much trickier and
12:26
the reason for that is that we need to
12:28
do what's called second winning so this
12:31
is to explain a little bit sort of the
12:33
workflow that you would sort of expect
12:37
the application crashes right and so the
12:39
first thing you want to do is get a
12:40
secretaries and we're going to go into
12:42
like different ways in which you can do
12:44
this on NDK but what's most important in
12:47
order to get a stack trace you need
12:49
what's called unwind information and
12:50
unwind information is necessary because
12:53
the way the compiler optimizes your code
12:56
means that depending on how the compiler
12:58
wrote the code was actually necessary to
13:01
go from the current instruction to the
13:02
previous one or more accurately to the
13:04
one that would actually return to it can
13:07
be very complex so we need this unwind
13:09
info to unwind the next step after you
13:13
already have a stack trace is the
13:14
structures at this point is just a bunch
13:16
of numbers and we need to symbolic a tit
13:18
and symbolic head means we
13:20
take a number and a bunch of other
13:21
information that's necessary to turn it
13:23
into a human readable string and we're
13:25
going to use that debug information to
13:27
turn it into function names and
13:29
locations in files and at the end you
13:33
can just render it store it whatever you
13:34
want to do with it and this is something
13:36
that doesn't just apply for for debug
13:39
tools that make stack traces and crash
13:41
reports it also finds it applies to if
13:43
you do performance profiling because a
13:45
very common thing that you do when you
13:46
performance profile something is you
13:48
want to sample how many times you end up
13:51
somewhere in a sector a so I don't know
13:52
if you're familiar with DTrace but and
13:54
the way these tools work is that they
13:56
every n milliseconds take a step
13:58
snapshot of your stack and they will say
14:00
like oh we have been in this function
14:02
this function we have them in this
14:03
function and so based on which branches
14:06
you take you can see like I'm always
14:07
spending time in like this very slow
14:09
function so this is really important
14:11
information and depending on where you
14:16
want to draw this line you can run some
14:18
of it more on the server or some of it
14:19
more on the device so we built a tool
14:22
which is open source which is called
14:23
symbolic adder and it can make a stack
14:26
trace out of what we call memory dumps
14:27
on the server side or it can also just
14:31
symbolic eight individual memory
14:33
addresses so if you know if I'm in
14:35
function X and you want to know like
14:36
what is that you can just store it into
14:39
this tool it's going to symbolic ate it
14:40
for it this tool is also used in the
14:42
back behind the scenes on century so as
14:45
we mentioned there are different ways
14:48
you can decide to do it more on the
14:49
server or do it more on the client if
14:52
you want to do it on the server what you
14:53
would do is you would dump out the stack
14:55
memory and so we say like this fret has
14:57
this much memory for a sec you take this
14:59
throw it on into what's called a mini
15:01
dump and you can do the processing on
15:03
the server so there will be a little bit
15:05
of stack data in it which is maybe
15:06
irrelevant but dumping out stack memory
15:10
is a comparatively easy thing all you
15:12
need to know is why is my stack how
15:14
large is it and dump it into a file if I
15:18
don't want to dump the memory stack I
15:20
could also do what's called the stack
15:22
walking on a device and then I get out
15:24
individual addresses and I can store it
15:26
in symbolic header so how do I decide
15:28
which one I want do you want to stack
15:30
walk on the device or do I want to the
15:32
memory dump
15:33
and as we mentioned earlier we need the
15:36
unwind info to do the actual unwinding
15:38
and so where's the unwind info so the
15:41
unwind info is typically stored in the
15:43
executables so it can be in a shared
15:46
library that you load or it can be in
15:47
the coded e compiled and this unwind
15:50
info fulfills the purpose primarily for
15:53
C++ to be able to invoke the right
15:56
destructors if you for exceptions so if
15:58
you throw an exception and bubbles out a
15:60
bunch of stack frames it needs to
16:02
deallocate the memory or it needs to
16:04
clean up some resources on the way out
16:05
the stack and so this information needs
16:08
to be compiled into the binary most of
16:09
the time this is what a stack looks like
16:13
so you have you started top somewhere
16:16
there is your data some local variables
16:19
and then underneath that so growing down
16:23
to a lower memory address you have more
16:25
and more functions that are being called
16:26
the calling on top and in the stack
16:29
memory somewhere there's an interesting
16:31
part in it we've got the return address
16:32
and this is what it wants to return to
16:33
so if the function returns it wants to
16:36
jump there but how do you get to that
16:39
data right so the unwind information is
16:41
what ultimately tells the stack unwinder
16:44
where to return to so you need to know
16:46
what's a current address you need to
16:47
know the register state of everything
16:49
else and then we can take that
16:50
information to pop one frame off to what
16:52
was the previous one and since as we
16:56
mentioned earlier on production builds
16:58
you care about performance more than
16:59
anything else there can be a lot of
17:01
optimizations applied so in in in let's
17:06
say like long time ago there was always
17:09
a register which would point back to the
17:11
previous frame it was the frame register
17:13
but since we're short on registers a
17:16
very common optimization the compiler
17:18
would do is it would just omit this
17:19
frame pointer register and so once this
17:21
register is omitted you need this unwind
17:24
info because there is no heuristic you
17:25
could fall back to which is reliable so
17:30
as we mentioned we need a Sun went in
17:31
for it and this is sort of what the
17:34
workflow typically looks like for people
17:35
that use NDK or any other C++
17:39
workflow so you start over the source
17:42
code you compile it and from there it
17:44
sort of takes the branch the executable
17:46
makes its way
17:47
into the application into your on your
17:49
device so the executable gets
17:50
distributed and it crashes on a customer
17:52
site but with compiling that executable
17:56
you typically also get debug info so in
17:58
Windows that would be in a PDB file on
17:59
Mac that would be at war file same
18:02
really on Android because it's a Linux
18:05
system they also get morphed it out and
18:07
this debug file typically gets split off
18:09
and you can retain it then you can do
18:11
some stuff with it
18:12
and in this case it can approve it the
18:14
century and we could do some stuff with
18:15
it
18:16
the problem with that is if you only
18:18
give us the the debug data there's some
18:22
crucial information missing which is the
18:23
unwind information because the unwind
18:25
information is only contained in
18:26
executable so if you're we're to stack
18:29
unwind out of memory dumps on the
18:31
central server site you would also have
18:33
to give us the executable because
18:35
otherwise we come to it so this is sort
18:38
of the workflow that you use with system
18:40
like sentry
18:41
but even if you if you want to debug it
18:44
yourself so for instance if you if you
18:46
have a crash on an Android device you
18:48
get actually a secretary stump to a lock
18:50
and you can sort it and it will cry like
18:54
NDK crash or in any case stack which
18:57
would parse this and it also needs both
18:59
of these informations to do anything
18:60
useful or and if you want to throw it
19:03
into a local debugger like gdb you also
19:06
need both things so let's just say we
19:09
would like to unwind out of memory dams
19:11
for a lot of people that's sort of the
19:13
ideal case and we'll go into detail of
19:16
why you want that but that would be
19:19
great
19:19
the problem with that is that really
19:23
only works on iOS and the sort of this
19:25
is the mode in which we went into there
19:27
is an expectation because on iOS the
19:29
total amount of devices you will ever
19:31
observe is very limited there's a fixed
19:34
number of iPhones a fixed number of iOS
19:35
releases you can or we can just get all
19:39
the data store it on our site and then
19:41
identify it so the total amount of
19:43
observable things is very limited on
19:46
Android that's a little bit different
19:48
and enjoy it is much harder and it's
19:50
harder because the ecosystem is
19:52
fragmented like Android Beam open source
19:54
you can pull the code create your custom
19:56
build all these different vendors create
19:59
their own like Samsung will have their
20:00
own
20:00
bills and their own phones this is this
20:03
makes much harder for us to collect
20:04
symbols right so this is not a problem
20:09
that affects only a company like us
20:10
doing crash reporting but anyone who has
20:12
who wants to debug crashes in someone
20:15
else's phone like in the clients so what
20:18
Facebook decided to do is they decided
20:20
to upload system images from their apps
20:23
like so if you have a an Android device
20:25
on your in your pocket with the same
20:26
zoom sorry with a Facebook app this is
20:28
happening in the background for you so
20:31
we could do something like that
20:32
although we don't have a app as popular
20:35
Facebook to do that without your consent
20:38
so we have to also consider a different
20:41
option of what else can we do if we
20:42
can't do that so one thing we can do is
20:46
we can stack walk on the device that
20:48
comes with a few constraints enormous
20:49
already mentioning so how do we like
20:53
walk on a device we need stack Walker so
20:57
we can strike walk on device
20:58
conceptionally because the an event
20:60
information is on there this is the part
21:03
where NDK gets really confusing because
21:05
we all know that Android can stack walk
21:07
it has to and if it crashes on your
21:11
device you can actually get a stack out
21:12
of it so it's technically able to do
21:14
that in the androids tree source tree
21:19
there's a whole bunch of stuff in there
21:20
that can stack they sleep corkscrew
21:22
which stopped being maintained many
21:25
years ago and only supports very two bit
21:26
arm CPUs the sleep unwind which is an
21:31
open source stack on binder which is
21:33
very popular there are some patches on
21:35
it that Google wrote Google itself
21:38
actually stopped using lip on went
21:39
awhile back even said uses lipstick on
21:41
wine which is the Google custom also
21:43
somewhat open source is an open source
21:45
version of a stack combined it's written
21:47
in C++ it's huge in comparison to
21:49
everything else and this is sort of the
21:51
only one that's actively being
21:52
maintained for Android and it's the one
21:55
that Google itself uses the downside of
21:58
this is that it's actually not being
21:60
exposed to you as a customer and so to
22:02
understand how to even get that going
22:04
and a bunch of people have been doing
22:05
that they're open source Forks of that
22:07
where someone took the stack unwinder
22:10
from the Android source tree copied it
22:12
into a github repository made
22:14
work to compile against the extra
22:16
current version of NDK took out a bunch
22:18
of stuff that's unrelated but it's these
22:22
open-source sort of versions of that
22:25
which would direct out of the source
22:26
tree are kind of unmaintained so they
22:28
are they're much older than the most
22:30
recent commits are that google has on
22:32
their side since it's written in c++ and
22:35
since there's a lot of complexity going
22:37
on it requires a very large what's
22:39
called the sig old stack to actually do
22:40
unwinding so if you crash the operating
22:44
system gives you a separate stack to to
22:48
do some stuff in your signal handler and
22:50
this comes out of the box is a very
22:52
small stack and if you want to do a
22:54
stack unwinding in the signal handler
22:55
with flips that can bind a lip and wine
22:57
stack you you need to enlarge this
22:60
otherwise it's not going to work at all
23:01
and since the the general version you
23:05
can get on the internet for open source
23:07
use compared to the one that Google has
23:08
have deviated so much it's in a pretty
23:11
disappointing state at the moment so we
23:13
would really like to have a stack worker
23:15
that's exposed from Google it's
23:17
available on all the other platforms but
23:19
it's not there so the second thing we
23:22
need is once we have a stack trace we
23:25
need to figure out how do we get the
23:26
function names from it right how do we
23:28
get the addresses how do we get the line
23:29
numbers and this information is in the
23:33
debug information file so we need to
23:34
match them up with the executables on
23:36
your device and the way you do this or
23:39
the way ideally you do this is you get
23:40
grandpa's called the build IDs from that
23:42
so the build that is our unique
23:44
identifiers that match exactly the
23:46
binary with the debug info this
23:50
information is thankfully contained in
23:52
what's called the elf header you can use
23:55
the ideal iterate phd our function to
23:57
iterate over all the loaded elf files
23:59
and if you know we're in the memory to
24:01
poke around you can read it from there
24:03
one of the problems with that is that
24:05
all the NDK versions actually not having
24:07
that function reason being that google
24:09
has their own C library the Bionic one
24:14
which just didn't implement this this
24:16
part so an alternative and there's also
24:19
what Google does internally they are
24:21
parsing the proc maps so when the kernel
24:25
gives a process memory
24:27
it has it has to establish a memory
24:30
mapping somewhere so each process has
24:32
its own memory mapping and the kernel
24:34
provides a special file in proc pit map
24:36
where you can see that so this is an
24:38
example from what this memory mapping
24:40
looks like so it gives you a memory
24:42
range from where the mapping starts
24:44
where it ends what the executable bits
24:46
are so it can be executable can be read
24:48
it can be write the bunch of other flags
24:51
that can be set on it enable it can map
24:54
to a file so in this case you can see
24:55
that this is the proc map for the cat
24:57
command that's what I used as an example
24:59
and you can see like from memory address
25:01
0 X 4 whatever up to some relatively
25:06
large number this is where the
25:08
executable is and then there are bunch
25:10
of libraries linked into it like for
25:11
instance you can see that Lipsy is
25:12
linked in and it appears somewhere in
25:14
memory there is not so many mappings per
25:17
each one of those files is because some
25:19
of these are sections that are read-only
25:21
so fends global variables are typically
25:23
loaded into space somewhere they might
25:27
actually write in that case the executor
25:29
itself is the only one has a beautiful
25:31
bits because you don't want accidentally
25:33
to be able to execute stack memory so
25:36
you can see that a whole bunch of these
25:37
things are mapped in different areas and
25:39
so if you have ever used Google's brake
25:42
pad Google's brake pad will actually on
25:43
Android read this file because they
25:45
already knew that didn't have a
25:46
different API to work with so that's one
25:49
workaround that you can use on NDK to
25:51
actually get to these mappings if you
25:53
can't use the alliterate phd are but
25:56
here's the problem with stack walking on
25:58
device and that's basically that the way
26:00
this would work is that you're in a sec
26:02
fault right so basically you're the two
26:04
ways in which it can go one of them is
26:06
typically seg fault
26:07
so you touch some memory you're really
26:09
not supposed to touch and the operating
26:11
system punishes you for it and once it
26:13
wants to tear you down
26:14
shortly before it does that it emits a
26:16
signal called the 6x and you can
26:18
intercept that for that it will invoke a
26:22
signal handler and signal handlers are
26:24
very painful and politics and have all
26:26
kinds of restrictions the second option
26:29
that you will typically find is that if
26:32
a recursive function that calls itself
26:33
and eventually runs out of stack memory
26:35
so it runs to the through tab the top of
26:38
the stack and it touches some memories
26:40
on supposed to touch
26:41
now the problem with that is if you run
26:43
out of stack memory you basically the
26:46
signal handler would run somewhere where
26:49
there's no stack space anymore
26:50
so it can't run realistically so the way
26:54
Linux and particularly Android solve set
26:56
is supposed to call the cig old stack so
26:58
sick all-sec is separately prepared
27:00
memories like somewhere where your
27:02
signal handler will run so you can
27:04
imagine this like you run out of stack
27:07
space the the operating system says like
27:10
you can't do that but let me before I
27:12
abort you call your signal handlers that
27:14
we can do something and that signal hand
27:16
over running the sig sig old stack that
27:20
is very small it's so small that they
27:22
typically can't do a second whining the
27:24
second problem with that is that even if
27:26
you can do a second winding because he
27:27
made it larger is a concept called an
27:30
async safety so the signal handler gets
27:32
invoked at the point wherever that fred
27:36
was at the time so you could imagine
27:38
maybe it tried to allocate some memory
27:40
and that failed and for whatever reason
27:44
while you in this memory allocation
27:45
routine which typically has to be fred
27:48
safe does will be locked you know in a
27:50
signal handler if you try to allocate
27:51
more memory you are going to deadlock
27:53
because you're trying to fetch the lock
27:54
that you were just failing on so this
27:57
concept is trying called async safety
27:59
and if you look at all the functions
28:01
you're allowed to call in a signal
28:02
handler
28:02
it's a tiny set of observable functions
28:06
in particular most stack and winders are
28:09
not supposed to be called in that so
28:12
you're about to crash you about to do a
28:14
whole bunch of stuff which you're really
28:16
not supposed to do so in the ideal case
28:17
we wouldn't do that the first part we
28:20
can solve because we can just allocate
28:23
more stack space for the sick old stack
28:26
tell it now please use that so we would
28:29
do that on startup so we can give it 65
28:31
K instead of 8 K which is more for stack
28:33
walking and then every future signal
28:36
evoked from this point will have a
28:37
largest tech space to work with the
28:39
second problem however is a bigger one
28:40
how do we prevent
28:43
sec walking on a device at all because
28:45
if we have a sink safety as a
28:47
requirement
28:48
we're really very limited in what we can
28:50
do so if we don't have the ability to
28:53
effectively
28:54
allocate memory we can write their own
28:56
memory allocator right if there's a
28:58
whole bunch of other stuff that we would
28:59
have to do maybe you can work around
29:01
some of them but at one point you're out
29:04
of options and it would be easier for us
29:06
to just dump the stack memory and then
29:08
do everything on the server for that we
29:13
need a symbol server so the thing that
29:15
Facebook was doing under the hood
29:17
they're building a symbol server right
29:18
something that there's some allocation
29:20
service can query to get the symbols
29:22
that they need to symbolic 8 ideally
29:26
independent hardware providers would do
29:27
that so sensing would provide their
29:29
their symbol server and so would Google
29:32
for the Android bits and so forth
29:35
something that we plan to do is to
29:37
actually collect symbols ourselves like
29:40
maybe you can go on device farms in the
29:42
clouds to run tests in the app and it's
29:44
an app that only collects these symbols
29:45
and there should be good enough to allow
29:48
us to to to have a build our own symbol
29:52
server so how do we put all this SDK
29:56
together right with the step walking on
29:58
the client first on the NDK side what we
30:01
what we do is we hook to the signal
30:03
handlers we hook to the signal handlers
30:06
we also have to load the loaded images
30:07
so we know what was loaded in the
30:09
process and lastly we will like when the
30:12
signal Henders trigger we do the second
30:14
winding and we dump to disk so with the
30:19
event on disk
30:20
if the either if the app restarts or
30:22
still if the apps is still running
30:24
before it crashes the Java layer can
30:26
detect that a file was written it loads
30:28
this file up you can add more context
30:30
for example things the user was doing
30:31
right before the application crashed and
30:35
then you can send open a connection to
30:37
century in the back end and send this
30:38
event to century it's on the server side
30:41
then then we're able like having the
30:43
stack trace there'll be just numbers at
30:44
that point we can look at symbols debug
30:47
symbols and convert them into something
30:50
you can read and understand so your
30:51
function names and line numbers etc you
30:54
can also in the case of Java also do the
30:56
ProGuard mapping so it's all done on the
30:58
back end finally we just store it so
31:01
that you know on the UI can render and
31:03
show you a nice view of what happened on
31:04
when the application crashed
31:08
it's bringing us to the last part so
31:10
when we had to put all this together we
31:12
want to allow you as a developer to use
31:14
it you add our SDK we ship it to you we
31:18
want like it's easy enough for you to
31:19
call all the Java code it's immediately
31:21
available for you to call but to call
31:23
the C and C++ code from your C code it
31:27
just doesn't work out of the box and
31:28
this is just not something that you know
31:32
it's a lack of support from the tool
31:35
like the tooling Android Gradle plug-in
31:37
and all this stuff so when we build
31:39
around DK support this is what comes out
31:40
like all the API is every support we
31:42
have the different native libraries
31:44
there and when you add this to your app
31:48
Android studio can't see our libraries
31:51
automatically there are no header files
31:52
there for you to refer to and there's no
31:55
you're not possible to link against
31:57
these things so apparently the suggested
31:59
solution so far is for us to build a
32:01
greater plugin that will copy the native
32:03
libraries that we bundled in our AR into
32:06
your app and only then you can change
32:08
your C make list and link against it and
32:11
then call from your C++ code into our
32:14
C++ code it will like you don't need any
32:17
of this to capture the stack I'd like to
32:19
capture crashes from C++ but if you want
32:22
to call our code two different things
32:23
then you will have to do this ugly dance
32:26
here that we have how could this be
32:30
better right so we as we mentioned
32:32
earlier we have some experience in other
32:34
platforms and we went into this project
32:37
with some sort of expectations about how
32:39
it would be but that only turned out to
32:42
be just very different the the amount of
32:45
extra code that Android actually
32:46
required
32:47
it was unexpectedly large so if we could
32:51
have some wish list for center to bring
32:54
us better things it would be awesome if
32:57
Android could include a stack Walker
32:59
every a receive response platform we
33:01
have worked with so far has one Android
33:05
really doesn't because I guess Google
33:07
doesn't consider that to be something
33:08
that you want to production about device
33:11
the second thing is that deep ionic live
33:13
it just lacks a whole bunch of stuff one
33:16
of the things most C libraries have and
33:18
it actually required the politics to
33:20
some degree is
33:21
called the gate context function in the
33:23
u context e struct which lets you if you
33:26
call it it grabs all the register
33:27
contents and you can see what was there
33:29
at that point in time you can't write
33:31
that function in C you have to write it
33:33
in assembly and it's just not there and
33:36
Android so you would have to write some
33:37
person assembly code to get that the
33:40
lipstick on bind unwinder that google
33:42
has actually runs into this issue and so
33:44
they have some custom assembly code to
33:46
read just the most necessary registers
33:48
like the instruction pointer register
33:49
and their frame base pointer but you
33:52
can't get any or all the other ones with
33:54
built in we stuffed it sort of available
33:59
if you actually do manage to compile all
34:01
of this in there is no way to distribute
34:03
it right you you need to do some custom
34:06
code to get your native libraries out of
34:08
an aar there's no support for headers so
34:11
that would be really nice to have it
34:15
doesn't really seem like C in C++ is is
34:17
sort of the primary focus on Android and
34:21
it would just be nice if the tooling
34:22
supports that out of the box it's not
34:24
too uncommon that you want a C companion
34:26
library that's also available to other
34:28
things it would be very nice if you
34:30
could just like download stuff from a
34:33
central that's a binary dependencies we
34:35
can compile it against others it's not a
34:37
lot of fun to custom compile C code all
34:39
the time if I could just get this
34:41
ready-made from the internet it would be
34:42
nice right so I could just have a Gradle
34:44
dependency and it will just give me this
34:46
and it could link my C++ code against
34:48
and lastly it would be awesome if OMS
34:51
and Google could actually provide sample
34:53
servers the Microsoft ecosystem which is
34:56
very similar to the Android ecosystem
34:58
where you have maybe you only have some
35:01
Windows versions right but you have a
35:03
ton of OMS that write custom driver code
35:06
that adds all kinds of stuff that
35:07
appears in your secretaries on on
35:09
Windows for probably 20 years there's
35:12
the concept of simple server for any
35:15
stack trees that you get out of a
35:16
Windows system you can go to the
35:17
Microsoft symbol server and you get all
35:19
of this stuff even for your graphics
35:21
card if you have an Nvidia driver that
35:23
crashed somewhere that's their unity has
35:26
that so they said there's a concept and
35:28
a history of making this data available
35:30
so it would be great if you could get
35:33
this it would make everybody's life
35:35
it looks a little bit like okay this is
35:37
a crash reporting company of course
35:39
they're going to have this problem but
35:41
you can see on the internet there are
35:42
lots of people in open-source community
35:44
would love to have the state available
35:45
it would unlock a whole bunch of
35:47
possibilities that are currently just
35:49
not there and it would make the life of
35:51
C and C++ developers much greater and
35:54
with that if you have any questions
35:55
weren't happy to answer
droidcon News
Tech Showcases, Developer Resources & Partners
EmployerBrandingHeader
jobs.droidcon.com
![]() Latest Android Jobs
Kotlin Weekly
![]() Your weekly dose of Kotlin
ProAndroidDev
![]() Android Tech Blogs, Case Studies and Step-by-Step Coding
Zalando
![]() Meet one of Berlin's top employers
Academy for App Success
![]() Google Play resources tailored for the global droidcon community |
Droidcon is a registered trademark of Mobile Seasons GmbH Copyright © 2020. All rights reserved.