February 24, 2021
8 min read

Using Frida to find hooks in Android applications

Using Frida to find hooks in an Android application

Technical Writeups

One technique that Android applications sometimes use to obfuscate how they work is self-hooking. This, specifically, is utilized by Android packers as a way to protect the contents of the underlying application. Other examples can be seen in the wild in security products, malware, or even games deploying anti-cheat software. As a simplistic example, we will walk through how a security product can provide an "encryption" layer at rest for Android applications, without needing to be invoked directly by the application it's protecting.

The general idea is that, before an application makes any system calls to bionic-libc, the security instrumentation will place a hook on the functions it wishes to intercept. So for this example we will study a write-to-disk-encryptor. The general flow of any Linux based application from the system level would be as follows when writing to the disk:

  • Open a file descriptor
  • Write a buffer to that file descriptor
  • Close that file descriptor

Easy, right? Now at a very high level, when an Android application calls a higher level system API for writing a file to disk, this is still what happens below the hood. The developer doesn't need to know this, it just happens. So when writing security tooling around this, we can approach it at the very low level and let the system perform all our work for us. In order to do this, we would want to utilize the following steps;

  • The application attempts to open a file descriptor
  • Is this file descriptor inside the directory we want to encrypt?
    e.g. - /data/data/com.our.package.name/?
    If not, don't bother doing anything. Otherwise continue.
  • Mark this file descriptor as something we want to intercept on write()
  • Wait for a write() to occur which contains the above file descriptor as an argument, when this occurs, "encrypt"/xor/scramble the text as needed, before passing it to the real write() function.

So how would this actually be accomplished? While there are a few different ways, one common one is to patch the function you want to intercept, in memory, to point to your own function. You would then remember the original address for the function, and call this at the end. At a high level, this would be replicated in Frida like the following;

1var open = new NativeFunction(Module.findExportByName(null, 'open'), 'int', ['pointer', 'int', 'int']);
2Interceptor.replace(open, new NativeCallback(function(path, flags, mode) {
3    console.log('open( path="' + Memory.readUtf8String(path) + '", flags=', flags,' mode=', mode, ')');
4
5    // Check if this is a fd to encrypt contents of, save for later
6
7    return open(path, flags, mode);
8}, 'int', ['pointer', 'int', 'int']));

Note: To actually do this inline with an Android Application, you would need to write some native code to perform the actual hooking, though hopefully this is illustrative enough for the example. More information on this can be found in a few different places, like the Android GOT Hook article.

This can be pretty useful for folks who want to obscure how things are being done to both the system and the application developer. Though as a reverse engineer, we want to be able to find these things quickly. This is often the case when working on a bug bounty or pentest of an application which has used some type of protection like this. If the protection is bought and not developed in-house, it is merely a hindrance to get past and not often part of the assigned scope of the test. This is when looking for these hooked functions can be a nice way to speed up analysis and avoid spending time unwinding obfuscation or reversing a protector.

Utilizing a Corellium Android Device for this has become my go to for speeding up analysis of such applications. After we had added the Frida tab, it sped up my time even more. Previously when writing unpackers for certain Android targets, I had modified some scripts I found online which did patch checking. After extending this to fit my needs and keeping it up to date with different Frida and Android releases, I'll be utilizing it as an example to show how to use the embedded Frida on Corellium devices. The basic idea for this script is to read the contents of loaded shared libraries from the disk and compare them to those that are loaded in memory. If we find differences in the code segments, we likely know something has been changed and it would be good to investigate them.

To start we just need to simply create a new Android device on our account. I've brought up an Android 11 VM and allowed it to complete booting. After this we would install the target Application following the knowledgebase article for doing so, utilizing the Apps tab. My target for this blog will be the latest DJI Fly application, as I know that they use the SecNeo unpacker which uses this method under the hood. After the application is installed, I launch it from the Apps tab as well by just clicking `launch` and let it start up. From here we will navigate to the Frida tab, pictured below;

We can click the + Select a Process button here, and will be presented with a list of running processes.

From here we will choose the second dji.go.v5 process. If we attempted to attach to the first one, we would get an error. This is because SecNeo attaches to itself with a child process so that you cannot attach directly to the main one as a form of anti-debugging. We will tackle a target like this in a later blog, though for now, we can utilize the fact that the memory between the two processes is cloned - which will let us find the patched functions in the main process via the child one. Once the correct process is chosen, click the attach button.

As we see above, we now have a Frida REPL console which is attached to the process in the VM. Here we can copy and paste small scripts or test out functionality. I've found this extremely useful for quickly testing out Frida code while developing larger scripts. Next, we will click the scripts tab, so we can upload the provided hook_finder.js script (see References below). On this tab, we upload using the upload button and locating the script on your local machine and continuing. From there you should see a screen like the one below.

At this point, we can do two different actions to execute the script. We can now go back to the REPL console and manually enter it as %load /data/corellium/frida/scripts/hook_finder.js, or simply click the execute button, which will do this automatically for us. This is a custom command that is not normally found in the Frida REPL environment, so just keep this in mind if you're attempting this elsewhere. Once we navigate back to the console tab we should see the output of the script as it works. Notably we see hooks being identified in /apex/com.android.runtime/lib64/bionic/libc.so which are similar to the example given in the beginning of this blog. The output when I've run it on the DJI target is show below:

We can also see more hooks which are targeting /apex/com.android.art/lib64/libart.so and /apex/com.android.art/lib64/libdexfile.so. These hooks revolve more around the inner workings of how SecNeo performs "stolen bytecode replacement" as one of it's primary security mechanisms and is out of scope for this blog.

So what exactly are we seeing here? Essentially, if we dive into the hook_finder.js code, we have automated the process of reading a subsection of the modules loaded by the application from disk, parsing their headers to identify specific segments of the ELF file, and then comparing the contents of the .text and .rodata sections to those that are found in memory. While this won't detect all forms of hooking, it does detect a fair amount of those used by Android packers.

At this point, one should be able to take the code as it currently is, and extend it to analyze what has changed and try to understand why. Many of the things hooked are in relation to the above example used in this blog, while others are likely used for other packer features. We could expect to find that the code injected, causes the caller to jump to a memory location within the packers library. What this library might do with each function call from there is an exercise left to the reader.

Feel free to join our Slack to discuss this post or ask any questions you may have about it.

References

hook-finder.js

1// Script to gather the shared library from disk and also
2// from memory utilizing Frida. After reading the file from
3// disk, it will then compare some sections of the file in
4// order to hunt and identify potentially modified and hooked
5// functions.
6//
7// Re-written over the ages for usage while
8// unpacking Android applications by
9// Tim 'diff' Strazzere, <tim -at- corellium.com> <diff -at- protonmail.com>
10// Based off older code and concepts from lich4/lichao890427
11//
12// Corresponding blog https://corellium.com/blog/android-frida-finding-hooks
13
14// Helper function for creating a native function for usage
15function getNativeFunction(name, ret, args) {
16    var mod = Module.findExportByName(null, name);
17    if (mod === null) {
18        return null;
19    }
20
21    var func = new NativeFunction(mod, ret, args);
22    if (typeof func === 'undefined') {
23        return null;
24    }
25
26    return func;
27}
28
29var open_ptr = getNativeFunction('open', 'int', ['pointer', 'int', 'int']);
30var read_ptr = getNativeFunction('read', 'int', ['int', 'pointer', 'int']);
31var close_ptr = getNativeFunction('close', 'int', ['int']);
32var lseek_ptr = getNativeFunction('lseek', 'int', ['int', 'int', 'int']);
33
34function getElfData(module) {
35    console.log('Processing ', module.path);
36    if (module.sections) {
37        return true;
38    }
39
40    var fd = open_ptr(Memory.allocUtf8String(module.path), 0 /* O_RDONLY */, 0);
41    if (fd == -1) {
42        return false;
43    }
44
45    // Get elf header
46    var header = Memory.alloc(64);
47    lseek_ptr(fd, 0, 0 /* SEEK_SET */);
48    read_ptr(fd, header, 64);
49
50    // Allow for both 32bit and 64bit binaries
51    var is32 = Memory.readU8(header.add(4)) === 1;
52    module.is32 = is32;
53
54    // Parse section headers
55    var sectionHeaderOffset = is32 ? Memory.readU32(header.add(32)) : Memory.readU64(header.add(40)).toNumber(); // For some reason this is read as a string
56    var sectionHeaderSize = is32 ? Memory.readU16(header.add(46)) : Memory.readU16(header.add(58));
57    var sectionHeaderCount = is32 ? Memory.readU16(header.add(48)) : Memory.readU16(header.add(60));
58    var sectionHeaderStringTableIndex = is32 ? Memory.readU16(header.add(50)) : Memory.readU16(header.add(62));
59
60    var sectionHeaders = Memory.alloc(sectionHeaderSize * sectionHeaderCount);
61
62    lseek_ptr(fd, sectionHeaderOffset, 0 /* SEEK_SET */);
63    read_ptr(fd, sectionHeaders, sectionHeaderSize * sectionHeaderCount);
64
65    var stringTableOffset = is32 ? Memory.readU32(sectionHeaders.add(sectionHeaderSize * sectionHeaderStringTableIndex + 16)) : Memory.readU64(sectionHeaders.add(sectionHeaderSize * sectionHeaderStringTableIndex + 24)).toNumber();
66    var stringTableSize = is32 ? Memory.readU32(sectionHeaders.add(sectionHeaderSize * sectionHeaderStringTableIndex + 20)) : Memory.readU64(sectionHeaders.add(sectionHeaderSize * sectionHeaderStringTableIndex + 32)).toNumber();
67
68    var stringTable = Memory.alloc(stringTableSize);
69    lseek_ptr(fd, stringTableOffset, 0 /* SEEK_SET */);
70    read_ptr(fd, stringTable, stringTableSize);
71    var sections = [];
72
73    var dynsym = undefined;
74    var dynstr = undefined;
75    var relplt = undefined;
76    var reldyn = undefined;
77    
78    for (var i = 0; i < sectionHeaderCount; i++) {
79        var sectionName = Memory.readUtf8String(stringTable.add(Memory.readU32(sectionHeaders.add(i * sectionHeaderSize))));
80        var sectionAddress = is32 ? Memory.readU32(sectionHeaders.add(i * sectionHeaderSize + 12)) : Memory.readU64(sectionHeaders.add(i * sectionHeaderSize + 16)).toNumber();
81        var sectionOffset = is32 ? Memory.readU32(sectionHeaders.add(i * sectionHeaderSize + 16)) : Memory.readU64(sectionHeaders.add(i * sectionHeaderSize + 24)).toNumber();
82        var sectionSize = is32 ? Memory.readU32(sectionHeaders.add(i * sectionHeaderSize + 20)) : Memory.readU64(sectionHeaders.add(i * sectionHeaderSize + 32)).toNumber();
83
84        if (['.text', '.rodata', '.got', '.got.plt'].includes(sectionName)) {
85            var section = {};
86            section.name = sectionName;
87			section.memoryOffset = sectionAddress;
88			section.fileOffset = sectionOffset;
89            section.size = sectionSize;
90            if (sectionSize > 0) {
91                section.data = Memory.alloc(sectionSize);
92                lseek_ptr(fd, sectionOffset, 0 /* SEEK_SET */);
93                read_ptr(fd, section.data, sectionSize);
94            } else {
95                section.data = undefined;
96            }
97            sections.push(section);
98        } else if (['.dynsym', '.dynstr', '.rel.dyn', '.rel.plt'].includes(sectionName)) {
99            var section = {};
100            section.name = sectionName;
101			section.memoryOffset = sectionAddress;
102			section.fileOffset = sectionOffset;
103            section.size = sectionSize;
104            if (sectionSize > 0) {
105                section.data = Memory.alloc(sectionSize);
106                lseek_ptr(fd, sectionOffset, 0 /* SEEK_SET */);
107                read_ptr(fd, section.data, sectionSize);
108            } else {
109                console.log('No data section for', section.name);
110                section.data = undefined;
111            }
112
113            if (section.name === '.dynsym') {
114                dynsym = section;
115            }
116            if (section.name === '.dynstr') {
117                dynstr = section;
118            }
119            if (section.name === '.rel.dyn') {
120                reldyn = section;
121            }
122            if (section.name === '.rel.plt') {
123                relplt = section;
124            }
125            sections.push(section);
126        }
127    }
128
129    if (!!dynsym && !!dynstr) {
130        var symbols = [];
131        var stringTable = module.base.add(dynstr.memoryOffset);
132        var structSize = is32 ? 16 : 24;
133        for (var i = 0; i < dynsym.size / structSize; i++) {
134            var symbolOffset = Memory.readU32(module.base.add(dynsym.memoryOffset).add(structSize * i));
135            symbols.push(Memory.readUtf8String(stringTable.add(symbolOffset)));
136        }
137
138        module.symbols = symbols;
139    }
140
141    var relmap = new Map();
142    if (!!reldyn) {
143        for (var i = 0; i < reldyn.size / 8; i++) {
144            if ((Memory.readU32(module.base.add(reldyn.memoryOffset).add(i * 8)) != 0) &&
145                (Memory.readU32(module.base.add(reldyn.memoryOffset).add(i * 8).add(4)) >> 8 != 0)) {
146                relmap[Memory.readU32(module.base.add(reldyn.memoryOffset).add(i * 8))] = Memory.readU32(module.base.add(reldyn.memoryOffset).add(i * 8).add(4)) >> 8;
147            }
148        }
149    }
150
151    if (!!relplt) {
152        for (var i = 0; i < relplt.size / 8; i++) {
153            if ((Memory.readU32(module.base.add(relplt.memoryOffset).add(i * 8)) != 0) &&
154                (Memory.readU32(module.base.add(relplt.memoryOffset).add(i * 8).add(4)) >> 8 != 0)) {
155                relmap[Memory.readU32(module.base.add(relplt.memoryOffset).add(i * 8))] = Memory.readU32(module.base.add(relplt.memoryOffset).add(i * 8).add(4)) >> 8;
156            }
157        }
158    }
159    module.relmap = relmap;
160
161    module.sections = sections;
162    return true;
163}
164
165function findHooks(module) {
166    if (module.sections === undefined) {
167        if (!getElfData(module)) {
168            return undefined;
169        }
170    }
171
172    module.sections.forEach((section) => {
173        if (section.size === 0) {
174            return;
175        }
176
177        // It's important to cast the ArrayBuffer returned by `readByteArray` cannot be referenced incrementally
178        var file = new Uint8Array(Memory.readByteArray(section.data, section.size));
179        var memory = new Uint8Array(Memory.readByteArray(module.base.add(section.memoryOffset), section.size));
180        for (var i = 0; i < section.size;) {
181            if (['.rodata', '.text'].includes(section.name)) {
182                if (file[i] != memory[i]) {
183                    console.log('*** Potential variance found at ', DebugSymbol.fromAddress(module.base.add(section.memoryOffset).add(i)));
184                    i += 4;
185                }
186                i++
187            } else if (['.got'].includes(section.name)) {
188                break;
189                // It shouldn't be as the got table isn't initialized until execution
190                if (file[i] != memory[i]) { 
191                    // todo compare the symbol to string against what it resolves too
192                }
193                i += module.is32 ? 4 : 8;
194            } else {
195                // Unscanned sections, to be added as needed
196                break;
197            }
198        }
199    });
200}
201
202// Quick and simple way to get the package name, assumes that the script
203// was injected into an APK otherwise it won't work.
204function getPackageName() {
205    var fd = open_ptr(Memory.allocUtf8String('/proc/self/cmdline'), 0 /* O_RDONLY */, 0);
206    if (fd == -1) {
207        return 'null';
208    }
209
210    var buffer = Memory.alloc(32);
211    read_ptr(fd, buffer, 32);
212    close_ptr(fd);
213
214    return Memory.readUtf8String(buffer);
215}
216
217// Adjust this as needed, often I don't need to scan anything outside of the
218// included shared libraries and a few which are almost always in an apex folder.
219// This logic will need to be changed if you're using a pre-apex version of Android
220// to ensure it picked up the proper libraries for hunting
221//
222// While it doesn't hurt to scan everything, it's almost never needed and will just slow
223// down the process at a linear scale.
224//
225// If you already know what you're hunting for, feel free to just return or look for
226// libart, libdvm, etc, etc
227function getRelevantModules() {
228    var modules = [];
229    var packagename = getPackageName();
230
231    Process.enumerateModules().forEach((module) => {
232        if (module.path.includes(packagename)) {
233            modules.push(module);
234            console.log('Adding ', module.path);
235        } else if (module.path.includes('/apex')) {
236            modules.push(module);
237            console.log('Adding ', module.path);
238        } else {
239            console.log('Skipping ', module.path);
240        }
241    })
242
243    return modules;
244}
245
246var modules = getRelevantModules();
247
248modules.forEach((module) => {
249    getElfData(module);
250    findHooks(module);
251});

Contributors

Amanda Gorton