Abstract

The authorization mechanism of smart devices is mainly implemented by firmware, yet many smart devices have security issues about their firmware. Limited research has focused on securing the firmware of smart devices, although increasingly more smart devices are used to deal with the very sensitive applications, activities, and data of users. Thus, research on smart device firmware security is of growing importance. Disassembly is a common method for evaluating the security of authorization mechanisms. When disassembling firmware, the processor type of the running environment and the image base of the firmware should first be determined. In general, the processor type can be obtained by tearing down the device or consulting the product manual. However, it is not easy to determine the image base of firmware. Since the processors of many smart devices are ARM architectures, in this paper, we focus on firmware under the ARM architecture and propose an automated method for determining the image base. By studying the storage law of the jump table in the firmware of ARM-based smart devices, we propose an algorithm, named determining the image base by searching jump tables (DBJT), to determine the image base. The experimental results indicate that the proposed method can successfully determine the image base of firmware, which stores the absolute address in the jump table.

1. Introduction

Wireless technologies for smart devices are developing rapidly and are widely used. Smart devices have been deployed in several scenarios, such as smart phones, wearable devices, and vehicles. A recent marketing research report forecasted that the amount of smart devices will grow to approximately 10 billion in number worldwide by 2025 [1].

There have been a number of authorization security incidents caused by defects in firmware in recent years. For example, researchers found that several D-Link routers contain authentication backdoors by disassembling the firmware. If the attacker’s browser user agent string is xmlset_roodkcableoj28840ybtide, then he/she can access the web interface of the device, bypassing the authentication procedure and viewing/changing the device settings [2]. A similar incident occurred on the Tenda router, in which an authentication backdoor was found by disassembling the firmware. The backdoor allows for the execution of commands remotely by sending them to specific strings and commands [3].

Unlike traditional embedded devices, smart devices are more vulnerable to attack. Some incidents [48] indicate that the security situation of smart devices is becoming increasingly serious, which has a profound impact on a country’s economic and social development. Therefore, the security evaluation analysis and vulnerability assessment of smart devices are the primary considerations at present.

However, limited papers have been found that focus on securing the firmware of smart devices, although the firmware running on these smart devices is vulnerable to attack. Firmware provides the necessary instructions on how a smart device determines its functionality and communicates with other devices. The firmware can be obtained by downloading it from the website of the vendor or extracting it from the flash storage of the device hardware. Any firmware used in smart devices should be assumed insecure, which may have security vulnerabilities.

To evaluate and improve the security of firmware, a necessary method is disassembling [9, 10]. In this case, a disassembler, such as IDA Pro, needs to know the processor type and image base of the firmware [11]. In general, the processor type can be discerned by consulting the product manual or physical examination of the hardware [12, 13]. However, the image base cannot be obtained directly. Without the image base, the disassembler is unable to create cross-references based on absolute addresses [14]. When these cross-references are lacking, it is difficult to navigate efficiently in disassembly listing. Facing the obscure disassembly code, people often lose their direction when they look for the assembly code in which they are most interested. Conversely, knowledge of the correct image base is critical in understanding the firmware as a whole [12].

Heterogeneous hardware architectures are used in firmware images; however, many smart devices are based on the ARM architecture [1517]. Therefore, this work mainly focuses on ARM-based firmware. As shown in Figure 1, Figure 1(a) shows the disassembly code with the wrong image base and Figure 1(b) shows the disassembly code with the correct image base. IDA Pro cannot establish a cross-reference when the wrong image base is set, and the absolute addresses are marked in red. When the correct image base is set, IDA Pro establishes cross-references to these absolute addresses, which are important for reverse engineers to understand the intention of the assembly code.

To determine the image base of firmware, many researchers have put in a great deal of effort, and several manual solutions have been proposed.

Skochinsky [18] proposed a general principle for determining the image base of a file with an unknown format. He suggested that some kinds of hints, such as self-relocating code and initialization code, can be used.

Basnight et al. [12, 19] presented two methods for inferring the image base. The first method uses immediate values in instruction to infer a reasonable image base. The second method uses a hardware debugger to halt a programmable logic controller and obtain a memory dump. Then, the image base can be found by manually analyzing common instruction patterns in the memory dump.

Dacosta et al. [20] noted that when the case values in a switch-case statement of a C program are sequential and dense, the memory addresses of the case are usually stored in a jump table; this fact can be used to infer the memory address of the nearby code and eventually obtain the image base. Dacosta’s approach manually analyzed the instruction of jump to default statement block (in this case, the BHI instruction) first, obtained the offset of the default statement block, and then analyzed the memory address of the default statement block to calculate the image base.

All of the above methods are not automated and heavily rely on reverse engineers’ experience and intuition. We have proposed [2123] three methods for automatically determining the image base. These automated methods are applicable to different types of ARM firmware, which cannot determine the image base of all types of firmware.

In this paper, we proposed a method for determining the image base of firmware that uses a jump table to store absolute addresses. The source code of firmware usually contains switch-case statements, and the compiler may generate jump tables for such code. By searching the sequence of instructions, the jump table can be located. Then, according to the absolute addresses in the jump table and the offset of the case statement block, we can obtain the image base. The experimental result indicates that the proposed method can effectively determine the image base of firmware that uses the jump table to store the absolute addresses.

2. Jump Table in Firmware

The switch-case statement often appears in the source code of firmware and may generate a jump table after being compiled. After the code in Listing 1 is compiled into a binary file, IDA Pro can be used to disassemble the binary file, and the disassembly results are shown in Figure 2.

switch(n)
{
case 0:
  printf("n =0\n");
  break;
case 1:
  printf("n =1\n");
  break;
case 2:
  printf("n =2\n");
  break;
case 3:
  printf("n =3\n");
  break;
case 4:
  printf("n =4\n");
  break;
default:
  printf("default.\n");
}

It can be seen that when there is a switch-case statement in the code, the compiler may generate a jump table. The content in the jump table is the addresses of the case statement block; for example, 0x8268 in the jump table is the address of the first case statement block.

Next, we analyze the calculation process of the jump table in two cases. (1)Suppose that variable in the code of Listing 1 is less than or equal to 4 (e.g., 3), then register R3 in the instruction at memory 0x8248 in Figure 2 is 3. After executing the instruction “CMP R3, #4” at offset 0x00008248, the LDRLS instruction is executed. According to the ARM manual [24], the memory address accessed by LDRLS is

As shown in Figure 2, the word at address 0x8260 is 0x828C. This means that the PC register will be assigned a value of 0x828C, and the program will jump to 0x828C to continue execution (2)When the value of variable is greater than 4, i.e., the value of R3 is greater than 4, the instruction “B loc_82A4” at offset 0x00008250 will be executed. The program will jump to location loc_82A4 to continue execution

According to the above analysis, we can understand the calculation process of the jump table. Take the firmware of ABB NETA-21 as a case, as shown in Figure 3. The CMP instruction at offset 0x000AB124 is followed by the LDRLS instruction, the B instruction, and a jump table. The jump table begins at offset 0x000AB130 with four addresses, as shown by the red background in Figure 3, which are 0xC00B326C, 0xC00B3160, 0xC00B3150, and 0xC00B3140. In general, the minimum memory address in the jump table points to the first case statement block, and the first case statement block is usually next to the jump table. The minimum memory address in the jump table is 0xC00B3140, and the first case statement block starts at offset 0x000AB140. That is, the case statement block with offset 0x000AB140 is mapped to the memory address 0xC00B3140, and then, the image base can be calculated.

3. DBJT Algorithm

According to the above analysis, when compiling the switch-case statement, the compiler usually generates the CMP instruction, LDRLS instruction, B instruction, and jump table in turn. The program jumps according to the addresses in the jump table. The model is shown in Figure 4.

In general, the minimum memory address in the jump table points to the first case statement block. A jump table can be used to deduce the memory address of the first case statement block; thus, the difference between the memory address and offset of the first case statement block can be used to obtain the candidate image base.

Figure 5 shows that the firmware that contains a case block with offset offset_case1 is mapped to memory. The image base of firmware is denoted as the base, and the minimum memory address in the jump table is denoted as min_addr. According to the analysis in Section 2, the first case block with offset offset_case1 is mapped to memory location min_addr, i.e., , and then, we can obtain the image base as .

Based on the model of the switch-case statement, we can scan from the starting position of the firmware to locate the switch-case statement. If in a location, the current instruction is CMP, the second instruction is LDRLS, and the third instruction is B, then we consider it to be a switch-case statement, and the B instruction is followed by the jump table. Then, read in all the content of the jump table, obtain the minimum element of the jump table, and subtract the offset of the first case block from the minimum element to obtain a candidate image base. With one jump table, we can obtain a candidate image base. All candidate image bases can be calculated from all jump tables of the firmware. Then, we count the frequency of each candidate image base. If the frequency of a particular candidate image base is much larger than those of others, then we consider this candidate to be the actual image base. Based on the above analysis, we propose the determining the image base by searching jump tables (DBJT) algorithm to determine the image base. The pseudocode of the algorithm is shown in Listing 2.

Input:firmwareFile
Output: A sorted result of the elements and their occurrence in multiset M
function DBJT (firmwareFile)
fileSize ⟵Obtain the size of firmwareFile
offset ⟵0
while(0 ≤ offset < fileSize) do
  CMP_FLAG ⟵ FALSE
  LDRLS_FLAG ⟵ FALSE
  B_FLAG ⟵ FALSE
  if Current instruction is CMP instruction, then
     CMP_FLAG ⟵ TRUE
  else
     offsetoffset +4
     continue
  end if
  if The second instruction is LDRLS instruction, then
     LDRLS_FLAG ⟵ TRUE
  else
     offsetoffset +4
     continue
  end if
  if The third instruction is B instruction, then
     B_FLAG ⟵ TRUE
  else
     offsetoffset +4
     continue
  end if
  if CMP_FLAG ==TRUE && LDRLS_FLAG == TRUE && B_FLAG == TRUE then
     jt[n] ⟵ Read the jump table
     min_addr ⟵ Obtain the minimum element of the array jt[n]
     offset_case1 ⟵ Obtain offset of the first case block
     basemin_addr - offset_case1
     if   base % 4 ==0    then
       Mbase
     end if
     offsetoffset_case1
  end if
  offsetoffset +4
end while
 Count the number of occurrences of each element in the multiset M
 Sort the elements and their occurrence in descending order by number of occurrences
Output: Sorted elements and their occurrences
end function

The time complexity of the DBJT algorithm is O(fileSize), where fileSize is the size of the firmware file. The algorithm first locates the jump table according to three consecutive instructions (CMP instruction, LDRLS instruction, and B instruction) and then sorts all the addresses in the jump table to obtain the minimum memory address. A candidate image base is obtained by the difference between the offset of the case statement block and the minimum memory address, and the candidate image base is added to multiset M. Finally, count the number of occurrences of each candidate image base in the multiset M, and then, sort them in descending order by occurrences. If a candidate image base appears much more frequently than other elements, then it is considered the correct image base. Otherwise, the outputs do not contain the correct image base because the DBJT algorithm cannot be applied successfully to this firmware.

4. Experimental Results and Analysis

To test the proposed algorithm, we collected 10 firmware from well-known vendors’ official websites. The DBJT algorithm was implemented in the C language and was compiled with Visual C++6.0. The experiments were performed on a personal computer with an Intel i7-2600 3.4 GHz processor and 18 GB memory, running Microsoft Windows 7 SP1.

4.1. Experimental Results

In the experiment, the DBJT algorithm proposed in this paper is used to identify the jump table in the firmware and calculate the image base. The experimental results are shown in Table 1. The column “Jump table” lists the number of jump tables identified by the DBJT algorithm in each firmware file. The column “Correct” lists the frequency of the correct image base identified by the DBJT algorithm, and the column “Base” lists the correct image bases of the corresponding firmware. The column “Time” lists the execution time of the proposed algorithm. The symbol N/A means that the method is not applicable to the corresponding firmware; the reasons for this are discussed in Section 4.2. The manual validation results are shown in the “Validated” column of Table 1.

We take the firmware uImage of ABB NETA-21 as an example to analyze the experimental results. As shown in Table 1, 261 jump tables are identified by the DBJT algorithm, 108 of which point to the same candidate image base 0xC0008000. Figure 6(a) shows the candidate image base and the corresponding occurrence frequency. It can be seen that the candidate image base 0xC0008000 appears 108 times, which is much higher than the frequency of other candidate image bases. The practical significance is that the candidate image base calculated by 108 jump tables is 0xC0018000. Therefore, we consider 0xC0018000 to be the correct image base of the firmware.

To verify whether the experimental results are correct, we load the firmware file uImage using IDA Pro and set the processor type to “ARM little-endian” and the image base to 0xC0008000. Then, we can see that the cross-references for absolute addresses in the disassembly code are correct, as shown in Figure 7(a). This indicates that the memory address 0xC0008000 is the correct image base. In comparison, the same file loaded by IDA Pro without setting the correct image base is shown in Figure 7(b).

As shown in Table 1, the execution time of the proposed algorithm for uImage is 250 ms. Compared to the time of reverse engineering, the time to determine the image base is insignificant.

Figure 6(b) shows the experimental results obtained for the firmware sample 3551.bin from the Advantech EKI-2748FI-managed Ethernet switch, the image base of which is 0x00400000, which is manually verified as the correct image base.

In Figure 6, we can see that there are some other points near the image base. These points are caused by errors in the algorithm. If the default statement block is in the first position in the switch-case statement, then the minimum memory address in the jump table no longer points to the first case statement block, and the default statement block is next to the jump table. This style of the C code is shown in Listing 3, and its corresponding assembly code is shown in Figure 8. Although such style of the C code is legitimate, most programmers never write in such style. This type of switch-case statement will lead to the inaccuracy of the DBJT algorithm, which will differ from the correct image base by a few bytes.

switch(n)
{
default:
  printf("default.\n");
  break;
case 0:
  printf("n =0\n");
  break;
case 1:
  printf("n =1\n");
  break;
case 2:
  printf("n =2\n");
  break;
case 3:
  printf("n =3\n");
  break;
case 4:
  printf("n =4\n");
  break;
  }
4.2. Possible Reasons for Determination Failure

In Table 1, the number of recognized jump tables in some firmware is 0, and the image base is not determined successfully, indicating that the DBJT algorithm is not suitable for this firmware. The possible reasons for this are as follows. (1)The compiler generates a jump table only when the value of the case in the switch-case is sequential and dense. Otherwise, the compiler generates no jump table. For example, the case value in Listing 4 is not sequential, and there is no jump table generated, as shown in Figure 9(2)In some firmware, the jump table contains no absolute addresses, and the DBJT algorithm cannot be used to determine the image base, such as firmware es-03001-1.ffd of Emerson ES-03001, firmware v1.23.nb0 of Phoenix OT 4 M Terminal, and pn-82672.bin of Rockwell DriveLogix 5730. Figure 10 shows the assembly code of firmware es-03001-1.ffd

switch(n)
{
case 1:
  printf("n =1\n");
  break;
case 100:
  printf("n =100\n");
  break;
default:
  printf("default.\n");
}

In Figure 10, the BHI instruction at address 0x00004E00 is the “Branch if Higher” instruction. Combined with the previous instruction, “CMP R1, #6,” if R1 is greater than 6, then it will jump to the label def_4E0C. If R1 is less than or equal to 6 (e.g., 2), then the ADR instruction will be executed. The ADR instruction at address 0x00004E04 assigns register R2 to 0x00004E10. LDRB instruction loads a byte from memory. Then, . The 0x01 at address 0x00004E12 is loaded into register R2. The ADD instruction at address 0x00004E0C will modify the value of the PC register. The calculation process of the PC register is as follows:

That is, the PC register will be assigned the value 0x4E18. From the above calculation, it can be seen that there is no absolute address stored in the jump table, so the algorithm proposed in this paper cannot be used for this firmware.

5. Conclusions

The disassembly of firmware is a necessary step in the security assessment of authentication mechanisms. However, for the firmware of most smart devices, the image base cannot be obtained directly, which is a major obstacle to disassembly. In this paper, we research the storage law of the jump table in the ARM firmware of smart devices and propose a method for determining the firmware image base by using a jump table. The experimental results show that the proposed method is effective for the firmware that stores the absolute addresses in the jump table. For future work, it is still a challenge to automatically determine the image base of other types of firmware, such as firmware that contains no jump table. We will continue to research new methods for other kinds of firmware in smart devices. We believe that these automated approaches can effectively reduce the difficulty of reverse analysis.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61802439) and Beijing Youth Backbone Personal Project (Grant No. 201800002685XG357).