2021-08-21
I was shown that apparently GDB fails to properly decode NASM generated STABS debug sections, or NASM can’t produce correct STABS information. So what does a reasonable person with many other more important things on their todo-list do? Sit down and figure out what the issue is. Right??
The STABS format is a properly ancient format to encode additional debugging information in binaries, traditionally a.out binaries. Since then we entered the realm of Middle-Earth and started producing ELF binaries instead, using DWARF as the debugging format. However, ELF still can use STABS and especially on 32 bit architectures it is still relatively wide-spread, despite the Sun having finally settled on the company who invented the format 1 (sobs).
Debugging formats usually encode the name of the file corresponding to the generated object code, plus the specific line for one or more instructions. You can either use the GNU tool addr2line to view this manually, or use a debugger or the like for more bells and whistles.
But, for some reason, the combination of NASM+STABS results in cruel death, total destruction and absolute oblivion. Let’s dive in an save the Middle-Earth.
We use the AMD64 GNU/Linux platform, producing ELF64 binaries with
the minimal program using _start
as entry point and
executing sys_exit(0)
immediately. The object code is
generated using NASM/GAS respectively, and linked with the same
flags.
They produce identical object code. Reproduced here in both common x86 syntaxes2:
$ objdump -d -M att [gas/nasm]
0000000000401000 <_start>:
401000: b8 3c 00 00 00 mov $0x3c,%eax
401005: bf 00 00 00 00 mov $0x0,%edi
40100a: 0f 05 syscall
$ objdump -d -M intel [gas/nasm]
0000000000401000 <_start>:
401000: b8 3c 00 00 00 mov eax,0x3c
401005: bf 00 00 00 00 mov edi,0x0
40100a: 0f 05 syscall
If you want to follow along what I’m gonna be doing, you can find the source and binaries I used here. I’ve taken care to list every precise command I used.
If we ask GDB to print the source code listing of the GAS produced file, everything works fine:
$ gdb gas
Reading symbols from gas...
(gdb) list _start
1 .globl _start
2
3 .section .text
4
5 _start: movl $60, %eax
6 movl $0, %edi
7 syscall
(gdb) quit
With the NASM produced file we don’t get any source code listing though.
$ gdb nasm
Reading symbols from nasm...
(gdb) list _start
(gdb) quit
Note, we don’t get any error either!
GDB uses the same tooling as the GNU/ADDR2LINE program [citation needed], so let’s check how this program behaves 3:
$ addr2line -e nasm 0x401000
nasm.nasm:5
$ addr2line -e gas 0x401000
gas.s:5
Seems to work? But wait… what about the next instruction at address 0x401005?
$ addr2line -e nasm 0x401005
nasm.nasm:5
$ addr2line -e gas 0x401005
gas.s:6
They seem to be in disagreement here. With the last instruction as well:
$ addr2line -e nasm 0x40100a
nasm.nasm:6
$ addr2line -e gas 0x40100a
gas.s:7
To conclude, while addr2line at least seems to work for the first source code line (or first assembly instruction), unlike GDB which doesn’t work at all, for the other lines/instructions we are presented with wrong information.
GNU/OBJDUMP can use STABS information as well to annotate the disassembly, let’s try this 4:
$ objdump -dl nasm
0000000000401000 <_start>:
nasm.nasm:5
401000: b8 3c 00 00 00 mov $0x3c,%eax
401005: bf 00 00 00 00 mov $0x0,%edi
nasm.nasm:6
40100a: 0f 05 syscall
At least objdump and addr2line are consistent in what they are displaying. Let’s do the same with the gas produced file:
$ objdump -dl gas
0000000000401000 <_start>:
gas.s:5
401000: b8 3c 00 00 00 mov $0x3c,%eax
gas.s:6
401005: bf 00 00 00 00 mov $0x0,%edi
gas.s:7
40100a: 0f 05 syscall
As expected, with gas the line numbers are displayed as they should be.
Let’s use objdump to directly print ourselves the contents of
.stab
5:
$ objdump -G gas
Symnum n_type n_othr n_desc n_value n_strx String
-1 HdrSym 0 4 0000000000000007 1
0 SO 0 0 0000000000401000 1 gas.s
1 SLINE 0 5 0000000000401000 0
2 SLINE 0 6 0000000000401005 0
3 SLINE 0 7 000000000040100a 0
We are presented with a table of seven columns with five entries.
Without almost any knowledge of .stab
, my educated guess
is that SLINE
in the n_type
column refers to
the source code line with the address n_value
– so far, so
good. Now let’s check the NASM generated binary:
$ objdump -G nasm
Symnum n_type n_othr n_desc n_value n_strx String
-1 HdrSym 0 5 000000000000000b 1
0 SO 0 0 0000000000401000 1 nasm.nasm
1 SLINE 0 5 0000000000401000 0
2 SLINE 0 6 000000000040100a 0
3 SLINE 0 7 0000000000401014 0
4 SO 0 0 0000000000000000 0
This… doesn’t look right. The first address does match, but the
second source code line doesn’t match to address 0x40100a (the address
of the last instruction) and 0x401014 isn’t part of our program at all.
Also, we have an additional SO
entry with
n_value
0x0, but this may be something that’s allowed by
the (inexistant) specification.
We can also look at the binary representation of the files. First we
need to figure out the starting offset of the .stab
section, and ideally its length as well, so we ask objdump to dump the
headers of the our executables 6:
$ objdump -h nasm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000000c 0000000000401000 0000000000401000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .stab 00000048 0000000000000000 0000000000000000 0000100c 2**2
CONTENTS, READONLY, DEBUGGING
2 .stabstr 0000000b 0000000000000000 0000000000000000 00001054 2**0
CONTENTS, READONLY, DEBUGGING
$ objdump -h gas
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.property 00000030 0000000000400120 0000000000400120 00000120 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 0000000c 0000000000401000 0000000000401000 00001000 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .stab 0000003c 0000000000000000 0000000000000000 0000100c 2**2
CONTENTS, READONLY, DEBUGGING
3 .stabstr 00000007 0000000000000000 0000000000000000 00001048 2**0
CONTENTS, READONLY, DEBUGGING
So the NASM generated file has a .stab
section starting
at 0x100c of 0x48 bytes size, whereas with GAS it’s only 0x3c bytes of
size, with the same starting point. We can now use od(1)
to
dump this precise data in two-byte hex units 7:
$ od -t x1 -Ax -j 0x100c -N 0x48 nasm
00100c 01 00 00 00 00 00 05 00 0b 00 00 00 01 00 00 00
00101c 64 00 00 00 00 10 40 00 00 00 00 00 44 00 05 00
00102c 00 10 40 00 00 00 00 00 44 00 06 00 0a 10 40 00
00103c 00 00 00 00 44 00 07 00 14 10 40 00 00 00 00 00
00104c 64 00 00 00 00 00 00 00
001054
$ od -t x1 -Ax -j 0x100c -N 0x3c gas
00100c 01 00 00 00 00 00 04 00 07 00 00 00 01 00 00 00
00101c 64 00 00 00 00 10 40 00 00 00 00 00 44 00 05 00
00102c 00 10 40 00 00 00 00 00 44 00 06 00 05 10 40 00
00103c 00 00 00 00 44 00 07 00 0a 10 40 00
001048
Alternatively, we could use objdump again, dumping the full-contents
of the .stab
section with objdump -s -j .stab
,
yielding almost analogous output (try yourself!).
Comparing this binary dump with the -G
interpreted dump
of the .stab
section from before, we can guess that the
byte 0x1016 (0x05 in NASM, 0x04 in GAS) refers to the
n_desc
column in table. The byte at 0x01018 is encoding the
lowest byte of the n_value
. Searching for the other table
entries with their repective addresses 0x401000, 0x401005, 0x40100a and
0x401014 from the GAS/NASM files we deduce that the four bytes following
the n_desc
byte are little-endian encoded
n_value
.
This gives us the following format (on the example of the GAS dump),
where T refers to the bytes (likely) identifying n_type
, O
n_othr
, D n_desc
, V n_value
, and
S n_strx
:
00100c SS SS 00 00 TT TT DD OO VV VV VV VV|SS SS 00 00
00101c TT TT DD OO VV VV VV VV|SS SS 00 00 TT TT DD OO
00102c VV VV VV VV|SS SS 00 00 TT TT DD OO VV VV VV VV|
00103c SS SS 00 00 TT TT DD OO VV VV VV VV
001048
The String
field is stored in the stabstr
section and which we aren’t interested in.
Please note that this mapping is not more of an educated guess.
Anyhow, applied to the NASM binary dump we can recognize the same pattern:
00100c SS SS 00 00 TT TT DD OO VV VV VV VV|SS SS 00 00
00101c TT TT DD OO VV VV VV VV|SS SS 00 00 TT TT DD OO
00102c VV VV VV VV|SS SS 00 00 TT TT DD OO VV VV VV VV|
00103c SS SS 00 00 TT TT DD OO VV VV VV VV|SS SS 00 00
00104c TT TT DD OO VV VV VV VV
001054
You can use a hex editor such as radare2/rizen and fire it up as
writable (-w
) in raw mode (-n
):
$ r2 -w -n nasm
The goal is to edit the SLINE entries with the n_desc
values of 6 and 7 to point to the correct n_value
. They are
stored in the addresses 0x1038–103b and 0x1044–0x1047. We will modify
the bytes [0x0a,0x10,0x40,0x00] (little-endian encoded 0x40100a) and
[0x14,0x10,0x40,0x00] (little-endian encoded 0x401014) to be 0x401005
and 0x40100a respectively.
[0x00000000]> s 0x1038 # seek to specified address
[0x00001038]> wx 0x05 # write on byte with given hex value
[0x00001038]> s 0x1044
[0x00001044]> wx 0x0a
After saving, a quick objdump -G
confirms our changes
were successful, and indeed addr2line now seems to work:
$ addr2line -e nasm 0x401005
nasm.nasm:6
$ addr2line -e nasm 0x40100a
nasm.nasm:7
Sadly, GDB still won’t print our source code listing :|
But we still have the confusing additional STABS entry left in our
.stabs
section:
Symnum n_type n_othr n_desc n_value n_strx String
4 SO 0 0 0000000000000000 0
What if we could resize the section and thus ignore this bogus entry?
That is, instead of having a 0x48 byte sized .stabs
section, reduce it by the 12 bytes required to have a 0x3c sized section
(just as with the GAS produced file)?
We can invoke either on the system shell
$ rabin2 -O 'r/.stab/0x3c' nasm
or run
[0x00000000]> iO r/.stab/0x3c
from the radare2 shell 8. For me, this didn’t alter the file though, so we need to do some more magic, doing it all manually.
Wikipedia has a nice overview of the offsets
of the ELF binary header. The ELF header itself links the section
header (which may reside almost anywhere in the binary). Specifically,
the field e_shoff
at address 0x28 of the ELF header
contains the starting address of the section header.
We will now print the 8 Byte address as quad-word 9:
[0x00000000]> s 0x28
[0x00000028]> pxq 8
0x00000028 0x0000000000001148 H.......
We then seek to this address and can print a small hexdump again. It
is the binary representation of the section header which we
pretty-printed earlier using objdump -h
.
Again, from Wikipedia we can gather that each entry in the section header table is of 64 (0x40) Bytes size. With the default settings for our hex editor, this amounts to four lines for each entry in the able.
[0x00000028]> s 0x1148
[0x00001148]> px
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x00001148 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00001158 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00001168 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00001178 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x00001188 1b00 0000 0100 0000 0600 0000 0000 0000 ................
0x00001198 0010 4000 0000 0000 0010 0000 0000 0000 ..@.............
0x000011a8 0c00 0000 0000 0000 0000 0000 0000 0000 ................
0x000011b8 1000 0000 0000 0000 0000 0000 0000 0000 ................
0x000011c8 2100 0000 0100 0000 0000 0000 0000 0000 !...............
0x000011d8 0000 0000 0000 0000 0c10 0000 0000 0000 ................
0x000011e8 4800 0000 0000 0000 0300 0000 0000 0000 <...............
0x000011f8 0400 0000 0000 0000 0c00 0000 0000 0000 ................
0x00001208 2700 0000 0300 0000 0000 0000 0000 0000 '...............
0x00001218 0000 0000 0000 0000 5410 0000 0000 0000 ........T.......
0x00001228 0b00 0000 0000 0000 0000 0000 0000 0000 ................
0x00001238 0100 0000 0000 0000 0000 0000 0000 0000 ................
The first four lines containing zeroes signify an empty entry. The
second four lines are of no interest to us, but the third ones contain
the sequence 0x48 (the “wrong” size of our .stabs section!). So the
third entry with address from 0x11c8 is likely to be our
.stabs
entry. Indeed, Wikipedia documents that the
sh_size
is at offset 0x20 which would mean that the 8 bytes
from 0x11e8 really encode our size!
[0x00000028]> s 0x11e8
[0x000011e8]> wx 0x3c
Checking with objdump -h
and objdump -G
confirms our change to have been successful. Now firing up GDB:
$ gdb nasm
Reading symbols from nasm...
(gdb) list _start
1 global _start
2
3 section .text
4
5 _start: mov eax, 60
6 mov edi, 0
7 syscall
Success! It does take almost three times as much “user” time,
measured with time(1)
though, so probably it still chokes a
bit on the additional data in the section. Things do look much better
now though.
In the quest of further understanding the weird additional entry, let’s compare the NASM output to the one created by yet another assembler (mostly compatible with NASM input):
$ yasm -f elf64 -g -F stabs -o yasm.o nasm.nasm
$ ld -o yasm yasm.o
$ objdump -G yasm
Symnum n_type n_othr n_desc n_value n_strx String
-1 HdrSym 0 6 0000000000000015 1
0 SO 0 0 0000000000401000 1 nasm.nasm
1 FUN 0 0 0000000000401000 11 _start:F1
2 SLINE 0 5 0000000000000000 0
3 SLINE 0 6 0000000000000005 0
4 SLINE 0 7 000000000000000a 0
5 SO 0 0 000000000040100c 0
We do recognize the final line – in this case, however, it does contain a more meanifingful value than 0x0.
So we undo any modifications we did before and instead of deleting the last entry in the STABS table or resizing the STABS section, we simply modify it to also contain this 0x40100c address, as YASM does:
[0x00000000]> s 0x1050 # address of the last n_value in .stabs
[0x00001050]> wx 0x0c1040 # write little-endian encoded address.
And oh, indeed now gdb
is able to list our code, even
with correct line numbers! Except…
(gdb) break nasm.nasm:5
Breakpoint 1 at 0x401000: file nasm.nasm, line 5.
(gdb) break nasm.nasm:6
Breakpoint 2 at 0x40100a: file nasm.nasm, line 6.
(gdb) break nasm.nasm:7
Breakpoint 3 at 0x401014: file nasm.nasm, line 7.
Well, while in the source code listing GDB did somehow figure out
that the line numbers provided in the STABS couldn’t be right, when
setting breakpoints at specific line numbers it still gets confused. So
we do need both edits, first correcting the line numbers, and second,
either removing the additional SO
entry, or putting a more
meanifingful value there.
Without deeper knowledge it’s hard to judge whether GDB is at fault for not ignoring the last zero entry or whether such thing is indeed illegal. However, the line number information that NASM computes seems to be wrong in any case, and YASM does simply behave better. Time to switch?
https://en.wikipedia.org/wiki/Sun_Microsystems#Acquisition_by_Oracle↩︎
The intel syntax was invented by intel specifically for
the x86 microprocessor series, while the AT&T syntax was used by the
company of the same name also for other architectures. They differ in
mnemonics (opcode names), use of prefixes and suffixes, and even operand
order! The GNU objdump program can disassemble (-d
) a
binary and display the result in either syntax.↩︎
For those not acquiantanced with addr2line, we can
specify an executable using the -e
option, followed by an
address in the binary to find out what address maps to which line in the
original source code – provided debug information is present.↩︎
Additionally to the -d
disassembly-option,
we use the -l
option to list the corresponding source code
lines.↩︎
A helpful look into the manual of objdump tells us,
-G
can be used to display the table of the STABS section.↩︎
Using the -h
option to print all headers.↩︎
od(1)
is the POSIX specified octal dump
program. This is why we need so many additional options to force the
output to actually be in hex. You can use the GNU hexdump tool instead,
if you feel fancy.↩︎
For more details on what binary operations are, in
theory, possible, run rabin2 -O help
.↩︎
radare2 has a built-in help mechanism, use
p?
in the shell to get more information on the
p
command.↩︎