[Home]HowToFindStackBugs

TheSourcery | RecentChanges | Preferences | Index | RSS

Here's our example program that we'll debug.

$ cat memover.c
a() {
char buf[10];
sprintf(buf,"$N gives %s a %s", "Dorkus", "big puking shiny sward o duum.");
}

int main(int argc, char** argv) {
  char c[10];
  a();
  return 0;
}

$ gcc -g -o memover memover.c

When we run it it crashes and dumps core. The bug is obvious looking at the code. The string being copied into the buf array is much larger than it can hold.

$ ./memover
Segmentation fault (core dumped)

$ gdb -c core
Core was generated by `./memover'.
Program terminated with signal 11, Segmentation fault.
#0  0x69622061 in ?? ()

(gdb) file memover
Reading symbols from memover...done.

The problem with memory overruns on the stack is that we can't rely on the frame pointers in our core dump as they are commonly trashed:

(gdb) bt
#0  0x69622061 in ?? ()
Cannot access memory at address 0x2073756b

(gdb) i locals
No symbol table info available.

Looks hopeless doesn't it as gdb has lost it's way? While the above error is obvious when your looking right at it. How do we find the offensive source code in a large program when our core dump is screwed up. Well first let's find out where our program code is.

(gdb) x _start
0x8048320 <_start>:     0x895eed31
(gdb) x _fini
0x804843c <_fini>:      0x53e58955

Our program code is found within the range of 0x8048320-0x804843c

How did I know to look for those symbol names? Well you find that out using the objdump utility on your executable.

$ objdump -t memover
memover:     file format elf32-i386

SYMBOL TABLE:
080480f4 l    d  .interp        00000000
08048108 l    d  .note.ABI-tag  00000000
08048128 l    d  .hash  00000000
08048158 l    d  .dynsym        00000000
080481c8 l    d  .dynstr        00000000
08048244 l    d  .gnu.version   00000000
08048254 l    d  .gnu.version_r 00000000
08048274 l    d  .rel.got       00000000
0804827c l    d  .rel.plt       00000000
0804829c l    d  .init  00000000
080482cc l    d  .plt   00000000
08048320 l    d  .text  00000000               ----------- our code is called .text 
0804843c l    d  .fini  00000000               --------- end of code here 
08048460 l    d  .rodata        00000000
...blah blah removed ....
08048320 g       .text  00000000              _start          --------- here's the symbol
08049598 g     O *ABS*  00000000              __bss_start
080483f4 g     F .text  00000011              main            ---------- our main
080482fc       F *UND*  00000105              __libc_start_main@@GLIBC_2.0
080494b8  w      .data  00000000              data_start
0804843c g     F .fini  00000000              _fini           -------- end of code
08049598 g     O *ABS*  00000000              _edata
080494d8 g     O .got   00000000              _GLOBAL_OFFSET_TABLE_
080495b0 g     O *ABS*  00000000              _end
080483d0 g     F .text  00000023              a                -------- ours tooo
08048464 g     O .rodata        00000004              _IO_stdin_used
0804830c       F *UND*  00000024              sprintf@@GLIBC_2.0  ------ library calls we called 
...more blah blah removed ....

Why do we need to know that? Well we can look at the registers in our core and look at a few meaningful ones. We can also get a sense of which are valid knowing the range our program is loaded at and where the stack usually lives.

(gdb) i registers
eax            0x30     48
ecx            0x40071d14       1074208020
edx            0xbffffbf8       -1073742856
ebx            0x4010b1ec       1074835948
esp            0xbffffcdc       -1073742628   ------ stack pointer is probably valid!  
                                              ------ This should be a high address on gcc 2.95+ on linux that is...  
                                              ------ systems that use alloca like old bsd and cygwin will often
                                              ------ have a low stack address below the program code
ebp            0x2073756b       544437611     ------ ebp the base frame pointer (this is trashed)
                                              ------ we know this because backtrace is fubared
                                              ------ and because it should be pointing at the stack higher than ESP
                                              ------- (greater than 0xbffffcdc or thereabouts)
esi            0x4000ae60       1073786464
edi            0xbffffd34       -1073742540
eip            0x69622061       1768038497    ------ eip is where we are executing (this is trashed too - ascii)
                                              ------- unscramble it right to left and it reads "a bi"  hmmm...
eflags         0x10282  66178
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x2b     43
gs             0x2b     43
cwd            0x0      0
swd            0x0      0
twd            0x0      0
fip            0x0      0
fcs            0x0      0
fopo           0x0      0
fos            0x0      0

Let's see what's on the stack pointed to by ESP. Display the contents of ESP in strings.

(gdb) x/10s 0xbffffcdc
0xbffffcdc:      "g puking shiny sward o duum."  ----------- wow a clue.  grep for it in the code.  
0xbffffcf9:      "² +h8\001@\001"
0xbffffd02:      ""
0xbffffd03:      ""
0xbffffd04:      " \203\004\b"
0xbffffd09:      ""
0xbffffd0a:      ""
0xbffffd0b:      ""
0xbffffd0c:      "A\203\004\b(\203\004\b\001"
0xbffffd16:      ""

Okay assuming we still didn't have clue.. lets display the stack in words

(gdb) x /20w 0xbffffcdc
0xbffffcdc:     0x75702067      0x676e696b      0x69687320      0x7320796e
0xbffffcec:     0x64726177      0x64206f20      0x2e6d7575      0xbffffd00* --- looks good here (PUSH of prior EBP?)  
0xbffffcfc:     0x40013868      0x00000001      0x08048320**    0x00000000
0xbffffd0c:     0x08048341      0x080483f4      0x00000001      0xbffffd34
0xbffffd1c:     0x0804829c      0x0804843c      0x4000ae60      0xbffffd2c

As we go up the stack we're hoping to find some data that looks valid. Bingo 0x08048320 looks like the last valid EIP of our program that is on the stack.

(gdb) disass 0x08048320
Dump of assembler code for function _start:
0x8048320 <_start>:     xor    %ebp,%ebp
0x8048322 <_start+2>:   pop    %esi
0x8048323 <_start+3>:   mov    %esp,%ecx
0x8048325 <_start+5>:   and    $0xfffffff8,%esp

It's a shame as that happens to be _start which doesn't narrow things down. Had we had a much bigger program with many more functions nested a might deeper that might have been helpful. Merc muds don't nest too deeply anyways so depending on how bad your overflow was that might not help.

Learn to use gdb, explore, play around and how the real code works under the covers of the high level language. I'm certain there are quicker approaches to solving this problem.


TheSourcery | RecentChanges | Preferences | Index | RSS
Edit text of this page | View other revisions
Last edited September 13, 2005 9:32 pm by JonLambert (diff)
Search:
All material on this Wiki is the property of the contributing authors.
©2004-2006 by the contributing authors.
Ideas, requests, problems regarding this site? Send feedback.