📜 ⬆️ ⬇️

We study With using GDB

Translation of the article by Allan O'Donnell Learning C with GDB .

Based on the characteristics of such high-level languages ​​like Ruby, Scheme or Haskell, learning C can be a daunting task. In addition to overcoming such low-level features of C, like manual memory management and pointers, you still need to do without a REPL . Once you get used to research programming in the REPL, dealing with a cycle of write-compiled-launched will be a little disappointment for you.

Recently it occurred to me that I could use GDB as a pseudo-REPL for C. I experimented using GDB as a tool for learning the language, not just for debugging, and it turned out that it was a lot of fun.

The purpose of this post is to show you that GDB is a great tool for learning C. I will introduce you to a few of my favorite teams from GDB, and demonstrate how you can use GDB to understand one of the difficult parts of C: the difference between arrays and pointers.
')

Introduction to GDB


Let's start by creating the following small C program - minimal.c :

int main() { int i = 1337; return 0; } 

Note that the program does absolutely nothing, and does not even have a single printf command. Now dive into the new world of learning using GBD.

Let's compile this program with the -g flag to generate debug information that GDB will work with, and give it the same information:

 $ gcc -g minimal.c -o minimal $ gdb minimal 

You should now be on the GDB command line with lightning speed. I promised you a REPL, so get:

 (gdb) print 1 + 2 $1 = 3 

Amazing print is a built-in GDB command that computes the result of the Cth expression. If you do not know what exactly a GDB command does, just use the help — type help name-of-the-command on the GDB command line.

Here is a more interesting example:

 (gbd) print (int) 2147483648 $2 = -2147483648 

I will miss clarifying why 2147483648 == -2147483648 . The main point here is that even arithmetic can be insidious in C, and GDB perfectly understands C arithmetic.

Now let's set a breakpoint in the main function and run the program:

 (gdb) break main (gdb) run 

The program stopped at the third line, just where the variable i is initialized. Interestingly, although the variable has not yet been initialized, we can already now see its value using the print command:

 (gdb) print i $3 = 32767 

In C, the value of a local uninitialized variable is undefined, so the result you received may differ.

We can execute the current line of code using the next command:

 (gdb) next (gdb) print i $4 = 1337 

Investigate memory using command X


Variables in C are contiguous blocks of memory. In this case, the block of each variable is characterized by two numbers:

1. The numeric address of the first byte in the block.
2. Block size in bytes. This size is determined by the type of the variable.

One of the distinguishing features of the C language is that you have direct access to the variable memory block. The & operator gives us the address of a variable in memory, and sizeof calculates the size occupied by a variable in memory.

You can play with both possibilities in GDB:

 (gdb) print &i $5 = (int *) 0x7fff5fbff584 (gdb) print sizeof(i) $6 = 4 

Speaking in a normal language, this means that the variable i is located at 0x7fff5fbff5b4 and occupies 4 bytes in memory.

I already mentioned above that the size of a variable in memory depends on its type, and generally speaking, the sizeof operator can also operate with the data types themselves:

 (gdb) print sizeof(int) $7 = 4 (gdb) print sizeof(double) $8 = 8 

This means that at least on my machine, variables of type int occupy four bytes, and of type double - eight bytes.

GDB has a powerful tool for direct memory exploration — the x command. This command checks memory starting at a specific address. It also has a number of formatting commands, which provide precise control over the number of bytes that you want to check, and in what form you want to display them. In case of difficulty, type help x on the GDB command line.

As you already know, the & operator calculates the address of a variable, which means that you can pass the value of & i to the x command and thereby get the opportunity to look at the individual bytes hidden behind the i variable:

 (gdb) x/4xb &i 0x7fff5fbff584: 0x39 0x05 0x00 0x00 

The formatting flags indicate that I want to get four ( 4 ) values, displayed in hexadecimal (he x ), one byte each ( b yte). I specified a test of only four bytes, because the variable i occupies so much in memory. The output shows a by-byte representation of the variable in memory.

But one subtlety is connected with the byte-by-output, which must be constantly kept in my head - on Intel machines, the bytes are stored in the order “ from the youngest to the oldest ” (from right to left), unlike the more usual record for a person, where the low byte would have to be in end (left to right).

One way to clarify this question is to assign a more interesting value to the variable i and check this memory area again:

 (gdb) set var i = 0x12345678 (gdb) x/4xb &i 0x7fff5fbff584: 0x78 0x56 0x34 0x12 

Examine memory with the ptype command


The ptype command is probably one of my favorites. It shows the type of C-th expression:

 (gdb) ptype i type = int (gdb) ptype &i type = int * (gdb) ptype main type = int (void) 

Types in C can become complicated , but ptype allows you to explore them interactively.

Pointers and arrays


Arrays are a surprisingly subtle concept in C. The point of this clause is to write a simple program and then run it through GDB until the arrays have any meaning.

So, we need a program code with array array.c :

 int main() { int a[] = {1, 2, 3}; return 0; } 

Compile it with the -g flag, run it in GDB, and use next to go to the initialization string:

 $ gcc -g arrays.c -o arrays $ gdb arrays (gdb) break main (gdb) run (gdb) next 

At this stage, you can display the contents of the variable and find out its type:

 (gdb) print a $1 = {1, 2, 3} (gdb) ptype a type = int [3] 

Now that our program is properly configured in GDB, the first thing to do is to use the x command to see what the variable a “under the hood” looks like:

 (gdb) x/12xb &a 0x7fff5fbff56c: 0x01 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x7fff5fbff574: 0x03 0x00 0x00 0x00 

This means that the memory location for array a begins at 0x7fff5fbff56c . The first four bytes contain a [0] , the next four bytes a [1] , and the last four bytes contain a [2] . Indeed, you can check and make sure that sizeof knows that a is in memory of exactly twelve bytes:

 (gdb) print sizeof(a) $2 = 12 

Up to this point, arrays look what they should be. They have types corresponding to arrays and they store all values ​​in adjacent memory locations. However, in certain situations, arrays behave very similarly to pointers! For example, we can apply arithmetic operations to a :

 (gdb) print a + 1 $3 = (int *) 0x7fff5fbff570 

In normal words, this means that a + 1 is a pointer to an int that has the address 0x7fff5fbff570 . At this point, you should already reflexively transfer pointers to the x command, so let's see what happened:

 (gdb) x/4xb a + 1 0x7fff5fbff570: 0x02 0x00 0x00 0x00 


Note that the address 0x7fff5fbff570 is exactly four more than 0x7fff5fbff56c , that is, the address of the first byte of the array a . Given that the type int occupies four bytes in memory, we can conclude that a + 1 indicates a [1] .

In fact, indexing arrays in C is syntactic sugar for pointer arithmetic: a [i] is equivalent to * (a + i) . You can check this in gdb:

 (gdb) print a[0] $4 = 1 (gdb) print *(a + 0) $5 = 1 (gdb) print a[1] $6 = 2 (gdb) print *(a + 1) $7 = 2 (gdb) print a[2] $8 = 3 (gdb) print *(a + 2) $9 = 3 

So, we saw that in some situations, a behaves like an array, and in some - like a pointer to its first element. What is going on?

The answer is as follows: when the name of an array is used in an expression in C, it “splits (decay)” into a pointer to the first element. There are only two exceptions to this rule: when the array name is passed in sizeof and when the array name is used with the address taking operator & .

The fact that the name a does not fall into a pointer to the first element when using the & operator raises an interesting question: what is the difference between a pointer into which a and & a breaks up?

Numerically, they both represent the same address:

 (gdb) x/4xb a 0x7fff5fbff56c: 0x01 0x00 0x00 0x00 (gdb) x/4xb &a 0x7fff5fbff56c: 0x01 0x00 0x00 0x00 

However, their types are different. As we have already seen, the name of the array splits into a pointer to its first element and therefore must be of type int * . As for type & a , we can ask GDB about it:

 (gdb) ptype &a type = int (*)[3] 

Simply put, & a is a pointer to an array of three integers. This makes sense: a does not decay when passed to the operator & and a is of type int [3] .

You can trace the difference between the pointer, which splits a and the operation & a by the example of how they behave in relation to pointer arithmetic:

 (gdb) print a + 1 $10 = (int *) 0x7fff5fbff570 (gdb) print &a + 1 $11 = (int (*)[3]) 0x7fff5fbff578 

Notice that adding 1 to a increases the address by four units, while adding 1 to & a adds twelve to the address.

The pointer to which a really splits a has the form & a [0] :

 (gdb) print &a[0] $11 = (int *) 0x7fff5fbff56c 

Conclusion


I hope I convinced you that GDB is an elegant research environment for studying C. It allows you to output the value of expressions using the print command, examine the memory by command x by one by one, and work with types using the ptype command.

If you plan to continue experimenting with learning C with GDB, then I have some suggestions:

1. Use GDB to work on The Ksplice Pointer Challenge .
2. Understand how structures are stored in memory. How do they compare to arrays?
3. Use GDB disassembler commands to better understand assembly programming. It is especially fun to explore how the function call stack works.
4. Check out the “TUI” GDB mode, which provides a graphical ncurses add-on over the usual GDB. On OS X, you will probably have to build GDB from source.

From the translator: Traditionally, to specify errors, use the LAN. I will be glad to constructive criticism.

Source: https://habr.com/ru/post/181738/


All Articles