PCAP programming

This text is a translation of Tim Karstens’s 2002 Programming with pcap article. There is not much PCAP information on the Russian-language Internet. The translation was made primarily for people who are interested in the subject of traffic capture, but they do not speak English well. Under the cut, in fact, the translation itself.

Introduction

Let's start by defining who this article is written for. Obviously, some basic knowledge of C is necessary (unless, of course, you just want to understand the theory), to understand the code in the article, but you don’t need to be a programming ninja: in those moments that can be understood only by more experienced programmers, I will try explain in detail all the concepts. Also, some basic knowledge of how networks work can help understanding, given that PCAP is a library for implementing sniffing (Note of translator: Sniffing is the process of capturing network traffic, yours, or someone else's). All the code examples presented here were tested on FreeBSD 4.3 with the default kernel.

Getting Started: General PCAP Application Form

The first thing you need to understand is the overall structure of the PCAP sniffer. It may look like this:

We begin by defining the interface identifier from which we want to receive traffic. In Linux, it could be something like eth0 , in BSD it could be xl1 , and so on. We can either specify this identifier in the string, or ask PCAP to provide it to us.
Next, you need to initialize the PCAP. At this stage, we need to pass PCAP the name of the device with which we will work. If necessary, we can capture traffic from multiple devices. To distinguish them, we will use session descriptors. Just like when working with files, we need to call our traffic capture session so that we can distinguish it from other similar sessions.
In case we want to receive some kind of specific traffic (for example, only TCP / IP packets, or packets only from port 23, and so on), we must create a set of rules, "compile" them, and apply them to a specific session. This is a three-phase, closely related process. The set of rules is initially in a line, and then compiled into a clear PCAP format. Compilation is performed by calling a function inside our program; it is not related to the use of any external application. Next we tell PCAP to apply this filter to the session we need.
Finally, we tell PCAP to start capturing traffic. In the case of using pcap_loop , PCAP will work until it receives as many packets as we have indicated. Every time he gets a new package, he calls a function defined by us. This function can do everything we want. She can read the package and transmit the information to the user, she can save it to a file, or not do anything at all.
After we finish the capture, the session can be closed.
This is actually a very simple process. Only five steps, one of which is optional (step 3). Let's look at each step, and their implementation.

Device definition

It's terribly easy. There are two ways to identify the device we want to listen to.

The first is to simply allow the user to tell the program the name of the device from which he wants to capture traffic. Consider this code:

 #include <stdio.h> #include <pcap.h> int main(int argc, char *argv[]) { char *dev = argv[1]; printf("Device: %s\n", dev); return(0); }

The user defines the device specifying his name as the first argument of the program. Now, the dev line contains the name of the interface that we will listen in in the format understandable by PCAP (of course, provided that the user gave us the real interface name)

The second method is also very simple. Let's take a look at the program:

 #include <stdio.h> #include <pcap.h> int main(int argc, char *argv[]) { char *dev, errbuf[PCAP_ERRBUF_SIZE]; dev = pcap_lookupdev(errbuf); if (dev == NULL) { fprintf(stderr, "Couldn't find default device: %s\n", errbuf); return(2); } printf("Device: %s\n", dev); return(0); }

In this case, PCAP will simply set the device name itself. "But wait, Tim," you say. "What to do with the errbuf string?". Most PCAP commands allow us to pass them a string as one of the arguments. For what purpose? In the event that the command fails, PCAP will write the error description to the transferred string. In this case, if the execution of pcap_lookupdev() fails, the error message will be placed in errbuf . Cool, is not it? This is how the device name is set to capture traffic.

Device setup for sniffing

The task of creating a traffic capture session is also very simple. For this we will use the pcap_open_live() function. The prototype of this function:

 pcap_t *pcap_open_live(char *device, int snaplen, int promisc, int to_ms, char *ebuf)

The first argument is the device name that we defined in the previous section. snaplen is an integer that specifies the maximum number of bytes that PCAP can capture. promisc , when set to true , sets the device to promiscuous mode (anyway, even if it is set to false , in certain cases the interface may be in promiscuous mode). to_ms is the read time in milliseconds (a value of 0 means no timeout; at least on some platforms, this means that you can wait for enough packets to stop sniffing before you finish analyzing these packets. Therefore, you should use non-zero time). Finally, ebuf is a string in which we can store error messages (just like we did with errbuf ). The function returns the session handle.

For demonstration, consider this code snippet:

 #include <pcap.h> ... pcap_t *handle; handle = pcap_open_live(dev, BUFSIZ, 1, 1000, errbuf); if (handle == NULL) { fprintf(stderr, "Couldn't open device %s: %s\n", dev, errbuf); return(2); }

This code opens a device placed in the dev variable, says to read as many bytes as specified in BUFSIZ (a constant that is defined in pcap.h). We say to switch the device into illegible mode, in order to capture traffic before any error occurs, and in case of an error, put its description in the errbuf line; and after, in case of an error, we use this line to display a message about what went wrong.

Comments on legible / unintelligible sniffing modes: the two methods are very different in style. Usually, the interface is in legible mode, capturing only the traffic that is sent to him. Only traffic directed from it, to it, or routed through it will be captured by a sniffer. Illegible mode, on the contrary, captures all traffic that passes through the cable. In a non-switching environment, this can be all network traffic. The obvious advantage of this method is that it is possible to capture more packets, which may be useful, or not, depending on the purpose of capturing traffic. However, there are disadvantages. Illegible mode is easily detected, one node can clearly determine whether the other is in illegible mode or not. It also works only in a non-switched environment (for example, a hub, or a router using APR). Another disadvantage is that in networks with a large amount of traffic may not be enough system resources to capture and analyze all packets.

Not all devices provide the same data link layer headers in the packets you read. Ethernet devices, and some non-Ethernet devices, can provide Ethernet headers, but other types of devices, such as locking devices in BSD and OS X, PPP interfaces, and Wi-Fi interfaces in monitoring mode, do not.

You need to determine the type of data link headers that the device provides and use it to analyze the contents of the packets. pcap_datalink() returns the type of data link headers. (See list of data link header values . Returned values are DHT_ values in this list)

If your program does not support the link level headers provided by the device, then it will have to stop working with the following code:

 if (pcap_datalink(handle) != DLT_EN10MB) { fprintf(stderr, "Device %s doesn't provide Ethernet headers -not supported\n", dev); return(2); }

which will work if the device does not support Ethernet headers. This might work for the code below, which uses Ethernet headers.

Traffic filtering

Often we are interested in capturing only a certain type of traffic. For example, it happens that the only thing we want is to capture traffic from port 23 (telnet) to search for passwords. Or maybe we want to intercept the file that was sent via port 21 (FTP). Maybe we want to capture only DNS traffic (port 53 UDP). However, there are rare cases where we just want to blindly capture all Internet traffic. Let's take a look at the pcap_compile() and pcap_setfilter() functions.

The process is very simple. After we called pcap_open_live() and have a running sniffing session, we can apply our filter. You ask why not just use the usual if / else if expressions? Two reasons: first, the PCAP filter is more efficient because it filters directly through BPF; accordingly, we need far fewer resources, because the BPF driver does it directly. The second is that PCAP filters are simply simpler.

Before you apply a filter, we must compile it. The filter condition is contained in a regular string (or an array of char ). The syntax is well documented on the tcpdump.org main page; I will leave it to you for independent consideration. However, we will use simple test expressions, and, perhaps, you are sufficiently ingenious to deduce the rules of the syntax of these conditions from the examples given.

To compile the filter, we call the pcap_compile() function. The prototype defines this function as:

 int pcap_compile(pcap_t *p, struct bpf_program *fp, char *str, int optimize, bpf_u_int32 netmask)

The first argument is our session descriptor ( pcap_t* handle in our previous example). The next one is a pointer to the place where we will store the compiled version of the filter. Next comes the expression itself, in the usual string format. Then comes an integer that determines whether filter expressions need to be optimized or not (0 - no, 1 - yes). Finally, we must define the network mask of the network to which we apply the filter. The function returns -1 on error; all other values mean success.

After compiling the filter, it's time to apply it. Call pcap_setfilter() . Following our PCAP explanation format, we need to consider a prototype of this function:

 int pcap_setfilter(pcap_t *p, struct bpf_program *fp)

It is very straightforward and simple. The first argument is our session descriptor, the second is a pointer to the compiled version of our filter (it should be the same variable as in the previous pcap_compile() function).

Perhaps this example will help you understand better:

Sample job, compile and apply PCAP filter

 #include <pcap.h> ... pcap_t *handle; /*   */ char dev[] = "rl0"; /*    */ char errbuf[PCAP_ERRBUF_SIZE]; /*     */ struct bpf_program fp; /*   */ char filter_exp[] = "port 23"; /*   */ bpf_u_int32 mask; /*    */ bpf_u_int32 net; /* IP  */ if (pcap_lookupnet(dev, &net, &mask, errbuf) == -1) { fprintf(stderr, "Can't get netmask for device %s\n", dev); net = 0; mask = 0; } handle = pcap_open_live(dev, BUFSIZ, 1, 1000, errbuf); if (handle == NULL) { fprintf(stderr, "Couldn't open device %s: %s\n", dev, errbuf); return(2); } if (pcap_compile(handle, &fp, filter_exp, 0, net) == -1) { fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(handle)); return(2); } if (pcap_setfilter(handle, &fp) == -1) { fprintf(stderr, "Couldn't install filter %s: %s\n", filter_exp, pcap_geterr(handle)); return(2); }

This program is configured to sniff traffic that passes through port 23, in promiscuous mode, on the rl0 device.

You may notice that the previous example contains a function that we have not yet talked about. pcap_lookupnet() is a function that, when given a device name, returns the IPv4 network number and the corresponding network mask (the network number is an IPv4 ANDed address with a network mask, so it contains only the network part of the address). This is important because we need to know the network mask for applying the filter.

In my experience, this filter does not work on some operating systems. In my test environment, I found that OpenBSD 2.9 with the kernel supported this filter type by default, but FreeBSD 4.3 with the default kernel did not. Your experience may vary.

Real sniffing

At the current stage, we learned how to determine the device, prepare it to capture traffic, and apply filters. Now is the time to grab a few packages. There are two main ways to capture packets. We can simply capture one packet, or we can enter a loop that runs until n packets are captured. We begin by showing how to capture one packet, and then consider the methods of using loops. pcap_next() look at the pcap_next() prototype:

 u_char *pcap_next(pcap_t *p, struct pcap_pkthdr *h)

The first argument is the session descriptor. The second is a pointer to a structure that contains general information about the packet, specifically, the time at which it was captured, the length of the packet, and the length of its specific part (for example, if it is fragmented). pcap_next() returns a u_char pointer to the packet, which is described in the structure. We will talk about reading packages later.

This is a demonstration of using pcap_next() to capture packets:

Capture one packet

 #include <pcap.h> #include <stdio.h> int main(int argc, char *argv[]) { pcap_t *handle; /*   */ char *dev; /*    */ char errbuf[PCAP_ERRBUF_SIZE]; /*     */ struct bpf_program fp; /*   */ char filter_exp[] = "port 23"; /*   */ bpf_u_int32 mask; /*   */ bpf_u_int32 net; /* IP */ struct pcap_pkthdr header; /*     PCAP */ const u_char *packet; /*  */ /*   */ dev = pcap_lookupdev(errbuf); if (dev == NULL) { fprintf(stderr, "Couldn't find default device: %s\n", errbuf); return(2); } /*    */ if (pcap_lookupnet(dev, &net, &mask, errbuf) == -1) { fprintf(stderr, "Couldn't get netmask for device %s: %s\n", dev, errbuf); net = 0; mask = 0; } /*      */ handle = pcap_open_live(dev, BUFSIZ, 1, 1000, errbuf); if (handle == NULL) { fprintf(stderr, "Couldn't open device %s: %s\n", dev, errbuf); return(2); } /*     */ if (pcap_compile(handle, &fp, filter_exp, 0, net) == -1) { fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(handle)); return(2); } if (pcap_setfilter(handle, &fp) == -1) { fprintf(stderr, "Couldn't install filter %s: %s\n", filter_exp, pcap_geterr(handle)); return(2); } /*   */ packet = pcap_next(handle, &header); /*    */ printf("Jacked a packet with length of [%d]\n", header.len); /*   */ pcap_close(handle); return(0); }

An application captures the traffic of any device received through pcap_loockupdev() , putting it in promiscuous mode. It detects that the packet goes to port 23 (telnet) and tells the user the size of the packet (in bytes). Again, the program includes the pcap_close() call, which we will discuss later (although it is quite understandable).

The second way to capture traffic is to use pcap_loop() or pcap_dispatch() (which in turn uses pcap_loop() ). To understand the use of these two functions, we need to understand the idea of a callback function.

The callback function is not something new, it is a common thing in a lot of APIs. The concept behind the callback function is very simple. Suppose that there is a program that is waiting for an event of a certain kind. Just for example, suppose the program is waiting for a keypress. Every time a user presses a key, my program will call a function to handle this keystroke. This is the callback function. These functions are used in PCAP, but instead of calling them at the moment a key is pressed, they are called when PCAP captures a packet. You can use callback functions only in pcap_loop () and pcap_dispatch () which are very similar in this regard. Each of them calls the callback function each time a packet gets through which passes through the filter (if there is a filter of course. If not, all packets that were captured will call the callback function).

The pcap_loop() prototype is shown below:

 int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user)

The first argument is the session descriptor. Next comes an integer that tells pcap_loop() number of packets that need to be captured (a negative value means that the loop must run before an error occurs). The third argument is the name of the callback function (id only, no parameters). The last argument is useful in some applications, but in most cases it is simply set to NULL. Suppose that we have arguments that we want to pass to the callback function, in addition to those that are passed to it by pcap_loop() . The final argument is where we make it. Obviously, you must bring them to the u_char * type to make sure that you get the right results. As we will see later, PCAP uses some interesting ways to transfer information in the form of u_char * . After we show an example of how PCAP does this, it will be obvious how to do it in this moment. If not, refer to the reference text on C, since the explanations of the pointers are beyond the scope of this document. pcap_dispatch() almost identical in use. The only difference between pcap_dispatch() and pcap_loop() is that pcap_dispatch() will only process the first series of packets received from the system, while pcap_loop () will continue processing packets or batches of packets until the counter ends. For a more in-depth discussion of the differences, see the official PCAP documentation.

Before we give an example of using pcap_loop() , we need to check the format of our callback function. We cannot independently define the prototype of the callback function, otherwise pcap_loop() will not know how to use it. So we have to use this format as a prototype of our callback function:

 void got_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *packet);

Let's break it down in more detail. First, the function must have a void type. This is logical, because pcap_loop() would not know what to do with the return value anyway. The first argument matches the last argument of pcap_loop() . Regardless of what value is passed to the last argument pcap_loop() , it is passed to the first argument of our callback function. The second argument is the PCAP header, which contains information about when the packet was captured, how big it is, and so on. The pcap_pkthdr structure pcap_pkthdr defined in the pcap.h file as:

 struct pcap_pkthdr { struct timeval ts; /*   */ bpf_u_int32 caplen; /*   */ bpf_u_int32 len; /*   */ };

These values must be sufficiently clear. The last argument is the most interesting of all, and the most difficult to understand for a novice programmer. This is another pointer to u_char , and it points to the first byte of the data section contained in the packet that was captured by pcap_loop() .

But how can you use this variable (called packet) in the prototype? A packet contains many attributes, so, as you can guess, this is not a string, but a set of structures (for example, a TCP / IP packet contains an Ethernet header, an IP header, a TCP header, and finally, data). This u_char pointer points to the serialized version of these structures. To start using one of them you need to make some interesting type conversions.

, , . TCP/IP Ethernet.

Ethernet, IP, TCP

 /* Ethernet    6  */ #define ETHER_ADDR_LEN 6 /*  Ethernet */ struct sniff_ethernet { u_char ether_dhost[ETHER_ADDR_LEN]; /*   */ u_char ether_shost[ETHER_ADDR_LEN]; /*   */ u_short ether_type; /* IP? ARP? RARP?  .. */ }; /* IP header */ struct sniff_ip { u_char ip_vhl; /*  << 4 |   >> 2 */ u_char ip_tos; /*   */ u_short ip_len; /*   */ u_short ip_id; /*  */ u_short ip_off; /*    */ #define IP_RF 0x8000 /* reserved   */ #define IP_DF 0x4000 /* dont   */ #define IP_MF 0x2000 /* more   */ #define IP_OFFMASK 0x1fff /*     */ u_char ip_ttl; /*   */ u_char ip_p; /*  */ u_short ip_sum; /*   */ struct in_addr ip_src,ip_dst; /*      */ }; #define IP_HL(ip) (((ip)->ip_vhl) & 0x0f) #define IP_V(ip) (((ip)->ip_vhl) >> 4) /* TCP header */ typedef u_int tcp_seq; struct sniff_tcp { u_short th_sport; /*   */ u_short th_dport; /*   */ tcp_seq th_seq; /*   */ tcp_seq th_ack; /*   */ u_char th_offx2; /*  , rsvd */ #define TH_OFF(th) (((th)->th_offx2 & 0xf0) >> 4) u_char th_flags; #define TH_FIN 0x01 #define TH_SYN 0x02 #define TH_RST 0x04 #define TH_PUSH 0x08 #define TH_ACK 0x10 #define TH_URG 0x20 #define TH_ECE 0x40 #define TH_CWR 0x80 #define TH_FLAGS (TH_FIN|TH_SYN|TH_RST|TH_ACK|TH_URG|TH_ECE|TH_CWR) u_short th_win; /*  */ u_short th_sum; /*   */ u_short th_urp; /*   */ };

PCAP u_char ? , . ? ( : ).

, , TCP/IP Ethernet. . — , . , . .

 /*  Ethernet    14  */ #define SIZE_ETHERNET 14 const struct sniff_ethernet *ethernet; /*  Ethernet */ const struct sniff_ip *ip; /*  IP */ const struct sniff_tcp *tcp; /*  TCP */ const char *payload; /*   */ u_int size_ip; u_int size_tcp;

 ethernet = (struct sniff_ethernet*)(packet); ip = (struct sniff_ip*)(packet + SIZE_ETHERNET); size_ip = IP_HL(ip)*4; if (size_ip < 20) { printf(" * Invalid IP header length: %u bytes\n", size_ip); return; } tcp = (struct sniff_tcp*)(packet + SIZE_ETHERNET + size_ip); size_tcp = TH_OFF(tcp)*4; if (size_tcp < 20) { printf(" * Invalid TCP header length: %u bytes\n", size_tcp); return; } payload = (u_char *)(packet + SIZE_ETHERNET + size_ip + size_tcp);

How it works? . u_char — .

, , . , , — sniff_ethernet , , . — Ethernet , 14, SIZE_ETHERNET .

, , — . IP, Ethernet, . 4- IP. 4- , 4, . 20 .

TCP , 4- , " " TCP, 20 .

, :

VARIABLE	LOCATION(in bytes)
sniff_ethernet	X
sniff_ip	X + SIZE_ETHERNET
sniff_tcp	X + SIZE_ETHERNET + {IP header length}
payload	X + SIZE_ETHERNET + {IP header length} + {TCP header length}

sniff_ethernet , , . sniff_ip , sniff_ethernet , , sniff_ethernet (14 SIZE_ETHERNET ). sniff_tcp , — X Ethernet, IP . (14 , 4 IP). , ( ) .

, , , . . sniffer.c .

Completion

PCAP. PCAP , , , . .

2002. . , :
:
, , .

This document is Copyright 2002 Tim Carstens. All rights reserved. Redistribution and use, with or without modification, are permitted provided that the following conditions are met:
Redistribution must retain the above copyright notice and this list of conditions.
The name of Tim Carstens may not be used to endorse or promote products derived from this document without specific prior written permission.
/ Insert 'wh00t' for the BSD license here /

Source: https://habr.com/ru/post/337840/

All Articles