Virtual network interface

It is well known that Linux drivers are kernel modules. All drivers are modules, but not all modules are drivers. An example of one such group of modules that are not drivers and that appear much less frequently in discussions is network filters at various levels of the Linux network stack.

Sometimes, and even often enough, I would like to have a network interface that could operate with the traffic of any other interface, but in some way additionally “color” this traffic. This may be needed for additional analysis, or traffic control, or its encryption, ...

The idea is extremely simple: to channel the traffic of an already existing network interface to a newly created interface with completely different characteristics (name, IP, mask, subnet, ...). We will discuss one of the ways to perform such actions in the form of a Linux kernel module (it is not the only one, but we will discuss other methods separately another time).

About interfaces in a couple of words

It is immediately clear that we intend to “hang up” a new interface, which is to be created, to a previously existing one. Therefore, we will briefly recall what concerns the creation of interfaces (and what is done, for example, by the driver of any network adapter), because there are several nuances that are important for our purposes.
')
The network interface is where:

For each packet received from the interface, instances of the socket buffer structure (struct sk_buff) are created, then the created instance of the structure moves up the protocol stack up to its recipient in user space, where it is destroyed;
The outgoing instances of the struct sk_buff structure, generated somewhere in the upper levels of the user-space protocols, must be sent, and the structure instances themselves are then destroyed (or disposed of in the pool).

For at least the last 5-6 years, the network interfaces have been invariably created by the macro:

#define alloc_netdev( sizeof_priv, name, setup )

Here (the details will be clear from the example module):
- sizeof_priv - the size of the private interface data area (struct net_device), which will be created by the kernel without our direct participation;
- name - character string - interface name pattern;
- setup — address of the interface initialization function;

In this almost unchanged form, the process of creating an interface is described everywhere in publications and is mentioned in discussions. But starting from kernel 3.17, the prototype of the interface creation macro changes (<linux / netdevice.h>):

 #define alloc_netdev( sizeof_priv, name, name_assign_type, setup )

As it is easy to see, now instead of 3 parameters 4, the 3rd of which is a constant that determines the numbering order of the created interfaces (based on the name pattern), described in the same definition file:

 /* interface name assignment types (sysfs name_assign_type attribute) */ #define NET_NAME_UNKNOWN 0 /* unknown origin (not exposed to userspace) */ #define NET_NAME_ENUM 1 /* enumerated by kernel */ #define NET_NAME_PREDICTABLE 2 /* predictably named by the kernel */ #define NET_NAME_USER 3 /* provided by user-space */ #define NET_NAME_RENAMED 4 /* renamed by user-space */

This is the first subtlety to pay attention to. We will not go into these details in more detail, it was only important to note them.

But the interface created in this way is not yet operational, it does not perform any actions. In order to “give life” to the created network interface, you need to implement an appropriate set of operations for it. All communication between the network interface and the operations performed on it is carried out through the network interface operation table:

 struct net_device_ops { int (*ndo_init)(struct net_device *dev); void (*ndo_uninit)(struct net_device *dev); int (*ndo_open)(struct net_device *dev); int (*ndo_stop)(struct net_device *dev); netdev_tx_t (*ndo_start_xmit) (struct sk_buff *skb, struct net_device *dev); ... struct net_device_stats* (*ndo_get_stats)(struct net_device *dev); ... }

In the kernel 3.09, for example, 39 operations are defined in struct net_device_ops, and about 50 operations in the 3.14 kernel, but the actually developed modules implement only a small part of them.

It is characteristic that in the interface operations table there is an operation of transferring the socket buffer ndo_start_xmit () to the physical medium, but there is no operation at all to receive packets (socket buffers). This is quite natural, as we will see shortly: received packets (for example, in the hardware interrupt handler IRQ) immediately after reception by calling netif_rx () (or netif_receive_skb ()) are immediately placed in the queue (core) of received packets, and then sequentially processed by the network stack . But it is necessary to execute the ndo_start_xmit () function, at least to call the dev_kfree_skb () kernel API, which utilizes (destroys) the socket buffer after a successful (and unsuccessful) transfer operation of the packet. If this is not done, a weak memory leak (with each packet) will arise in the system, which, ultimately, sooner or later will lead to a system crash. This is another subtlety that we keep in mind.

The last element we need is a struct net_device structure (described in <linux / netdevice.h>) - a description of the network interface. This is a large structure containing not only the hardware description, but also the configuration parameters of the network interface in relation to the underlying protocols (example is taken from kernel 3.09):

 struct net_device { char name[ IFNAMSIZ ] ; ... unsigned long base_addr; /* device I/O address */ unsigned int irq; /* device IRQ number */ ... unsigned mtu; /* interface MTU value */ unsigned short type; /* interface hardware type */ ... struct net_device_stats stats; struct list_head dev_list; ... /* Interface address info. */ unsigned char perm_addr[ MAX_ADDR_LEN ]; /* permanent hw address */ unsigned char addr_len; /* hardware address length */ ... }

Here, the type field, for example, defines the type of hardware adapter in terms of the ARP mechanism for resolving MAC addresses (<linux / if_arp.h>):

 ... #define ARPHRD_ETHER 1 /* Ethernet 10Mbps */ ... #define ARPHRD_ARCNET 7 /* ARCnet */ ... #define ARPHRD_IEEE1394 24 /* IEEE 1394 IPv4 - RFC 2734 */ ... #define ARPHRD_IEEE80211 801 /* IEEE 802.11 */

A private data structure (mentioned earlier) is usually created and associated with the network interface structure, in which the user can place arbitrary own data of any complexity associated with the interface. This is especially true if the driver is supposed to create several of the same type of network interfaces. Access to a private data structure should be determined exclusively by the specially defined netdev_priv () function. The following shows a possible kind of function — this definition is from kernel 3.09, but no one will guarantee that it will not radically change in another kernel:

 /* netdev_priv - access network device private data * Get network device private data */ static inline void *netdev_priv( const struct net_device *dev ) { return (char *)dev + ALIGN( sizeof( struct net_device ), NETDEV_ALIGN ); }

As it is easy to see from the definition, a private data structure is written directly onto the tail of a struct net_device — this is the usual practice of creating structures of variable size accepted in the C language starting from the standard C89 (and in C99).

This building material will be enough for us to build a virtual network interface module.

Virtual Interface Module

Create a module that will intercept network I / O traffic from another, previously existing system (physical or logical) interface, and handle these streams (virt.c file) ...

module code

 #include <linux/module.h> #include <linux/version.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/moduleparam.h> #include <net/arp.h> #define ERR(...) printk( KERN_ERR "! "__VA_ARGS__ ) #define LOG(...) printk( KERN_INFO "! "__VA_ARGS__ ) static char* link = "eth0"; //    module_param( link, charp, 0 ); static char* ifname = "virt"; //    module_param( ifname, charp, 0 ); static struct net_device *child = NULL; struct priv { struct net_device_stats stats; struct net_device *parent; }; static rx_handler_result_t handle_frame( struct sk_buff **pskb ) { struct sk_buff *skb = *pskb; if( child ) { struct priv *priv = netdev_priv( child ); priv->stats.rx_packets++; priv->stats.rx_bytes += skb->len; LOG( "rx: injecting frame from %s to %s", skb->dev->name, child->name ); skb->dev = child; return RX_HANDLER_ANOTHER; } return RX_HANDLER_PASS; } static int open( struct net_device *dev ) { netif_start_queue( dev ); LOG( "%s: device opened", dev->name ); return 0; } static int stop( struct net_device *dev ) { netif_stop_queue( dev ); LOG( "%s: device closed", dev->name ); return 0; } static netdev_tx_t start_xmit( struct sk_buff *skb, struct net_device *dev ) { struct priv *priv = netdev_priv( dev ); priv->stats.tx_packets++; priv->stats.tx_bytes += skb->len; if( priv->parent ) { skb->dev = priv->parent; skb->priority = 1; dev_queue_xmit( skb ); LOG( "tx: injecting frame from %s to %s", dev->name, skb->dev->name ); return 0; } return NETDEV_TX_OK; } static struct net_device_stats *get_stats( struct net_device *dev ) { return &( (struct priv*)netdev_priv( dev ) )->stats; } static struct net_device_ops crypto_net_device_ops = { .ndo_open = open, .ndo_stop = stop, .ndo_get_stats = get_stats, .ndo_start_xmit = start_xmit, }; static void setup( struct net_device *dev ) { int j; ether_setup( dev ); memset( netdev_priv(dev), 0, sizeof( struct priv ) ); dev->netdev_ops = &crypto_net_device_ops; for( j = 0; j < ETH_ALEN; ++j ) // fill in the MAC address with a phoney dev->dev_addr[ j ] = (char)j; } int __init init( void ) { int err = 0; struct priv *priv; char ifstr[ 40 ]; sprintf( ifstr, "%s%s", ifname, "%d" ); #if (LINUX_VERSION_CODE < KERNEL_VERSION(3, 17, 0)) child = alloc_netdev( sizeof( struct priv ), ifstr, setup ); #else child = alloc_netdev( sizeof( struct priv ), ifstr, NET_NAME_UNKNOWN, setup ); #endif if( child == NULL ) { ERR( "%s: allocate error", THIS_MODULE->name ); return -ENOMEM; } priv = netdev_priv( child ); priv->parent = __dev_get_by_name( &init_net, link ); // parent interface if( !priv->parent ) { ERR( "%s: no such net: %s", THIS_MODULE->name, link ); err = -ENODEV; goto err; } if( priv->parent->type != ARPHRD_ETHER && priv->parent->type != ARPHRD_LOOPBACK ) { ERR( "%s: illegal net type", THIS_MODULE->name ); err = -EINVAL; goto err; } /* also, and clone its IP, MAC and other information */ memcpy( child->dev_addr, priv->parent->dev_addr, ETH_ALEN ); memcpy( child->broadcast, priv->parent->broadcast, ETH_ALEN ); if( ( err = dev_alloc_name( child, child->name ) ) ) { ERR( "%s: allocate name, error %i", THIS_MODULE->name, err ); err = -EIO; goto err; } register_netdev( child ); rtnl_lock(); netdev_rx_handler_register( priv->parent, &handle_frame, NULL ); rtnl_unlock(); LOG( "module %s loaded", THIS_MODULE->name ); LOG( "%s: create link %s", THIS_MODULE->name, child->name ); LOG( "%s: registered rx handler for %s", THIS_MODULE->name, priv->parent->name ); return 0; err: free_netdev( child ); return err; } void __exit exit( void ) { struct priv *priv = netdev_priv( child ); if( priv->parent ) { rtnl_lock(); netdev_rx_handler_unregister( priv->parent ); rtnl_unlock(); LOG( "unregister rx handler for %s\n", priv->parent->name ); } unregister_netdev( child ); free_netdev( child ); LOG( "module %s unloaded", THIS_MODULE->name ); } module_init( init ); module_exit( exit ); MODULE_AUTHOR( "Oleg Tsiliuric" ); MODULE_AUTHOR( "Nikita Dorokhin" ); MODULE_LICENSE( "GPL v2" ); MODULE_VERSION( "2.1" );

Everything is simple enough here, but the following points deserve some individual comments:

After creating the alloc_netdev () interface, we bind its operations via the crypto_net_device_ops table. Operations (fields) are defined here: .ndo_open and .ndo_stop (which are called when the interface starts and stops using the command ifconfig up / down), .ndo_get_stats (request for interface statistics), and .ndo_start_xmit (packet transmission).
Through a private data region, we retain a link to the parent interface in us to a specific struct priv structure (the sample files show several different uses for the private region to bind).
In the table of operations there is no (and cannot be logically) function for receiving socket buffers. But by calling netdev_rx_handler_register () (which appeared only in kernel 2.6.36) we can add our own filter function handle_frame () to the queue for processing received packets (for the parent interface), which will be called for each packet coming from this interface.
At the time of adding the filter to the queue, we need to block access to the queue for a short time (otherwise, we might expect an emergency result). This is achieved by calling rtnl_lock () and rtnl_unlock ().
When an outgoing socket buffer is transferred to the network (the start_xmit () function), we simply replace the interface in the socket buffer structure through which the data should be physically sent.
When receiving, on the contrary, socket buffers created in the parent interface are replaced by virtual ones.

How it works?

Choose any existing and operational network interface (in Fedora 16, one of the Ethernet interfaces was called p7p1 - this is a good illustration of the fact that interfaces can have very different names):

 $ ip addr show dev p7p1 3: p7p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:9e:02:02 brd ff:ff:ff:ff:ff:ff inet 192.168.56.101/24 brd 192.168.56.255 scope global p7p1 inet6 fe80::a00:27ff:fe9e:202/64 scope link valid_lft forever preferred_lft forever

We will install our new virtual interface on it and configure it on the IP subnet (192.168.50.0/24), which is different from the original subnet of the p7p1 interface:

 $ sudo insmod virt2.ko link=p7p1 $ sudo ifconfig virt0 192.168.50.2 $ ifconfig virt0 virt0 Link encap:Ethernet HWaddr 08:00:27:9E:02:02 inet addr:192.168.50.2 Bcast:192.168.50.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe9e:202/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:27 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:5027 (4.9 KiB)

The easiest and fastest way to create a return end of communication (do we need to somehow test our work?) For such a new (192.168.50.2/24) subnet on another LAN host, is to create an alias IP for the network interface of this remote host, like :

 $ sudo ifconfig vboxnet0:1 192.168.50.1 $ ifconfig ... vboxnet0 Link encap:Ethernet HWaddr 0A:00:27:00:00:00 inet addr:192.168.56.1 Bcast:192.168.56.255 Mask:255.255.255.0 inet6 addr: fe80::800:27ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:223 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:36730 (35.8 KiB) vboxnet0:1 Link encap:Ethernet HWaddr 0A:00:27:00:00:00 inet addr:192.168.50.1 Bcast:192.168.50.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

(The hypervisor network interface of the VirtualBox virtual machines is shown here, but it doesn’t matter, and you can do exactly the same with the interface of any physical device).

Now from the newly created virtual interface we can check the transparency of the network by sending ICMP:

 $ ping 192.168.50.1 PING 192.168.50.1 (192.168.50.1) 56(84) bytes of data. 64 bytes from 192.168.50.1: icmp_req=1 ttl=64 time=0.371 ms 64 bytes from 192.168.50.1: icmp_req=2 ttl=64 time=0.210 ms 64 bytes from 192.168.50.1: icmp_req=3 ttl=64 time=0.184 ms 64 bytes from 192.168.50.1: icmp_req=4 ttl=64 time=0.242 ms ^C --- 192.168.50.1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3001ms rtt min/avg/max/mdev = 0.184/0.251/0.371/0.074 ms $ sudo tcpdump -i virt0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on virt0, link-type EN10MB (Ethernet), capture size 65535 bytes 00:13:02.228615 IP 192.168.50.1 > 192.168.50.2: ICMP echo request, id 5609, seq 1, length 64 00:13:02.228716 ARP, Request who-has 192.168.50.1 tell 192.168.50.2, length 28 00:13:02.228786 ARP, Reply 192.168.50.1 is-at 0a:00:27:00:00:00 (oui Unknown), length 46 00:13:02.228803 IP 192.168.50.2 > 192.168.50.1: ICMP echo reply, id 5609, seq 1, length 64 00:13:03.227996 IP 192.168.50.1 > 192.168.50.2: ICMP echo request, id 5609, seq 2, length 64 00:13:03.228059 IP 192.168.50.2 > 192.168.50.1: ICMP echo reply, id 5609, seq 2, length 64 00:13:04.228016 IP 192.168.50.1 > 192.168.50.2: ICMP echo request, id 5609, seq 3, length 64 ... 00:14:09.236014 ARP, Request who-has 192.168.50.2 tell 192.168.50.1, length 46 00:14:09.236052 ARP, Reply 192.168.50.2 is-at 08:00:27:9e:02:02 (oui Unknown), length 28 tcpdump: pcap_loop: The interface went down 16 packets captured 16 packets received by filter 0 packets dropped by kernel

And then create (now the opposite, on a remote host) a full-fledged SSH session to a new virtual interface:

 $ ssh 192.168.50.2 Nasty PTR record "192.168.50.2" is set up for 192.168.50.2, ignoring olej@192.168.50.2's password: Last login: Tue Apr 3 10:21:28 2012 from 192.168.1.5 $ uname -a Linux fedora16vm.localdomain 3.3.0-8.fc16.i686 #1 SMP Thu Mar 29 18:33:55 UTC 2012 i686 i686 i386 GNU/Linux $ exit logout Connection to 192.168.50.2 closed. $

With this newly created virtual interface, you can do a lot of exciting experiments in a wide variety of network configurations!

What's next?

An astute reader, even if he carefully read the previous text, has the right to exclaim in this place: “But your virtual interface does not complement, but replaces the parent interface?”. Yes, in the shown variant exactly like this: loading such a module prohibits traffic on the parent interface, but unloading the module again restores it.

This misfortune is easy to help. In order for the newly created virtual network interface to work independently in addition to the parent one, it is necessary:

In filters (both reception and transmission), analyze the IP address field in the socket buffer structure and replace the interface only for IP belonging to the virtual interface.
At the reception, we should separate the processing of socket buffers corresponding to the IP and ARP protocols, because the data structures of these protocols, of course, differ (field struct sk_buff * -> protocol).

It may seem complicated in the verbal description, but in the module code everything is quite simple, and adds no more than 25 lines of code. And this option is given in the archive of examples (virt-full subdirectory, this code is not given here, so as not to overload the text):

 $ ping 192.168.50.2 PING 192.168.50.2 (192.168.50.2) 56(84) bytes of data. 64 bytes from 192.168.50.2: icmp_req=1 ttl=64 time=0.473 ms 64 bytes from 192.168.50.2: icmp_req=2 ttl=64 time=0.256 ms 64 bytes from 192.168.50.2: icmp_req=3 ttl=64 time=0.281 ms ^C --- 192.168.50.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.256/0.336/0.473/0.099 ms $ ping 192.168.56.101 PING 192.168.56.101 (192.168.56.101) 56(84) bytes of data. 64 bytes from 192.168.56.101: icmp_req=1 ttl=64 time=2.63 ms 64 bytes from 192.168.56.101: icmp_req=2 ttl=64 time=0.306 ms 64 bytes from 192.168.56.101: icmp_req=3 ttl=64 time=0.225 ms ^C --- 192.168.56.101 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2002ms rtt min/avg/max/mdev = 0.225/1.053/2.630/1.115 ms $ dmesg | tail -n19 [58382.498200] virt0: no IPv6 routers present [58391.368273] device virt0 entered promiscuous mode [58409.904046] ! rx: IP4 to IP=192.168.50.2 [58409.904050] ! rx: injecting frame from p7p1 to virt0 [58409.904197] ! tx: injecting frame from virt0 to p7p1 [58409.904212] ! rx: ARP for 192.168.50.2 [58409.904214] ! rx: injecting frame from p7p1 to virt0 [58409.904262] ! tx: injecting frame from virt0 to p7p1 [58410.903427] ! rx: IP4 to IP=192.168.50.2 [58410.903431] ! rx: injecting frame from p7p1 to virt0 [58410.903531] ! tx: injecting frame from virt0 to p7p1 [58411.903447] ! rx: IP4 to IP=192.168.50.2 [58411.903451] ! rx: injecting frame from p7p1 to virt0 [58411.903547] ! tx: injecting frame from virt0 to p7p1 [58414.694485] ! rx: ARP for 192.168.56.101 [58414.696846] ! rx: IP4 to IP=192.168.56.101 [58415.696508] ! rx: IP4 to IP=192.168.56.101 [58416.696572] ! rx: IP4 to IP=192.168.56.101 [58419.712245] ! rx: ARP for 192.168.56.101

Archive codes for further experimentation can take here or here .

Source: https://habr.com/ru/post/269987/

All Articles

Virtual network interface

About interfaces in a couple of words

Virtual Interface Module

How it works?

What's next?

More articles: