Another virtual interface

In the previous article , a sketch of the code of the Linux kernel module was shown to create an additional virtual network interface. It was a simplified fragment of a real project that had worked for several years without failures and reclamations, so that it could well serve as a template for further improvement, correction and development.

But such an approach to implementation is, firstly, not the only one, and, secondly, in some situations it may be unacceptable (for example, in an embedded system with a kernel under 2.6.36, where there is no netdev_rx_handler_register () call yet). Below we will consider an alternative with the same functionality, but implementing it on a completely different layer of the TCP / IP network stack.

Network layer protocols

Quite a lot, not to repeat, it is written that the levels (layers) of the TCP / IP network stack do not clearly correspond to 7 levels of the OSI / ISO open systems interaction model (or, to be honest, the OSI model, which is close to the heart of academia, inadequate real-life TCP / IP network). The creation of a virtual interface, in the previous implementation discussed , was performed at the interface level (L2, Level 2 - very roughly corresponding to the OSI link level). The current implementation leverages the capabilities of the network layer (L3).

It is advisable to consider a certain minimum in relation to the network layer facilities, in the volume even slightly wider than necessary for the current task, for the possibilities of its subsequent expansion. At the network level of the network protocol stack (TCP / IP, but not only - all other protocol families are supported here, but today they seem to be of little relevance) processing of such protocols as IP / IPv4 / IPv6, IPX, IGMP, RIP, OSPF, ARP, or add original user protocols. To install the network layer handlers, the network layer API is provided (<linux / netdevice.h>):

struct packet_type { __be16 type; /* This is really htons(ether_type). */ struct net_device *dev; /* NULL is wildcarded here */ int (*func) (struct sk_buff*, struct net_device*, struct packet_type*, struct net_device*); ... struct list_head list; }; extern void dev_add_pack( struct packet_type *pt ); extern void dev_remove_pack( struct packet_type *pt );

In fact, in the core protocol modules we need to add a filter through which the socket buffers from the incoming interface stream pass (the outgoing stream is simpler, as was shown in the previous implementation). The dev_add_pack () function adds another new handler for packages of a given type, implemented by the func () function. The function adds but does not replace the existing handler (including the default handler of the Linux network system). Socket buffers that satisfy the criteria laid down in the struct packet_type structure (according to the protocol type and the dev network interface) are selected (fall into the function) for processing into the function.
')
Note: In the same way (installation of the filter function), new protocols are added and at the higher transport layer of the network stack (on which, for example, UDP, TCP, SCTP protocols are processed). All higher levels (more or less similar to OSI model levels) are not represented in the kernel, and are serviced in user space by BSD socket programming techniques. But all those related to higher levels, the details will no longer be considered in the text.

If we would like to add a new protocol (proprietary), we would have to override its type:

 #define PROTO_ID 0x1234 static struct packet_type test_proto = { __constant_htons( PROT_ID ), ... }

The problem with this would be that the standard IP stack does not know such a protocol, and we will have to assume all of its processing. But our goal is only to override the processing of some packets, then for this we use the constant ETH_P_ALL, indicating that all protocols must pass through the filter (and if the dev field is NULL, then all network interfaces).

For comparison and concretization, we find a large number of protocol identifiers (Ethernet Protocol ID's) in <linux / if_ether.h>, here are some of them, for example:

 #define ETH_P_LOOP 0x0060 /* Ethernet Loopback packet */ #define ETH_P_IP 0x0800 /* Internet Protocol packet */ #define ETH_P_ARP 0x0806 /* Address Resolution packet */ #define ETH_P_PAE 0x888E /* Port Access Entity (IEEE 802.1X) */ #define ETH_P_ALL 0x0003 /* Every packet (be careful!!!) */ ...

In this case, the type field is not an abstract numeric value in the program code, this value in binary form will be entered in the Ethernet header of the frame that is physically sent to the propagation medium:

 struct ethhdr { unsigned char h_dest[ETH_ALEN]; /* destination eth addr */ unsigned char h_source[ETH_ALEN]; /* source ether addr */ __be16 h_proto; /* packet type ID field */ } __attribute__((packed));

(We will need the same description in the code when filling in the struct packet_type structure in the module).

The filter function itself (the func field), which we still have to write, perhaps in the simplest form, is something like this:

 int test_pack_rcv( struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *odev ) { LOG( "packet received with length: %u\n", skb->len ); kfree_skb( skb ); return skb->len; };

The function is shown here mainly because of the obligatory call to kfree_skb (). He, in contrast to the seemingly close dev_kfree_skb () in the transmission channel, does not destroy the socket buffer, but only decrements its usage counter (users field). When installing each additional protocol filter by calling dev_add_pack (), this field of socket buffers will be incremented. You can install several network level filters (in the same or several loadable modules) and they will work all right in the reverse way of installing them, but each of them must execute kfree_skb (). Otherwise, you will have a slow but steady memory leak in the network stack, so its result, like a system crash, will be detected only after a few hours of continuous operation.

This is quite an interesting and not obvious place, so much so that it makes sense to digress and see the source code for the implementation of kfree_skb () (file net / core / skbuff.c):

 void kfree_skb(struct sk_buff *skb) { if (unlikely(!skb)) return; if (likely(atomic_read(&skb->users) == 1)) smp_rmb(); else if (likely(!atomic_dec_and_test(&skb->users))) return; trace_kfree_skb(skb, __builtin_return_address(0)); __kfree_skb(skb); }

Calling kfree_skb () will actually free the socket buffer only in the case of skb-> users == 1, for all other values it will only decrement skb-> users (usage counter).

Now we have enough details to organize the work of the virtual interface, but using, at this time, the network layer of the IP stack.

Virtual Interface Module

We proceed as before : create two module variants — a simplified version of virtl.ko, whose network interface (virt0) replaces the parent network interface, and the full version virt.ko, which analyzes network protocol frames (ARP and IP4) and affects only that traffic. which its interface refers to. The difference is that during the load of the simplified module, the work of the parent interface is temporarily stopped (until the module virtl.ko is unloaded), and when loading the full version both interfaces can work in parallel and independently. The code of the full module is noticeably more cumbersome, and it adds nothing to an understanding of the principles. Further, a simplified version showing the principles is considered in detail, and only later we will minimally touch the full version (its code and test protocol are given in the archive of examples):

the code is quite long here

 #include <linux/module.h> #include <linux/version.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/inetdevice.h> #include <linux/moduleparam.h> #include <net/arp.h> #include <linux/ip.h> #define ERR(...) printk( KERN_ERR "! "__VA_ARGS__ ) #define LOG(...) printk( KERN_INFO "! "__VA_ARGS__ ) #define DBG(...) if( debug != 0 ) printk( KERN_INFO "! "__VA_ARGS__ ) static char* link = "eth0"; module_param( link, charp, 0 ); static char* ifname = "virt"; module_param( ifname, charp, 0 ); static int debug = 0; module_param( debug, int, 0 ); static struct net_device *child = NULL; static struct net_device_stats stats; //     static u32 child_ip; struct priv { struct net_device *parent; }; static char* strIP( u32 addr ) { //  IP    static char saddr[ MAX_ADDR_LEN ]; sprintf( saddr, "%d.%d.%d.%d", ( addr ) & 0xFF, ( addr >> 8 ) & 0xFF, ( addr >> 16 ) & 0xFF, ( addr >> 24 ) & 0xFF ); return saddr; } static int open( struct net_device *dev ) { struct in_device *in_dev = dev->ip_ptr; struct in_ifaddr *ifa = in_dev->ifa_list; /* IP ifaddr chain */ LOG( "%s: device opened", dev->name ); child_ip = ifa->ifa_address; netif_start_queue( dev ); if( debug != 0 ) { char sdebg[ 40 ] = ""; sprintf( sdebg, "%s:", strIP( ifa->ifa_address ) ); strcat( sdebg, strIP( ifa->ifa_mask ) ); DBG( "%s: %s", dev->name, sdebg ); } return 0; } static int stop( struct net_device *dev ) { LOG( "%s: device closed", dev->name ); netif_stop_queue( dev ); return 0; } static struct net_device_stats *get_stats( struct net_device *dev ) { return &stats; } //   static netdev_tx_t start_xmit( struct sk_buff *skb, struct net_device *dev ) { struct priv *priv = netdev_priv( dev ); stats.tx_packets++; stats.tx_bytes += skb->len; skb->dev = priv->parent; //    ()  skb->priority = 1; dev_queue_xmit( skb ); DBG( "tx: injecting frame from %s to %s with length: %u", dev->name, skb->dev->name, skb->len ); return 0; return NETDEV_TX_OK; } static struct net_device_ops net_device_ops = { .ndo_open = open, .ndo_stop = stop, .ndo_get_stats = get_stats, .ndo_start_xmit = start_xmit, }; //   int pack_parent( struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *odev ) { skb->dev = child; //      stats.rx_packets++; stats.rx_bytes += skb->len; DBG( "tx: injecting frame from %s to %s with length: %u", dev->name, skb->dev->name, skb->len ); kfree_skb( skb ); return skb->len; }; static struct packet_type proto_parent = { __constant_htons( ETH_P_ALL ), //   : ETH_P_ARP & ETH_P_IP NULL, pack_parent, (void*)1, NULL }; int __init init( void ) { void setup( struct net_device *dev ) { //   ( GCC) int j; ether_setup( dev ); memset( netdev_priv( dev ), 0, sizeof( struct priv ) ); dev->netdev_ops = &net_device_ops; for( j = 0; j < ETH_ALEN; ++j ) //  MAC   dev->dev_addr[ j ] = (char)j; } int err = 0; struct priv *priv; char ifstr[ 40 ]; sprintf( ifstr, "%s%s", ifname, "%d" ); #if (LINUX_VERSION_CODE < KERNEL_VERSION(3, 17, 0)) child = alloc_netdev( sizeof( struct priv ), ifstr, setup ); #else child = alloc_netdev( sizeof( struct priv ), ifstr, NET_NAME_UNKNOWN, setup ); #endif if( child == NULL ) { ERR( "%s: allocate error", THIS_MODULE->name ); return -ENOMEM; } priv = netdev_priv( child ); priv->parent = dev_get_by_name( &init_net, link ); //   if( !priv->parent ) { ERR( "%s: no such net: %s", THIS_MODULE->name, link ); err = -ENODEV; goto err; } if( priv->parent->type != ARPHRD_ETHER && priv->parent->type != ARPHRD_LOOPBACK ) { ERR( "%s: illegal net type", THIS_MODULE->name ); err = -EINVAL; goto err; } memcpy( child->dev_addr, priv->parent->dev_addr, ETH_ALEN ); memcpy( child->broadcast, priv->parent->broadcast, ETH_ALEN ); if( ( err = dev_alloc_name( child, child->name ) ) ) { ERR( "%s: allocate name, error %i", THIS_MODULE->name, err ); err = -EIO; goto err; } register_netdev( child ); //    proto_parent.dev = priv->parent; dev_add_pack( &proto_parent ); //      LOG( "module %s loaded", THIS_MODULE->name ); LOG( "%s: create link %s", THIS_MODULE->name, child->name ); return 0; err: free_netdev( child ); return err; } void __exit virt_exit( void ) { struct priv *priv= netdev_priv( child ); dev_remove_pack( &proto_parent ); //    unregister_netdev( child ); dev_put( priv->parent ); free_netdev( child ); LOG( "module %s unloaded", THIS_MODULE->name ); LOG( "=============================================" ); } module_init( init ); module_exit( virt_exit ); MODULE_AUTHOR( "Oleg Tsiliuric" ); MODULE_LICENSE( "GPL v2" ); MODULE_VERSION( "3.7" );

Everything is quite transparent:

After registering a new network interface (virt0), it makes a call to dev_add_pack (), which sets a filter of received packets for the parent interface;
The dev field on the parent interface pointer is pre-installed in the packet_type structure: only from this interface will incoming traffic be intercepted by the function pack_parent () defined in the structure;
This function captures the statistics of the interface and, most importantly, replaces the pointer of the parent interface to the virtual one in the socket buffer.
Reverse substitution (virtual to physical) occurs in the send frame start_xmit () function.

Here's how it works:

On the tested computer, we load the module and configure it on a separate new subnet:

 $ ip address ... 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:52:b9:e0 brd ff:ff:ff:ff:ff:ff inet 192.168.1.21/24 brd 192.168.1.255 scope global eth0 inet6 fe80::a00:27ff:fe52:b9e0/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:0f:13:6d brd ff:ff:ff:ff:ff:ff inet 192.168.56.102/24 brd 192.168.56.255 scope global eth1 inet6 fe80::a00:27ff:fe0f:136d/64 scope link valid_lft forever preferred_lft forever $ sudo insmod virt.ko link=eth1 debug=1 $ sudo ifconfig virt0 192.168.50.19 $ sudo ifconfig virt0 virt0 Link encap:Ethernet HWaddr 08:00:27:0f:13:6d inet addr:192.168.50.19 Bcast:192.168.50.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe0f:136d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:46 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:8373 (8.1 KiB)

(This shows statistics with zero number of bytes received on the interface).

On the computer from which we are testing, we create an alias IP for the new subnet (192.168.50.0/24) and can carry traffic to the created interface:

 $ sudo ifconfig vboxnet0:1 192.168.50.1 $ ping 192.168.50.19 PING 192.168.50.19 (192.168.50.19) 56(84) bytes of data. 64 bytes from 192.168.50.19: icmp_req=1 ttl=64 time=0.627 ms 64 bytes from 192.168.50.19: icmp_req=2 ttl=64 time=0.305 ms 64 bytes from 192.168.50.19: icmp_req=3 ttl=64 time=0.326 ms ^C --- 192.168.50.19 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.305/0.419/0.627/0.148 ms

On the same (testing) computer (counterpart), it is very informative to observe traffic (in a separate terminal) recorded by tcpdump:

 $ sudo tcpdump -i vboxnet0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vboxnet0, link-type EN10MB (Ethernet), capture size 65535 bytes ... 18:41:01.740607 ARP, Request who-has 192.168.50.19 tell 192.168.50.1, length 28 18:41:01.741104 ARP, Reply 192.168.50.19 is-at 08:00:27:0f:13:6d (oui Unknown), length 28 18:41:01.741116 IP 192.168.50.1 > 192.168.50.19: ICMP echo request, id 8402, seq 1, length 64 18:41:01.741211 IP 192.168.50.19 > 192.168.50.1: ICMP echo reply, id 8402, seq 1, length 64 18:41:02.741164 IP 192.168.50.1 > 192.168.50.19: ICMP echo request, id 8402, seq 2, length 64 18:41:02.741451 IP 192.168.50.19 > 192.168.50.1: ICMP echo reply, id 8402, seq 2, length 64 18:41:03.741163 IP 192.168.50.1 > 192.168.50.19: ICMP echo request, id 8402, seq 3, length 64 18:41:03.741471 IP 192.168.50.19 > 192.168.50.1: ICMP echo reply, id 8402, seq 3, length 64 18:41:06.747701 ARP, Request who-has 192.168.50.1 tell 192.168.50.19, length 28 18:41:06.747715 ARP, Reply 192.168.50.1 is-at 0a:00:27:00:00:00 (oui Unknown), length 28

Expanding opportunities

Now, briefly, in two words, on how to make a full-fledged virtual interface that works only with its own traffic and does not disrupt the operation of the parent interface (what the full version of the module does in the archive). For this you need:

Declare two separate protocol handlers (for ARP name resolution protocols and for IP protocol itself):

 //   ETH_P_ARP int arp_pack_rcv( struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *odev ) { ... return skb->len; }; static struct packet_type arp_proto = { __constant_htons( ETH_P_ARP ), NULL, arp_pack_rcv, //   ETH_P_ARP (void*)1, NULL }; //   ETH_P_IP int ip4_pack_rcv( struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *odev ) { ... return skb->len; }; static struct packet_type ip4_proto = { __constant_htons( ETH_P_IP ), NULL, ip4_pack_rcv, //   ETH_P_IP (void*)1, NULL };

Both register them sequentially in the module initialization function:

  arp_proto.dev = ip4_proto.dev = priv->parent; //      dev_add_pack( &arp_proto ); dev_add_pack( &ip4_proto );

Each of the installed filters should replace the interface only for those frames whose IP of the recipient matches the IP of the interface ...
Two separate handlers are convenient in that the headers of the ARP and IP frames have a completely different format, and it is necessary to allocate IP assignments in them in different ways (all the full code is shown in the archive of the example).

Using such a full-fledged module, you can open to the host, for example, two parallel SSH sessions on different interfaces (using different IP), which will in parallel actually use a single common physical interface:

 $ ssh olej@192.168.50.17 olej@192.168.50.17's password: Last login: Mon Jul 16 15:52:16 2012 from 192.168.1.9 ... $ ssh olej@192.168.56.101 olej@192.168.56.101's password: Last login: Mon Jul 16 17:29:57 2012 from 192.168.50.1 ... $ who olej tty1 2012-07-16 09:29 (:0) olej pts/0 2012-07-16 09:33 (:0.0) ... olej pts/6 2012-07-16 17:29 (192.168.50.1) olej pts/7 2012-07-16 17:31 (192.168.56.1)

The last command shown (who) is executed already in an SSH session, that is, on the same remote host to which two independent connections from two different subnets are fixed (the last two lines of output), which actually represent one host, but from the point of view its various network interfaces.

Further clarification

In preparing and debugging examples of modules, for more details, this (fairly fresh) book was actively used: Rami Rosen: “Linux Kernel Networking: Implementation and Theory”, Apress, 650 pages, 2014, ISBN-13: 978-1-4302 -6196-4.

The author kindly provided it for free download even before the book was released for sale (2013-12-22). You can download it on this page .

Anyone who is interested in issues like the ones discussed in this article will be able to find in this edition a lot of ideas for the further development of the technology of own use of network interfaces.

The archive of codes mentioned in the text for experiments and further development can be found here or here .

Source: https://habr.com/ru/post/270517/

All Articles

Another virtual interface

Network layer protocols

Virtual Interface Module

Expanding opportunities

Further clarification

More articles: