📜 ⬆️ ⬇️

Making the code cleaner: When does the use of the devres API harm?

Managed resources in the Linux kernel (also known as Device Resource Management or the devres API), about which I wrote a short note earlier, are extremely useful things, but you should not take this auxiliary feature set as a silver bullet when writing drivers or modifying existing ones. Consider cases where you need to carefully apply these methods.

Registering an interrupt handler with tasklets


About interruptions and tasklets, they are clearly described in an article in the EmBox blog, so it is assumed that the reader is already familiar with this or similar material.

Take for example the following pseudocode:
struct my_struct {struct tasklet_struct *tasklet; int irq; }; void tasklet_handler(…) { do_the_things_right(…); } irqreturn_t irq_handler(void *param) { struct my_struct *ms = param; … tasklet_schedule(&ms->tasklet); return IRQ_HANDLED; } int probe(…) { struct my_struct *ms; int err; ms = devm_kzalloc(…); … tasklet_init(&ms->tasklet, tasklet_handler, (unsigned long)ms); … err = devm_request_irq(ms->irq, irq_handler, …, ms); if (err) return err; return 0; } int remove(…) { struct my_struct *ms = …; … tasklet_kill(&ms->tasklet); } 

The attentive reader will immediately exclaim: “So here the race conditions leading to an endless cycle!” Will be right.
')
Let's see why. Tasklets are executed in the context of soft interrupts (softirq), and therefore there is a possibility of a delay between scheduling ( tasklet_schedule() ) and the execution of a task. At this time, the removal of the driver from memory can occur, the user called rmmod my_module . Of course, we explicitly call deleting a tasklet, see tasklet_kill() , but the interrupt handler is still active , since we used the devres API and planned to remove it in the order of the queue after the execution ->remove() !

How to cure? It's very simple, watch your hands:
 int remove(…) { struct my_struct *ms = …; … devm_free_irq(ms->irq, ms); tasklet_kill(&ms->tasklet); } 

Just that rare case when we use devres API at the time of object removal.

What hides a driver, for example, a character device?


Consider now the following pseudocode:
 int closecb(…) { struct my_struct *ms = …; do_something_on_close(ms, …); } struct file_ops fops = { .close = closecb, … }; int probe(…) { struct my_struct *ms; int err; ms = devm_kzalloc(…); … err = register_char_device(ms, "node_served_by_driver", &fops, …); if (err) return err; return 0; } int remove(…) { struct my_struct *ms = …; … } 

Now imagine such a scenario:
  1. Make sure that the driver is loaded and attached to the device.
  2. Open /dev/node_served_by_driver , and make it so that the device remains open.
  3. Let us untie the driver from the device, for example, by running the command:
     echo our_device_name > /sys/bus/platform/drivers/our_driver_name/unbind 
    or simply by disconnecting the device from the bus, if possible, for example, by disconnecting the USB drive.
  4. Now close the device.
  5. Enjoying the fall of the nucleus.

Why it happens? Yes, because the memory allocated at the stage ->probe() released at the time of unlinking the device. And we still use this room! In this case, the device driver is not deleted and cannot be removed, because held by the program that opened the device, and remains in memory until the moment of explicit closing and deletion.

How to treat? Easy too. Do not use memory allocated with devm_kzalloc() in file operations in the driver, carefully monitor the lifetime of objects. According to the devres API author, the prefix dev is not just there, but with the goal of indicating that resources are directly related to hardware, and not to handling events from the user.

PS In fact, the problem is wider, and it rises to discuss the future of the Kernel Summit 2015.

Good luck in debugging!

Source: https://habr.com/ru/post/265111/


All Articles