📜 ⬆️ ⬇️

Find the system call code mkdir

Even in the first year of my acquaintance with Ubuntu, I wondered how the core of this OS works. I decided with all seriousness to figure it out, downloaded the 80-megabyte source archive, and ... everything! I did not know where to start, nothing to finish. He opened random files one by one and was immediately lost. I think this happened to every experienced Linux user. Now I have gained experience and would like to share it.

In this article I will explain how I searched for the mkdir system call code.

To begin with, the mkdir function is defined in sys/stat.h The prototype is as follows:
')
 /* Create a new directory named PATH, with permission bits MODE. */ extern int mkdir (__const char *__path, __mode_t __mode) __THROW __nonnull ((1)); 


This header file is part of the POSIX standard. Linux is almost completely POSIX-compatible, which means it must implement mkdir with exactly this signature.

But even knowing the signature, it’s not easy to find the actual system call code ...

And in truth, ack "int mkdir" will return:

security / inode.c
103: static int mkdir (struct inode * dir, struct dentry * dentry, int mode)

tools / perf / util / util.c
4: int mkdir_p (char * path, mode_t mode)

tools / perf / util / util.h
259: int mkdir_p (char * path, mode_t mode);


It is clear that no signature completely matches. Where is the implementation of the mkdir function? What is the algorithm for finding implementations of system calls in the Linux kernel?

System calls do not work like normal functions. They are not functions at all. To perform a system call, you need some assembly code. By and large, the system call number is put in the EAX register (by the way, system calls are addressed by number, not by address), arguments are put in the other registers: the first in EBX, the second in ECX, the third in EDX, the fourth in ESX, fifth in edi. By the way, this is why a system call cannot have more than 5 arguments. After locating all the necessary values, the program that wants to make a system call performs the 128th interrupt (on the assembler: int 0x80). Interruption, transfers the processor to the kernel mode and transfers control to the address specified in advance with the kernel. As you can see, the system calls operate at a lower level than the C functions.

The number of any system call can be found in usr/include/asm*/unistd.h :
 #define __NR_mkdir 83 __SYSCALL(__NR_mkdir, sys_mkdir) 


That is the mkdir system call has number 83.

If you programmed in user space under Linux, you most likely know that, as a rule, C functions are used to make system calls. Where do they come from? Those functions are just wrappers from the GNU libc library. Each system call has a corresponding wrapper function. The functions themselves do all the same interrupts.

So where do you look for mkdir now? In theory, a system call could be implemented simply in assembly language, then the corresponding C function simply would not exist, but this is not so. On Linux, each system call is defined in include/linux/syscalls.h :
 asmlinkage long sys_mkdir(const char __user *pathname, int mode); 


The implementation is in the appropriate part of the kernel. Here you just need to know that mkdir is part of the VFS file subsystem and is defined in fs/namei.c :
 SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) { return sys_mkdirat(AT_FDCWD, pathname, mode); } 


SYSCALL_DEFINE2 is one of the SYSCALL_DEFINEx series macros, where x is the number of arguments to the system call. In the above code, another system call is called - sys_mkdirat, which is also located in fs/namei.c . Note that here the system call is executed by a function call, because the calling code is already executed in kernel mode.

 SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode) { struct dentry *dentry; struct path path; int error; dentry = user_path_create(dfd, pathname, &path, 1); if (IS_ERR(dentry)) return PTR_ERR(dentry); if (!IS_POSIXACL(path.dentry->d_inode)) mode &= ~current_umask(); error = security_path_mkdir(&path, dentry, mode); if (!error) error = vfs_mkdir(path.dentry->d_inode, dentry, mode); done_path_create(&path, dentry); return error; } 


And here is already interesting! We meet the first checks and, again, the transfer of control to another function, vfs_mkdir, which is defined in the same fs/namei.h :

 int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { int error = may_create(dir, dentry); unsigned max_links = dir->i_sb->s_max_links; if (error) return error; if (!dir->i_op->mkdir) return -EPERM; mode &= (S_IRWXUGO|S_ISVTX); error = security_inode_mkdir(dir, dentry, mode); if (error) return error; if (max_links && dir->i_nlink >= max_links) return -EMLINK; error = dir->i_op->mkdir(dir, dentry, mode); if (!error) fsnotify_mkdir(dir, dentry); return error; } 


More checks, more control transfers. It is worth saying that Linux is a very multi-level system, where duties are distributed across different parts of the system. Therefore, it is not strange that in the code above the logic is constantly delegated. In the last piece of code dir-> i_op-> mkdir (dir, dentry, mode) is called. Follow the trail! dir is of type inode *. From the definition of the inode structure, we learn that the i_op pointer is of the type inode_operations *. The latter structure contains pointers to functions of operations that can be performed on this node, and the implementations are different for different file systems. That is, depending on what file system our dir belongs to, the inode_operations structure will contain pointers to certain implementations.

For example for ext4, we find the mkdir implementation in fs / ext4 / namei.c

Actually so the desired code!
 static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) { handle_t *handle; struct inode *inode; struct buffer_head *dir_block = NULL; struct ext4_dir_entry_2 *de; struct ext4_dir_entry_tail *t; unsigned int blocksize = dir->i_sb->s_blocksize; int csum_size = 0; int err, retries = 0; if (EXT4_HAS_RO_COMPAT_FEATURE(dir->i_sb, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)) csum_size = sizeof(struct ext4_dir_entry_tail); if (EXT4_DIR_LINK_MAX(dir)) return -EMLINK; dquot_initialize(dir); retry: handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb)); if (IS_ERR(handle)) return PTR_ERR(handle); if (IS_DIRSYNC(dir)) ext4_handle_sync(handle); inode = ext4_new_inode(handle, dir, S_IFDIR | mode, &dentry->d_name, 0, NULL); err = PTR_ERR(inode); if (IS_ERR(inode)) goto out_stop; inode->i_op = &ext4_dir_inode_operations; inode->i_fop = &ext4_dir_operations; inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize; if (!(dir_block = ext4_bread(handle, inode, 0, 1, &err))) { if (!err) { err = -EIO; ext4_error(inode->i_sb, "Directory hole detected on inode %lu\n", inode->i_ino); } goto out_clear_inode; } BUFFER_TRACE(dir_block, "get_write_access"); err = ext4_journal_get_write_access(handle, dir_block); if (err) goto out_clear_inode; de = (struct ext4_dir_entry_2 *) dir_block->b_data; de->inode = cpu_to_le32(inode->i_ino); de->name_len = 1; de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len), blocksize); strcpy(de->name, "."); ext4_set_de_type(dir->i_sb, de, S_IFDIR); de = ext4_next_entry(de, blocksize); de->inode = cpu_to_le32(dir->i_ino); de->rec_len = ext4_rec_len_to_disk(blocksize - (csum_size + EXT4_DIR_REC_LEN(1)), blocksize); de->name_len = 2; strcpy(de->name, ".."); ext4_set_de_type(dir->i_sb, de, S_IFDIR); set_nlink(inode, 2); if (csum_size) { t = EXT4_DIRENT_TAIL(dir_block->b_data, blocksize); initialize_dirent_tail(t, blocksize); } BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata"); err = ext4_handle_dirty_dirent_node(handle, inode, dir_block); if (err) goto out_clear_inode; set_buffer_verified(dir_block); err = ext4_mark_inode_dirty(handle, inode); if (!err) err = ext4_add_entry(handle, dentry, inode); if (err) { out_clear_inode: clear_nlink(inode); unlock_new_inode(inode); ext4_mark_inode_dirty(handle, inode); iput(inode); goto out_stop; } ext4_inc_count(handle, dir); ext4_update_dx_flag(dir); err = ext4_mark_inode_dirty(handle, dir); if (err) goto out_clear_inode; unlock_new_inode(inode); d_instantiate(dentry, inode); out_stop: brelse(dir_block); ext4_journal_stop(handle); if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries)) goto retry; return err; } 


Understanding the work of the kernel is a useful skill in the arsenal of any Linux user. I hope this article was also helpful!

Resources:
Linux system call table
Linux sources 3.6
Discussion on unix.stackexchange.com

Source: https://habr.com/ru/post/155645/


All Articles