-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/linux: enhanced descs for io_uring #1926
Conversation
The code does not compile due to this, which will be addressed shortly. |
I missed that some offsets are dynamic and depend on size. For check_add_overflow/struct_size/etc, I think we should ignore the possibility of overflows and just open-code the operations they do. First, we are a fuzzer and if we do more nonsense, nothing bad happens. Maybe the opposite: the more nonsense, the better! :) For the static offsets I was thinking about calling io_uring_create inside of syz_io_uring_submit and using the values it returns rather then hardcoding them in executor. These offsets may change across kernel versions. And just less logic in executor this way. |
FTR, the array offset calculation in the kernel seems wrong to me: |
Yes, and for SMP and cache-line alignment, I think we should assume the kernel is SMP maybe with a comment about this assumption. First, non-SMP kernels are quite rare today (maybe only the smallest ARM boards), definitely not anything we care about right now. Second, figuring out if kernel was build as SMP or not may be tricky and can only be done dynamically. |
The offsets calculation does not look too complex. I used test program for experimentation:
I see 2 options:
Which one is better?... Hard to say 2 is less hard-coded things that duplicate kernel logic and can break in future (but still some), but 1 is simpler and faster... |
Codecov Report
|
There are few remaining TODO's, please finalize it, and we will do final review passes. |
One additional thing we need since this is a tricky subsystem and we use custom pseudo-syscalls is a test program. |
OK, so we have syz_io_uring_complete return an fd in some cases. |
The test to test the new descriptions and the pseudo-calls now added, which indeed helped as I found a bug in one of the pseudo-calls' implementation. |
While writing the mmap calls' arguments in the tests, something drew my attention. In the current state of the descriptions, the length arg for the On the other hand, if we want the fuzzer to be able to utilize the ring in full, we may want to compute the size for the memory region we want to map (as in here) and call the mmap with that size (as in here). I cannot think of a way to achieve the exact computation without another pseudo-call. One approach might be to have Another approach might be to set lower&upper bounds for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the TODO re detecting supported syscalls this looks good to me.
Introduced pseudo-call "syz_io_uring_put_sqes_on_ring()" for writing submission queue entries (sqes) on sq_ring, which was obtained by mmap'ping the offsets obtained from io_uring_setup(). Added descriptions for io_ring_register operations that were missing earlier. Did misc changes to adapt the descriptions for the updates on the io_uring subsystem.
addr and len are for the buffer located at buf_index
As a result, IOSQE_BUFFER_SELECT_BIT is included in the iosqe_flags.
This is required with the fix in io_uring kernel code. https://lore.kernel.org/io-uring/CACT4Y+bgTCMXi3eU7xV+W0ZZNceZFUWRTkngojdr0G_yuY8w9w@mail.gmail.com/T/#t
The usage of cq_ring->flags is only for manipulating IORING_CQ_EVENTFD_DISABLED bit. This is achieved by a pseudo-syscall, which toggles the bit.
Removed syz_io_uring_cq_eventfd_toggle() and introduced syz_io_uring_put_ring_metadata() instead. We have many pieces of metadata for both sq_ring and cq_ring, for which we are given the offsets, and some of are not supposed to be manipulated by the application. Among them, both sq and cq flags can be changed. Both valid and invalid cases might cause interesting outcomes. Use the newly introduced pseudo syscall to manipulate them randomly while also manipulating the flags to their special values.
Removed syz_io_uring_put_ring_metadata() and instead added a much more generic pseudo systemcall to achieve the task. This should benefit other subsystems as well.
syz_io_uring_submit() is called with a union of sqes to reduce duplication of other parameters of the function. io_uring_sqe is templated with io_uring_sqe_t, and this template type is used to describe sqes for different ops. The organization of io_uring.txt is changed.
The files are registered using io_uring_register$IORING_REGISTER_FILES(). When IOSQE_FIXED_FILE_BIT is enabled in iosqe_flags in sqe, a variety of operations can use those registered files using the index of the file instead of fd. Changed the sqe descriptions for the eligible operations to utilize this.
…sqes A personality_id can be registered for a io_uring fd using io_uring_register$IORING_REGISTER_PERSONALITY(). This id can be utilized within sqes. This commit improves the descs for io_uring to utilize it. In addition, the descriptions for the misc field in io_uring_sqe_t is refactored as most are shared among sqes.
io_uring_cqe.res is used to carry the return value of operations achieved through io_uring. The only operations with meaningful return values (in terms of their possible usage) are openat and openat2. The pseudo-syscall syz_io_uring_complete() is modified to account for this and return those fds. The description for sqe_user_data is splitted into two to identify openat and non-openat io_uring ops. IORING_OP_IOCTL was suggested but never supported in io_uring. Thus, the note on this is removed in the descriptions. tee() expects pipefds, thus, IORING_OP_TEE. The descriptions for the pipe r/w fds are written as ordinary fd. Thus, in the description for IORING_OP_TEE, which is io_uring_sqe_tee, fd is used in the place where pipefds are expected. The note on this is removed in the descriptions.
This is not tested yet.
The changes successfully pass the sys/linux/test/io_uring test. sys/linux/io_uring.txt: sq_ring_ptr and cq_ring_ptr are really the same. Thus, they are replaced with ring_ptr. executor/common_linux.h: thanks to io_uring test, a bug is found in where the sq_array's address is computed in syz_io_uring_submit(). Fixed. In addition, similar to the descriptions, the naming for the ring_ptr is changed from {sq,cq}_ring_ptr to ring_ptr.
Used a smaller range to ease the collisions. Used comperatively unique and magic numbers for openat user_data to avoid thinking as if the cqe belongs to openat while the user_data is coming from some random location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minus the linter warnings.
Thanks for your valuable reviews! The last commit fixes the io_uring test and the linter warnings. This should be ready for the final review passes @xairy |
Update #533
@dvyukov @xairy