Age | Commit message (Collapse) | Author |
---|
| |
| Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| We are resetting the actual lock so we should reset this information at the same time. Previously this caused an assertion to fail in debug mode. Notes: Merged: https://github.com/ruby/ruby/pull/12981 |
| Notes: Merged: https://github.com/ruby/ruby/pull/12740 |
| When a dedicated native thread fails to create (OS can't create more threads), don't add the ruby-level thread to the thread scheduler. If it's added, then next time the thread is scheduled to run it will sleep forever, waiting for a native thread that doesn't exist to pick it up. Notes: Merged: https://github.com/ruby/ruby/pull/12441 |
| This commit adds an environment variable `RUBY_THREAD_TIMESLICE` for specifying the default thread quantum in milliseconds. You can adjust this variable to tune throughput, which is especially useful on multithreaded systems that are mixing CPU bound work and IO bound work. The default quantum remains 100ms. [Feature #20861] Co-Authored-By: John Hawthorn <john@hawthorn.email> Notes: Merged: https://github.com/ruby/ruby/pull/11981 |
| Many libraries should be loaded on the main ractor because of setting constants with unshareable objects and so on. This patch allows to call `requore` on non-main Ractors by asking the main ractor to call `require` on it. The calling ractor waits for the result of `require` from the main ractor. If the `require` call failed with some reasons, an exception objects will be deliverred from the main ractor to the calling ractor if it is copy-able. Same on `require_relative` and `require` by `autoload`. Now `Ractor.new{pp obj}` works well (the first call of `pp` requires `pp` library implicitly). [Feature #20627] Notes: Merged: https://github.com/ruby/ruby/pull/11142 |
| This code was working around a bug in the Linux kernel. It was previously possible for the kernel to place heap pages in a region where the stack was allowed to grow into, and then therefore run out of usable stack memory before RLIMIT_STACK was reached. This bug was fixed in Linux commit https://github.com/torvalds/linux/commit/c204d21f2232d875e36b8774c36ffd027dc1d606 for kernel 4.13 in 2017. Therefore, in 2024, we should be safe to delete this workaround. [Bug #20804] Notes: Merged: https://github.com/ruby/ruby/pull/11927 |
| Notes: Merged: https://github.com/ruby/ruby/pull/11835 |
| Notes: Merged: https://github.com/ruby/ruby/pull/11720 |
| |
| [Feature #20590] For better of for worse, fork(2) remain the primary provider of parallelism in Ruby programs. Even though it's frowned uppon in many circles, and a lot of literature will simply state that only async-signal safe APIs are safe to use after `fork()`, in practice most APIs work well as long as you are careful about not forking while another thread is holding a pthread mutex. One of the APIs that is known cause fork safety issues is `getaddrinfo`. If you fork while another thread is inside `getaddrinfo`, a mutex may be left locked in the child, with no way to unlock it. I think we could reduce the impact of these problem by preventing in for the most notorious and common cases, by locking around `fork(2)` and known unsafe APIs with a read-write lock. Notes: Merged: https://github.com/ruby/ruby/pull/10864 |
| Previously under certain conditions it was possible to encounter a deadlock in the forked child process if ractor.sched.lock was held. Co-authored-by: Nathan Froyd <froydnj@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/11356 |
| `th` is gone. |
| Introduce `struct rb_thread_sched_waiting` and `timer_th.waiting` can contain other than `rb_thread_t`. |
| On the VM barrier waiting, it needs to store machine context as a GC root. Also it needs to wait for barrier synchronization correctly by `while` (for continuous barrier sync by another ractor). This is why GC marking misses are occerred on ARM machines. like: https://rubyci.s3.amazonaws.com/fedora40-arm/ruby-master/log/20240702T063002Z.fail.html.gz |
| After an upgrade to Ruby 3.3.0, I experienced reproducible production crashes of the form: [BUG] pthread_kill: No such process (ESRCH) This is the only pthread_kill call in Ruby. The result of pthread_kill was previously ignored in Ruby 3.2 and below. Checking the result was added in be1bbd5b7d40ad863ab35097765d3754726bbd54 (MaNy). I have not yet been able to create a minimal self-contained example, but it should be safe to remove the checks. |
| |
| It's really a property of the EC; each fiber (which has its own EC) also has its own asan_fake_stack_handle. [Bug #20310] |
| Introduce `rb_thread_lock_native_thread()` to allocate dedicated native thread to the current Ruby thread for M:N threads. This C API is similar to Go's `runtime.LockOSThread()`. Accepted at https://github.com/ruby/dev-meeting-log/blob/master/2023/DevMeeting-2023-08-24.md (and missed to implement on Ruby 3.3.0) |
| In a similar way to how we do it with fibers in cont.c, we need to call __sanitize_start_switch_fiber and __sanitize_finish_switch_fiber around the call to coroutine_transfer to let ASAN save & restore the fake stack pointer. When a M:N thread is exiting, we pass `to_dead` to the new coroutine_transfer0 function, so that we can pass NULL for saving the stack pointer. This signals to ASAN that the fake stack can be freed (otherwise it would be leaked) [Bug #20220] |
| |
| xfree can handle null values, so we don't need to check it. |
| Where a local variable is used as part of the stack bounds detection, it has to actually be on the stack. ASAN can put local variable on "fake stacks", however, with addresses in different memory mappings. This completely destroys the stack bounds calculation, and can lead to e.g. things not getting GC marked on the machine stack or stackoverflow checks that always fail. The __asan_addr_is_in_fake_stack helper can be used to get the _real_ stack address of such variables, and thus perform the stack size calculation properly [Bug #20001] |
| This commit changes how stack extents are calculated for both the main thread and other threads. Ruby uses the address of a local variable as part of the calculation for machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually too low (too close to the leaf function call) in both the main thread case and the new thread case. In the main thread case, we have the `INIT_STACK` macro, which is used for pthreads to set the `native_main_thread->stack_start` value. This value is correctly captured at the very top level of the program (in main.c). However, this is _not_ what's used to set the execution context machine stack (`th->ec->machine_stack.stack_start`); that gets set as part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the address of a local variable allocated _inside_ `Init_BareVM`. This is too low; we need to use a local allocated closer to the top of the program. In the new thread case, the lolcal is allocated inside `native_thread_init_stack`, which is, again, too low. In both cases, this means that we might have VALUEs lying outside the bounds of `th->ec->machine.stack_{start,end}`, which won't be marked correctly by the GC machinery. To fix this, * In the main thread case: We already have `INIT_STACK` at the right level, so just pass that local var to `ruby_thread_init_stack`. * In the new thread case: Allocate the local one level above the call to `native_thread_init_stack` in `call_thread_start_func2`. [Bug #20001] fix |
| This reverts commit 4ba8f0dc993953d3ddda6328e3ef17a2fc2cbde5. |
| This reverts commit 6185cfdf38e26026c6d38220eeca48689e54cdcf. |
| Where a local variable is used as part of the stack bounds detection, it has to actually be on the stack. ASAN can put local variable on "fake stacks", however, with addresses in different memory mappings. This completely destroys the stack bounds calculation, and can lead to e.g. things not getting GC marked on the machine stack or stackoverflow checks that always fail. The __asan_addr_is_in_fake_stack helper can be used to get the _real_ stack address of such variables, and thus perform the stack size calculation properly [Bug #20001] |
| The implementation of `native_thread_init_stack` for the various threading models can use the address of a local variable as part of the calculation of the machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually allocated _inside_ the `native_thread_init_stack` frame; that means the caller might allocate a VALUE on the stack that actually lies outside the bounds stored in machine.stack_{start,end}. A local variable from one level above the topmost frame that stores VALUEs on the stack must be drilled down into the call to `native_thread_init_stack` to be used in the calculation. This probably doesn't _really_ matter for the win32 case (they'll be in the same memory mapping so VirtualQuery should return the same thing), but definitely could matter for the pthreads case. [Bug #20001] |
| |
| |
| `hrrel / RB_HRTIME_PER_MSEC` floor the timeout value and it can return wrong value by `Mutex#sleep` (return Integer even if it should return nil (timeout'ed)). This patch ceil the value and the issue was solved. |
| * Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD) * Include sys/event.h header check for macros, and include sys/event.h when present * Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue * MAP_STACK is not available on macOS so conditionall apply it to mmap flags * Set fd to close on exec * Log debug messages specific to kqueue and epoll on creation * close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior Use kq with rb_thread_wait_for_single_fd * Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied * Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised) * Also handle kevent specific errnos, since most don't apply from epoll to kqueue * Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use * We're making changes in this file anyways - may as well fix a couple spelling mistakes while here Make sure FD_CLOEXEC carries over in dup * Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec |
| `sched->lock_owner` can be non-NULL at fork because the timer thread can acquire the lock while forking. `lock_owner` information is for debugging, so we only need to clear it at fork. I hope this patch fixes the following assertion failure: ``` thread_pthread.c:354:thread_sched_lock_:sched->lock_owner == NULL ``` |
| |
| This reverts commit ad54fbf281ca1935e79f4df1460b0106ba76761e. |
| [Bug #20019] This fixes GVL instrumentation in three locations it was missing: - Suspending when blocking on a Ractor - Suspending when doing a coroutine transfer from an M:N thread - Resuming after an M:N thread starts Co-authored-by: Matthew Draper <matthew@trebex.net> |
| Followup: https://github.com/ruby/ruby/pull/9029 [Bug #20019] Some events still weren't triggered from the right place. The test suite was also improved a bit more. |
| This entirely changes how it is tested. Rather than to use counters we now record the timeline of events with associated threads which makes it much easier to assert that certains events are only preceded by a specific event, and makes it much easier to debug unexpected timelines. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com> Co-Authored-By: JP Camara <jp@jpcamara.com> Co-Authored-By: John Hawthorn <john@hawthorn.email> |
| Context: https://github.com/ivoanjo/gvl-tracing/pull/4 Some hooks may want to collect data on a per thread basis. Right now the only way to identify the concerned thread is to use `rb_nativethread_self()` or similar, but even then because of the thread cache or MaNy, two distinct Ruby threads may report the same native thread id. By passing `thread->self`, hooks can use it as a key to store the metadata. NB: Most hooks are executed outside the GVL, so such data collection need to use a thread-safe data-structure, and shouldn't use the reference in other ways from inside the hook. They must also either pin that value or handle compaction. |
| |
| If `RUBY_MN_THREADS=1` is given, this patch shows `+MN` in `RUBY_DESCRIPTION` like: ``` $ RUBY_MN_THREADS=1 ./miniruby --yjit -v ruby 3.3.0dev (2023-10-17T04:10:14Z master 908f8fffa2) +YJIT +MN [x86_64-linux] ``` Before this patch, a warning is displayed if `$VERBOSE` is given. However it can make troubles with tests (with `$VERBOSE`), do not show any warning with a MN threads configuration. |
| |
| to avoid deadlock ```ruby r = Ractor.new do obj = Thread.new{} Ractor.yield obj rescue => e e.message end p r.take ``` ``` (lldb) bt * thread #1, name = 'miniruby', stop reason = signal SIGSTOP * frame #0: 0x0000ffff44881410 libpthread.so.0`__lll_lock_wait + 88 frame #1: 0x0000ffff4487a078 libpthread.so.0`__pthread_mutex_lock + 232 frame #2: 0x0000aaab617c0980 miniruby`rb_native_mutex_lock(lock=<unavailable>) at thread_pthread.c:109:14 frame #3: 0x0000aaab617c1d58 miniruby`ubf_event_waiting [inlined] thread_sched_lock_(th=0x0000aaab9df82980, file=<unavailable>, line=46, sched=0x0000aaab9dec79b8) at thread_pthread.c:351:5 frame #4: 0x0000aaab617c1d50 miniruby`ubf_event_waiting(ptr=0x0000aaab9df82980) at thread_pthread_mn.c:46:5 frame #5: 0x0000aaab617c6020 miniruby`rb_threadptr_interrupt [inlined] rb_threadptr_interrupt_common(trap=0, th=0x0000aaab9df82980) at thread.c:352:25 frame #6: 0x0000aaab617c5fec miniruby`rb_threadptr_interrupt(th=0x0000aaab9df82980) at thread.c:365:5 frame #7: 0x0000aaab617379b0 miniruby`rb_ractor_terminate_all at ractor.c:2364:13 frame #8: 0x0000aaab6173797c miniruby`rb_ractor_terminate_all at ractor.c:2383:17 frame #9: 0x0000aaab61737958 miniruby`rb_ractor_terminate_all [inlined] ractor_terminal_interrupt_all(vm=0x0000aaab9dea3320) at ractor.c:2375:1 frame #10: 0x0000aaab61737950 miniruby`rb_ractor_terminate_all at ractor.c:2424:13 frame #11: 0x0000aaab6164f108 miniruby`rb_ec_cleanup(ec=0x0000aaab9dea5900, ex=RUBY_TAG_NONE) at eval.c:239:9 frame #12: 0x0000aaab6164fa3c miniruby`ruby_run_node(n=0x0000ffff417ed178) at eval.c:328:12 frame #13: 0x0000aaab615a5ab0 miniruby`main at main.c:39:12 frame #14: 0x0000aaab615a5a98 miniruby`main(argc=<unavailable>, argv=<unavailable>) at main.c:58:12 frame #15: 0x0000ffff44714b2c libc.so.6`__libc_start_main + 228 frame #16: 0x0000aaab615a5b0c miniruby`_start + 52 (lldb) thread select 3 * thread #3, name = 'bootstraptest.*', stop reason = signal SIGSTOP frame #0: 0x0000ffff448813ec libpthread.so.0`__lll_lock_wait + 52 libpthread.so.0`__lll_lock_wait: -> 0xffff448813ec <+52>: svc #0 0xffff448813f0 <+56>: eor w20, w20, #0x80 0xffff448813f4 <+60>: sxtw x20, w20 0xffff448813f8 <+64>: b 0xffff44881414 ; <+92> (lldb) bt * thread #3, name = 'bootstraptest.*', stop reason = signal SIGSTOP * frame #0: 0x0000ffff448813ec libpthread.so.0`__lll_lock_wait + 52 frame #1: 0x0000ffff4487a078 libpthread.so.0`__pthread_mutex_lock + 232 frame #2: 0x0000aaab617c0980 miniruby`rb_native_mutex_lock(lock=<unavailable>) at thread_pthread.c:109:14 frame #3: 0x0000aaab61823d68 miniruby`rb_vm_lock_enter_body [inlined] vm_lock_enter(no_barrier=false, lev=0x0000ffff215bfbe4, locked=false, vm=0x0000aaab9dea3320, cr=0x0000aaab9dec7890) at vm_sync.c:57:9 frame #4: 0x0000aaab61823d60 miniruby`rb_vm_lock_enter_body(lev=0x0000ffff215bfbe4) at vm_sync.c:119:9 frame #5: 0x0000aaab617c1b30 miniruby`thread_sched_setup_running_threads [inlined] rb_vm_lock_enter(file=<unavailable>, line=597, lev=0x0000ffff215bfbe4) at vm_sync.h:75:9 frame #6: 0x0000aaab617c1b14 miniruby`thread_sched_setup_running_threads(vm=0x0000aaab9dea3320, add_th=0x0000aaab9df82980, del_th=<unavailable>, add_timeslice_th=0x0000000000000000, cr=<unavailable>, sched=<unavailable>, sched=<unavailable>) at thread_pthread.c:597:9 frame #7: 0x0000aaab617c29b4 miniruby`thread_sched_wait_running_turn at thread_pthread.c:614:5 frame #8: 0x0000aaab617c298c miniruby`thread_sched_wait_running_turn(sched=0x0000aaab9dec79b8, th=0x0000aaab9df82980, can_direct_transfer=true) at thread_pthread.c:868:9 frame #9: 0x0000aaab617c6f0c miniruby`thread_sched_wait_events(sched=0x0000aaab9dec79b8, th=0x0000aaab9df82980, fd=<unavailable>, events=<unavailable>, rel=<unavailable>) at thread_pthread_mn.c:90:17 frame #10: 0x0000aaab617c7354 miniruby`rb_thread_terminate_all at thread_pthread.c:3248:13 frame #11: 0x0000aaab617c733c miniruby`rb_thread_terminate_all(th=0x0000aaab9df82980) at thread.c:466:13 frame #12: 0x0000aaab617c7a64 miniruby`thread_start_func_2(th=0x0000aaab9df82980, stack_start=<unavailable>) at thread.c:713:9 frame #13: 0x0000aaab617c7d1c miniruby`co_start [inlined] call_thread_start_func_2(th=0x0000aaab9df82980) at thread_pthread.c:2165:5 frame #14: 0x0000aaab617c7cd0 miniruby`co_start(from=<unavailable>, self=0x0000aaab9df0f760) at thread_pthread_mn.c:421:9 ``` |
| s390x (Ubuntu) still fails tests with 62dfaeec2c. |
| * on `__EMSCRIPTEN__` provides epoll* declarations, but no implementations. * on `NON_SCALAR_THREAD_ID`, now we can not debug issues on x390s/Ubuntu so skip it. x390s/RHEL works fine, so I think we can remove second limitation but I could not login to it so it seems hard to debug now. |
| With M:N thread scheduler, the native thread (NT) related resources should be freed when the NT is no longer needed. So the calling `native_thread_destroy()` at the end of `is will be freed when `thread_cleanup_func()` (at the end of Ruby thread) is not correct timing. Call it when the corresponding Ruby thread is collected. |
| |
| |
| This patch introduce M:N thread scheduler for Ractor system. In general, M:N thread scheduler employs N native threads (OS threads) to manage M user-level threads (Ruby threads in this case). On the Ruby interpreter, 1 native thread is provided for 1 Ractor and all Ruby threads are managed by the native thread. From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means 1 Ruby thread has 1 native thread. M:N scheduler change this strategy. Because of compatibility issue (and stableness issue of the implementation) main Ractor doesn't use M:N scheduler on default. On the other words, threads on the main Ractor will be managed with 1:1 thread scheduler. There are additional settings by environment variables: `RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor. Note that non-main ractors use the M:N scheduler without this configuration. With this configuration, single ractor applications run threads on M:1 thread scheduler (green threads, user-level threads). `RUBY_MAX_CPU=n` specifies maximum number of native threads for M:N scheduler (default: 8). This patch will be reverted soon if non-easy issues are found. [Bug #19842] |