Age | Commit message (Collapse) | Author |
---|
| |
| Before this patch, the MN scheduler waits for the IO with the following steps: 1. `poll(fd, timeout=0)` to check fd is ready or not. 2. if fd is not ready, waits with MN thread scheduler 3. call `func` to issue the blocking I/O call The advantage of advanced `poll()` is we can wait for the IO ready for any fds. However `poll()` becomes overhead for already ready fds. This patch changes the steps like: 1. call `func` to issue the blocking I/O call 2. if the `func` returns `EWOULDBLOCK` the fd is `O_NONBLOCK` and we need to wait for fd is ready so that waits with MN thread scheduler. In this case, we can wait only for `O_NONBLOCK` fds. Otherwise it waits with blocking operations such as `read()` system call. However we don't need to call `poll()` to check fd is ready in advance. With this patch we can observe performance improvement on microbenchmark which repeats blocking I/O (not `O_NONBLOCK` fd) with and without MN thread scheduler. ```ruby require 'benchmark' f = open('/dev/null', 'w') f.sync = true TN = 1 N = 1_000_000 / TN Benchmark.bm{|x| x.report{ TN.times.map{ Thread.new{ N.times{f.print '.'} } }.each(&:join) } } __END__ TN = 1 user system total real ruby32 0.393966 0.101122 0.495088 ( 0.495235) ruby33 0.493963 0.089521 0.583484 ( 0.584091) ruby33+MN 0.639333 0.200843 0.840176 ( 0.840291) <- Slow this+MN 0.512231 0.099091 0.611322 ( 0.611074) <- Good ``` |
| Assrsion was `events == RB_WAITFD_IN || events == RB_WAITFD_OUT` but it should accept `RB_WAITFD_IN | RB_WAITFD_OUT`. |
| If the Fiber is nonblocking mode, fiber scheduler needs to handle IO events. |
| Introduce `thread_io_wait_events()` to make 1 function to call `thread_sched_wait_events()`. In ``thread_io_wait_events()`, manipulate `waiting_fd` to raise an exception when closing the IO correctly. |
| * When we have the thread already, it saves a lookup * `event_wait`, not `kq` Clean up the `thread_sched_wait_events_timeval` calls * By handling the PTHREAD check inside the function, all the other code can become much simpler and just call the function directly without additional checks |
| * Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD) * Include sys/event.h header check for macros, and include sys/event.h when present * Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue * MAP_STACK is not available on macOS so conditionall apply it to mmap flags * Set fd to close on exec * Log debug messages specific to kqueue and epoll on creation * close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior Use kq with rb_thread_wait_for_single_fd * Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied * Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised) * Also handle kevent specific errnos, since most don't apply from epoll to kqueue * Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use * We're making changes in this file anyways - may as well fix a couple spelling mistakes while here Make sure FD_CLOEXEC carries over in dup * Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec |
| `thread_sched_wait_events()` suspend the thread until the target fd is ready. Howver, other threads can close the target fd and suspended thread should be awake. To support it, setup `waiting_fd` before `thread_sched_wait_events()`. `rb_thread_io_wake_pending_closer()` should be called before `RUBY_VM_CHECK_INTS_BLOCKING()` because it can return this function. This patch introduces additional overhead (setup/cleanup `waiting_fd`) and maybe we can reduce the overhead. |
| Our current implementation of rb_postponed_job_register suffers from some safety issues that can lead to interpreter crashes (see bug #1991). Essentially, the issue is that jobs can be called with the wrong arguments. We made two attempts to fix this whilst keeping the promised semantics, but: * The first one involved masking/unmasking when flushing jobs, which was believed to be too expensive * The second one involved a lock-free, multi-producer, single-consumer ringbuffer, which was too complex The critical insight behind this third solution is that essentially the only user of these APIs are a) internal, or b) profiling gems. For a), none of the usages actually require variable data; they will work just fine with the preregistration interface. For b), generally profiling gems only call a single callback with a single piece of data (which is actually usually just zero) for the life of the program. The ringbuffer is complex because it needs to support multi-word inserts of job & data (which can't be atomic); but nobody actually even needs that functionality, really. So, this comit: * Introduces a pre-registration API for jobs, with a GVL-requiring rb_postponed_job_prereigster, which returns a handle which can be used with an async-signal-safe rb_postponed_job_trigger. * Deprecates rb_postponed_job_register (and re-implements it on top of the preregister function for compatability) * Moves all the internal usages of postponed job register pre-registration |
| This patch introduces thread specific storage APIs for tools which use `rb_internal_thread_event_hook` APIs. * `rb_internal_thread_specific_key_create()` to create a tool specific thread local storage key and allocate the storage if not available. * `rb_internal_thread_specific_set()` sets a data to thread and tool specific storage. * `rb_internal_thread_specific_get()` gets a data in thread and tool specific storage. Note that `rb_internal_thread_specific_get|set(thread_val, key)` can be called without GVL and safe for async signal and safe for multi-threading (native threads). So you can call it in any internal thread event hooks. Further more you can call it from other native threads. Of course `thread_val` should be living while accessing the data from this function. Note that you should not forget to clean up the set data. |
| This entirely changes how it is tested. Rather than to use counters we now record the timeline of events with associated threads which makes it much easier to assert that certains events are only preceded by a specific event, and makes it much easier to debug unexpected timelines. Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com> Co-Authored-By: JP Camara <jp@jpcamara.com> Co-Authored-By: John Hawthorn <john@hawthorn.email> |
| These are rare but embedding them is trivial and make them fit in a 40B slot. |
| Context: https://github.com/ivoanjo/gvl-tracing/pull/4 Some hooks may want to collect data on a per thread basis. Right now the only way to identify the concerned thread is to use `rb_nativethread_self()` or similar, but even then because of the thread cache or MaNy, two distinct Ruby threads may report the same native thread id. By passing `thread->self`, hooks can use it as a key to store the metadata. NB: Most hooks are executed outside the GVL, so such data collection need to use a thread-safe data-structure, and shouldn't use the reference in other ways from inside the hook. They must also either pin that value or handle compaction. |
| With M:N thread scheduler, the native thread (NT) related resources should be freed when the NT is no longer needed. So the calling `native_thread_destroy()` at the end of `is will be freed when `thread_cleanup_func()` (at the end of Ruby thread) is not correct timing. Call it when the corresponding Ruby thread is collected. |
| This patch introduce M:N thread scheduler for Ractor system. In general, M:N thread scheduler employs N native threads (OS threads) to manage M user-level threads (Ruby threads in this case). On the Ruby interpreter, 1 native thread is provided for 1 Ractor and all Ruby threads are managed by the native thread. From Ruby 1.9, the interpreter uses 1:1 thread scheduler which means 1 Ruby thread has 1 native thread. M:N scheduler change this strategy. Because of compatibility issue (and stableness issue of the implementation) main Ractor doesn't use M:N scheduler on default. On the other words, threads on the main Ractor will be managed with 1:1 thread scheduler. There are additional settings by environment variables: `RUBY_MN_THREADS=1` enables M:N thread scheduler on the main ractor. Note that non-main ractors use the M:N scheduler without this configuration. With this configuration, single ractor applications run threads on M:1 thread scheduler (green threads, user-level threads). `RUBY_MAX_CPU=n` specifies maximum number of native threads for M:N scheduler (default: 8). This patch will be reverted soon if non-easy issues are found. [Bug #19842] |
| When interrupt behavior is configured for all possible exceptions using 'Exception', there's no need to iterate the pending exception's ancestors for hash lookups. More significantly, by storing the catch-all timing symbol directly in the mask stack, we can skip allocating the hash we would otherwise need. Notes: Merged: https://github.com/ruby/ruby/pull/8278 |
| If the supplied hash is already frozen and compare-by-identity, we can use it directly (still checking its contents are valid symbols), without making a new copy. Notes: Merged: https://github.com/ruby/ruby/pull/8278 |
| RARRAY_CONST_PTR now does the same things as RARRAY_CONST_PTR_TRANSIENT. Notes: Merged: https://github.com/ruby/ruby/pull/8071 |
| |
| According to the C99 specification section 7.20.3.2 paragraph 2: > If ptr is a null pointer, no action occurs. So we do not need to check that the pointer is a null pointer. Notes: Merged: https://github.com/ruby/ruby/pull/8004 |
| Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| Because a thread calling IO#close now blocks in a native condvar wait, it's possible for there to be _no_ threads left to actually handle incoming signals/ubf calls/etc. This manifested as failing tests on Solaris 10 (SPARC), because: * One thread called IO#close, which sent a SIGVTALRM to the other thread to interrupt it, and then waited on the condvar to be notified that the reading thread was done. * One thread was calling IO#read, but it hadn't yet reached the actual call to select(2) when the SIGVTALRM arrived, so it never unblocked itself. This results in a deadlock. The fix is to use a real Ruby mutex for the close lock; that way, the closing thread goes into sigwait-sleep and can keep trying to interrupt the select(2) thread. See the discussion in: https://github.com/ruby/ruby/pull/7865/ Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| Please consider using misc/expand_tabs.rb as a pre-commit hook. |
| When one thread is closing a file descriptor whilst another thread is concurrently reading it, we need to wait for the reading thread to be done with it to prevent a potential EBADF (or, worse, file descriptor reuse). At the moment, that is done by keeping a list of threads still using the file descriptor in io_close_fptr. It then continually calls rb_thread_schedule() in fptr_finalize_flush until said list is empty. That busy-looping seems to behave rather poorly on some OS's, particulary FreeBSD. It can cause the TestIO#test_race_gets_and_close test to fail (even with its very long 200 second timeout) because the closing thread starves out the using thread. To fix that, I introduce the concept of struct rb_io_close_wait_list; a list of threads still using a file descriptor that we want to close. We call `rb_notify_fd_close` to let the thread scheduler know we're closing a FD, which fills the list with threads. Then, we call rb_notify_fd_close_wait which will block the thread until all of the still-using threads are done. This is implemented with a condition variable sleep, so no busy-looping is required. Notes: Merged: https://github.com/ruby/ruby/pull/7865 |
| This patch fixes a potential busy-loop in the thread scheduler. If there are two threads, the main thread (where Ruby signal handlers must run) and a sleeping thread, it is possible for the following sequence of events to occur: * The sleeping thread is in native_sleep -> sigwait_sleep A signal * arives, kicking this thread out of rb_sigwait_sleep The sleeping * thread calls THREAD_BLOCKING_END and eventually thread_sched_to_running_common * the sleeping thread writes into the sigwait_fd pipe by calling rb_thread_wakeup_timer_thread * the sleeping thread re-loops around in native_sleep() because the desired sleep time has not actually yet expired * that calls rb_sigwait_sleep again the ppoll() in rb_sigwait_sleep * immediately returns because of the byte written into the sigwait_fd by rb_thread_wakeup_timer_thread * that wakes the thread up again and kicks the whole cycle off again. Such a loop can only be broken by the main thread waking up and handling the signal, such that ubf_threads_empty() below becomes true again; however this loop can actually keep things so busy (and cause so much contention on the main thread's interrupt_lock) that the main thread doesn't deal with the signal for many seconds. This seems particuarly likely on FreeBSD 13. (the cycle can also be broken by the sleeping thread finally elapsing its desired sleep time). The fix for _this_ loop is to only wakeup the timer thrad in thread_sched_to_running_common if the current thread is not itself the sigwait thread. An almost identical loop also happens in the same circumstances because the call to check_signals_nogvl (through sigwait_timeout) in rb_sigwait_sleep returns true if there is any pending signal for the main thread to handle. That then causes rb_sigwait_sleep to skip over sleeping entirely. This is unnescessary and counterproductive, I believe; if the main thread needs to be woken up that is done inline in check_signals_nogvl anyway. See https://bugs.ruby-lang.org/issues/19680 Notes: Merged: https://github.com/ruby/ruby/pull/7864 |
| Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| * Remove unused SIGCHLD handling. * Remove unused `init_sigchld`. * Remove unnecessary `#define RUBY_SIGCHLD (0)`. * Remove unused `SIGCHLD_LOSSY`. Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| because of 9720f5ac894566ade2aabcf9adea0a3235de1353 http://rubyci.s3.amazonaws.com/solaris11-sunc/ruby-master/log/20230403T130011Z.fail.html.gz ``` 1) Failure: TestThread#test_signal_at_join [/export/home/chkbuild/chkbuild-sunc/tmp/build/20230403T130011Z/ruby/test/ruby/test_thread.rb:1488]: Exception raised: <#<fatal:"No live threads left. Deadlock?\n1 threads, 1 sleeps current:0x00891288 main thread:0x00891288\n* #<Thread:0xfef89a18 sleep_forever>\n rb_thread_t:0x00891288 native:0x00000001 int:0\n \n">> Backtrace: -:30:in `join' -:30:in `block (3 levels) in <main>' -:21:in `times' -:21:in `block (2 levels) in <main>'. ``` The mechanism: * Main thread (M) calls `Thread#join` * M: calls `sleep_forever()` * M: set `th->status = THREAD_STOPPED_FOREVER` * M: do `checkints` * M: handle a trap handler with `th->status = THREAD_RUNNABLE` * M: thread switch at the end of the trap handler * Another thread (T) will process `Thread#kill` by M. * T: `rb_threadptr_join_list_wakeup()` at the end of T tris to wakeup M, but M's state is runnable because M is handling trap handler and just ignore the waking up and terminate T$a * T: switch to M. * M: after the trap handler, reset `th->status = THREAD_STOPPED_FOREVER` and check deadlock -> Deadlock because only M is living. To avoid such situation, add new sleep flags `SLEEP_ALLOW_SPURIOUS` and `SLEEP_NO_CHECKINTS` to skip any check ints. BTW this is instentional to leave second `vm_check_ints_blocking()` without checking `SLEEP_NO_CHECKINTS` because `SLEEP_ALLOW_SPURIOUS` should be specified with `SLEEP_NO_CHECKINTS` and skipping this checkints can skip any interrupts. Notes: Merged: https://github.com/ruby/ruby/pull/7647 |
| because it does same thing. Notes: Merged: https://github.com/ruby/ruby/pull/7642 |
| reorder `sleep_forever()` and so on. |
| for future extension Notes: Merged: https://github.com/ruby/ruby/pull/7639 |
| because `RUBY_DEBUG_LOG()` add "\n" at the end of message. |
| so that no need to lock the ractor. Notes: Merged: https://github.com/ruby/ruby/pull/7616 |
| Notes: Merged: https://github.com/ruby/ruby/pull/7618 |
| Notes: Merged: https://github.com/ruby/ruby/pull/7465 |
| * Remove `waitpid_lock` and related code. * Remove un-necessary test. * Remove `rb_thread_sleep_interruptible` dead code. Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| * Revert "Remove special handling of `SIGCHLD`. (#7482)" This reverts commit 44a0711eab7fbc71ac2c8ff489d8c53e97a8fe75. * Revert "Remove prototypes for functions that are no longer used. (#7497)" This reverts commit 4dce12bead3bfd91fd80b5e7195f7f540ffffacb. * Revert "Remove SIGCHLD `waidpid`. (#7476)" This reverts commit 1658e7d96696a656d9bd0a0c84c82cde86914ba2. * Fix change to rjit variable name. Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| * Remove `waitpid_lock` and related code. * Remove un-necessary test. * Remove `rb_thread_sleep_interruptible` dead code. Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| |
| |
| Notes: Merged: https://github.com/ruby/ruby/pull/7462 |
| Notes: Merged: https://github.com/ruby/ruby/pull/7462 |
| It's possible (but very rare) to have a race condition between setting `mutex->fiber = NULL` and `thread_mutex_remove(th, mutex)` which results in the following bug: ``` [BUG] invalid keeping_mutexes: Attempt to unlock a mutex which is not locked ``` Fixes <https://bugs.ruby-lang.org/issues/19480>. Notes: Merged-By: ioquatix <samuel@codeotaku.com> |
| Notes: Merged: https://github.com/ruby/ruby/pull/7459 |
| ``` 1) Failure: TestThreadInstrumentation#test_thread_instrumentation [/tmp/ruby/src/trunk-repeat20-asserts/test/-ext-/thread/test_instrumentation_api.rb:33]: Call counters[4]: [3, 4, 4, 4, 0]. Expected 0 to be > 0. ``` We fire the EXIT hook after the call to `thread_sched_to_dead` which mean another thread might be running before the `EXIT` hook have been executed. Notes: Merged: https://github.com/ruby/ruby/pull/7249 |
| [Feature #19425] Notes: Merged: https://github.com/ruby/ruby/pull/7273 |
| [Bug #19415] If multiple threads attemps to load the same file concurrently it's not a circular dependency issue. So we check that the existing ThreadShield is owner by the current fiber before warning about circular dependencies. Notes: Merged: https://github.com/ruby/ruby/pull/7257 |
| This reverts commit fa49651e05a06512e18ccb2f54a7198c9ff579de. Notes: Merged: https://github.com/ruby/ruby/pull/7256 |
| [Bug #19415] If multiple threads attemps to load the same file concurrently it's not a circular dependency issue. So we check that the existing ThreadShield is owner by the current fiber before warning about circular dependencies. Notes: Merged: https://github.com/ruby/ruby/pull/7252 |
| Notes: Merged: https://github.com/ruby/ruby/pull/7162 |