From: Thomas Gleixner <tglx@linutronix.de> Date: Tue, 25 Apr 2023 20:48:56 +0200 Subject: [PATCH 1/2] posix-timers: Prevent RT livelock in itimer_delete() itimer_delete() has a retry loop when the timer is concurrently expired. On non-RT kernels this just spin-waits until the timer callback has completed. On RT kernels this is a potential livelock when the exiting task preempted the hrtimer soft interrupt. This only affects hrtimer based timers as Posix CPU timers cannot be concurrently expired. For CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y this is obviously impossible as the task cannot run task work and exit at the same time. The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n (only non-RT) is prevented because interrupts are disabled. Replace spin_unlock() with an invocation of timer_wait_running() to handle it the same way as the other retry loops in the posix timer code. Fixes: ec8f954a40da ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/all/20230425183312.862346341@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- kernel/time/posix-timers.c | 50 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 8 deletions(-) Index: linux-6.3.0-rt11/kernel/time/posix-timers.c =================================================================== @ linux-6.3.0-rt11/kernel/time/posix-timers.c:1036 @ retry_delete: } /* - * return timer owned by the process, used by exit_itimers + * Delete a timer if it is armed, remove it from the hash and schedule it + * for RCU freeing. */ static void itimer_delete(struct k_itimer *timer) { -retry_delete: - spin_lock_irq(&timer->it_lock); + unsigned long flags; +retry_delete: + /* + * irqsave is required to make timer_wait_running() work. + */ + spin_lock_irqsave(&timer->it_lock, flags); + + /* + * Even if the timer is not longer accessible from other tasks + * it still might be armed and queued in the underlying timer + * mechanism. Worse, that timer mechanism might run the expiry + * function concurrently. + */ if (timer_delete_hook(timer) == TIMER_RETRY) { - spin_unlock_irq(&timer->it_lock); + /* + * Timer is expired concurrently, prevent livelocks + * and pointless spinning on RT. + * + * The CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y case is + * irrelevant here because obviously the exiting task + * cannot be expiring timer in task work concurrently. + * Ditto for CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n as the + * tick interrupt cannot run on this CPU because the above + * spin_lock disabled interrupts. + * + * timer_wait_running() drops timer::it_lock, which opens + * the possibility for another task to delete the timer. + * + * That's not possible here because this is invoked from + * do_exit() only for the last thread of the thread group. + * So no other task can access that timer. + */ + if (WARN_ON_ONCE(timer_wait_running(timer, &flags) != timer)) + return; + goto retry_delete; } list_del(&timer->list); - spin_unlock_irq(&timer->it_lock); + spin_unlock_irqrestore(&timer->it_lock, flags); release_posix_timer(timer, IT_ID_SET); } /* - * This is called by do_exit or de_thread, only when nobody else can - * modify the signal->posix_timers list. Yet we need sighand->siglock - * to prevent the race with /proc/pid/timers. + * Invoked from do_exit() when the last thread of a thread group exits. + * At that point no other task can access the timers of the dying + * task anymore. */ void exit_itimers(struct task_struct *tsk) { @ linux-6.3.0-rt11/kernel/time/posix-timers.c:1098 @ void exit_itimers(struct task_struct *ts if (list_empty(&tsk->signal->posix_timers)) return; + /* Protect against concurrent read via /proc/$PID/timers */ spin_lock_irq(&tsk->sighand->siglock); list_replace_init(&tsk->signal->posix_timers, &timers); spin_unlock_irq(&tsk->sighand->siglock); + /* The timers are not longer accessible via tsk::signal */ while (!list_empty(&timers)) { tmr = list_first_entry(&timers, struct k_itimer, list); itimer_delete(tmr);