Subject: fs/dcache: disable preemption on i_dir_seq's write side From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Fri Oct 20 11:29:53 2017 +0200 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> i_dir_seq is a sequence counter with a lock which is represented by the lowest bit. The writer atomically updates the counter which ensures that it can be modified by only one writer at a time. The commit introducing this change claims that the lock has been integrated into the counter for space reasons within the inode struct. The i_dir_seq member is within a union which shares also a pointer. That means by using seqlock_t we would have a sequence counter and a lock without increasing the size of the data structure on 64bit and 32bit would grow by 4 bytes. With lockdep enabled the size would grow and on PREEMPT_RT the spinlock_t is also larger. In order to keep this construct working on PREEMPT_RT, the writer needs to disable preemption while obtaining the lock on the sequence counter / starting the write critical section. The writer acquires an otherwise unrelated spinlock_t which serves the same purpose on !PREEMPT_RT. With enabled preemption a high priority reader could preempt the writer and live lock the system while waiting for the locked bit to disappear. Another solution would be to have global spinlock_t which is always acquired by the writer. The reader would then acquire the lock if the sequence count is odd and by doing so force the writer out of the critical section. The global spinlock_t could be replaced by a hashed lock based on the address of the inode to lower the lock contention. For now, manually disable preemption on PREEMPT_RT to avoid live locks. Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- fs/dcache.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) --- @ fs/dcache.c:2540 @ EXPORT_SYMBOL(d_rehash); static inline unsigned start_dir_add(struct inode *dir) { - + /* + * The caller has a spinlock_t (dentry::d_lock) acquired which disables + * preemption on !PREEMPT_RT. On PREEMPT_RT the lock does not disable + * preemption and it has be done explicitly. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); for (;;) { unsigned n = dir->i_dir_seq; if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n) @ fs/dcache.c:2558 @ static inline unsigned start_dir_add(str static inline void end_dir_add(struct inode *dir, unsigned n) { smp_store_release(&dir->i_dir_seq, n + 2); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } static void d_wait_lookup(struct dentry *dentry)