Index: linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html =================================================================== @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:59 @ sections. RCU-preempt Expedited Grace Periods</a></h2> <p> -<tt>CONFIG_PREEMPT=y</tt> kernels implement RCU-preempt. -The overall flow of the handling of a given CPU by an RCU-preempt +<tt>CONFIG_PREEMPT=y</tt> and <tt>CONFIG_PREEMPT_RT=y</tt> kernels implement +RCU-preempt. The overall flow of the handling of a given CPU by an RCU-preempt expedited grace period is shown in the following diagram: <p><img src="ExpRCUFlow.svg" alt="ExpRCUFlow.svg" width="55%"> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:143 @ or offline, among other things. RCU-sched Expedited Grace Periods</a></h2> <p> -<tt>CONFIG_PREEMPT=n</tt> kernels implement RCU-sched. -The overall flow of the handling of a given CPU by an RCU-sched +<tt>CONFIG_PREEMPT=n</tt> and <tt>CONFIG_PREEMPT_RT=n</tt> kernels implement +RCU-sched. The overall flow of the handling of a given CPU by an RCU-sched expedited grace period is shown in the following diagram: <p><img src="ExpSchedFlow.svg" alt="ExpSchedFlow.svg" width="55%"> Index: linux-5.4.5-rt3/Documentation/RCU/Design/Requirements/Requirements.html =================================================================== --- linux-5.4.5-rt3.orig/Documentation/RCU/Design/Requirements/Requirements.html +++ linux-5.4.5-rt3/Documentation/RCU/Design/Requirements/Requirements.html @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:109 @ big RCU read-side critical section. Production-quality implementations of <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> are extremely lightweight, and in fact have exactly zero overhead in Linux kernels built for production -use with <tt>CONFIG_PREEMPT=n</tt>. +use with <tt>CONFIG_PREEMPTION=n</tt>. <p> This guarantee allows ordering to be enforced with extremely low @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1502 @ costs have plummeted. However, as I learned from Matt Mackall's <a href="http://elinux.org/Linux_Tiny-FAQ">bloatwatch</a> efforts, memory footprint is critically important on single-CPU systems with -non-preemptible (<tt>CONFIG_PREEMPT=n</tt>) kernels, and thus +non-preemptible (<tt>CONFIG_PREEMPTION=n</tt>) kernels, and thus <a href="https://lkml.kernel.org/g/20090113221724.GA15307@linux.vnet.ibm.com">tiny RCU</a> was born. Josh Triplett has since taken over the small-memory banner with his @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1890 @ constructs, there are limitations. <p> Implementations of RCU for which <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> generate no code, such as -Linux-kernel RCU when <tt>CONFIG_PREEMPT=n</tt>, can be +Linux-kernel RCU when <tt>CONFIG_PREEMPTION=n</tt>, can be nested arbitrarily deeply. After all, there is no overhead. Except that if all these instances of <tt>rcu_read_lock()</tt> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2232 @ be a no-op. <p> However, once the scheduler has spawned its first kthread, this early boot trick fails for <tt>synchronize_rcu()</tt> (as well as for -<tt>synchronize_rcu_expedited()</tt>) in <tt>CONFIG_PREEMPT=y</tt> +<tt>synchronize_rcu_expedited()</tt>) in <tt>CONFIG_PREEMPTION=y</tt> kernels. The reason is that an RCU read-side critical section might be preempted, which means that a subsequent <tt>synchronize_rcu()</tt> really does have @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2571 @ the following: <p> If the compiler did make this transformation in a -<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did +<tt>CONFIG_PREEMPTION=n</tt> kernel build, and if <tt>get_user()</tt> did page fault, the result would be a quiescent state in the middle of an RCU read-side critical section. This misplaced quiescent state could result in line 4 being @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2909 @ in conjunction with the The real-time-latency response requirements are such that the traditional approach of disabling preemption across RCU read-side critical sections is inappropriate. -Kernels built with <tt>CONFIG_PREEMPT=y</tt> therefore +Kernels built with <tt>CONFIG_PREEMPTION=y</tt> therefore use an RCU implementation that allows RCU read-side critical sections to be preempted. This requirement made its presence known after users made it @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3067 @ includes <tt>rcu_barrier_bh()</tt>, and <tt>rcu_read_lock_bh_held()</tt>. However, the update-side APIs are now simple wrappers for other RCU -flavors, namely RCU-sched in CONFIG_PREEMPT=n kernels and RCU-preempt +flavors, namely RCU-sched in CONFIG_PREEMPTION=n kernels and RCU-preempt otherwise. <h3><a name="Sched Flavor">Sched Flavor (Historical)</a></h3> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3091 @ of an RCU read-side critical section can Therefore, <i>RCU-sched</i> was created, which follows “classic” RCU in that an RCU-sched grace period waits for for pre-existing interrupt and NMI handlers. -In kernels built with <tt>CONFIG_PREEMPT=n</tt>, the RCU and RCU-sched +In kernels built with <tt>CONFIG_PREEMPTION=n</tt>, the RCU and RCU-sched APIs have identical implementations, while kernels built with -<tt>CONFIG_PREEMPT=y</tt> provide a separate implementation for each. +<tt>CONFIG_PREEMPTION=y</tt> provide a separate implementation for each. <p> -Note well that in <tt>CONFIG_PREEMPT=y</tt> kernels, +Note well that in <tt>CONFIG_PREEMPTION=y</tt> kernels, <tt>rcu_read_lock_sched()</tt> and <tt>rcu_read_unlock_sched()</tt> disable and re-enable preemption, respectively. This means that if there was a preemption attempt during the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3305 @ The tasks-RCU API is quite compact, cons <tt>call_rcu_tasks()</tt>, <tt>synchronize_rcu_tasks()</tt>, and <tt>rcu_barrier_tasks()</tt>. -In <tt>CONFIG_PREEMPT=n</tt> kernels, trampolines cannot be preempted, +In <tt>CONFIG_PREEMPTION=n</tt> kernels, trampolines cannot be preempted, so these APIs map to <tt>call_rcu()</tt>, <tt>synchronize_rcu()</tt>, and <tt>rcu_barrier()</tt>, respectively. -In <tt>CONFIG_PREEMPT=y</tt> kernels, trampolines can be preempted, +In <tt>CONFIG_PREEMPTION=y</tt> kernels, trampolines can be preempted, and these three APIs are therefore implemented by separate functions that check for voluntary context switches. Index: linux-5.4.5-rt3/Documentation/RCU/checklist.txt =================================================================== --- linux-5.4.5-rt3.orig/Documentation/RCU/checklist.txt +++ linux-5.4.5-rt3/Documentation/RCU/checklist.txt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:213 @ over a rather long period of time, but i the rest of the system. 7. As of v4.20, a given kernel implements only one RCU flavor, - which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y. - If the updater uses call_rcu() or synchronize_rcu(), + which is RCU-sched for PREEMPTION=n and RCU-preempt for + PREEMPTION=y. If the updater uses call_rcu() or synchronize_rcu(), then the corresponding readers my use rcu_read_lock() and rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(), or any pair of primitives that disables and re-enables preemption, Index: linux-5.4.5-rt3/Documentation/RCU/rcubarrier.txt =================================================================== --- linux-5.4.5-rt3.orig/Documentation/RCU/rcubarrier.txt +++ linux-5.4.5-rt3/Documentation/RCU/rcubarrier.txt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:9 @ RCU (read-copy update) is a synchronizat of as a replacement for read-writer locking (among other things), but with very low-overhead readers that are immune to deadlock, priority inversion, and unbounded latency. RCU read-side critical sections are delimited -by rcu_read_lock() and rcu_read_unlock(), which, in non-CONFIG_PREEMPT -kernels, generate no code whatsoever. +by rcu_read_lock() and rcu_read_unlock(), which, in +non-CONFIG_PREEMPTION kernels, generate no code whatsoever. This means that RCU writers are unaware of the presence of concurrent readers, so that RCU updates to shared data must be undertaken quite @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:306 @ Answer: This cannot happen. The reason i to smp_call_function() and further to smp_call_function_on_cpu(), causing this latter to spin until the cross-CPU invocation of rcu_barrier_func() has completed. This by itself would prevent - a grace period from completing on non-CONFIG_PREEMPT kernels, + a grace period from completing on non-CONFIG_PREEMPTION kernels, since each CPU must undergo a context switch (or other quiescent state) before the grace period can complete. However, this is - of no use in CONFIG_PREEMPT kernels. + of no use in CONFIG_PREEMPTION kernels. Therefore, on_each_cpu() disables preemption across its call to smp_call_function() and also across the local call to Index: linux-5.4.5-rt3/Documentation/RCU/stallwarn.txt =================================================================== --- linux-5.4.5-rt3.orig/Documentation/RCU/stallwarn.txt +++ linux-5.4.5-rt3/Documentation/RCU/stallwarn.txt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:23 @ o A CPU looping with preemption disabled o A CPU looping with bottom halves disabled. -o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel +o For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel without invoking schedule(). If the looping in the kernel is really expected and desirable behavior, you might need to add some calls to cond_resched(). @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:42 @ o Anything that prevents RCU's grace-per result in the "rcu_.*kthread starved for" console-log message, which will include additional debugging information. -o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might +o A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, Index: linux-5.4.5-rt3/Documentation/RCU/whatisRCU.txt =================================================================== --- linux-5.4.5-rt3.orig/Documentation/RCU/whatisRCU.txt +++ linux-5.4.5-rt3/Documentation/RCU/whatisRCU.txt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:651 @ Quick Quiz #1: Why is this argument naiv This section presents a "toy" RCU implementation that is based on "classic RCU". It is also short on performance (but only for updates) and -on features such as hotplug CPU and the ability to run in CONFIG_PREEMPT -kernels. The definitions of rcu_dereference() and rcu_assign_pointer() -are the same as those shown in the preceding section, so they are omitted. +on features such as hotplug CPU and the ability to run in +CONFIG_PREEMPTION kernels. The definitions of rcu_dereference() and +rcu_assign_pointer() are the same as those shown in the preceding +section, so they are omitted. void rcu_read_lock(void) { } Index: linux-5.4.5-rt3/Documentation/printk-ringbuffer.txt =================================================================== --- /dev/null +++ linux-5.4.5-rt3/Documentation/printk-ringbuffer.txt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +struct printk_ringbuffer +------------------------ +John Ogness <john.ogness@linutronix.de> + +Overview +~~~~~~~~ +As the name suggests, this ring buffer was implemented specifically to serve +the needs of the printk() infrastructure. The ring buffer itself is not +specific to printk and could be used for other purposes. _However_, the +requirements and semantics of printk are rather unique. If you intend to use +this ring buffer for anything other than printk, you need to be very clear on +its features, behavior, and pitfalls. + +Features +^^^^^^^^ +The printk ring buffer has the following features: + +- single global buffer +- resides in initialized data section (available at early boot) +- lockless readers +- supports multiple writers +- supports multiple non-consuming readers +- safe from any context (including NMI) +- groups bytes into variable length blocks (referenced by entries) +- entries tagged with sequence numbers + +Behavior +^^^^^^^^ +Since the printk ring buffer readers are lockless, there exists no +synchronization between readers and writers. Basically writers are the tasks +in control and may overwrite any and all committed data at any time and from +any context. For this reason readers can miss entries if they are overwritten +before the reader was able to access the data. The reader API implementation +is such that reader access to entries is atomic, so there is no risk of +readers having to deal with partial or corrupt data. Also, entries are +tagged with sequence numbers so readers can recognize if entries were missed. + +Writing to the ring buffer consists of 2 steps. First a writer must reserve +an entry of desired size. After this step the writer has exclusive access +to the memory region. Once the data has been written to memory, it needs to +be committed to the ring buffer. After this step the entry has been inserted +into the ring buffer and assigned an appropriate sequence number. + +Once committed, a writer must no longer access the data directly. This is +because the data may have been overwritten and no longer exists. If a +writer must access the data, it should either keep a private copy before +committing the entry or use the reader API to gain access to the data. + +Because of how the data backend is implemented, entries that have been +reserved but not yet committed act as barriers, preventing future writers +from filling the ring buffer beyond the location of the reserved but not +yet committed entry region. For this reason it is *important* that writers +perform both reserve and commit as quickly as possible. Also, be aware that +preemption and local interrupts are disabled and writing to the ring buffer +is processor-reentrant locked during the reserve/commit window. Writers in +NMI contexts can still preempt any other writers, but as long as these +writers do not write a large amount of data with respect to the ring buffer +size, this should not become an issue. + +API +~~~ + +Declaration +^^^^^^^^^^^ +The printk ring buffer can be instantiated as a static structure: + + /* declare a static struct printk_ringbuffer */ + #define DECLARE_STATIC_PRINTKRB(name, szbits, cpulockptr) + +The value of szbits specifies the size of the ring buffer in bits. The +cpulockptr field is a pointer to a prb_cpulock struct that is used to +perform processor-reentrant spin locking for the writers. It is specified +externally because it may be used for multiple ring buffers (or other +code) to synchronize writers without risk of deadlock. + +Here is an example of a declaration of a printk ring buffer specifying a +32KB (2^15) ring buffer: + +.... +DECLARE_STATIC_PRINTKRB_CPULOCK(rb_cpulock); +DECLARE_STATIC_PRINTKRB(rb, 15, &rb_cpulock); +.... + +If writers will be using multiple ring buffers and the ordering of that usage +is not clear, the same prb_cpulock should be used for both ring buffers. + +Writer API +^^^^^^^^^^ +The writer API consists of 2 functions. The first is to reserve an entry in +the ring buffer, the second is to commit that data to the ring buffer. The +reserved entry information is stored within a provided `struct prb_handle`. + + /* reserve an entry */ + char *prb_reserve(struct prb_handle *h, struct printk_ringbuffer *rb, + unsigned int size); + + /* commit a reserved entry to the ring buffer */ + void prb_commit(struct prb_handle *h); + +Here is an example of a function to write data to a ring buffer: + +.... +int write_data(struct printk_ringbuffer *rb, char *data, int size) +{ + struct prb_handle h; + char *buf; + + buf = prb_reserve(&h, rb, size); + if (!buf) + return -1; + memcpy(buf, data, size); + prb_commit(&h); + + return 0; +} +.... + +Pitfalls +++++++++ +Be aware that prb_reserve() can fail. A retry might be successful, but it +depends entirely on whether or not the next part of the ring buffer to +overwrite belongs to reserved but not yet committed entries of other writers. +Writers can use the prb_inc_lost() function to allow readers to notice that a +message was lost. + +Reader API +^^^^^^^^^^ +The reader API utilizes a `struct prb_iterator` to track the reader's +position in the ring buffer. + + /* declare a pre-initialized static iterator for a ring buffer */ + #define DECLARE_STATIC_PRINTKRB_ITER(name, rbaddr) + + /* initialize iterator for a ring buffer (if static macro NOT used) */ + void prb_iter_init(struct prb_iterator *iter, + struct printk_ringbuffer *rb, u64 *seq); + + /* make a deep copy of an iterator */ + void prb_iter_copy(struct prb_iterator *dest, + struct prb_iterator *src); + + /* non-blocking, advance to next entry (and read the data) */ + int prb_iter_next(struct prb_iterator *iter, char *buf, + int size, u64 *seq); + + /* blocking, advance to next entry (and read the data) */ + int prb_iter_wait_next(struct prb_iterator *iter, char *buf, + int size, u64 *seq); + + /* position iterator at the entry seq */ + int prb_iter_seek(struct prb_iterator *iter, u64 seq); + + /* read data at current position */ + int prb_iter_data(struct prb_iterator *iter, char *buf, + int size, u64 *seq); + +Typically prb_iter_data() is not needed because the data can be retrieved +directly with prb_iter_next(). + +Here is an example of a non-blocking function that will read all the data in +a ring buffer: + +.... +void read_all_data(struct printk_ringbuffer *rb, char *buf, int size) +{ + struct prb_iterator iter; + u64 prev_seq = 0; + u64 seq; + int ret; + + prb_iter_init(&iter, rb, NULL); + + for (;;) { + ret = prb_iter_next(&iter, buf, size, &seq); + if (ret > 0) { + if (seq != ++prev_seq) { + /* "seq - prev_seq" entries missed */ + prev_seq = seq; + } + /* process buf here */ + } else if (ret == 0) { + /* hit the end, done */ + break; + } else if (ret < 0) { + /* + * iterator is invalid, a writer overtook us, reset the + * iterator and keep going, entries were missed + */ + prb_iter_init(&iter, rb, NULL); + } + } +} +.... + +Pitfalls +++++++++ +The reader's iterator can become invalid at any time because the reader was +overtaken by a writer. Typically the reader should reset the iterator back +to the current oldest entry (which will be newer than the entry the reader +was at) and continue, noting the number of entries that were missed. + +Utility API +^^^^^^^^^^^ +Several functions are available as convenience for external code. + + /* query the size of the data buffer */ + int prb_buffer_size(struct printk_ringbuffer *rb); + + /* skip a seq number to signify a lost record */ + void prb_inc_lost(struct printk_ringbuffer *rb); + + /* processor-reentrant spin lock */ + void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store); + + /* processor-reentrant spin unlock */ + void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store); + +Pitfalls +++++++++ +Although the value returned by prb_buffer_size() does represent an absolute +upper bound, the amount of data that can be stored within the ring buffer +is actually less because of the additional storage space of a header for each +entry. + +The prb_lock() and prb_unlock() functions can be used to synchronize between +ring buffer writers and other external activities. The function of a +processor-reentrant spin lock is to disable preemption and local interrupts +and synchronize against other processors. It does *not* protect against +multiple contexts of a single processor, i.e NMI. + +Implementation +~~~~~~~~~~~~~~ +This section describes several of the implementation concepts and details to +help developers better understand the code. + +Entries +^^^^^^^ +All ring buffer data is stored within a single static byte array. The reason +for this is to ensure that any pointers to the data (past and present) will +always point to valid memory. This is important because the lockless readers +may be preempted for long periods of time and when they resume may be working +with expired pointers. + +Entries are identified by start index and size. (The start index plus size +is the start index of the next entry.) The start index is not simply an +offset into the byte array, but rather a logical position (lpos) that maps +directly to byte array offsets. + +For example, for a byte array of 1000, an entry may have have a start index +of 100. Another entry may have a start index of 1100. And yet another 2100. +All of these entry are pointing to the same memory region, but only the most +recent entry is valid. The other entries are pointing to valid memory, but +represent entries that have been overwritten. + +Note that due to overflowing, the most recent entry is not necessarily the one +with the highest lpos value. Indeed, the printk ring buffer initializes its +data such that an overflow happens relatively quickly in order to validate the +handling of this situation. The implementation assumes that an lpos (unsigned +long) will never completely wrap while a reader is preempted. If this were to +become an issue, the seq number (which never wraps) could be used to increase +the robustness of handling this situation. + +Buffer Wrapping +^^^^^^^^^^^^^^^ +If an entry starts near the end of the byte array but would extend beyond it, +a special terminating entry (size = -1) is inserted into the byte array and +the real entry is placed at the beginning of the byte array. This can waste +space at the end of the byte array, but simplifies the implementation by +allowing writers to always work with contiguous buffers. + +Note that the size field is the first 4 bytes of the entry header. Also note +that calc_next() always ensures that there are at least 4 bytes left at the +end of the byte array to allow room for a terminating entry. + +Ring Buffer Pointers +^^^^^^^^^^^^^^^^^^^^ +Three pointers (lpos values) are used to manage the ring buffer: + + - _tail_: points to the oldest entry + - _head_: points to where the next new committed entry will be + - _reserve_: points to where the next new reserved entry will be + +These pointers always maintain a logical ordering: + + tail <= head <= reserve + +The reserve pointer moves forward when a writer reserves a new entry. The +head pointer moves forward when a writer commits a new entry. + +The reserve pointer cannot overwrite the tail pointer in a wrap situation. In +such a situation, the tail pointer must be "pushed forward", thus +invalidating that oldest entry. Readers identify if they are accessing a +valid entry by ensuring their entry pointer is `>= tail && < head`. + +If the tail pointer is equal to the head pointer, it cannot be pushed and any +reserve operation will fail. The only resolution is for writers to commit +their reserved entries. + +Processor-Reentrant Locking +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The purpose of the processor-reentrant locking is to limit the interruption +scenarios of writers to 2 contexts. This allows for a simplified +implementation where: + +- The reserve/commit window only exists on 1 processor at a time. A reserve + can never fail due to uncommitted entries of other processors. + +- When committing entries, it is trivial to handle the situation when + subsequent entries have already been committed, i.e. managing the head + pointer. + +Performance +~~~~~~~~~~~ +Some basic tests were performed on a quad Intel(R) Xeon(R) CPU E5-2697 v4 at +2.30GHz (36 cores / 72 threads). All tests involved writing a total of +32,000,000 records at an average of 33 bytes each. Each writer was pinned to +its own CPU and would write as fast as it could until a total of 32,000,000 +records were written. All tests involved 2 readers that were both pinned +together to another CPU. Each reader would read as fast as it could and track +how many of the 32,000,000 records it could read. All tests used a ring buffer +of 16KB in size, which holds around 350 records (header + data for each +entry). + +The only difference between the tests is the number of writers (and thus also +the number of records per writer). As more writers are added, the time to +write a record increases. This is because data pointers, modified via cmpxchg, +and global data access in general become more contended. + +1 writer +^^^^^^^^ + runtime: 0m 18s + reader1: 16219900/32000000 (50%) records + reader2: 16141582/32000000 (50%) records + +2 writers +^^^^^^^^^ + runtime: 0m 32s + reader1: 16327957/32000000 (51%) records + reader2: 16313988/32000000 (50%) records + +4 writers +^^^^^^^^^ + runtime: 0m 42s + reader1: 16421642/32000000 (51%) records + reader2: 16417224/32000000 (51%) records + +8 writers +^^^^^^^^^ + runtime: 0m 43s + reader1: 16418300/32000000 (51%) records + reader2: 16432222/32000000 (51%) records + +16 writers +^^^^^^^^^^ + runtime: 0m 54s + reader1: 16539189/32000000 (51%) records + reader2: 16542711/32000000 (51%) records + +32 writers +^^^^^^^^^^ + runtime: 1m 13s + reader1: 16731808/32000000 (52%) records + reader2: 16735119/32000000 (52%) records + +Comments +^^^^^^^^ +It is particularly interesting to compare/contrast the 1-writer and 32-writer +tests. Despite the writing of the 32,000,000 records taking over 4 times +longer, the readers (which perform no cmpxchg) were still unable to keep up. +This shows that the memory contention between the increasing number of CPUs +also has a dramatic effect on readers. + +It should also be noted that in all cases each reader was able to read >=50% +of the records. This means that a single reader would have been able to keep +up with the writer(s) in all cases, becoming slightly easier as more writers +are added. This was the purpose of pinning 2 readers to 1 CPU: to observe how +maximum reader performance changes. Index: linux-5.4.5-rt3/Documentation/trace/ftrace-uses.rst =================================================================== --- linux-5.4.5-rt3.orig/Documentation/trace/ftrace-uses.rst +++ linux-5.4.5-rt3/Documentation/trace/ftrace-uses.rst @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:149 @ FTRACE_OPS_FL_RECURSION_SAFE itself or any nested functions that those functions call. If this flag is set, it is possible that the callback will also - be called with preemption enabled (when CONFIG_PREEMPT is set), + be called with preemption enabled (when CONFIG_PREEMPTION is set), but this is not guaranteed. FTRACE_OPS_FL_IPMODIFY Index: linux-5.4.5-rt3/arch/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/Kconfig +++ linux-5.4.5-rt3/arch/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:34 @ config OPROFILE tristate "OProfile system profiling" depends on PROFILING depends on HAVE_OPROFILE + depends on !PREEMPT_RT select RING_BUFFER select RING_BUFFER_ALLOW_SWAP help Index: linux-5.4.5-rt3/arch/alpha/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/alpha/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/alpha/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef _ALPHA_SPINLOCK_TYPES_H #define _ALPHA_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { volatile unsigned int lock; } arch_spinlock_t; Index: linux-5.4.5-rt3/arch/arc/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arc/kernel/entry.S +++ linux-5.4.5-rt3/arch/arc/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:340 @ resume_user_mode_begin: resume_kernel_mode: ; Disable Interrupts from this point on - ; CONFIG_PREEMPT: This is a must for preempt_schedule_irq() - ; !CONFIG_PREEMPT: To ensure restore_regs is intr safe + ; CONFIG_PREEMPTION: This is a must for preempt_schedule_irq() + ; !CONFIG_PREEMPTION: To ensure restore_regs is intr safe IRQ_DISABLE r9 -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION ; Can't preempt if preemption disabled GET_CURR_THR_INFO_FROM_SP r10 Index: linux-5.4.5-rt3/arch/arm/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/Kconfig +++ linux-5.4.5-rt3/arch/arm/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:35 @ config ARM select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT if CPU_V7 select ARCH_SUPPORTS_ATOMIC_RMW + select ARCH_SUPPORTS_RT select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:68 @ config ARM select HARDIRQS_SW_RESEND select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6 - select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU + select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU && !PREEMPT_RT select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:107 @ config ARM select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_PREEMPT_LAZY select HAVE_RCU_TABLE_FREE if SMP && ARM_LPAE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RSEQ Index: linux-5.4.5-rt3/arch/arm/include/asm/irq.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/include/asm/irq.h +++ linux-5.4.5-rt3/arch/arm/include/asm/irq.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:26 @ #endif #ifndef __ASSEMBLY__ +#include <linux/cpumask.h> + struct irqaction; struct pt_regs; Index: linux-5.4.5-rt3/arch/arm/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/arm/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef __ASM_SPINLOCK_TYPES_H #define __ASM_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - #define TICKET_SHIFT 16 typedef struct { Index: linux-5.4.5-rt3/arch/arm/include/asm/switch_to.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/include/asm/switch_to.h +++ linux-5.4.5-rt3/arch/arm/include/asm/switch_to.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:7 @ #include <linux/thread_info.h> +#if defined CONFIG_PREEMPT_RT && defined CONFIG_HIGHMEM +void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p); +#else +static inline void +switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { } +#endif + /* * For v7 SMP cores running a preemptible kernel we may be pre-empted * during a TLB maintenance operation, so execute an inner-shareable dsb * to ensure that the maintenance completes in case we migrate to another * CPU. */ -#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7) +#if defined(CONFIG_PREEMPTION) && defined(CONFIG_SMP) && defined(CONFIG_CPU_V7) #define __complete_pending_tlbi() dsb(ish) #else #define __complete_pending_tlbi() @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:36 @ extern struct task_struct *__switch_to(s #define switch_to(prev,next,last) \ do { \ __complete_pending_tlbi(); \ + switch_kmaps(prev, next); \ last = __switch_to(prev,task_thread_info(prev), task_thread_info(next)); \ } while (0) Index: linux-5.4.5-rt3/arch/arm/include/asm/thread_info.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/include/asm/thread_info.h +++ linux-5.4.5-rt3/arch/arm/include/asm/thread_info.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:49 @ struct cpu_context_save { struct thread_info { unsigned long flags; /* low level flags */ int preempt_count; /* 0 => preemptable, <0 => bug */ + int preempt_lazy_count; /* 0 => preemptable, <0 => bug */ mm_segment_t addr_limit; /* address limit */ struct task_struct *task; /* main task structure */ __u32 cpu; /* cpu */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:143 @ extern int vfp_restore_user_hwstate(stru #define TIF_SYSCALL_TRACE 4 /* syscall trace active */ #define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ #define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ +#define TIF_SECCOMP 8 /* seccomp syscall filtering active */ +#define TIF_NEED_RESCHED_LAZY 7 #define TIF_NOHZ 12 /* in adaptive nohz mode */ #define TIF_USING_IWMMXT 17 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:154 @ extern int vfp_restore_user_hwstate(stru #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) +#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY) #define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:170 @ extern int vfp_restore_user_hwstate(stru * Change these and you break ASM code in entry-common.S */ #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ - _TIF_NOTIFY_RESUME | _TIF_UPROBE) + _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ + _TIF_NEED_RESCHED_LAZY) #endif /* __KERNEL__ */ #endif /* __ASM_ARM_THREAD_INFO_H */ Index: linux-5.4.5-rt3/arch/arm/kernel/asm-offsets.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/asm-offsets.c +++ linux-5.4.5-rt3/arch/arm/kernel/asm-offsets.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:56 @ int main(void) BLANK(); DEFINE(TI_FLAGS, offsetof(struct thread_info, flags)); DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count)); + DEFINE(TI_PREEMPT_LAZY, offsetof(struct thread_info, preempt_lazy_count)); DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit)); DEFINE(TI_TASK, offsetof(struct thread_info, task)); DEFINE(TI_CPU, offsetof(struct thread_info, cpu)); Index: linux-5.4.5-rt3/arch/arm/kernel/entry-armv.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/entry-armv.S +++ linux-5.4.5-rt3/arch/arm/kernel/entry-armv.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:214 @ __irq_svc: svc_entry irq_handler -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION ldr r8, [tsk, #TI_PREEMPT] @ get preempt count - ldr r0, [tsk, #TI_FLAGS] @ get flags teq r8, #0 @ if preempt count != 0 + bne 1f @ return from exeption + ldr r0, [tsk, #TI_FLAGS] @ get flags + tst r0, #_TIF_NEED_RESCHED @ if NEED_RESCHED is set + blne svc_preempt @ preempt! + + ldr r8, [tsk, #TI_PREEMPT_LAZY] @ get preempt lazy count + teq r8, #0 @ if preempt lazy count != 0 movne r0, #0 @ force flags to 0 - tst r0, #_TIF_NEED_RESCHED + tst r0, #_TIF_NEED_RESCHED_LAZY blne svc_preempt +1: #endif svc_exit r5, irq = 1 @ return from exception @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:236 @ ENDPROC(__irq_svc) .ltorg -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION svc_preempt: mov r8, lr 1: bl preempt_schedule_irq @ irq en/disable is done inside ldr r0, [tsk, #TI_FLAGS] @ get new tasks TI_FLAGS tst r0, #_TIF_NEED_RESCHED + bne 1b + tst r0, #_TIF_NEED_RESCHED_LAZY reteq r8 @ go again - b 1b + ldr r0, [tsk, #TI_PREEMPT_LAZY] @ get preempt lazy count + teq r0, #0 @ if preempt lazy count != 0 + beq 1b + ret r8 @ go again + #endif __und_fault: Index: linux-5.4.5-rt3/arch/arm/kernel/entry-common.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/entry-common.S +++ linux-5.4.5-rt3/arch/arm/kernel/entry-common.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:56 @ __ret_fast_syscall: cmp r2, #TASK_SIZE blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #((_TIF_SYSCALL_WORK | _TIF_WORK_MASK) & ~_TIF_SECCOMP) + bne fast_work_pending + tst r1, #_TIF_SECCOMP bne fast_work_pending @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:95 @ __ret_fast_syscall: cmp r2, #TASK_SIZE blne addr_limit_check_failed ldr r1, [tsk, #TI_FLAGS] @ re-check for syscall tracing - tst r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK + tst r1, #((_TIF_SYSCALL_WORK | _TIF_WORK_MASK) & ~_TIF_SECCOMP) + bne do_slower_path + tst r1, #_TIF_SECCOMP beq no_work_pending +do_slower_path: UNWIND(.fnend ) ENDPROC(ret_fast_syscall) Index: linux-5.4.5-rt3/arch/arm/kernel/signal.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/signal.c +++ linux-5.4.5-rt3/arch/arm/kernel/signal.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:652 @ do_work_pending(struct pt_regs *regs, un */ trace_hardirqs_off(); do { - if (likely(thread_flags & _TIF_NEED_RESCHED)) { + if (likely(thread_flags & (_TIF_NEED_RESCHED | + _TIF_NEED_RESCHED_LAZY))) { schedule(); } else { if (unlikely(!user_mode(regs))) Index: linux-5.4.5-rt3/arch/arm/kernel/smp.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/smp.c +++ linux-5.4.5-rt3/arch/arm/kernel/smp.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:685 @ void handle_IPI(int ipinr, struct pt_reg break; case IPI_CPU_BACKTRACE: - printk_nmi_enter(); irq_enter(); nmi_cpu_backtrace(regs); irq_exit(); - printk_nmi_exit(); break; default: Index: linux-5.4.5-rt3/arch/arm/kernel/traps.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/kernel/traps.c +++ linux-5.4.5-rt3/arch/arm/kernel/traps.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:251 @ void show_stack(struct task_struct *tsk, #ifdef CONFIG_PREEMPT #define S_PREEMPT " PREEMPT" +#elif defined(CONFIG_PREEMPT_RT) +#define S_PREEMPT " PREEMPT_RT" #else #define S_PREEMPT "" #endif Index: linux-5.4.5-rt3/arch/arm/mm/cache-v7.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/mm/cache-v7.S +++ linux-5.4.5-rt3/arch/arm/mm/cache-v7.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:138 @ flush_levels: and r1, r1, #7 @ mask of the bits for current cache only cmp r1, #2 @ see what cache we have at this level blt skip @ skip if no cache, or just i-cache -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION save_and_disable_irqs_notrace r9 @ make cssr&csidr read atomic #endif mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr isb @ isb to sych the new cssr&csidr mrc p15, 1, r1, c0, c0, 0 @ read the new csidr -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION restore_irqs_notrace r9 #endif and r2, r1, #7 @ extract the length of the cache lines Index: linux-5.4.5-rt3/arch/arm/mm/cache-v7m.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/mm/cache-v7m.S +++ linux-5.4.5-rt3/arch/arm/mm/cache-v7m.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:186 @ flush_levels: and r1, r1, #7 @ mask of the bits for current cache only cmp r1, #2 @ see what cache we have at this level blt skip @ skip if no cache, or just i-cache -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION save_and_disable_irqs_notrace r9 @ make cssr&csidr read atomic #endif write_csselr r10, r1 @ set current cache level isb @ isb to sych the new cssr&csidr read_ccsidr r1 @ read the new csidr -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION restore_irqs_notrace r9 #endif and r2, r1, #7 @ extract the length of the cache lines Index: linux-5.4.5-rt3/arch/arm/mm/fault.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/mm/fault.c +++ linux-5.4.5-rt3/arch/arm/mm/fault.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:417 @ do_translation_fault(unsigned long addr, if (addr < TASK_SIZE) return do_page_fault(addr, fsr, regs); + if (interrupts_enabled(regs)) + local_irq_enable(); + if (user_mode(regs)) goto bad_area; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:487 @ do_translation_fault(unsigned long addr, static int do_sect_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) { + if (interrupts_enabled(regs)) + local_irq_enable(); + do_bad_area(addr, fsr, regs); return 0; } Index: linux-5.4.5-rt3/arch/arm/mm/highmem.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm/mm/highmem.c +++ linux-5.4.5-rt3/arch/arm/mm/highmem.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:34 @ static inline pte_t get_fixmap_pte(unsig return *ptep; } +static unsigned int fixmap_idx(int type) +{ + return FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); +} + void *kmap(struct page *page) { might_sleep(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:59 @ EXPORT_SYMBOL(kunmap); void *kmap_atomic(struct page *page) { + pte_t pte = mk_pte(page, kmap_prot); unsigned int idx; unsigned long vaddr; void *kmap; int type; - preempt_disable(); + preempt_disable_nort(); pagefault_disable(); if (!PageHighMem(page)) return page_address(page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:85 @ void *kmap_atomic(struct page *page) type = kmap_atomic_idx_push(); - idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); + idx = fixmap_idx(type); vaddr = __fix_to_virt(idx); #ifdef CONFIG_DEBUG_HIGHMEM /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:99 @ void *kmap_atomic(struct page *page) * in place, so the contained TLB flush ensures the TLB is updated * with the new mapping. */ - set_fixmap_pte(idx, mk_pte(page, kmap_prot)); +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = pte; +#endif + set_fixmap_pte(idx, pte); return (void *)vaddr; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:115 @ void __kunmap_atomic(void *kvaddr) if (kvaddr >= (void *)FIXADDR_START) { type = kmap_atomic_idx(); - idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); + idx = fixmap_idx(type); if (cache_is_vivt()) __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE); +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = __pte(0); +#endif #ifdef CONFIG_DEBUG_HIGHMEM BUG_ON(vaddr != __fix_to_virt(idx)); - set_fixmap_pte(idx, __pte(0)); #else (void) idx; /* to kill a warning */ #endif + set_fixmap_pte(idx, __pte(0)); kmap_atomic_idx_pop(); } else if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) { /* this address was obtained through kmap_high_get() */ kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)])); } pagefault_enable(); - preempt_enable(); + preempt_enable_nort(); } EXPORT_SYMBOL(__kunmap_atomic); void *kmap_atomic_pfn(unsigned long pfn) { + pte_t pte = pfn_pte(pfn, kmap_prot); unsigned long vaddr; int idx, type; struct page *page = pfn_to_page(pfn); - preempt_disable(); + preempt_disable_nort(); pagefault_disable(); if (!PageHighMem(page)) return page_address(page); type = kmap_atomic_idx_push(); - idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id(); + idx = fixmap_idx(type); vaddr = __fix_to_virt(idx); #ifdef CONFIG_DEBUG_HIGHMEM BUG_ON(!pte_none(get_fixmap_pte(vaddr))); #endif - set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot)); +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = pte; +#endif + set_fixmap_pte(idx, pte); return (void *)vaddr; } +#if defined CONFIG_PREEMPT_RT +void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) +{ + int i; + + /* + * Clear @prev's kmap_atomic mappings + */ + for (i = 0; i < prev_p->kmap_idx; i++) { + int idx = fixmap_idx(i); + + set_fixmap_pte(idx, __pte(0)); + } + /* + * Restore @next_p's kmap_atomic mappings + */ + for (i = 0; i < next_p->kmap_idx; i++) { + int idx = fixmap_idx(i); + + if (!pte_none(next_p->kmap_pte[i])) + set_fixmap_pte(idx, next_p->kmap_pte[i]); + } +} +#endif Index: linux-5.4.5-rt3/arch/arm64/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/Kconfig +++ linux-5.4.5-rt3/arch/arm64/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:38 @ config ARM64 select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_NMI_SAFE_CMPXCHG - select ARCH_INLINE_READ_LOCK if !PREEMPT - select ARCH_INLINE_READ_LOCK_BH if !PREEMPT - select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT - select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT - select ARCH_INLINE_READ_UNLOCK if !PREEMPT - select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT - select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT - select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPT - select ARCH_INLINE_WRITE_LOCK if !PREEMPT - select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT - select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT - select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT - select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT - select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT - select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT - select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT - select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPT - select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPT - select ARCH_INLINE_SPIN_LOCK if !PREEMPT - select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPT - select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPT - select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPT - select ARCH_INLINE_SPIN_UNLOCK if !PREEMPT - select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPT - select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPT - select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPT + select ARCH_INLINE_READ_LOCK if !PREEMPTION + select ARCH_INLINE_READ_LOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_TRYLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_LOCK_IRQSAVE if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_BH if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQ if !PREEMPTION + select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE if !PREEMPTION select ARCH_KEEP_MEMBLOCK select ARCH_USE_CMPXCHG_LOCKREF select ARCH_USE_QUEUED_RWLOCKS @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:72 @ config ARM64 select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 50000 || CC_IS_CLANG select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_RT select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT select ARCH_WANT_FRAME_POINTERS @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:163 @ config ARM64 select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_PREEMPT_LAZY select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_FUNCTION_ARG_ACCESS_API select HAVE_RCU_TABLE_FREE Index: linux-5.4.5-rt3/arch/arm64/crypto/sha256-glue.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/crypto/sha256-glue.c +++ linux-5.4.5-rt3/arch/arm64/crypto/sha256-glue.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:100 @ static int sha256_update_neon(struct sha * input when running on a preemptible kernel, but process the * data block by block instead. */ - if (IS_ENABLED(CONFIG_PREEMPT) && + if (IS_ENABLED(CONFIG_PREEMPTION) && chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) chunk = SHA256_BLOCK_SIZE - sctx->count % SHA256_BLOCK_SIZE; Index: linux-5.4.5-rt3/arch/arm64/include/asm/assembler.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/include/asm/assembler.h +++ linux-5.4.5-rt3/arch/arm64/include/asm/assembler.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:702 @ USER(\label, ic ivau, \tmp2) // invali * where <label> is optional, and marks the point where execution will resume * after a yield has been performed. If omitted, execution resumes right after * the endif_yield_neon invocation. Note that the entire sequence, including - * the provided patchup code, will be omitted from the image if CONFIG_PREEMPT - * is not defined. + * the provided patchup code, will be omitted from the image if + * CONFIG_PREEMPTION is not defined. * * As a convenience, in the case where no patchup code is required, the above * sequence may be abbreviated to @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:731 @ USER(\label, ic ivau, \tmp2) // invali .endm .macro if_will_cond_yield_neon -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION get_current_task x0 ldr x0, [x0, #TSK_TI_PREEMPT] sub x0, x0, #PREEMPT_DISABLE_OFFSET Index: linux-5.4.5-rt3/arch/arm64/include/asm/kvm_mmu.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/include/asm/kvm_mmu.h +++ linux-5.4.5-rt3/arch/arm64/include/asm/kvm_mmu.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:94 @ alternative_cb_end void kvm_update_va_mask(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst); +void kvm_compute_layout(void); static inline unsigned long __kern_hyp_va(unsigned long v) { Index: linux-5.4.5-rt3/arch/arm64/include/asm/preempt.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/include/asm/preempt.h +++ linux-5.4.5-rt3/arch/arm64/include/asm/preempt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:73 @ static inline bool __preempt_count_dec_a * interrupt occurring between the non-atomic READ_ONCE/WRITE_ONCE * pair. */ - return !pc || !READ_ONCE(ti->preempt_count); + if (!pc || !READ_ONCE(ti->preempt_count)) + return true; +#ifdef CONFIG_PREEMPT_LAZY + if ((pc & ~PREEMPT_NEED_RESCHED)) + return false; + if (current_thread_info()->preempt_lazy_count) + return false; + return test_thread_flag(TIF_NEED_RESCHED_LAZY); +#else + return false; +#endif } static inline bool should_resched(int preempt_offset) { +#ifdef CONFIG_PREEMPT_LAZY + u64 pc = READ_ONCE(current_thread_info()->preempt_count); + if (pc == preempt_offset) + return true; + + if ((pc & ~PREEMPT_NEED_RESCHED) != preempt_offset) + return false; + + if (current_thread_info()->preempt_lazy_count) + return false; + return test_thread_flag(TIF_NEED_RESCHED_LAZY); +#else u64 pc = READ_ONCE(current_thread_info()->preempt_count); return pc == preempt_offset; +#endif } -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION void preempt_schedule(void); #define __preempt_schedule() preempt_schedule() void preempt_schedule_notrace(void); #define __preempt_schedule_notrace() preempt_schedule_notrace() -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ #endif /* __ASM_PREEMPT_H */ Index: linux-5.4.5-rt3/arch/arm64/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/arm64/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:8 @ #ifndef __ASM_SPINLOCK_TYPES_H #define __ASM_SPINLOCK_TYPES_H -#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H) -# error "please don't include this file directly" -#endif - #include <asm-generic/qspinlock_types.h> #include <asm-generic/qrwlock_types.h> Index: linux-5.4.5-rt3/arch/arm64/include/asm/thread_info.h =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/include/asm/thread_info.h +++ linux-5.4.5-rt3/arch/arm64/include/asm/thread_info.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:32 @ struct thread_info { #ifdef CONFIG_ARM64_SW_TTBR0_PAN u64 ttbr0; /* saved TTBR0_EL1 */ #endif + int preempt_lazy_count; /* 0 => preemptable, <0 => bug */ union { u64 preempt_count; /* 0 => preemptible, <0 => bug */ struct { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:67 @ void arch_release_task_struct(struct tas #define TIF_FOREIGN_FPSTATE 3 /* CPU's FP state is not current's */ #define TIF_UPROBE 4 /* uprobe breakpoint or singlestep */ #define TIF_FSCHECK 5 /* Check FS is USER_DS on return */ +#define TIF_NEED_RESCHED_LAZY 6 #define TIF_NOHZ 7 #define TIF_SYSCALL_TRACE 8 /* syscall trace active */ #define TIF_SYSCALL_AUDIT 9 /* syscall auditing */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:88 @ void arch_release_task_struct(struct tas #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_FOREIGN_FPSTATE (1 << TIF_FOREIGN_FPSTATE) +#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY) #define _TIF_NOHZ (1 << TIF_NOHZ) #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:102 @ void arch_release_task_struct(struct tas #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \ - _TIF_UPROBE | _TIF_FSCHECK) + _TIF_UPROBE | _TIF_FSCHECK | _TIF_NEED_RESCHED_LAZY) +#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY) #define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ _TIF_NOHZ | _TIF_SYSCALL_EMU) Index: linux-5.4.5-rt3/arch/arm64/kernel/asm-offsets.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/asm-offsets.c +++ linux-5.4.5-rt3/arch/arm64/kernel/asm-offsets.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ int main(void) BLANK(); DEFINE(TSK_TI_FLAGS, offsetof(struct task_struct, thread_info.flags)); DEFINE(TSK_TI_PREEMPT, offsetof(struct task_struct, thread_info.preempt_count)); + DEFINE(TSK_TI_PREEMPT_LAZY, offsetof(struct task_struct, thread_info.preempt_lazy_count)); DEFINE(TSK_TI_ADDR_LIMIT, offsetof(struct task_struct, thread_info.addr_limit)); #ifdef CONFIG_ARM64_SW_TTBR0_PAN DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0)); Index: linux-5.4.5-rt3/arch/arm64/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/entry.S +++ linux-5.4.5-rt3/arch/arm64/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:672 @ el1_irq: irq_handler -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count alternative_if ARM64_HAS_IRQ_PRIO_MASKING /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:682 @ alternative_if ARM64_HAS_IRQ_PRIO_MASKIN mrs x0, daif orr x24, x24, x0 alternative_else_nop_endif - cbnz x24, 1f // preempt count != 0 || NMI return path - bl arm64_preempt_schedule_irq // irq en/disable is done inside + + cbz x24, 1f // (need_resched + count) == 0 + cbnz w24, 2f // count != 0 + + ldr w24, [tsk, #TSK_TI_PREEMPT_LAZY] // get preempt lazy count + cbnz w24, 2f // preempt lazy count != 0 + + ldr x0, [tsk, #TSK_TI_FLAGS] // get flags + tbz x0, #TIF_NEED_RESCHED_LAZY, 2f // needs rescheduling? 1: + bl arm64_preempt_schedule_irq // irq en/disable is done inside +2: #endif #ifdef CONFIG_ARM64_PSEUDO_NMI Index: linux-5.4.5-rt3/arch/arm64/kernel/fpsimd.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/fpsimd.c +++ linux-5.4.5-rt3/arch/arm64/kernel/fpsimd.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:216 @ static void sve_free(struct task_struct __sve_free(task); } +static void *sve_free_atomic(struct task_struct *task) +{ + void *sve_state = task->thread.sve_state; + + WARN_ON(test_tsk_thread_flag(task, TIF_SVE)); + + task->thread.sve_state = NULL; + return sve_state; +} + /* * TIF_SVE controls whether a task can use SVE without trapping while * in userspace, and also the way a task's FPSIMD/SVE state is stored @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1021 @ void fpsimd_thread_switch(struct task_st void fpsimd_flush_thread(void) { int vl, supported_vl; + void *mem = NULL; if (!system_supports_fpsimd()) return; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1034 @ void fpsimd_flush_thread(void) if (system_supports_sve()) { clear_thread_flag(TIF_SVE); - sve_free(current); + mem = sve_free_atomic(current); /* * Reset the task vector length as required. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1068 @ void fpsimd_flush_thread(void) } put_cpu_fpsimd_context(); + kfree(mem); } /* Index: linux-5.4.5-rt3/arch/arm64/kernel/signal.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/signal.c +++ linux-5.4.5-rt3/arch/arm64/kernel/signal.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:913 @ asmlinkage void do_notify_resume(struct /* Check valid user FS if needed */ addr_limit_user_check(); - if (thread_flags & _TIF_NEED_RESCHED) { + if (thread_flags & _TIF_NEED_RESCHED_MASK) { /* Unmask Debug and SError for the next task */ local_daif_restore(DAIF_PROCCTX_NOIRQ); Index: linux-5.4.5-rt3/arch/arm64/kernel/smp.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/smp.c +++ linux-5.4.5-rt3/arch/arm64/kernel/smp.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:34 @ #include <linux/of.h> #include <linux/irq_work.h> #include <linux/kexec.h> +#include <linux/kvm_host.h> #include <asm/alternative.h> #include <asm/atomic.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:43 @ #include <asm/cputype.h> #include <asm/cpu_ops.h> #include <asm/daifflags.h> +#include <asm/kvm_mmu.h> #include <asm/mmu_context.h> #include <asm/numa.h> #include <asm/pgtable.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:413 @ static void __init hyp_mode_check(void) "CPU: CPUs started in inconsistent modes"); else pr_info("CPU: All CPU(s) started at EL1\n"); + if (IS_ENABLED(CONFIG_KVM_ARM_HOST)) + kvm_compute_layout(); } void __init smp_cpus_done(unsigned int max_cpus) Index: linux-5.4.5-rt3/arch/arm64/kernel/traps.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kernel/traps.c +++ linux-5.4.5-rt3/arch/arm64/kernel/traps.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:146 @ void show_stack(struct task_struct *tsk, #ifdef CONFIG_PREEMPT #define S_PREEMPT " PREEMPT" +#elif defined(CONFIG_PREEMPT_RT) +#define S_PREEMPT " PREEMPT_RT" #else #define S_PREEMPT "" #endif + #define S_SMP " SMP" static int __die(const char *str, int err, struct pt_regs *regs) Index: linux-5.4.5-rt3/arch/arm64/kvm/va_layout.c =================================================================== --- linux-5.4.5-rt3.orig/arch/arm64/kvm/va_layout.c +++ linux-5.4.5-rt3/arch/arm64/kvm/va_layout.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:25 @ static u8 tag_lsb; static u64 tag_val; static u64 va_mask; -static void compute_layout(void) +__init void kvm_compute_layout(void) { phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start); u64 hyp_va_msb; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:113 @ void __init kvm_update_va_mask(struct al BUG_ON(nr_inst != 5); - if (!has_vhe() && !va_mask) - compute_layout(); - for (i = 0; i < nr_inst; i++) { u32 rd, rn, insn, oinsn; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:156 @ void kvm_patch_vector_branch(struct alt_ return; } - if (!va_mask) - compute_layout(); - /* * Compute HYP VA by using the same computation as kern_hyp_va() */ Index: linux-5.4.5-rt3/arch/c6x/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/c6x/kernel/entry.S +++ linux-5.4.5-rt3/arch/c6x/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #define DP B14 #define SP B15 -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION #define resume_kernel restore_all #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:290 @ work_notifysig: ;; is a little bit different ;; ENTRY(ret_from_exception) -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION MASK_INT B2 #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:560 @ ENDPROC(_nmi_handler) ;; ;; Jump to schedule() then return to ret_from_isr ;; -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION resume_kernel: GET_THREAD_INFO A12 LDW .D1T1 *+A12(THREAD_INFO_PREEMPT_COUNT),A1 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:585 @ preempt_schedule: B .S2 preempt_schedule_irq #endif ADDKPC .S2 preempt_schedule,B3,4 -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ ENTRY(enable_exception) DINT Index: linux-5.4.5-rt3/arch/csky/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/csky/kernel/entry.S +++ linux-5.4.5-rt3/arch/csky/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:280 @ ENTRY(csky_irq) zero_fp psrset ee -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION mov r9, sp /* Get current stack pointer */ bmaski r10, THREAD_SHIFT andn r9, r10 /* Get thread_info */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:297 @ ENTRY(csky_irq) mov a0, sp jbsr csky_do_IRQ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION subi r12, 1 stw r12, (r9, TINFO_PREEMPT) cmpnei r12, 0 Index: linux-5.4.5-rt3/arch/h8300/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/h8300/kernel/entry.S +++ linux-5.4.5-rt3/arch/h8300/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:287 @ badsys: mov.l er0,@(LER0:16,sp) bra resume_userspace -#if !defined(CONFIG_PREEMPT) +#if !defined(CONFIG_PREEMPTION) #define resume_kernel restore_all #endif ret_from_exception: -#if defined(CONFIG_PREEMPT) +#if defined(CONFIG_PREEMPTION) orc #0xc0,ccr #endif ret_from_interrupt: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:322 @ work_resched: restore_all: RESTORE_ALL /* Does RTE */ -#if defined(CONFIG_PREEMPT) +#if defined(CONFIG_PREEMPTION) resume_kernel: mov.l @(TI_PRE_COUNT:16,er4),er0 bne restore_all:8 Index: linux-5.4.5-rt3/arch/hexagon/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/hexagon/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/hexagon/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ #ifndef _ASM_SPINLOCK_TYPES_H #define _ASM_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { volatile unsigned int lock; } arch_spinlock_t; Index: linux-5.4.5-rt3/arch/hexagon/kernel/vm_entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/hexagon/kernel/vm_entry.S +++ linux-5.4.5-rt3/arch/hexagon/kernel/vm_entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:268 @ event_dispatch: * should be in the designated register (usually R19) * * If we were in kernel mode, we don't need to check scheduler - * or signals if CONFIG_PREEMPT is not set. If set, then it has + * or signals if CONFIG_PREEMPTION is not set. If set, then it has * to jump to a need_resched kind of block. - * BTW, CONFIG_PREEMPT is not supported yet. + * BTW, CONFIG_PREEMPTION is not supported yet. */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION R0 = #VM_INT_DISABLE trap1(#HVM_TRAP1_VMSETIE) #endif Index: linux-5.4.5-rt3/arch/ia64/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/ia64/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/ia64/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef _ASM_IA64_SPINLOCK_TYPES_H #define _ASM_IA64_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { volatile unsigned int lock; } arch_spinlock_t; Index: linux-5.4.5-rt3/arch/ia64/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/ia64/kernel/entry.S +++ linux-5.4.5-rt3/arch/ia64/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:673 @ GLOBAL_ENTRY(ia64_leave_syscall) * * p6 controls whether current_thread_info()->flags needs to be check for * extra work. We always check for extra work when returning to user-level. - * With CONFIG_PREEMPT, we also check for extra work when the preempt_count + * With CONFIG_PREEMPTION, we also check for extra work when the preempt_count * is 0. After extra work processing has been completed, execution * resumes at ia64_work_processed_syscall with p6 set to 1 if the extra-work-check * needs to be redone. */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION RSM_PSR_I(p0, r2, r18) // disable interrupts cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:688 @ GLOBAL_ENTRY(ia64_leave_syscall) (pUStk) mov r21=0 // r21 <- 0 ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) -#else /* !CONFIG_PREEMPT */ +#else /* !CONFIG_PREEMPTION */ RSM_PSR_I(pUStk, r2, r18) cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:817 @ GLOBAL_ENTRY(ia64_leave_kernel) * * p6 controls whether current_thread_info()->flags needs to be check for * extra work. We always check for extra work when returning to user-level. - * With CONFIG_PREEMPT, we also check for extra work when the preempt_count + * With CONFIG_PREEMPTION, we also check for extra work when the preempt_count * is 0. After extra work processing has been completed, execution * resumes at .work_processed_syscall with p6 set to 1 if the extra-work-check * needs to be redone. */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION RSM_PSR_I(p0, r17, r31) // disable interrupts cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1123 @ skip_rbs_switch: /* * On entry: - * r20 = ¤t->thread_info->pre_count (if CONFIG_PREEMPT) + * r20 = ¤t->thread_info->pre_count (if CONFIG_PREEMPTION) * r31 = current->thread_info->flags * On exit: * p6 = TRUE if work-pending-check needs to be redone Index: linux-5.4.5-rt3/arch/ia64/kernel/kprobes.c =================================================================== --- linux-5.4.5-rt3.orig/arch/ia64/kernel/kprobes.c +++ linux-5.4.5-rt3/arch/ia64/kernel/kprobes.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:844 @ static int __kprobes pre_kprobes_handler return 1; } -#if !defined(CONFIG_PREEMPT) +#if !defined(CONFIG_PREEMPTION) if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) { /* Boost up -- we can execute copied instructions directly */ ia64_psr(regs)->ri = p->ainsn.slot; Index: linux-5.4.5-rt3/arch/m68k/coldfire/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/m68k/coldfire/entry.S +++ linux-5.4.5-rt3/arch/m68k/coldfire/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:111 @ ret_from_exception: btst #5,%sp@(PT_OFF_SR) /* check if returning to kernel */ jeq Luser_return /* if so, skip resched, signals */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION movel %sp,%d1 /* get thread_info pointer */ andl #-THREAD_SIZE,%d1 /* at base of kernel stack */ movel %d1,%a0 Index: linux-5.4.5-rt3/arch/microblaze/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/microblaze/kernel/entry.S +++ linux-5.4.5-rt3/arch/microblaze/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:731 @ no_intr_resched: bri 6f; /* MS: Return to kernel state. */ 2: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION lwi r11, CURRENT_TASK, TS_THREAD_INFO; /* MS: get preempt_count from thread info */ lwi r5, r11, TI_PREEMPT_COUNT; Index: linux-5.4.5-rt3/arch/mips/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/mips/Kconfig +++ linux-5.4.5-rt3/arch/mips/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2589 @ config MIPS_CRC_SUPPORT # config HIGHMEM bool "High Memory Support" - depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA + depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA && !PREEMPT_RT config CPU_SUPPORTS_HIGHMEM bool Index: linux-5.4.5-rt3/arch/mips/include/asm/asmmacro.h =================================================================== --- linux-5.4.5-rt3.orig/arch/mips/include/asm/asmmacro.h +++ linux-5.4.5-rt3/arch/mips/include/asm/asmmacro.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:66 @ .endm .macro local_irq_disable reg=t0 -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION lw \reg, TI_PRE_COUNT($28) addi \reg, \reg, 1 sw \reg, TI_PRE_COUNT($28) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:76 @ xori \reg, \reg, 1 mtc0 \reg, CP0_STATUS irq_disable_hazard -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION lw \reg, TI_PRE_COUNT($28) addi \reg, \reg, -1 sw \reg, TI_PRE_COUNT($28) Index: linux-5.4.5-rt3/arch/mips/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/mips/kernel/entry.S +++ linux-5.4.5-rt3/arch/mips/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:22 @ #include <asm/thread_info.h> #include <asm/war.h> -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION #define resume_kernel restore_all #else #define __ret_from_irq ret_from_exception @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:30 @ .text .align 5 -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION FEXPORT(ret_from_exception) local_irq_disable # preempt stop b __ret_from_irq @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:56 @ resume_userspace: bnez t0, work_pending j restore_all -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION resume_kernel: local_irq_disable lw t0, TI_PRE_COUNT($28) Index: linux-5.4.5-rt3/arch/nds32/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/nds32/Kconfig +++ linux-5.4.5-rt3/arch/nds32/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ config GENERIC_HWEIGHT config GENERIC_LOCKBREAK def_bool y - depends on PREEMPT + depends on PREEMPTION config TRACE_IRQFLAGS_SUPPORT def_bool y Index: linux-5.4.5-rt3/arch/nds32/kernel/ex-exit.S =================================================================== --- linux-5.4.5-rt3.orig/arch/nds32/kernel/ex-exit.S +++ linux-5.4.5-rt3/arch/nds32/kernel/ex-exit.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:75 @ restore_user_regs_last .endm -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION .macro preempt_stop .endm #else @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:161 @ no_work_pending: /* * preemptive kernel */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION resume_kernel: gie_disable lwi $t0, [tsk+#TSK_TI_PREEMPT] Index: linux-5.4.5-rt3/arch/nios2/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/nios2/kernel/entry.S +++ linux-5.4.5-rt3/arch/nios2/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:368 @ ENTRY(ret_from_interrupt) ldw r1, PT_ESTATUS(sp) /* check if returning to kernel */ TSTBNZ r1, r1, ESTATUS_EU, Luser_return -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION GET_THREAD_INFO r1 ldw r4, TI_PREEMPT_COUNT(r1) bne r4, r0, restore_all Index: linux-5.4.5-rt3/arch/parisc/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/parisc/Kconfig +++ linux-5.4.5-rt3/arch/parisc/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:85 @ config STACK_GROWSUP config GENERIC_LOCKBREAK bool default y - depends on SMP && PREEMPT + depends on SMP && PREEMPTION config ARCH_HAS_ILOG2_U32 bool Index: linux-5.4.5-rt3/arch/parisc/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/parisc/kernel/entry.S +++ linux-5.4.5-rt3/arch/parisc/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:943 @ intr_restore: rfi nop -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION # define intr_do_preempt intr_restore -#endif /* !CONFIG_PREEMPT */ +#endif /* !CONFIG_PREEMPTION */ .import schedule,code intr_do_resched: /* Only call schedule on return to userspace. If we're returning - * to kernel space, we may schedule if CONFIG_PREEMPT, otherwise + * to kernel space, we may schedule if CONFIG_PREEMPTION, otherwise * we jump back to intr_restore. */ LDREG PT_IASQ0(%r16), %r20 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:982 @ intr_do_resched: * and preempt_count is 0. otherwise, we continue on * our merry way back to the current running task. */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION .import preempt_schedule_irq,code intr_do_preempt: rsm PSW_SM_I, %r0 /* disable interrupts */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1002 @ intr_do_preempt: nop b,n intr_restore /* ssm PSW_SM_I done by intr_restore */ -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ /* * External interrupts. Index: linux-5.4.5-rt3/arch/powerpc/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/Kconfig +++ linux-5.4.5-rt3/arch/powerpc/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:109 @ config LOCKDEP_SUPPORT config GENERIC_LOCKBREAK bool default y - depends on SMP && PREEMPT + depends on SMP && PREEMPTION config GENERIC_HWEIGHT bool @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:147 @ config PPC select ARCH_MIGHT_HAVE_PC_SERIO select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_SUPPORTS_ATOMIC_RMW + select ARCH_SUPPORTS_RT select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select ARCH_WANT_IPC_PARSE_VERSION @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:225 @ config PPC select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_PREEMPT_LAZY select HAVE_RCU_TABLE_FREE if SMP select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_MMU_GATHER_PAGE_SIZE @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:403 @ menu "Kernel options" config HIGHMEM bool "High memory support" - depends on PPC32 + depends on PPC32 && !PREEMPT_RT source "kernel/Kconfig.hz" Index: linux-5.4.5-rt3/arch/powerpc/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/powerpc/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef _ASM_POWERPC_SPINLOCK_TYPES_H #define _ASM_POWERPC_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { volatile unsigned int slock; } arch_spinlock_t; Index: linux-5.4.5-rt3/arch/powerpc/include/asm/stackprotector.h =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/include/asm/stackprotector.h +++ linux-5.4.5-rt3/arch/powerpc/include/asm/stackprotector.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:27 @ static __always_inline void boot_init_st unsigned long canary; /* Try to get a semi random initial value. */ +#ifdef CONFIG_PREEMPT_RT + canary = (unsigned long)&canary; +#else canary = get_random_canary(); +#endif canary ^= mftb(); canary ^= LINUX_VERSION_CODE; canary &= CANARY_MASK; Index: linux-5.4.5-rt3/arch/powerpc/include/asm/thread_info.h =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/include/asm/thread_info.h +++ linux-5.4.5-rt3/arch/powerpc/include/asm/thread_info.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ struct thread_info { int preempt_count; /* 0 => preemptable, <0 => BUG */ + int preempt_lazy_count; /* 0 => preemptable, + <0 => BUG */ unsigned long local_flags; /* private flags for thread */ #ifdef CONFIG_LIVEPATCH unsigned long *livepatch_sp; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:85 @ void arch_setup_new_exec(void); #define TIF_SINGLESTEP 8 /* singlestepping active */ #define TIF_NOHZ 9 /* in adaptive nohz mode */ #define TIF_SECCOMP 10 /* secure computing */ -#define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */ -#define TIF_NOERROR 12 /* Force successful syscall return */ + +#define TIF_NEED_RESCHED_LAZY 11 /* lazy rescheduling necessary */ +#define TIF_SYSCALL_TRACEPOINT 12 /* syscall tracepoint instrumentation */ + #define TIF_NOTIFY_RESUME 13 /* callback before returning to user */ #define TIF_UPROBE 14 /* breakpointed or single-stepping */ -#define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */ #define TIF_EMULATE_STACK_STORE 16 /* Is an instruction emulation for stack store? */ #define TIF_MEMDIE 17 /* is terminating due to OOM killer */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:99 @ void arch_setup_new_exec(void); #endif #define TIF_POLLING_NRFLAG 19 /* true if poll_idle() is polling TIF_NEED_RESCHED */ #define TIF_32BIT 20 /* 32 bit binary */ +#define TIF_RESTOREALL 21 /* Restore all regs (implies NOERROR) */ +#define TIF_NOERROR 22 /* Force successful syscall return */ + /* as above, but as bit values */ #define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:121 @ void arch_setup_new_exec(void); #define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT) #define _TIF_EMULATE_STACK_STORE (1<<TIF_EMULATE_STACK_STORE) #define _TIF_NOHZ (1<<TIF_NOHZ) +#define _TIF_NEED_RESCHED_LAZY (1<<TIF_NEED_RESCHED_LAZY) #define _TIF_FSCHECK (1<<TIF_FSCHECK) #define _TIF_SYSCALL_EMU (1<<TIF_SYSCALL_EMU) #define _TIF_SYSCALL_DOTRACE (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:131 @ void arch_setup_new_exec(void); #define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \ _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \ - _TIF_FSCHECK) + _TIF_FSCHECK | _TIF_NEED_RESCHED_LAZY) #define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR) +#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY) /* Bits in local_flags */ /* Don't move TLF_NAPPING without adjusting the code in entry_32.S */ Index: linux-5.4.5-rt3/arch/powerpc/kernel/asm-offsets.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/asm-offsets.c +++ linux-5.4.5-rt3/arch/powerpc/kernel/asm-offsets.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:170 @ int main(void) OFFSET(TI_FLAGS, thread_info, flags); OFFSET(TI_LOCAL_FLAGS, thread_info, local_flags); OFFSET(TI_PREEMPT, thread_info, preempt_count); + OFFSET(TI_PREEMPT_LAZY, thread_info, preempt_lazy_count); #ifdef CONFIG_PPC64 OFFSET(DCACHEL1BLOCKSIZE, ppc64_caches, l1d.block_size); Index: linux-5.4.5-rt3/arch/powerpc/kernel/entry_32.S =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/entry_32.S +++ linux-5.4.5-rt3/arch/powerpc/kernel/entry_32.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:403 @ ret_from_syscall: MTMSRD(r10) lwz r9,TI_FLAGS(r2) li r8,-MAX_ERRNO - andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) + lis r0,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@h + ori r0,r0, (_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@l + and. r0,r9,r0 bne- syscall_exit_work cmplw 0,r3,r8 blt+ syscall_exit_cont @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:520 @ syscall_dotrace: b syscall_dotrace_cont syscall_exit_work: - andi. r0,r9,_TIF_RESTOREALL + andis. r0,r9,_TIF_RESTOREALL@h beq+ 0f REST_NVGPRS(r1) b 2f 0: cmplw 0,r3,r8 blt+ 1f - andi. r0,r9,_TIF_NOERROR + andis. r0,r9,_TIF_NOERROR@h bne- 1f lwz r11,_CCR(r1) /* Load CR */ neg r3,r3 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:535 @ syscall_exit_work: 1: stw r6,RESULT(r1) /* Save result */ stw r3,GPR3(r1) /* Update return value */ -2: andi. r0,r9,(_TIF_PERSYSCALL_MASK) +2: andi. r0,r9,(_TIF_PERSYSCALL_MASK)@h beq 4f /* Clear per-syscall TIF flags if any are set. */ - li r11,_TIF_PERSYSCALL_MASK + li r11,_TIF_PERSYSCALL_MASK@h addi r12,r2,TI_FLAGS 3: lwarx r8,0,r12 andc r8,r8,r11 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:902 @ resume_kernel: bne- 0b 1: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* check current_thread_info->preempt_count */ lwz r0,TI_PREEMPT(r2) cmpwi 0,r0,0 /* if non-zero, just restore regs and return */ bne restore_kuap andi. r8,r8,_TIF_NEED_RESCHED + bne+ 1f + lwz r0,TI_PREEMPT_LAZY(r2) + cmpwi 0,r0,0 /* if non-zero, just restore regs and return */ + bne restore_kuap + lwz r0,TI_FLAGS(r2) + andi. r0,r0,_TIF_NEED_RESCHED_LAZY beq+ restore_kuap +1: lwz r3,_MSR(r1) andi. r0,r3,MSR_EE /* interrupts off? */ beq restore_kuap /* don't schedule if so */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:933 @ resume_kernel: */ bl trace_hardirqs_on #endif -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ restore_kuap: kuap_restore r1, r2, r9, r10, r0 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1236 @ global_dbcr0: #endif /* !(CONFIG_4xx || CONFIG_BOOKE) */ do_work: /* r10 contains MSR_KERNEL here */ - andi. r0,r9,_TIF_NEED_RESCHED + andi. r0,r9,_TIF_NEED_RESCHED_MASK beq do_user_signal do_resched: /* r10 contains MSR_KERNEL here */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1257 @ recheck: SYNC MTMSRD(r10) /* disable interrupts */ lwz r9,TI_FLAGS(r2) - andi. r0,r9,_TIF_NEED_RESCHED + andi. r0,r9,_TIF_NEED_RESCHED_MASK bne- do_resched andi. r0,r9,_TIF_USER_WORK_MASK beq restore_user Index: linux-5.4.5-rt3/arch/powerpc/kernel/entry_64.S =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/entry_64.S +++ linux-5.4.5-rt3/arch/powerpc/kernel/entry_64.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:243 @ system_call_exit: ld r9,TI_FLAGS(r12) li r11,-MAX_ERRNO - andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK) + lis r0,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@h + ori r0,r0,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)@l + and. r0,r9,r0 bne- .Lsyscall_exit_work andi. r0,r8,MSR_FP @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:368 @ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* If TIF_RESTOREALL is set, don't scribble on either r3 or ccr. If TIF_NOERROR is set, just save r3 as it is. */ - andi. r0,r9,_TIF_RESTOREALL + andis. r0,r9,_TIF_RESTOREALL@h beq+ 0f REST_NVGPRS(r1) b 2f 0: cmpld r3,r11 /* r11 is -MAX_ERRNO */ blt+ 1f - andi. r0,r9,_TIF_NOERROR + andis. r0,r9,_TIF_NOERROR@h bne- 1f ld r5,_CCR(r1) neg r3,r3 oris r5,r5,0x1000 /* Set SO bit in CR */ std r5,_CCR(r1) 1: std r3,GPR3(r1) -2: andi. r0,r9,(_TIF_PERSYSCALL_MASK) +2: andis. r0,r9,(_TIF_PERSYSCALL_MASK)@h beq 4f /* Clear per-syscall TIF flags if any are set. */ - li r11,_TIF_PERSYSCALL_MASK + lis r11,(_TIF_PERSYSCALL_MASK)@h addi r12,r12,TI_FLAGS 3: ldarx r10,0,r12 andc r10,r10,r11 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:791 @ _GLOBAL(ret_from_except_lite) bl restore_math b restore #endif -1: andi. r0,r4,_TIF_NEED_RESCHED +1: andi. r0,r4,_TIF_NEED_RESCHED_MASK beq 2f bl restore_interrupts SCHEDULE_USER @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:851 @ resume_kernel: bne- 0b 1: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* Check if we need to preempt */ + lwz r8,TI_PREEMPT(r9) + cmpwi 0,r8,0 /* if non-zero, just restore regs and return */ + bne restore andi. r0,r4,_TIF_NEED_RESCHED + bne+ check_count + + andi. r0,r4,_TIF_NEED_RESCHED_LAZY beq+ restore + lwz r8,TI_PREEMPT_LAZY(r9) + /* Check that preempt_count() == 0 and interrupts are enabled */ - lwz r8,TI_PREEMPT(r9) +check_count: cmpwi cr0,r8,0 bne restore ld r0,SOFTE(r1) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:890 @ resume_kernel: li r10,MSR_RI mtmsrd r10,1 /* Update machine state */ #endif /* CONFIG_PPC_BOOK3E */ -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ .globl fast_exc_return_irq fast_exc_return_irq: Index: linux-5.4.5-rt3/arch/powerpc/kernel/irq.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/irq.c +++ linux-5.4.5-rt3/arch/powerpc/kernel/irq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:682 @ void *mcheckirq_ctx[NR_CPUS] __read_most void *softirq_ctx[NR_CPUS] __read_mostly; void *hardirq_ctx[NR_CPUS] __read_mostly; +#ifndef CONFIG_PREEMPT_RT void do_softirq_own_stack(void) { call_do_softirq(softirq_ctx[smp_processor_id()]); } +#endif irq_hw_number_t virq_to_hw(unsigned int virq) { Index: linux-5.4.5-rt3/arch/powerpc/kernel/misc_32.S =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/misc_32.S +++ linux-5.4.5-rt3/arch/powerpc/kernel/misc_32.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:40 @ * We store the saved ksp_limit in the unused part * of the STACK_FRAME_OVERHEAD */ +#ifndef CONFIG_PREEMPT_RT _GLOBAL(call_do_softirq) mflr r0 stw r0,4(r1) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:56 @ _GLOBAL(call_do_softirq) stw r10,THREAD+KSP_LIMIT(r2) mtlr r0 blr +#endif /* * void call_do_irq(struct pt_regs *regs, void *sp); Index: linux-5.4.5-rt3/arch/powerpc/kernel/misc_64.S =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/misc_64.S +++ linux-5.4.5-rt3/arch/powerpc/kernel/misc_64.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:30 @ .text +#ifndef CONFIG_PREEMPT_RT _GLOBAL(call_do_softirq) mflr r0 std r0,16(r1) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:41 @ _GLOBAL(call_do_softirq) ld r0,16(r1) mtlr r0 blr +#endif _GLOBAL(call_do_irq) mflr r0 Index: linux-5.4.5-rt3/arch/powerpc/kernel/traps.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/traps.c +++ linux-5.4.5-rt3/arch/powerpc/kernel/traps.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:174 @ extern void panic_flush_kmsg_start(void) extern void panic_flush_kmsg_end(void) { - printk_safe_flush_on_panic(); kmsg_dump(KMSG_DUMP_PANIC); bust_spinlocks(0); debug_locks_off(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:263 @ static char *get_mmu_str(void) static int __die(const char *str, struct pt_regs *regs, long err) { + const char *pr = ""; + printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter); + if (IS_ENABLED(CONFIG_PREEMPTION)) + pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT"; + printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s %s\n", IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE", PAGE_SIZE / 1024, get_mmu_str(), - IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "", + pr, IS_ENABLED(CONFIG_SMP) ? " SMP" : "", IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "", debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "", Index: linux-5.4.5-rt3/arch/powerpc/kernel/watchdog.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kernel/watchdog.c +++ linux-5.4.5-rt3/arch/powerpc/kernel/watchdog.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:184 @ static void watchdog_smp_panic(int cpu, wd_smp_unlock(&flags); - printk_safe_flush(); - /* - * printk_safe_flush() seems to require another print - * before anything actually goes out to console. - */ if (sysctl_hardlockup_all_cpu_backtrace) trigger_allbutself_cpu_backtrace(); Index: linux-5.4.5-rt3/arch/powerpc/kvm/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/kvm/Kconfig +++ linux-5.4.5-rt3/arch/powerpc/kvm/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:181 @ config KVM_E500MC config KVM_MPIC bool "KVM in-kernel MPIC emulation" depends on KVM && E500 + depends on !PREEMPT_RT select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQFD select HAVE_KVM_IRQ_ROUTING Index: linux-5.4.5-rt3/arch/powerpc/platforms/ps3/device-init.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/platforms/ps3/device-init.c +++ linux-5.4.5-rt3/arch/powerpc/platforms/ps3/device-init.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:741 @ static int ps3_notification_read_write(s } pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op); - res = wait_event_interruptible(dev->done.wait, - dev->done.done || kthread_should_stop()); + res = swait_event_interruptible_exclusive(dev->done.wait, + dev->done.done || kthread_should_stop()); if (kthread_should_stop()) res = -EINTR; if (res) { Index: linux-5.4.5-rt3/arch/powerpc/platforms/pseries/iommu.c =================================================================== --- linux-5.4.5-rt3.orig/arch/powerpc/platforms/pseries/iommu.c +++ linux-5.4.5-rt3/arch/powerpc/platforms/pseries/iommu.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:27 @ #include <linux/of.h> #include <linux/iommu.h> #include <linux/rculist.h> +#include <linux/locallock.h> #include <asm/io.h> #include <asm/prom.h> #include <asm/rtas.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:182 @ static int tce_build_pSeriesLP(struct io } static DEFINE_PER_CPU(__be64 *, tce_page); +static DEFINE_LOCAL_IRQ_LOCK(tcp_page_lock); static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:203 @ static int tce_buildmulti_pSeriesLP(stru direction, attrs); } - local_irq_save(flags); /* to protect tcep and the page behind it */ + /* to protect tcep and the page behind it */ + local_lock_irqsave(tcp_page_lock, flags); tcep = __this_cpu_read(tce_page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:215 @ static int tce_buildmulti_pSeriesLP(stru tcep = (__be64 *)__get_free_page(GFP_ATOMIC); /* If allocation fails, fall back to the loop implementation */ if (!tcep) { - local_irq_restore(flags); + local_unlock_irqrestore(tcp_page_lock, flags); return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr, direction, attrs); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:249 @ static int tce_buildmulti_pSeriesLP(stru tcenum += limit; } while (npages > 0 && !rc); - local_irq_restore(flags); + local_unlock_irqrestore(tcp_page_lock, flags); if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) { ret = (int)rc; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:407 @ static int tce_setrange_multi_pSeriesLP( u64 rc = 0; long l, limit; - local_irq_disable(); /* to protect tcep and the page behind it */ + /* to protect tcep and the page behind it */ + local_lock_irq(tcp_page_lock); tcep = __this_cpu_read(tce_page); if (!tcep) { tcep = (__be64 *)__get_free_page(GFP_ATOMIC); if (!tcep) { - local_irq_enable(); + local_unlock_irq(tcp_page_lock); return -ENOMEM; } __this_cpu_write(tce_page, tcep); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:460 @ static int tce_setrange_multi_pSeriesLP( /* error cleanup: caller will clear whole range */ - local_irq_enable(); + local_unlock_irq(tcp_page_lock); return rc; } Index: linux-5.4.5-rt3/arch/riscv/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/riscv/kernel/entry.S +++ linux-5.4.5-rt3/arch/riscv/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:158 @ _save_context: REG_L x2, PT_SP(sp) .endm -#if !IS_ENABLED(CONFIG_PREEMPT) +#if !IS_ENABLED(CONFIG_PREEMPTION) .set resume_kernel, restore_all #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:272 @ restore_all: RESTORE_ALL sret -#if IS_ENABLED(CONFIG_PREEMPT) +#if IS_ENABLED(CONFIG_PREEMPTION) resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all Index: linux-5.4.5-rt3/arch/s390/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/s390/Kconfig +++ linux-5.4.5-rt3/arch/s390/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ config GENERIC_BUG_RELATIVE_POINTERS def_bool y config GENERIC_LOCKBREAK - def_bool y if PREEMPT + def_bool y if PREEMPTTION config PGSTE def_bool y if KVM Index: linux-5.4.5-rt3/arch/s390/include/asm/preempt.h =================================================================== --- linux-5.4.5-rt3.orig/arch/s390/include/asm/preempt.h +++ linux-5.4.5-rt3/arch/s390/include/asm/preempt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:133 @ static inline bool should_resched(int pr #endif /* CONFIG_HAVE_MARCH_Z196_FEATURES */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION extern asmlinkage void preempt_schedule(void); #define __preempt_schedule() preempt_schedule() extern asmlinkage void preempt_schedule_notrace(void); #define __preempt_schedule_notrace() preempt_schedule_notrace() -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ #endif /* __ASM_PREEMPT_H */ Index: linux-5.4.5-rt3/arch/s390/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/s390/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/s390/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef __ASM_SPINLOCK_TYPES_H #define __ASM_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { int lock; } __attribute__ ((aligned (4))) arch_spinlock_t; Index: linux-5.4.5-rt3/arch/s390/kernel/dumpstack.c =================================================================== --- linux-5.4.5-rt3.orig/arch/s390/kernel/dumpstack.c +++ linux-5.4.5-rt3/arch/s390/kernel/dumpstack.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:197 @ void die(struct pt_regs *regs, const cha regs->int_code >> 17, ++die_counter); #ifdef CONFIG_PREEMPT pr_cont("PREEMPT "); +#elif defined(CONFIG_PREEMPT_RT) + pr_cont("PREEMPT_RT "); #endif pr_cont("SMP "); if (debug_pagealloc_enabled()) Index: linux-5.4.5-rt3/arch/s390/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/s390/kernel/entry.S +++ linux-5.4.5-rt3/arch/s390/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:793 @ ENTRY(io_int_handler) .Lio_work: tm __PT_PSW+1(%r11),0x01 # returning to user ? jo .Lio_work_user # yes -> do resched & signal -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION # check for preemptive scheduling icm %r0,15,__LC_PREEMPT_COUNT jnz .Lio_restore # preemption is disabled Index: linux-5.4.5-rt3/arch/sh/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/sh/Kconfig +++ linux-5.4.5-rt3/arch/sh/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:111 @ config GENERIC_CALIBRATE_DELAY config GENERIC_LOCKBREAK def_bool y - depends on SMP && PREEMPT + depends on SMP && PREEMPTION config ARCH_SUSPEND_POSSIBLE def_bool n Index: linux-5.4.5-rt3/arch/sh/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/sh/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/sh/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef __ASM_SH_SPINLOCK_TYPES_H #define __ASM_SH_SPINLOCK_TYPES_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - typedef struct { volatile unsigned int lock; } arch_spinlock_t; Index: linux-5.4.5-rt3/arch/sh/kernel/cpu/sh5/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/sh/kernel/cpu/sh5/entry.S +++ linux-5.4.5-rt3/arch/sh/kernel/cpu/sh5/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:89 @ andi r6, ~0xf0, r6; \ putcon r6, SR; -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION # define preempt_stop() CLI() #else # define preempt_stop() @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:887 @ ret_from_exception: /* Check softirqs */ -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION pta ret_from_syscall, tr0 blink tr0, ZERO Index: linux-5.4.5-rt3/arch/sh/kernel/entry-common.S =================================================================== --- linux-5.4.5-rt3.orig/arch/sh/kernel/entry-common.S +++ linux-5.4.5-rt3/arch/sh/kernel/entry-common.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:44 @ */ #include <asm/dwarf.h> -#if defined(CONFIG_PREEMPT) +#if defined(CONFIG_PREEMPTION) # define preempt_stop() cli ; TRACE_IRQS_OFF #else # define preempt_stop() @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:87 @ ENTRY(ret_from_irq) get_current_thread_info r8, r0 bt resume_kernel ! Yes, it's from kernel, go back soon -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION bra resume_userspace nop ENTRY(resume_kernel) Index: linux-5.4.5-rt3/arch/sh/kernel/irq.c =================================================================== --- linux-5.4.5-rt3.orig/arch/sh/kernel/irq.c +++ linux-5.4.5-rt3/arch/sh/kernel/irq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:151 @ void irq_ctx_exit(int cpu) hardirq_ctx[cpu] = NULL; } +#ifndef CONFIG_PREEMPT_RT void do_softirq_own_stack(void) { struct thread_info *curctx; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:179 @ void do_softirq_own_stack(void) "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr" ); } +#endif #else static inline void handle_one_irq(unsigned int irq) { Index: linux-5.4.5-rt3/arch/sparc/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/sparc/Kconfig +++ linux-5.4.5-rt3/arch/sparc/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:280 @ config US3_MC config GENERIC_LOCKBREAK bool default y - depends on SPARC64 && SMP && PREEMPT + depends on SPARC64 && SMP && PREEMPTION config NUMA bool "NUMA support" Index: linux-5.4.5-rt3/arch/sparc/kernel/irq_64.c =================================================================== --- linux-5.4.5-rt3.orig/arch/sparc/kernel/irq_64.c +++ linux-5.4.5-rt3/arch/sparc/kernel/irq_64.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:857 @ void __irq_entry handler_irq(int pil, st set_irq_regs(old_regs); } +#ifndef CONFIG_PREEMPT_RT void do_softirq_own_stack(void) { void *orig_sp, *sp = softirq_stack[smp_processor_id()]; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:872 @ void do_softirq_own_stack(void) __asm__ __volatile__("mov %0, %%sp" : : "r" (orig_sp)); } +#endif #ifdef CONFIG_HOTPLUG_CPU void fixup_irqs(void) Index: linux-5.4.5-rt3/arch/sparc/kernel/rtrap_64.S =================================================================== --- linux-5.4.5-rt3.orig/arch/sparc/kernel/rtrap_64.S +++ linux-5.4.5-rt3/arch/sparc/kernel/rtrap_64.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:313 @ kern_rtt_restore: retry to_kernel: -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION ldsw [%g6 + TI_PRE_COUNT], %l5 brnz %l5, kern_fpucheck ldx [%g6 + TI_FLAGS], %l5 Index: linux-5.4.5-rt3/arch/x86/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/Kconfig +++ linux-5.4.5-rt3/arch/x86/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:93 @ config X86 select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 + select ARCH_SUPPORTS_RT select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:136 @ config X86 select HAVE_ALIGNED_STRUCT_PAGE if SLUB select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE - select HAVE_ARCH_JUMP_LABEL - select HAVE_ARCH_JUMP_LABEL_RELATIVE + select HAVE_ARCH_JUMP_LABEL if !PREEMPT_RT + select HAVE_ARCH_JUMP_LABEL_RELATIVE if !PREEMPT_RT select HAVE_ARCH_KASAN if X86_64 select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS if MMU @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:203 @ config X86 select HAVE_PCI select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_PREEMPT_LAZY select HAVE_RCU_TABLE_FREE if PARAVIRT select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION Index: linux-5.4.5-rt3/arch/x86/crypto/aesni-intel_glue.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/crypto/aesni-intel_glue.c +++ linux-5.4.5-rt3/arch/x86/crypto/aesni-intel_glue.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:390 @ static int ecb_encrypt(struct skcipher_r err = skcipher_walk_virt(&walk, req, true); - kernel_fpu_begin(); while ((nbytes = walk.nbytes)) { + kernel_fpu_begin(); aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK); + kernel_fpu_end(); nbytes &= AES_BLOCK_SIZE - 1; err = skcipher_walk_done(&walk, nbytes); } - kernel_fpu_end(); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:412 @ static int ecb_decrypt(struct skcipher_r err = skcipher_walk_virt(&walk, req, true); - kernel_fpu_begin(); while ((nbytes = walk.nbytes)) { + kernel_fpu_begin(); aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK); + kernel_fpu_end(); nbytes &= AES_BLOCK_SIZE - 1; err = skcipher_walk_done(&walk, nbytes); } - kernel_fpu_end(); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:434 @ static int cbc_encrypt(struct skcipher_r err = skcipher_walk_virt(&walk, req, true); - kernel_fpu_begin(); while ((nbytes = walk.nbytes)) { + kernel_fpu_begin(); aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK, walk.iv); + kernel_fpu_end(); nbytes &= AES_BLOCK_SIZE - 1; err = skcipher_walk_done(&walk, nbytes); } - kernel_fpu_end(); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:456 @ static int cbc_decrypt(struct skcipher_r err = skcipher_walk_virt(&walk, req, true); - kernel_fpu_begin(); while ((nbytes = walk.nbytes)) { + kernel_fpu_begin(); aesni_cbc_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK, walk.iv); + kernel_fpu_end(); nbytes &= AES_BLOCK_SIZE - 1; err = skcipher_walk_done(&walk, nbytes); } - kernel_fpu_end(); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:513 @ static int ctr_crypt(struct skcipher_req err = skcipher_walk_virt(&walk, req, true); - kernel_fpu_begin(); while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) { + kernel_fpu_begin(); aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes & AES_BLOCK_MASK, walk.iv); + kernel_fpu_end(); nbytes &= AES_BLOCK_SIZE - 1; err = skcipher_walk_done(&walk, nbytes); } if (walk.nbytes) { + kernel_fpu_begin(); ctr_crypt_final(ctx, &walk); + kernel_fpu_end(); err = skcipher_walk_done(&walk, 0); } - kernel_fpu_end(); return err; } Index: linux-5.4.5-rt3/arch/x86/crypto/cast5_avx_glue.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/crypto/cast5_avx_glue.c +++ linux-5.4.5-rt3/arch/x86/crypto/cast5_avx_glue.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:49 @ static inline void cast5_fpu_end(bool fp static int ecb_crypt(struct skcipher_request *req, bool enc) { - bool fpu_enabled = false; + bool fpu_enabled; struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ static int ecb_crypt(struct skcipher_req u8 *wsrc = walk.src.virt.addr; u8 *wdst = walk.dst.virt.addr; - fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes); + fpu_enabled = cast5_fpu_begin(false, &walk, nbytes); /* Process multi-block batch */ if (nbytes >= bsize * CAST5_PARALLEL_BLOCKS) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:93 @ static int ecb_crypt(struct skcipher_req } while (nbytes >= bsize); done: + cast5_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - - cast5_fpu_end(fpu_enabled); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:199 @ static int cbc_decrypt(struct skcipher_r { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm); - bool fpu_enabled = false; + bool fpu_enabled; struct skcipher_walk walk; unsigned int nbytes; int err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:207 @ static int cbc_decrypt(struct skcipher_r err = skcipher_walk_virt(&walk, req, false); while ((nbytes = walk.nbytes)) { - fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes); + fpu_enabled = cast5_fpu_begin(false, &walk, nbytes); nbytes = __cbc_decrypt(ctx, &walk); + cast5_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - - cast5_fpu_end(fpu_enabled); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:278 @ static int ctr_crypt(struct skcipher_req { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct cast5_ctx *ctx = crypto_skcipher_ctx(tfm); - bool fpu_enabled = false; + bool fpu_enabled; struct skcipher_walk walk; unsigned int nbytes; int err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:286 @ static int ctr_crypt(struct skcipher_req err = skcipher_walk_virt(&walk, req, false); while ((nbytes = walk.nbytes) >= CAST5_BLOCK_SIZE) { - fpu_enabled = cast5_fpu_begin(fpu_enabled, &walk, nbytes); + fpu_enabled = cast5_fpu_begin(false, &walk, nbytes); nbytes = __ctr_crypt(&walk, ctx); + cast5_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - cast5_fpu_end(fpu_enabled); - if (walk.nbytes) { ctr_crypt_final(&walk, ctx); err = skcipher_walk_done(&walk, 0); Index: linux-5.4.5-rt3/arch/x86/crypto/chacha_glue.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/crypto/chacha_glue.c +++ linux-5.4.5-rt3/arch/x86/crypto/chacha_glue.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:130 @ static int chacha_simd_stream_xor(struct const struct chacha_ctx *ctx, const u8 *iv) { u32 *state, state_buf[16 + 2] __aligned(8); - int next_yield = 4096; /* bytes until next FPU yield */ int err = 0; BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:142 @ static int chacha_simd_stream_xor(struct if (nbytes < walk->total) { nbytes = round_down(nbytes, walk->stride); - next_yield -= nbytes; } chacha_dosimd(state, walk->dst.virt.addr, walk->src.virt.addr, nbytes, ctx->nrounds); - if (next_yield <= 0) { - /* temporarily allow preemption */ - kernel_fpu_end(); - kernel_fpu_begin(); - next_yield = 4096; - } - + kernel_fpu_end(); err = skcipher_walk_done(walk, walk->nbytes - nbytes); + kernel_fpu_begin(); } return err; Index: linux-5.4.5-rt3/arch/x86/crypto/glue_helper.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/crypto/glue_helper.c +++ linux-5.4.5-rt3/arch/x86/crypto/glue_helper.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:27 @ int glue_ecb_req_128bit(const struct com void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req)); const unsigned int bsize = 128 / 8; struct skcipher_walk walk; - bool fpu_enabled = false; + bool fpu_enabled; unsigned int nbytes; int err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:40 @ int glue_ecb_req_128bit(const struct com unsigned int i; fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit, - &walk, fpu_enabled, nbytes); + &walk, false, nbytes); for (i = 0; i < gctx->num_funcs; i++) { func_bytes = bsize * gctx->funcs[i].num_blocks; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:58 @ int glue_ecb_req_128bit(const struct com if (nbytes < bsize) break; } + glue_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - - glue_fpu_end(fpu_enabled); return err; } EXPORT_SYMBOL_GPL(glue_ecb_req_128bit); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:103 @ int glue_cbc_decrypt_req_128bit(const st void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req)); const unsigned int bsize = 128 / 8; struct skcipher_walk walk; - bool fpu_enabled = false; + bool fpu_enabled; unsigned int nbytes; int err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:117 @ int glue_cbc_decrypt_req_128bit(const st u128 last_iv; fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit, - &walk, fpu_enabled, nbytes); + &walk, false, nbytes); /* Start of the last block. */ src += nbytes / bsize - 1; dst += nbytes / bsize - 1; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:149 @ int glue_cbc_decrypt_req_128bit(const st done: u128_xor(dst, dst, (u128 *)walk.iv); *(u128 *)walk.iv = last_iv; + glue_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - glue_fpu_end(fpu_enabled); return err; } EXPORT_SYMBOL_GPL(glue_cbc_decrypt_req_128bit); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:163 @ int glue_ctr_req_128bit(const struct com void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req)); const unsigned int bsize = 128 / 8; struct skcipher_walk walk; - bool fpu_enabled = false; + bool fpu_enabled; unsigned int nbytes; int err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:177 @ int glue_ctr_req_128bit(const struct com le128 ctrblk; fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit, - &walk, fpu_enabled, nbytes); + &walk, false, nbytes); be128_to_le128(&ctrblk, (be128 *)walk.iv); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:201 @ int glue_ctr_req_128bit(const struct com } le128_to_be128((be128 *)walk.iv, &ctrblk); + glue_fpu_end(fpu_enabled); err = skcipher_walk_done(&walk, nbytes); } - glue_fpu_end(fpu_enabled); - if (nbytes) { le128 ctrblk; u128 tmp; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:302 @ int glue_xts_req_128bit(const struct com tweak_fn(tweak_ctx, walk.iv, walk.iv); while (nbytes) { + fpu_enabled = glue_fpu_begin(bsize, gctx->fpu_blocks_limit, + &walk, fpu_enabled, + nbytes < bsize ? bsize : nbytes); nbytes = __glue_xts_req_128bit(gctx, crypt_ctx, &walk); + glue_fpu_end(fpu_enabled); + fpu_enabled = false; + err = skcipher_walk_done(&walk, nbytes); nbytes = walk.nbytes; } Index: linux-5.4.5-rt3/arch/x86/entry/common.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/entry/common.c +++ linux-5.4.5-rt3/arch/x86/entry/common.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:133 @ static long syscall_trace_enter(struct p #define EXIT_TO_USERMODE_LOOP_FLAGS \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) + _TIF_NEED_RESCHED_MASK | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:148 @ static void exit_to_usermode_loop(struct /* We have work to do. */ local_irq_enable(); - if (cached_flags & _TIF_NEED_RESCHED) + if (cached_flags & _TIF_NEED_RESCHED_MASK) schedule(); +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND + if (unlikely(current->forced_info.si_signo)) { + struct task_struct *t = current; + force_sig_info(&t->forced_info); + t->forced_info.si_signo = 0; + } +#endif if (cached_flags & _TIF_UPROBE) uprobe_notify_resume(regs); Index: linux-5.4.5-rt3/arch/x86/entry/entry_32.S =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/entry/entry_32.S +++ linux-5.4.5-rt3/arch/x86/entry/entry_32.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1109 @ restore_all: restore_all_kernel: #ifdef CONFIG_PREEMPTION DISABLE_INTERRUPTS(CLBR_ANY) + # preempt count == 0 + NEED_RS set? cmpl $0, PER_CPU_VAR(__preempt_count) +#ifndef CONFIG_PREEMPT_LAZY jnz .Lno_preempt +#else + jz test_int_off + + # atleast preempt count == 0 ? + cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count) + jne .Lno_preempt + + movl PER_CPU_VAR(current_task), %ebp + cmpl $0,TASK_TI_preempt_lazy_count(%ebp) # non-zero preempt_lazy_count ? + jnz .Lno_preempt + + testl $_TIF_NEED_RESCHED_LAZY, TASK_TI_flags(%ebp) + jz .Lno_preempt + +test_int_off: +#endif testl $X86_EFLAGS_IF, PT_EFLAGS(%esp) # interrupts off (exception path) ? jz .Lno_preempt call preempt_schedule_irq Index: linux-5.4.5-rt3/arch/x86/entry/entry_64.S =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/entry/entry_64.S +++ linux-5.4.5-rt3/arch/x86/entry/entry_64.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:673 @ retint_kernel: btl $9, EFLAGS(%rsp) /* were interrupts off? */ jnc 1f cmpl $0, PER_CPU_VAR(__preempt_count) +#ifndef CONFIG_PREEMPT_LAZY jnz 1f +#else + jz do_preempt_schedule_irq + + # atleast preempt count == 0 ? + cmpl $_PREEMPT_ENABLED,PER_CPU_VAR(__preempt_count) + jnz 1f + + movq PER_CPU_VAR(current_task), %rcx + cmpl $0, TASK_TI_preempt_lazy_count(%rcx) + jnz 1f + + btl $TIF_NEED_RESCHED_LAZY,TASK_TI_flags(%rcx) + jnc 1f +do_preempt_schedule_irq: +#endif call preempt_schedule_irq 1: #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1093 @ EXPORT_SYMBOL(native_load_gs_index) jmp 2b .previous +#ifndef CONFIG_PREEMPT_RT /* Call softirq on interrupt stack. Interrupts are off. */ ENTRY(do_softirq_own_stack) pushq %rbp @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1104 @ ENTRY(do_softirq_own_stack) leaveq ret ENDPROC(do_softirq_own_stack) +#endif #ifdef CONFIG_XEN_PV idtentry hypervisor_callback xen_do_hypervisor_callback has_error_code=0 Index: linux-5.4.5-rt3/arch/x86/include/asm/fpu/api.h =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/include/asm/fpu/api.h +++ linux-5.4.5-rt3/arch/x86/include/asm/fpu/api.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:26 @ extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); extern bool irq_fpu_usable(void); extern void fpregs_mark_activate(void); +extern void kernel_fpu_resched(void); /* * Use fpregs_lock() while editing CPU's FPU registers or fpu->state. Index: linux-5.4.5-rt3/arch/x86/include/asm/preempt.h =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/include/asm/preempt.h +++ linux-5.4.5-rt3/arch/x86/include/asm/preempt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:92 @ static __always_inline void __preempt_co * a decrement which hits zero means we have no preempt_count and should * reschedule. */ -static __always_inline bool __preempt_count_dec_and_test(void) +static __always_inline bool ____preempt_count_dec_and_test(void) { return GEN_UNARY_RMWcc("decl", __preempt_count, e, __percpu_arg([var])); } +static __always_inline bool __preempt_count_dec_and_test(void) +{ + if (____preempt_count_dec_and_test()) + return true; +#ifdef CONFIG_PREEMPT_LAZY + if (preempt_count()) + return false; + if (current_thread_info()->preempt_lazy_count) + return false; + return test_thread_flag(TIF_NEED_RESCHED_LAZY); +#else + return false; +#endif +} + /* * Returns true when we need to resched and can (barring IRQ state). */ static __always_inline bool should_resched(int preempt_offset) { +#ifdef CONFIG_PREEMPT_LAZY + u32 tmp; + tmp = raw_cpu_read_4(__preempt_count); + if (tmp == preempt_offset) + return true; + + /* preempt count == 0 ? */ + tmp &= ~PREEMPT_NEED_RESCHED; + if (tmp != preempt_offset) + return false; + /* XXX PREEMPT_LOCK_OFFSET */ + if (current_thread_info()->preempt_lazy_count) + return false; + return test_thread_flag(TIF_NEED_RESCHED_LAZY); +#else return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset); +#endif } #ifdef CONFIG_PREEMPTION Index: linux-5.4.5-rt3/arch/x86/include/asm/signal.h =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/include/asm/signal.h +++ linux-5.4.5-rt3/arch/x86/include/asm/signal.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:31 @ typedef struct { #define SA_IA32_ABI 0x02000000u #define SA_X32_ABI 0x01000000u +/* + * Because some traps use the IST stack, we must keep preemption + * disabled while calling do_trap(), but do_trap() may call + * force_sig_info() which will grab the signal spin_locks for the + * task, which in PREEMPT_RT are mutexes. By defining + * ARCH_RT_DELAYS_SIGNAL_SEND the force_sig_info() will set + * TIF_NOTIFY_RESUME and set up the signal to be sent on exit of the + * trap. + */ +#if defined(CONFIG_PREEMPT_RT) +#define ARCH_RT_DELAYS_SIGNAL_SEND +#endif + #ifndef CONFIG_COMPAT typedef sigset_t compat_sigset_t; #endif Index: linux-5.4.5-rt3/arch/x86/include/asm/stackprotector.h =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/include/asm/stackprotector.h +++ linux-5.4.5-rt3/arch/x86/include/asm/stackprotector.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:63 @ */ static __always_inline void boot_init_stack_canary(void) { - u64 canary; + u64 uninitialized_var(canary); u64 tsc; #ifdef CONFIG_X86_64 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:74 @ static __always_inline void boot_init_st * of randomness. The TSC only matters for very early init, * there it already has some randomness on most systems. Later * on during the bootup the random pool has true entropy too. + * For preempt-rt we need to weaken the randomness a bit, as + * we can't call into the random generator from atomic context + * due to locking constraints. We just leave canary + * uninitialized and use the TSC based randomness on top of it. */ +#ifndef CONFIG_PREEMPT_RT get_random_bytes(&canary, sizeof(canary)); +#endif tsc = rdtsc(); canary += tsc + (tsc << 32UL); canary &= CANARY_MASK; Index: linux-5.4.5-rt3/arch/x86/include/asm/thread_info.h =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/include/asm/thread_info.h +++ linux-5.4.5-rt3/arch/x86/include/asm/thread_info.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:59 @ struct task_struct; struct thread_info { unsigned long flags; /* low level flags */ u32 status; /* thread synchronous flags */ + int preempt_lazy_count; /* 0 => lazy preemptable + <0 => BUG */ }; #define INIT_THREAD_INFO(tsk) \ { \ .flags = 0, \ + .preempt_lazy_count = 0, \ } #else /* !__ASSEMBLY__ */ #include <asm/asm-offsets.h> +#define GET_THREAD_INFO(reg) \ + _ASM_MOV PER_CPU_VAR(cpu_current_top_of_stack),reg ; \ + _ASM_SUB $(THREAD_SIZE),reg ; + #endif /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:102 @ struct thread_info { #define TIF_NOCPUID 15 /* CPUID is not accessible in userland */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* IA32 compatibility process */ +#define TIF_NEED_RESCHED_LAZY 18 /* lazy rescheduling necessary */ #define TIF_NOHZ 19 /* in adaptive nohz mode */ #define TIF_MEMDIE 20 /* is terminating due to OOM killer */ #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:133 @ struct thread_info { #define _TIF_NOCPUID (1 << TIF_NOCPUID) #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) +#define _TIF_NEED_RESCHED_LAZY (1 << TIF_NEED_RESCHED_LAZY) #define _TIF_NOHZ (1 << TIF_NOHZ) #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG) #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:171 @ struct thread_info { #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) +#define _TIF_NEED_RESCHED_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY) + #define STACK_WARN (THREAD_SIZE/8) /* Index: linux-5.4.5-rt3/arch/x86/kernel/apic/io_apic.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/apic/io_apic.c +++ linux-5.4.5-rt3/arch/x86/kernel/apic/io_apic.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1728 @ static bool io_apic_level_ack_pending(st return false; } -static inline bool ioapic_irqd_mask(struct irq_data *data) +static inline bool ioapic_prepare_move(struct irq_data *data) { /* If we are moving the IRQ we need to mask it */ if (unlikely(irqd_is_setaffinity_pending(data))) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1739 @ static inline bool ioapic_irqd_mask(stru return false; } -static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked) +static inline void ioapic_finish_move(struct irq_data *data, bool moveit) { - if (unlikely(masked)) { + if (unlikely(moveit)) { /* Only migrate the irq if the ack has been received. * * On rare occasions the broadcast level triggered ack gets @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1776 @ static inline void ioapic_irqd_unmask(st } } #else -static inline bool ioapic_irqd_mask(struct irq_data *data) +static inline bool ioapic_prepare_move(struct irq_data *data) { return false; } -static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked) +static inline void ioapic_finish_move(struct irq_data *data, bool moveit) { } #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1789 @ static void ioapic_ack_level(struct irq_ { struct irq_cfg *cfg = irqd_cfg(irq_data); unsigned long v; - bool masked; + bool moveit; int i; irq_complete_move(cfg); - masked = ioapic_irqd_mask(irq_data); + moveit = ioapic_prepare_move(irq_data); /* * It appears there is an erratum which affects at least version 0x11 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1848 @ static void ioapic_ack_level(struct irq_ eoi_ioapic_pin(cfg->vector, irq_data->chip_data); } - ioapic_irqd_unmask(irq_data, masked); + ioapic_finish_move(irq_data, moveit); } static void ioapic_ir_ack_level(struct irq_data *irq_data) Index: linux-5.4.5-rt3/arch/x86/kernel/asm-offsets.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/asm-offsets.c +++ linux-5.4.5-rt3/arch/x86/kernel/asm-offsets.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:41 @ static void __used common(void) #endif BLANK(); +#ifdef CONFIG_PREEMPT_LAZY + OFFSET(TASK_TI_flags, task_struct, thread_info.flags); + OFFSET(TASK_TI_preempt_lazy_count, task_struct, thread_info.preempt_lazy_count); +#endif OFFSET(TASK_addr_limit, task_struct, thread.addr_limit); BLANK(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:99 @ static void __used common(void) BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); + DEFINE(_PREEMPT_ENABLED, PREEMPT_ENABLED); /* TLB state for the entry code */ OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask); Index: linux-5.4.5-rt3/arch/x86/kernel/cpu/mshyperv.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/cpu/mshyperv.c +++ linux-5.4.5-rt3/arch/x86/kernel/cpu/mshyperv.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:80 @ EXPORT_SYMBOL_GPL(hv_remove_vmbus_irq); __visible void __irq_entry hv_stimer0_vector_handler(struct pt_regs *regs) { struct pt_regs *old_regs = set_irq_regs(regs); + u64 ip = regs ? instruction_pointer(regs) : 0; entering_irq(); inc_irq_stat(hyperv_stimer0_count); if (hv_stimer0_handler) hv_stimer0_handler(); - add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0); + add_interrupt_randomness(HYPERV_STIMER0_VECTOR, 0, ip); ack_APIC_irq(); exiting_irq(); Index: linux-5.4.5-rt3/arch/x86/kernel/fpu/core.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/fpu/core.c +++ linux-5.4.5-rt3/arch/x86/kernel/fpu/core.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:116 @ void kernel_fpu_end(void) } EXPORT_SYMBOL_GPL(kernel_fpu_end); +void kernel_fpu_resched(void) +{ + WARN_ON_FPU(!this_cpu_read(in_kernel_fpu)); + + if (should_resched(PREEMPT_OFFSET)) { + kernel_fpu_end(); + cond_resched(); + kernel_fpu_begin(); + } +} +EXPORT_SYMBOL_GPL(kernel_fpu_resched); + /* * Save the FPU state (mark it for reload if necessary): * Index: linux-5.4.5-rt3/arch/x86/kernel/irq_32.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/irq_32.c +++ linux-5.4.5-rt3/arch/x86/kernel/irq_32.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:134 @ int irq_init_percpu_irqstack(unsigned in return 0; } +#ifndef CONFIG_PREEMPT_RT void do_softirq_own_stack(void) { struct irq_stack *irqstk; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:151 @ void do_softirq_own_stack(void) call_on_stack(__do_softirq, isp); } +#endif void handle_irq(struct irq_desc *desc, struct pt_regs *regs) { Index: linux-5.4.5-rt3/arch/x86/kernel/process_32.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kernel/process_32.c +++ linux-5.4.5-rt3/arch/x86/kernel/process_32.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:41 @ #include <linux/io.h> #include <linux/kdebug.h> #include <linux/syscalls.h> +#include <linux/highmem.h> #include <asm/pgtable.h> #include <asm/ldt.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:200 @ start_thread(struct pt_regs *regs, unsig } EXPORT_SYMBOL_GPL(start_thread); +#ifdef CONFIG_PREEMPT_RT +static void switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) +{ + int i; + + /* + * Clear @prev's kmap_atomic mappings + */ + for (i = 0; i < prev_p->kmap_idx; i++) { + int idx = i + KM_TYPE_NR * smp_processor_id(); + pte_t *ptep = kmap_pte - idx; + + kpte_clear_flush(ptep, __fix_to_virt(FIX_KMAP_BEGIN + idx)); + } + /* + * Restore @next_p's kmap_atomic mappings + */ + for (i = 0; i < next_p->kmap_idx; i++) { + int idx = i + KM_TYPE_NR * smp_processor_id(); + + if (!pte_none(next_p->kmap_pte[i])) + set_pte(kmap_pte - idx, next_p->kmap_pte[i]); + } +} +#else +static inline void +switch_kmaps(struct task_struct *prev_p, struct task_struct *next_p) { } +#endif + /* * switch_to(x,y) should switch tasks from x to y. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:299 @ __switch_to(struct task_struct *prev_p, switch_to_extra(prev_p, next_p); + switch_kmaps(prev_p, next_p); + /* * Leave lazy mode, flushing any hypercalls made here. * This must be done before restoring TLS segments so Index: linux-5.4.5-rt3/arch/x86/kvm/x86.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/kvm/x86.c +++ linux-5.4.5-rt3/arch/x86/kvm/x86.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:7210 @ int kvm_arch_init(void *opaque) goto out; } +#ifdef CONFIG_PREEMPT_RT + if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { + pr_err("RT requires X86_FEATURE_CONSTANT_TSC\n"); + r = -EOPNOTSUPP; + goto out; + } +#endif + r = -ENOMEM; x86_fpu_cache = kmem_cache_create("x86_fpu", sizeof(struct fpu), __alignof__(struct fpu), SLAB_ACCOUNT, Index: linux-5.4.5-rt3/arch/x86/mm/highmem_32.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/mm/highmem_32.c +++ linux-5.4.5-rt3/arch/x86/mm/highmem_32.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:36 @ EXPORT_SYMBOL(kunmap); */ void *kmap_atomic_prot(struct page *page, pgprot_t prot) { + pte_t pte = mk_pte(page, prot); unsigned long vaddr; int idx, type; - preempt_disable(); + preempt_disable_nort(); pagefault_disable(); if (!PageHighMem(page)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:50 @ void *kmap_atomic_prot(struct page *page idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); BUG_ON(!pte_none(*(kmap_pte-idx))); - set_pte(kmap_pte-idx, mk_pte(page, prot)); +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = pte; +#endif + set_pte(kmap_pte-idx, pte); arch_flush_lazy_mmu_mode(); return (void *)vaddr; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:96 @ void __kunmap_atomic(void *kvaddr) * is a bad idea also, in case the page changes cacheability * attributes or becomes a protected page in a hypervisor. */ +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = __pte(0); +#endif kpte_clear_flush(kmap_pte-idx, vaddr); kmap_atomic_idx_pop(); arch_flush_lazy_mmu_mode(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:111 @ void __kunmap_atomic(void *kvaddr) #endif pagefault_enable(); - preempt_enable(); + preempt_enable_nort(); } EXPORT_SYMBOL(__kunmap_atomic); Index: linux-5.4.5-rt3/arch/x86/mm/iomap_32.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/mm/iomap_32.c +++ linux-5.4.5-rt3/arch/x86/mm/iomap_32.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:49 @ EXPORT_SYMBOL_GPL(iomap_free); void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot) { + pte_t pte = pfn_pte(pfn, prot); unsigned long vaddr; int idx, type; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:59 @ void *kmap_atomic_prot_pfn(unsigned long type = kmap_atomic_idx_push(); idx = type + KM_TYPE_NR * smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); - set_pte(kmap_pte - idx, pfn_pte(pfn, prot)); + WARN_ON(!pte_none(*(kmap_pte - idx))); + +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = pte; +#endif + set_pte(kmap_pte - idx, pte); arch_flush_lazy_mmu_mode(); return (void *)vaddr; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:115 @ iounmap_atomic(void __iomem *kvaddr) * is a bad idea also, in case the page changes cacheability * attributes or becomes a protected page in a hypervisor. */ +#ifdef CONFIG_PREEMPT_RT + current->kmap_pte[type] = __pte(0); +#endif kpte_clear_flush(kmap_pte-idx, vaddr); kmap_atomic_idx_pop(); } Index: linux-5.4.5-rt3/arch/x86/mm/tlb.c =================================================================== --- linux-5.4.5-rt3.orig/arch/x86/mm/tlb.c +++ linux-5.4.5-rt3/arch/x86/mm/tlb.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:711 @ void native_flush_tlb_others(const struc (void *)info, 1); else on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func_remote, - (void *)info, 1, GFP_ATOMIC, cpumask); + (void *)info, 1, cpumask); } /* Index: linux-5.4.5-rt3/arch/xtensa/include/asm/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/arch/xtensa/include/asm/spinlock_types.h +++ linux-5.4.5-rt3/arch/xtensa/include/asm/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #ifndef __ASM_SPINLOCK_TYPES_H #define __ASM_SPINLOCK_TYPES_H -#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H) -# error "please don't include this file directly" -#endif - #include <asm-generic/qspinlock_types.h> #include <asm-generic/qrwlock_types.h> Index: linux-5.4.5-rt3/arch/xtensa/kernel/entry.S =================================================================== --- linux-5.4.5-rt3.orig/arch/xtensa/kernel/entry.S +++ linux-5.4.5-rt3/arch/xtensa/kernel/entry.S @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:523 @ common_exception_return: call4 schedule # void schedule (void) j 1b -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION 6: _bbci.l a4, TIF_NEED_RESCHED, 4f Index: linux-5.4.5-rt3/arch/xtensa/kernel/traps.c =================================================================== --- linux-5.4.5-rt3.orig/arch/xtensa/kernel/traps.c +++ linux-5.4.5-rt3/arch/xtensa/kernel/traps.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:527 @ DEFINE_SPINLOCK(die_lock); void die(const char * str, struct pt_regs * regs, long err) { static int die_counter; + const char *pr = ""; + + if (IS_ENABLED(CONFIG_PREEMPTION)) + pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT"; console_verbose(); spin_lock_irq(&die_lock); - pr_info("%s: sig: %ld [#%d]%s\n", str, err, ++die_counter, - IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : ""); + pr_info("%s: sig: %ld [#%d]%s\n", str, err, ++die_counter, pr); show_regs(regs); if (!user_mode(regs)) show_stack(NULL, (unsigned long*)regs->areg[1]); Index: linux-5.4.5-rt3/block/blk-ioc.c =================================================================== --- linux-5.4.5-rt3.orig/block/blk-ioc.c +++ linux-5.4.5-rt3/block/blk-ioc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:12 @ #include <linux/blkdev.h> #include <linux/slab.h> #include <linux/sched/task.h> +#include <linux/delay.h> #include "blk.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:119 @ static void ioc_release_fn(struct work_s spin_unlock(&q->queue_lock); } else { spin_unlock_irqrestore(&ioc->lock, flags); - cpu_relax(); + cpu_chill(); spin_lock_irqsave_nested(&ioc->lock, flags, 1); } } Index: linux-5.4.5-rt3/block/blk-mq.c =================================================================== --- linux-5.4.5-rt3.orig/block/blk-mq.c +++ linux-5.4.5-rt3/block/blk-mq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:614 @ static void __blk_mq_complete_request(st return; } - cpu = get_cpu(); + cpu = get_cpu_light(); + /* + * Avoid SMP function calls for completions because they acquire + * sleeping spinlocks on RT. + */ +#ifdef CONFIG_PREEMPT_RT + shared = true; +#else if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) shared = cpus_share_cache(cpu, ctx->cpu); +#endif if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) { rq->csd.func = __blk_mq_complete_request_remote; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:634 @ static void __blk_mq_complete_request(st } else { q->mq_ops->complete(rq); } - put_cpu(); + put_cpu_light(); } static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1477 @ static void __blk_mq_delay_run_hw_queue( return; if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) { - int cpu = get_cpu(); + int cpu = get_cpu_light(); if (cpumask_test_cpu(cpu, hctx->cpumask)) { __blk_mq_run_hw_queue(hctx); - put_cpu(); + put_cpu_light(); return; } - put_cpu(); + put_cpu_light(); } kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, Index: linux-5.4.5-rt3/block/blk-softirq.c =================================================================== --- linux-5.4.5-rt3.orig/block/blk-softirq.c +++ linux-5.4.5-rt3/block/blk-softirq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:45 @ static __latent_entropy void blk_done_so static void trigger_softirq(void *data) { struct request *rq = data; - unsigned long flags; struct list_head *list; - local_irq_save(flags); list = this_cpu_ptr(&blk_cpu_done); list_add_tail(&rq->ipi_list, list); if (list->next == &rq->ipi_list) raise_softirq_irqoff(BLOCK_SOFTIRQ); - - local_irq_restore(flags); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:90 @ static int blk_softirq_cpu_dead(unsigned this_cpu_ptr(&blk_cpu_done)); raise_softirq_irqoff(BLOCK_SOFTIRQ); local_irq_enable(); + preempt_check_resched_rt(); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:142 @ do_local: goto do_local; local_irq_restore(flags); + preempt_check_resched_rt(); } static __init int blk_softirq_init(void) Index: linux-5.4.5-rt3/crypto/cryptd.c =================================================================== --- linux-5.4.5-rt3.orig/crypto/cryptd.c +++ linux-5.4.5-rt3/crypto/cryptd.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:39 @ static struct workqueue_struct *cryptd_w struct cryptd_cpu_queue { struct crypto_queue queue; struct work_struct work; + spinlock_t qlock; }; struct cryptd_queue { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:109 @ static int cryptd_init_queue(struct cryp cpu_queue = per_cpu_ptr(queue->cpu_queue, cpu); crypto_init_queue(&cpu_queue->queue, max_cpu_qlen); INIT_WORK(&cpu_queue->work, cryptd_queue_worker); + spin_lock_init(&cpu_queue->qlock); } pr_info("cryptd: max_cpu_qlen set to %d\n", max_cpu_qlen); return 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:134 @ static int cryptd_enqueue_request(struct struct cryptd_cpu_queue *cpu_queue; refcount_t *refcnt; - cpu = get_cpu(); - cpu_queue = this_cpu_ptr(queue->cpu_queue); + cpu_queue = raw_cpu_ptr(queue->cpu_queue); + spin_lock_bh(&cpu_queue->qlock); + cpu = smp_processor_id(); + err = crypto_enqueue_request(&cpu_queue->queue, request); refcnt = crypto_tfm_ctx(request->tfm); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:153 @ static int cryptd_enqueue_request(struct refcount_inc(refcnt); out_put_cpu: - put_cpu(); + spin_unlock_bh(&cpu_queue->qlock); return err; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:169 @ static void cryptd_queue_worker(struct w cpu_queue = container_of(work, struct cryptd_cpu_queue, work); /* * Only handle one request at a time to avoid hogging crypto workqueue. - * preempt_disable/enable is used to prevent being preempted by - * cryptd_enqueue_request(). local_bh_disable/enable is used to prevent - * cryptd_enqueue_request() being accessed from software interrupts. */ - local_bh_disable(); - preempt_disable(); + spin_lock_bh(&cpu_queue->qlock); backlog = crypto_get_backlog(&cpu_queue->queue); req = crypto_dequeue_request(&cpu_queue->queue); - preempt_enable(); - local_bh_enable(); + spin_unlock_bh(&cpu_queue->qlock); if (!req) return; Index: linux-5.4.5-rt3/drivers/block/zram/zcomp.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/block/zram/zcomp.c +++ linux-5.4.5-rt3/drivers/block/zram/zcomp.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:116 @ ssize_t zcomp_available_show(const char struct zcomp_strm *zcomp_stream_get(struct zcomp *comp) { - return *get_cpu_ptr(comp->stream); + struct zcomp_strm *zstrm; + + zstrm = *get_local_ptr(comp->stream); + spin_lock(&zstrm->zcomp_lock); + return zstrm; } void zcomp_stream_put(struct zcomp *comp) { - put_cpu_ptr(comp->stream); + struct zcomp_strm *zstrm; + + zstrm = *this_cpu_ptr(comp->stream); + spin_unlock(&zstrm->zcomp_lock); + put_local_ptr(zstrm); } int zcomp_compress(struct zcomp_strm *zstrm, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:179 @ int zcomp_cpu_up_prepare(unsigned int cp pr_err("Can't allocate a compression stream\n"); return -ENOMEM; } + spin_lock_init(&zstrm->zcomp_lock); *per_cpu_ptr(comp->stream, cpu) = zstrm; return 0; } Index: linux-5.4.5-rt3/drivers/block/zram/zcomp.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/block/zram/zcomp.h +++ linux-5.4.5-rt3/drivers/block/zram/zcomp.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:13 @ struct zcomp_strm { /* compression/decompression buffer */ void *buffer; struct crypto_comp *tfm; + spinlock_t zcomp_lock; }; /* dynamic per-device compression frontend */ Index: linux-5.4.5-rt3/drivers/block/zram/zram_drv.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/block/zram/zram_drv.c +++ linux-5.4.5-rt3/drivers/block/zram/zram_drv.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:58 @ static void zram_free_page(struct zram * static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, u32 index, int offset, struct bio *bio); +#ifdef CONFIG_PREEMPT_RT +static void zram_meta_init_table_locks(struct zram *zram, size_t num_pages) +{ + size_t index; + + for (index = 0; index < num_pages; index++) + spin_lock_init(&zram->table[index].lock); +} + +static int zram_slot_trylock(struct zram *zram, u32 index) +{ + int ret; + + ret = spin_trylock(&zram->table[index].lock); + if (ret) + __set_bit(ZRAM_LOCK, &zram->table[index].flags); + return ret; +} + +static void zram_slot_lock(struct zram *zram, u32 index) +{ + spin_lock(&zram->table[index].lock); + __set_bit(ZRAM_LOCK, &zram->table[index].flags); +} + +static void zram_slot_unlock(struct zram *zram, u32 index) +{ + __clear_bit(ZRAM_LOCK, &zram->table[index].flags); + spin_unlock(&zram->table[index].lock); +} + +#else + +static void zram_meta_init_table_locks(struct zram *zram, size_t num_pages) { } static int zram_slot_trylock(struct zram *zram, u32 index) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:107 @ static void zram_slot_unlock(struct zram { bit_spin_unlock(ZRAM_LOCK, &zram->table[index].flags); } +#endif static inline bool init_done(struct zram *zram) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1193 @ static bool zram_meta_alloc(struct zram if (!huge_class_size) huge_class_size = zs_huge_class_size(zram->mem_pool); + zram_meta_init_table_locks(zram, num_pages); return true; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1256 @ static int __zram_bvec_read(struct zram unsigned long handle; unsigned int size; void *src, *dst; + struct zcomp_strm *zstrm; zram_slot_lock(zram, index); if (zram_test_flag(zram, index, ZRAM_WB)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1287 @ static int __zram_bvec_read(struct zram size = zram_get_obj_size(zram, index); + zstrm = zcomp_stream_get(zram->comp); src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO); if (size == PAGE_SIZE) { dst = kmap_atomic(page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1295 @ static int __zram_bvec_read(struct zram kunmap_atomic(dst); ret = 0; } else { - struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp); dst = kmap_atomic(page); ret = zcomp_decompress(zstrm, src, size, dst); kunmap_atomic(dst); - zcomp_stream_put(zram->comp); } zs_unmap_object(zram->mem_pool, handle); + zcomp_stream_put(zram->comp); zram_slot_unlock(zram, index); /* Should NEVER happen. Return bio error if it does. */ Index: linux-5.4.5-rt3/drivers/block/zram/zram_drv.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/block/zram/zram_drv.h +++ linux-5.4.5-rt3/drivers/block/zram/zram_drv.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:66 @ struct zram_table_entry { unsigned long element; }; unsigned long flags; + spinlock_t lock; #ifdef CONFIG_ZRAM_MEMORY_TRACKING ktime_t ac_time; #endif Index: linux-5.4.5-rt3/drivers/char/random.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/char/random.c +++ linux-5.4.5-rt3/drivers/char/random.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1308 @ static __u32 get_reg(struct fast_pool *f return *ptr; } -void add_interrupt_randomness(int irq, int irq_flags) +void add_interrupt_randomness(int irq, int irq_flags, __u64 ip) { struct entropy_store *r; struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness); - struct pt_regs *regs = get_irq_regs(); unsigned long now = jiffies; cycles_t cycles = random_get_entropy(); __u32 c_high, j_high; - __u64 ip; unsigned long seed; int credit = 0; if (cycles == 0) - cycles = get_reg(fast_pool, regs); + cycles = get_reg(fast_pool, NULL); c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0; j_high = (sizeof(now) > 4) ? now >> 32 : 0; fast_pool->pool[0] ^= cycles ^ j_high ^ irq; fast_pool->pool[1] ^= now ^ c_high; - ip = regs ? instruction_pointer(regs) : _RET_IP_; + if (!ip) + ip = _RET_IP_; fast_pool->pool[2] ^= ip; fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 : - get_reg(fast_pool, regs); + get_reg(fast_pool, NULL); fast_mix(fast_pool); add_interrupt_bench(cycles); Index: linux-5.4.5-rt3/drivers/char/tpm/tpm-dev-common.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/char/tpm/tpm-dev-common.c +++ linux-5.4.5-rt3/drivers/char/tpm/tpm-dev-common.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:23 @ #include "tpm-dev.h" static struct workqueue_struct *tpm_dev_wq; -static DEFINE_MUTEX(tpm_dev_wq_lock); static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space, u8 *buf, size_t bufsiz) Index: linux-5.4.5-rt3/drivers/char/tpm/tpm_tis.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/char/tpm/tpm_tis.c +++ linux-5.4.5-rt3/drivers/char/tpm/tpm_tis.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:52 @ static inline struct tpm_tis_tcg_phy *to return container_of(data, struct tpm_tis_tcg_phy, priv); } +#ifdef CONFIG_PREEMPT_RT +/* + * Flushes previous write operations to chip so that a subsequent + * ioread*()s won't stall a cpu. + */ +static inline void tpm_tis_flush(void __iomem *iobase) +{ + ioread8(iobase + TPM_ACCESS(0)); +} +#else +#define tpm_tis_flush(iobase) do { } while (0) +#endif + +static inline void tpm_tis_iowrite8(u8 b, void __iomem *iobase, u32 addr) +{ + iowrite8(b, iobase + addr); + tpm_tis_flush(iobase); +} + +static inline void tpm_tis_iowrite32(u32 b, void __iomem *iobase, u32 addr) +{ + iowrite32(b, iobase + addr); + tpm_tis_flush(iobase); +} + static bool interrupts = true; module_param(interrupts, bool, 0444); MODULE_PARM_DESC(interrupts, "Enable interrupts"); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:174 @ static int tpm_tcg_write_bytes(struct tp struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data); while (len--) - iowrite8(*value++, phy->iobase + addr); + tpm_tis_iowrite8(*value++, phy->iobase, addr); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:201 @ static int tpm_tcg_write32(struct tpm_ti { struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data); - iowrite32(value, phy->iobase + addr); + tpm_tis_iowrite32(value, phy->iobase, addr); return 0; } Index: linux-5.4.5-rt3/drivers/clocksource/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/clocksource/Kconfig +++ linux-5.4.5-rt3/drivers/clocksource/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:437 @ config ATMEL_TCB_CLKSRC help Support for Timer Counter Blocks on Atmel SoCs. +config ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK + bool "TC Block use 32 KiHz clock" + depends on ATMEL_TCB_CLKSRC + default y + help + Select this to use 32 KiHz base clock rate as TC block clock. + config CLKSRC_EXYNOS_MCT bool "Exynos multi core timer driver" if COMPILE_TEST depends on ARM || ARM64 Index: linux-5.4.5-rt3/drivers/clocksource/timer-atmel-tcb.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/clocksource/timer-atmel-tcb.c +++ linux-5.4.5-rt3/drivers/clocksource/timer-atmel-tcb.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:31 @ * this 32 bit free-running counter. the second channel is not used. * * - The third channel may be used to provide a 16-bit clockevent - * source, used in either periodic or oneshot mode. This runs - * at 32 KiHZ, and can handle delays of up to two seconds. + * source, used in either periodic or oneshot mode. * * REVISIT behavior during system suspend states... we should disable * all clocks and save the power. Easily done for clockevent devices, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:145 @ static unsigned long notrace tc_delay_ti struct tc_clkevt_device { struct clock_event_device clkevt; struct clk *clk; + bool clk_enabled; + u32 freq; void __iomem *regs; }; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:155 @ static struct tc_clkevt_device *to_tc_cl return container_of(clkevt, struct tc_clkevt_device, clkevt); } -/* For now, we always use the 32K clock ... this optimizes for NO_HZ, - * because using one of the divided clocks would usually mean the - * tick rate can never be less than several dozen Hz (vs 0.5 Hz). - * - * A divided clock could be good for high resolution timers, since - * 30.5 usec resolution can seem "low". - */ static u32 timer_clock; +static void tc_clk_disable(struct clock_event_device *d) +{ + struct tc_clkevt_device *tcd = to_tc_clkevt(d); + + clk_disable(tcd->clk); + tcd->clk_enabled = false; +} + +static void tc_clk_enable(struct clock_event_device *d) +{ + struct tc_clkevt_device *tcd = to_tc_clkevt(d); + + if (tcd->clk_enabled) + return; + clk_enable(tcd->clk); + tcd->clk_enabled = true; +} + static int tc_shutdown(struct clock_event_device *d) { struct tc_clkevt_device *tcd = to_tc_clkevt(d); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:182 @ static int tc_shutdown(struct clock_even writel(0xff, regs + ATMEL_TC_REG(2, IDR)); writel(ATMEL_TC_CLKDIS, regs + ATMEL_TC_REG(2, CCR)); + return 0; +} + +static int tc_shutdown_clk_off(struct clock_event_device *d) +{ + tc_shutdown(d); if (!clockevent_state_detached(d)) - clk_disable(tcd->clk); + tc_clk_disable(d); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:202 @ static int tc_set_oneshot(struct clock_e if (clockevent_state_oneshot(d) || clockevent_state_periodic(d)) tc_shutdown(d); - clk_enable(tcd->clk); + tc_clk_enable(d); - /* slow clock, count up to RC, then irq and stop */ + /* count up to RC, then irq and stop */ writel(timer_clock | ATMEL_TC_CPCSTOP | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP_AUTO, regs + ATMEL_TC_REG(2, CMR)); writel(ATMEL_TC_CPCS, regs + ATMEL_TC_REG(2, IER)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:224 @ static int tc_set_periodic(struct clock_ /* By not making the gentime core emulate periodic mode on top * of oneshot, we get lower overhead and improved accuracy. */ - clk_enable(tcd->clk); + tc_clk_enable(d); - /* slow clock, count up to RC, then irq and restart */ + /* count up to RC, then irq and restart */ writel(timer_clock | ATMEL_TC_WAVE | ATMEL_TC_WAVESEL_UP_AUTO, regs + ATMEL_TC_REG(2, CMR)); - writel((32768 + HZ / 2) / HZ, tcaddr + ATMEL_TC_REG(2, RC)); + writel((tcd->freq + HZ / 2) / HZ, tcaddr + ATMEL_TC_REG(2, RC)); /* Enable clock and interrupts on RC compare */ writel(ATMEL_TC_CPCS, regs + ATMEL_TC_REG(2, IER)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:255 @ static struct tc_clkevt_device clkevt = .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, /* Should be lower than at91rm9200's system timer */ +#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK .rating = 125, +#else + .rating = 200, +#endif .set_next_event = tc_next_event, - .set_state_shutdown = tc_shutdown, + .set_state_shutdown = tc_shutdown_clk_off, .set_state_periodic = tc_set_periodic, .set_state_oneshot = tc_set_oneshot, }, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:281 @ static irqreturn_t ch2_irq(int irq, void return IRQ_NONE; } -static int __init setup_clkevents(struct atmel_tc *tc, int clk32k_divisor_idx) +static const u8 atmel_tcb_divisors[5] = { 2, 8, 32, 128, 0, }; + +static int __init setup_clkevents(struct atmel_tc *tc, int divisor_idx) { + unsigned divisor = atmel_tcb_divisors[divisor_idx]; int ret; struct clk *t2_clk = tc->clk[2]; int irq = tc->irq[2]; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:306 @ static int __init setup_clkevents(struct clkevt.regs = tc->regs; clkevt.clk = t2_clk; - timer_clock = clk32k_divisor_idx; + timer_clock = divisor_idx; + if (!divisor) + clkevt.freq = 32768; + else + clkevt.freq = clk_get_rate(t2_clk) / divisor; clkevt.clkevt.cpumask = cpumask_of(0); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:321 @ static int __init setup_clkevents(struct return ret; } - clockevents_config_and_register(&clkevt.clkevt, 32768, 1, 0xffff); + clockevents_config_and_register(&clkevt.clkevt, clkevt.freq, 1, 0xffff); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:378 @ static void __init tcb_setup_single_chan writel(ATMEL_TC_SYNC, tcaddr + ATMEL_TC_BCR); } -static const u8 atmel_tcb_divisors[5] = { 2, 8, 32, 128, 0, }; - static const struct of_device_id atmel_tcb_of_match[] = { { .compatible = "atmel,at91rm9200-tcb", .data = (void *)16, }, { .compatible = "atmel,at91sam9x5-tcb", .data = (void *)32, }, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:497 @ static int __init tcb_clksrc_init(struct goto err_disable_t1; /* channel 2: periodic and oneshot timer support */ +#ifdef CONFIG_ATMEL_TCB_CLKSRC_USE_SLOW_CLOCK ret = setup_clkevents(&tc, clk32k_divisor_idx); +#else + ret = setup_clkevents(&tc, best_divisor_idx); +#endif if (ret) goto err_unregister_clksrc; Index: linux-5.4.5-rt3/drivers/connector/cn_proc.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/connector/cn_proc.c +++ linux-5.4.5-rt3/drivers/connector/cn_proc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #include <linux/pid_namespace.h> #include <linux/cn_proc.h> +#include <linux/locallock.h> /* * Size of a cn_msg followed by a proc_event structure. Since the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:44 @ static struct cb_id cn_proc_event_id = { /* proc_event_counts is used as the sequence number of the netlink message */ static DEFINE_PER_CPU(__u32, proc_event_counts) = { 0 }; +static DEFINE_LOCAL_IRQ_LOCK(send_msg_lock); static inline void send_msg(struct cn_msg *msg) { - preempt_disable(); + local_lock(send_msg_lock); msg->seq = __this_cpu_inc_return(proc_event_counts) - 1; ((struct proc_event *)msg->data)->cpu = smp_processor_id(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ static inline void send_msg(struct cn_ms */ cn_netlink_send(msg, 0, CN_IDX_PROC, GFP_NOWAIT); - preempt_enable(); + local_unlock(send_msg_lock); } void proc_fork_connector(struct task_struct *task) Index: linux-5.4.5-rt3/drivers/dma-buf/dma-buf.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/dma-buf/dma-buf.c +++ linux-5.4.5-rt3/drivers/dma-buf/dma-buf.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:217 @ static __poll_t dma_buf_poll(struct file return 0; retry: - seq = read_seqcount_begin(&resv->seq); + seq = read_seqbegin(&resv->seq); rcu_read_lock(); fobj = rcu_dereference(resv->fence); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:226 @ retry: else shared_count = 0; fence_excl = rcu_dereference(resv->fence_excl); - if (read_seqcount_retry(&resv->seq, seq)) { + if (read_seqretry(&resv->seq, seq)) { rcu_read_unlock(); goto retry; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1192 @ static int dma_buf_debug_show(struct seq robj = buf_obj->resv; while (true) { - seq = read_seqcount_begin(&robj->seq); + seq = read_seqbegin(&robj->seq); rcu_read_lock(); fobj = rcu_dereference(robj->fence); shared_count = fobj ? fobj->shared_count : 0; fence = rcu_dereference(robj->fence_excl); - if (!read_seqcount_retry(&robj->seq, seq)) + if (!read_seqretry(&robj->seq, seq)) break; rcu_read_unlock(); } Index: linux-5.4.5-rt3/drivers/dma-buf/dma-resv.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/dma-buf/dma-resv.c +++ linux-5.4.5-rt3/drivers/dma-buf/dma-resv.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:52 @ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); -struct lock_class_key reservation_seqcount_class; -EXPORT_SYMBOL(reservation_seqcount_class); - -const char reservation_seqcount_string[] = "reservation_seqcount"; -EXPORT_SYMBOL(reservation_seqcount_string); - /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:100 @ void dma_resv_init(struct dma_resv *obj) { ww_mutex_init(&obj->lock, &reservation_ww_class); - __seqcount_init(&obj->seq, reservation_seqcount_string, - &reservation_seqcount_class); + seqlock_init(&obj->seq); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:230 @ void dma_resv_add_shared_fence(struct dm fobj = dma_resv_get_list(obj); count = fobj->shared_count; - preempt_disable(); - write_seqcount_begin(&obj->seq); + write_seqlock(&obj->seq); for (i = 0; i < count; ++i) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:250 @ replace: /* pointer update must be visible before we extend the shared_count */ smp_store_mb(fobj->shared_count, count); - write_seqcount_end(&obj->seq); - preempt_enable(); + write_sequnlock(&obj->seq); dma_fence_put(old); } EXPORT_SYMBOL(dma_resv_add_shared_fence); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:277 @ void dma_resv_add_excl_fence(struct dma_ if (fence) dma_fence_get(fence); - preempt_disable(); - write_seqcount_begin(&obj->seq); - /* write_seqcount_begin provides the necessary memory barrier */ + write_seqlock(&obj->seq); + /* write_seqlock provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence); if (old) old->shared_count = 0; - write_seqcount_end(&obj->seq); - preempt_enable(); + write_sequnlock(&obj->seq); /* inplace update, no shared fences */ while (i--) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:360 @ retry: src_list = dma_resv_get_list(dst); old = dma_resv_get_excl(dst); - preempt_disable(); - write_seqcount_begin(&dst->seq); - /* write_seqcount_begin provides the necessary memory barrier */ + write_seqlock(&dst->seq); + /* write_seqlock provides the necessary memory barrier */ RCU_INIT_POINTER(dst->fence_excl, new); RCU_INIT_POINTER(dst->fence, dst_list); - write_seqcount_end(&dst->seq); - preempt_enable(); + write_sequnlock(&dst->seq); dma_resv_list_free(src_list); dma_fence_put(old); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:404 @ int dma_resv_get_fences_rcu(struct dma_r shared_count = i = 0; rcu_read_lock(); - seq = read_seqcount_begin(&obj->seq); + seq = read_seqbegin(&obj->seq); fence_excl = rcu_dereference(obj->fence_excl); if (fence_excl && !dma_fence_get_rcu(fence_excl)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:446 @ int dma_resv_get_fences_rcu(struct dma_r } } - if (i != shared_count || read_seqcount_retry(&obj->seq, seq)) { + if (i != shared_count || read_seqretry(&obj->seq, seq)) { while (i--) dma_fence_put(shared[i]); dma_fence_put(fence_excl); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:497 @ long dma_resv_wait_timeout_rcu(struct dm retry: shared_count = 0; - seq = read_seqcount_begin(&obj->seq); + seq = read_seqbegin(&obj->seq); rcu_read_lock(); i = -1; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:543 @ retry: rcu_read_unlock(); if (fence) { - if (read_seqcount_retry(&obj->seq, seq)) { + if (read_seqretry(&obj->seq, seq)) { dma_fence_put(fence); goto retry; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:597 @ bool dma_resv_test_signaled_rcu(struct d retry: ret = true; shared_count = 0; - seq = read_seqcount_begin(&obj->seq); + seq = read_seqbegin(&obj->seq); if (test_all) { unsigned i; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:617 @ retry: break; } - if (read_seqcount_retry(&obj->seq, seq)) + if (read_seqretry(&obj->seq, seq)) goto retry; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:629 @ retry: if (ret < 0) goto retry; - if (read_seqcount_retry(&obj->seq, seq)) + if (read_seqretry(&obj->seq, seq)) goto retry; } } Index: linux-5.4.5-rt3/drivers/firmware/efi/efi.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/firmware/efi/efi.c +++ linux-5.4.5-rt3/drivers/firmware/efi/efi.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:71 @ struct mm_struct efi_mm = { struct workqueue_struct *efi_rts_wq; -static bool disable_runtime; +static bool disable_runtime = IS_ENABLED(CONFIG_PREEMPT_RT); static int __init setup_noefi(char *arg) { disable_runtime = true; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:97 @ static int __init parse_efi_cmdline(char if (parse_option_str(str, "noruntime")) disable_runtime = true; + if (parse_option_str(str, "runtime")) + disable_runtime = false; + return 0; } early_param("efi", parse_efi_cmdline); Index: linux-5.4.5-rt3/drivers/gpu/drm/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/Kconfig +++ linux-5.4.5-rt3/drivers/gpu/drm/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:400 @ config DRM_R128 config DRM_I810 tristate "Intel I810" - # !PREEMPT because of missing ioctl locking + # !PREEMPTION because of missing ioctl locking depends on DRM && AGP && AGP_INTEL && (!PREEMPTION || BROKEN) help Choose this option if you have an Intel I810 graphics card. If M is Index: linux-5.4.5-rt3/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ linux-5.4.5-rt3/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:255 @ static int amdgpu_amdkfd_remove_eviction new->shared_count = k; /* Install the new fence list, seqcount provides the barriers */ - preempt_disable(); - write_seqcount_begin(&resv->seq); + write_seqlock(&resv->seq); RCU_INIT_POINTER(resv->fence, new); - write_seqcount_end(&resv->seq); - preempt_enable(); + write_sequnlock(&resv->seq); /* Drop the references to the removed fences or move them to ef_list */ for (i = j, k = 0; i < old->shared_count; ++i) { Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/display/intel_sprite.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/display/intel_sprite.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/display/intel_sprite.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:41 @ #include <drm/drm_plane_helper.h> #include <drm/drm_rect.h> #include <drm/i915_drm.h> +#include <linux/locallock.h> #include "i915_drv.h" #include "i915_trace.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:84 @ int intel_usecs_to_scanlines(const struc #define VBLANK_EVASION_TIME_US 100 #endif +static DEFINE_LOCAL_IRQ_LOCK(pipe_update_lock); + /** * intel_pipe_update_start() - start update of a set of display registers * @new_crtc_state: the new crtc state @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:135 @ void intel_pipe_update_start(const struc DRM_ERROR("PSR idle timed out 0x%x, atomic update may fail\n", psr_status); - local_irq_disable(); + local_lock_irq(pipe_update_lock); crtc->debug.min_vbl = min; crtc->debug.max_vbl = max; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:159 @ void intel_pipe_update_start(const struc break; } - local_irq_enable(); + local_unlock_irq(pipe_update_lock); timeout = schedule_timeout(timeout); - local_irq_disable(); + local_lock_irq(pipe_update_lock); } finish_wait(wq, &wait); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:196 @ void intel_pipe_update_start(const struc return; irq_disable: - local_irq_disable(); + local_lock_irq(pipe_update_lock); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:232 @ void intel_pipe_update_end(struct intel_ new_crtc_state->base.event = NULL; } - local_irq_enable(); + local_unlock_irq(pipe_update_lock); if (intel_vgpu_active(dev_priv)) return; Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/gem/i915_gem_busy.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/gem/i915_gem_busy.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/gem/i915_gem_busy.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:78 @ busy_check_writer(const struct dma_fence return __busy_set_if_active(fence, __busy_write_id); } - int i915_gem_busy_ioctl(struct drm_device *dev, void *data, struct drm_file *file) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:112 @ i915_gem_busy_ioctl(struct drm_device *d * */ retry: - seq = raw_read_seqcount(&obj->base.resv->seq); + /* XXX raw_read_seqcount() does not wait for the WRTIE to finish */ + seq = read_seqbegin(&obj->base.resv->seq); /* Translate the exclusive fence to the READ *and* WRITE engine */ args->busy = @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:132 @ retry: } } - if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq)) + if (args->busy && read_seqretry(&obj->base.resv->seq, seq)) goto retry; err = 0; Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:123 @ __dma_fence_signal__notify(struct dma_fe struct dma_fence_cb *cur, *tmp; lockdep_assert_held(fence->lock); - lockdep_assert_irqs_disabled(); list_for_each_entry_safe(cur, tmp, list, node) { INIT_LIST_HEAD(&cur->node); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:136 @ void intel_engine_breadcrumbs_irq(struct const ktime_t timestamp = ktime_get(); struct intel_context *ce, *cn; struct list_head *pos, *next; + unsigned long flags; LIST_HEAD(signal); - spin_lock(&b->irq_lock); + spin_lock_irqsave(&b->irq_lock, flags); if (b->irq_armed && list_empty(&b->signalers)) __intel_breadcrumbs_disarm_irq(b); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:185 @ void intel_engine_breadcrumbs_irq(struct } } - spin_unlock(&b->irq_lock); + spin_unlock_irqrestore(&b->irq_lock, flags); list_for_each_safe(pos, next, &signal) { struct i915_request *rq = list_entry(pos, typeof(*rq), signal_link); struct list_head cb_list; - spin_lock(&rq->lock); + spin_lock_irqsave(&rq->lock, flags); list_replace(&rq->fence.cb_list, &cb_list); __dma_fence_signal__timestamp(&rq->fence, timestamp); __dma_fence_signal__notify(&rq->fence, &cb_list); - spin_unlock(&rq->lock); + spin_unlock_irqrestore(&rq->lock, flags); i915_request_put(rq); } } -void intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine) -{ - local_irq_disable(); - intel_engine_breadcrumbs_irq(engine); - local_irq_enable(); -} - static void signal_irq_work(struct irq_work *work) { struct intel_engine_cs *engine = @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:271 @ void intel_engine_fini_breadcrumbs(struc bool i915_request_enable_breadcrumb(struct i915_request *rq) { lockdep_assert_held(&rq->lock); - lockdep_assert_irqs_disabled(); if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) { struct intel_breadcrumbs *b = &rq->engine->breadcrumbs; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:320 @ void i915_request_cancel_breadcrumb(stru struct intel_breadcrumbs *b = &rq->engine->breadcrumbs; lockdep_assert_held(&rq->lock); - lockdep_assert_irqs_disabled(); /* * We must wait for b->irq_lock so that we know the interrupt handler Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_engine.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/gt/intel_engine.h +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_engine.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:352 @ void intel_engine_init_execlists(struct void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine); void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine); -void intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine); void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine); static inline void Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_hangcheck.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/gt/intel_hangcheck.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_hangcheck.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:286 @ static void hangcheck_elapsed(struct wor for_each_engine(engine, gt->i915, id) { struct hangcheck hc; - intel_engine_signal_breadcrumbs(engine); + intel_engine_breadcrumbs_irq(engine); hangcheck_load_sample(engine, &hc); hangcheck_accumulate_sample(engine, &hc); Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_reset.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/gt/intel_reset.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/gt/intel_reset.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:698 @ static void reset_finish_engine(struct i engine->reset.finish(engine); intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL); - intel_engine_signal_breadcrumbs(engine); + intel_engine_breadcrumbs_irq(engine); } static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_irq.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/i915_irq.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_irq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:986 @ bool i915_get_crtc_scanoutpos(struct drm spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */ + preempt_disable_rt(); /* Get optional system timestamp before query. */ if (stime) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1038 @ bool i915_get_crtc_scanoutpos(struct drm *etime = ktime_get(); /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */ + preempt_enable_rt(); spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags); Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_request.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/i915_request.c +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_request.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:208 @ static void remove_from_engine(struct i9 * check that the rq still belongs to the newly locked engine. */ locked = READ_ONCE(rq->engine); - spin_lock(&locked->active.lock); + spin_lock_irq(&locked->active.lock); while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { spin_unlock(&locked->active.lock); spin_lock(&engine->active.lock); locked = engine; } list_del(&rq->sched.link); - spin_unlock(&locked->active.lock); + spin_unlock_irq(&locked->active.lock); } static bool i915_request_retire(struct i915_request *rq) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:275 @ static bool i915_request_retire(struct i active->retire(active, rq); } - local_irq_disable(); - /* * We only loosely track inflight requests across preemption, * and so we may find ourselves attempting to retire a _completed_ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:283 @ static bool i915_request_retire(struct i */ remove_from_engine(rq); - spin_lock(&rq->lock); + spin_lock_irq(&rq->lock); i915_request_mark_complete(rq); if (!i915_request_signaled(rq)) dma_fence_signal_locked(&rq->fence); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:298 @ static bool i915_request_retire(struct i __notify_execute_cb(rq); } GEM_BUG_ON(!list_empty(&rq->execute_cb)); - spin_unlock(&rq->lock); - - local_irq_enable(); + spin_unlock_irq(&rq->lock); remove_from_client(rq); list_del(&rq->link); Index: linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_trace.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/i915/i915_trace.h +++ linux-5.4.5-rt3/drivers/gpu/drm/i915/i915_trace.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ) #define _I915_TRACE_H_ +#ifdef CONFIG_PREEMPT_RT +#define NOTRACE +#endif + #include <linux/stringify.h> #include <linux/types.h> #include <linux/tracepoint.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:728 @ DEFINE_EVENT(i915_request, i915_request_ TP_ARGS(rq) ); -#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) && !defined(NOTRACE) DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq) Index: linux-5.4.5-rt3/drivers/gpu/drm/radeon/radeon_display.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/gpu/drm/radeon/radeon_display.c +++ linux-5.4.5-rt3/drivers/gpu/drm/radeon/radeon_display.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1822 @ int radeon_get_crtc_scanoutpos(struct dr struct radeon_device *rdev = dev->dev_private; /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */ + preempt_disable_rt(); /* Get optional system timestamp before query. */ if (stime) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1915 @ int radeon_get_crtc_scanoutpos(struct dr *etime = ktime_get(); /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */ + preempt_enable_rt(); /* Decode into vertical and horizontal scanout position. */ *vpos = position & 0x1fff; Index: linux-5.4.5-rt3/drivers/hv/hyperv_vmbus.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/hv/hyperv_vmbus.h +++ linux-5.4.5-rt3/drivers/hv/hyperv_vmbus.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #include <linux/atomic.h> #include <linux/hyperv.h> #include <linux/interrupt.h> +#include <linux/irq.h> #include "hv_trace.h" Index: linux-5.4.5-rt3/drivers/hv/vmbus_drv.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/hv/vmbus_drv.c +++ linux-5.4.5-rt3/drivers/hv/vmbus_drv.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:25 @ #include <linux/clockchips.h> #include <linux/cpu.h> #include <linux/sched/task_stack.h> +#include <linux/irq.h> #include <asm/mshyperv.h> #include <linux/delay.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1203 @ static void vmbus_isr(void) void *page_addr = hv_cpu->synic_event_page; struct hv_message *msg; union hv_synic_event_flags *event; + struct pt_regs *regs = get_irq_regs(); + u64 ip = regs ? instruction_pointer(regs) : 0; bool handled = false; if (unlikely(page_addr == NULL)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1249 @ static void vmbus_isr(void) tasklet_schedule(&hv_cpu->msg_dpc); } - add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0); + add_interrupt_randomness(HYPERVISOR_CALLBACK_VECTOR, 0, ip); } /* Index: linux-5.4.5-rt3/drivers/leds/trigger/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/leds/trigger/Kconfig +++ linux-5.4.5-rt3/drivers/leds/trigger/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:67 @ config LEDS_TRIGGER_BACKLIGHT config LEDS_TRIGGER_CPU bool "LED CPU Trigger" + depends on !PREEMPT_RT help This allows LEDs to be controlled by active CPUs. This shows the active CPUs across an array of LEDs so you can see which Index: linux-5.4.5-rt3/drivers/md/bcache/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/md/bcache/Kconfig +++ linux-5.4.5-rt3/drivers/md/bcache/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5 @ config BCACHE tristate "Block device as cache" + depends on !PREEMPT_RT select CRC64 help Allows a block device to be used as cache for other devices; uses Index: linux-5.4.5-rt3/drivers/md/raid5.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/md/raid5.c +++ linux-5.4.5-rt3/drivers/md/raid5.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2061 @ static void raid_run_ops(struct stripe_h struct raid5_percpu *percpu; unsigned long cpu; - cpu = get_cpu(); + cpu = get_cpu_light(); percpu = per_cpu_ptr(conf->percpu, cpu); + spin_lock(&percpu->lock); if (test_bit(STRIPE_OP_BIOFILL, &ops_request)) { ops_run_biofill(sh); overlap_clear++; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2122 @ static void raid_run_ops(struct stripe_h if (test_and_clear_bit(R5_Overlap, &dev->flags)) wake_up(&sh->raid_conf->wait_for_overlap); } - put_cpu(); + spin_unlock(&percpu->lock); + put_cpu_light(); } static void free_stripe(struct kmem_cache *sc, struct stripe_head *sh) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6826 @ static int raid456_cpu_up_prepare(unsign __func__, cpu); return -ENOMEM; } + spin_lock_init(&per_cpu_ptr(conf->percpu, cpu)->lock); return 0; } Index: linux-5.4.5-rt3/drivers/md/raid5.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/md/raid5.h +++ linux-5.4.5-rt3/drivers/md/raid5.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:637 @ struct r5conf { int recovery_disabled; /* per cpu variables */ struct raid5_percpu { + spinlock_t lock; /* Protection for -RT */ struct page *spare_page; /* Used when checking P/Q in raid6 */ void *scribble; /* space for constructing buffer * lists and performing address Index: linux-5.4.5-rt3/drivers/media/platform/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/media/platform/Kconfig +++ linux-5.4.5-rt3/drivers/media/platform/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:588 @ config VIDEO_MESON_G12A_AO_CEC config CEC_GPIO tristate "Generic GPIO-based CEC driver" - depends on PREEMPT || COMPILE_TEST + depends on PREEMPTION || COMPILE_TEST select CEC_CORE select CEC_PIN select GPIOLIB Index: linux-5.4.5-rt3/drivers/net/wireless/intersil/orinoco/orinoco_usb.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/net/wireless/intersil/orinoco/orinoco_usb.c +++ linux-5.4.5-rt3/drivers/net/wireless/intersil/orinoco/orinoco_usb.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:696 @ static void ezusb_req_ctx_wait(struct ez while (!ctx->done.done && msecs--) udelay(1000); } else { - wait_event_interruptible(ctx->done.wait, - ctx->done.done); + swait_event_interruptible_exclusive(ctx->done.wait, + ctx->done.done); } break; default: Index: linux-5.4.5-rt3/drivers/of/base.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/of/base.c +++ linux-5.4.5-rt3/drivers/of/base.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:126 @ int __weak of_node_to_nid(struct device_ } #endif -/* - * Assumptions behind phandle_cache implementation: - * - phandle property values are in a contiguous range of 1..n - * - * If the assumptions do not hold, then - * - the phandle lookup overhead reduction provided by the cache - * will likely be less - */ +#define OF_PHANDLE_CACHE_BITS 7 +#define OF_PHANDLE_CACHE_SZ BIT(OF_PHANDLE_CACHE_BITS) -static struct device_node **phandle_cache; -static u32 phandle_cache_mask; +static struct device_node *phandle_cache[OF_PHANDLE_CACHE_SZ]; -/* - * Caller must hold devtree_lock. - */ -static void __of_free_phandle_cache(void) +static u32 of_phandle_cache_hash(phandle handle) { - u32 cache_entries = phandle_cache_mask + 1; - u32 k; - - if (!phandle_cache) - return; - - for (k = 0; k < cache_entries; k++) - of_node_put(phandle_cache[k]); - - kfree(phandle_cache); - phandle_cache = NULL; + return hash_32(handle, OF_PHANDLE_CACHE_BITS); } -int of_free_phandle_cache(void) -{ - unsigned long flags; - - raw_spin_lock_irqsave(&devtree_lock, flags); - - __of_free_phandle_cache(); - - raw_spin_unlock_irqrestore(&devtree_lock, flags); - - return 0; -} -#if !defined(CONFIG_MODULES) -late_initcall_sync(of_free_phandle_cache); -#endif - /* * Caller must hold devtree_lock. */ -void __of_free_phandle_cache_entry(phandle handle) +void __of_phandle_cache_inv_entry(phandle handle) { - phandle masked_handle; + u32 handle_hash; struct device_node *np; if (!handle) return; - masked_handle = handle & phandle_cache_mask; - - if (phandle_cache) { - np = phandle_cache[masked_handle]; - if (np && handle == np->phandle) { - of_node_put(np); - phandle_cache[masked_handle] = NULL; - } - } -} - -void of_populate_phandle_cache(void) -{ - unsigned long flags; - u32 cache_entries; - struct device_node *np; - u32 phandles = 0; - - raw_spin_lock_irqsave(&devtree_lock, flags); - - __of_free_phandle_cache(); + handle_hash = of_phandle_cache_hash(handle); - for_each_of_allnodes(np) - if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) - phandles++; - - if (!phandles) - goto out; - - cache_entries = roundup_pow_of_two(phandles); - phandle_cache_mask = cache_entries - 1; - - phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache), - GFP_ATOMIC); - if (!phandle_cache) - goto out; - - for_each_of_allnodes(np) - if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) { - of_node_get(np); - phandle_cache[np->phandle & phandle_cache_mask] = np; - } - -out: - raw_spin_unlock_irqrestore(&devtree_lock, flags); + np = phandle_cache[handle_hash]; + if (np && handle == np->phandle) + phandle_cache[handle_hash] = NULL; } void __init of_core_init(void) { struct device_node *np; - of_populate_phandle_cache(); /* Create the kset, and register existing nodes */ mutex_lock(&of_mutex); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:167 @ void __init of_core_init(void) pr_err("failed to register existing nodes\n"); return; } - for_each_of_allnodes(np) + for_each_of_allnodes(np) { __of_attach_node_sysfs(np); + if (np->phandle && !phandle_cache[of_phandle_cache_hash(np->phandle)]) + phandle_cache[of_phandle_cache_hash(np->phandle)] = np; + } mutex_unlock(&of_mutex); /* Symlink in /proc as required by userspace ABI */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1152 @ struct device_node *of_find_node_by_phan { struct device_node *np = NULL; unsigned long flags; - phandle masked_handle; + u32 handle_hash; if (!handle) return NULL; - raw_spin_lock_irqsave(&devtree_lock, flags); + handle_hash = of_phandle_cache_hash(handle); - masked_handle = handle & phandle_cache_mask; + raw_spin_lock_irqsave(&devtree_lock, flags); - if (phandle_cache) { - if (phandle_cache[masked_handle] && - handle == phandle_cache[masked_handle]->phandle) - np = phandle_cache[masked_handle]; - if (np && of_node_check_flag(np, OF_DETACHED)) { - WARN_ON(1); /* did not uncache np on node removal */ - of_node_put(np); - phandle_cache[masked_handle] = NULL; - np = NULL; - } + if (phandle_cache[handle_hash] && + handle == phandle_cache[handle_hash]->phandle) + np = phandle_cache[handle_hash]; + if (np && of_node_check_flag(np, OF_DETACHED)) { + WARN_ON(1); /* did not uncache np on node removal */ + phandle_cache[handle_hash] = NULL; + np = NULL; } if (!np) { for_each_of_allnodes(np) if (np->phandle == handle && !of_node_check_flag(np, OF_DETACHED)) { - if (phandle_cache) { - /* will put when removed from cache */ - of_node_get(np); - phandle_cache[masked_handle] = np; - } + phandle_cache[handle_hash] = np; break; } } Index: linux-5.4.5-rt3/drivers/of/dynamic.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/of/dynamic.c +++ linux-5.4.5-rt3/drivers/of/dynamic.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:279 @ void __of_detach_node(struct device_node of_node_set_flag(np, OF_DETACHED); /* race with of_find_node_by_phandle() prevented by devtree_lock */ - __of_free_phandle_cache_entry(np->phandle); + __of_phandle_cache_inv_entry(np->phandle); } /** Index: linux-5.4.5-rt3/drivers/of/of_private.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/of/of_private.h +++ linux-5.4.5-rt3/drivers/of/of_private.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:88 @ int of_resolve_phandles(struct device_no #endif #if defined(CONFIG_OF_DYNAMIC) -void __of_free_phandle_cache_entry(phandle handle); +void __of_phandle_cache_inv_entry(phandle handle); #endif #if defined(CONFIG_OF_OVERLAY) void of_overlay_mutex_lock(void); void of_overlay_mutex_unlock(void); -int of_free_phandle_cache(void); -void of_populate_phandle_cache(void); #else static inline void of_overlay_mutex_lock(void) {}; static inline void of_overlay_mutex_unlock(void) {}; Index: linux-5.4.5-rt3/drivers/of/overlay.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/of/overlay.c +++ linux-5.4.5-rt3/drivers/of/overlay.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:977 @ static int of_overlay_apply(const void * goto err_free_overlay_changeset; } - of_populate_phandle_cache(); - ret = __of_changeset_apply_notify(&ovcs->cset); if (ret) pr_err("overlay apply changeset entry notify error %d\n", ret); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1219 @ int of_overlay_remove(int *ovcs_id) list_del(&ovcs->ovcs_list); - /* - * Disable phandle cache. Avoids race condition that would arise - * from removing cache entry when the associated node is deleted. - */ - of_free_phandle_cache(); - ret_apply = 0; ret = __of_changeset_revert_entries(&ovcs->cset, &ret_apply); - of_populate_phandle_cache(); - if (ret) { if (ret_apply) devicetree_state_flags |= DTSF_REVERT_FAIL; Index: linux-5.4.5-rt3/drivers/pci/switch/switchtec.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/pci/switch/switchtec.c +++ linux-5.4.5-rt3/drivers/pci/switch/switchtec.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:55 @ struct switchtec_user { enum mrpc_state state; - struct completion comp; + wait_queue_head_t cmd_comp; struct kref kref; struct list_head list; + bool cmd_done; u32 cmd; u32 status; u32 return_code; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:81 @ static struct switchtec_user *stuser_cre stuser->stdev = stdev; kref_init(&stuser->kref); INIT_LIST_HEAD(&stuser->list); - init_completion(&stuser->comp); + init_waitqueue_head(&stuser->cmd_comp); stuser->event_cnt = atomic_read(&stdev->event_cnt); dev_dbg(&stdev->dev, "%s: %p\n", __func__, stuser); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:179 @ static int mrpc_queue_cmd(struct switcht kref_get(&stuser->kref); stuser->read_len = sizeof(stuser->data); stuser_set_state(stuser, MRPC_QUEUED); - init_completion(&stuser->comp); + stuser->cmd_done = false; list_add_tail(&stuser->list, &stdev->mrpc_queue); mrpc_cmd_submit(stdev); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:226 @ static void mrpc_complete_cmd(struct swi memcpy_fromio(stuser->data, &stdev->mmio_mrpc->output_data, stuser->read_len); out: - complete_all(&stuser->comp); + stuser->cmd_done = true; + wake_up_interruptible(&stuser->cmd_comp); list_del_init(&stuser->list); stuser_put(stuser); stdev->mrpc_busy = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:499 @ static ssize_t switchtec_dev_read(struct mutex_unlock(&stdev->mrpc_mutex); if (filp->f_flags & O_NONBLOCK) { - if (!try_wait_for_completion(&stuser->comp)) + if (!READ_ONCE(stuser->cmd_done)) return -EAGAIN; } else { - rc = wait_for_completion_interruptible(&stuser->comp); + rc = wait_event_interruptible(stuser->cmd_comp, + stuser->cmd_done); if (rc < 0) return rc; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:551 @ static __poll_t switchtec_dev_poll(struc struct switchtec_dev *stdev = stuser->stdev; __poll_t ret = 0; - poll_wait(filp, &stuser->comp.wait, wait); + poll_wait(filp, &stuser->cmd_comp, wait); poll_wait(filp, &stdev->event_wq, wait); if (lock_mutex_and_test_alive(stdev)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:559 @ static __poll_t switchtec_dev_poll(struc mutex_unlock(&stdev->mrpc_mutex); - if (try_wait_for_completion(&stuser->comp)) + if (READ_ONCE(stuser->cmd_done)) ret |= EPOLLIN | EPOLLRDNORM; if (stuser->event_cnt != atomic_read(&stdev->event_cnt)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1112 @ static void stdev_kill(struct switchtec_ /* Wake up and kill any users waiting on an MRPC request */ list_for_each_entry_safe(stuser, tmpuser, &stdev->mrpc_queue, list) { - complete_all(&stuser->comp); + stuser->cmd_done = true; + wake_up_interruptible(&stuser->cmd_comp); list_del_init(&stuser->list); stuser_put(stuser); } Index: linux-5.4.5-rt3/drivers/scsi/fcoe/fcoe.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/scsi/fcoe/fcoe.c +++ linux-5.4.5-rt3/drivers/scsi/fcoe/fcoe.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1455 @ err2: static int fcoe_alloc_paged_crc_eof(struct sk_buff *skb, int tlen) { struct fcoe_percpu_s *fps; - int rc; + int rc, cpu = get_cpu_light(); - fps = &get_cpu_var(fcoe_percpu); + fps = &per_cpu(fcoe_percpu, cpu); rc = fcoe_get_paged_crc_eof(skb, tlen, fps); - put_cpu_var(fcoe_percpu); + put_cpu_light(); return rc; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1644 @ static inline int fcoe_filter_frames(str return 0; } - stats = per_cpu_ptr(lport->stats, get_cpu()); + stats = per_cpu_ptr(lport->stats, get_cpu_light()); stats->InvalidCRCCount++; if (stats->InvalidCRCCount < 5) printk(KERN_WARNING "fcoe: dropping frame with CRC error\n"); - put_cpu(); + put_cpu_light(); return -EINVAL; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1689 @ static void fcoe_recv_frame(struct sk_bu */ hp = (struct fcoe_hdr *) skb_network_header(skb); - stats = per_cpu_ptr(lport->stats, get_cpu()); + stats = per_cpu_ptr(lport->stats, get_cpu_light()); if (unlikely(FC_FCOE_DECAPS_VER(hp) != FC_FCOE_VER)) { if (stats->ErrorFrames < 5) printk(KERN_WARNING "fcoe: FCoE version " @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1721 @ static void fcoe_recv_frame(struct sk_bu goto drop; if (!fcoe_filter_frames(lport, fp)) { - put_cpu(); + put_cpu_light(); fc_exch_recv(lport, fp); return; } drop: stats->ErrorFrames++; - put_cpu(); + put_cpu_light(); kfree_skb(skb); } Index: linux-5.4.5-rt3/drivers/scsi/fcoe/fcoe_ctlr.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/scsi/fcoe/fcoe_ctlr.c +++ linux-5.4.5-rt3/drivers/scsi/fcoe/fcoe_ctlr.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:829 @ static unsigned long fcoe_ctlr_age_fcfs( INIT_LIST_HEAD(&del_list); - stats = per_cpu_ptr(fip->lp->stats, get_cpu()); + stats = per_cpu_ptr(fip->lp->stats, get_cpu_light()); list_for_each_entry_safe(fcf, next, &fip->fcfs, list) { deadline = fcf->time + fcf->fka_period + fcf->fka_period / 2; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:865 @ static unsigned long fcoe_ctlr_age_fcfs( sel_time = fcf->time; } } - put_cpu(); + put_cpu_light(); list_for_each_entry_safe(fcf, next, &del_list, list) { /* Removes fcf from current list */ Index: linux-5.4.5-rt3/drivers/scsi/libfc/fc_exch.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/scsi/libfc/fc_exch.c +++ linux-5.4.5-rt3/drivers/scsi/libfc/fc_exch.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:824 @ static struct fc_exch *fc_exch_em_alloc( } memset(ep, 0, sizeof(*ep)); - cpu = get_cpu(); + cpu = get_cpu_light(); pool = per_cpu_ptr(mp->pool, cpu); spin_lock_bh(&pool->lock); - put_cpu(); + put_cpu_light(); /* peek cache of free slot */ if (pool->left != FC_XID_UNKNOWN) { Index: linux-5.4.5-rt3/drivers/thermal/intel/x86_pkg_temp_thermal.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/thermal/intel/x86_pkg_temp_thermal.c +++ linux-5.4.5-rt3/drivers/thermal/intel/x86_pkg_temp_thermal.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:66 @ static int max_id __read_mostly; /* Array of zone pointers */ static struct zone_device **zones; /* Serializes interrupt notification, work and hotplug */ -static DEFINE_SPINLOCK(pkg_temp_lock); +static DEFINE_RAW_SPINLOCK(pkg_temp_lock); /* Protects zone operation in the work function against hotplug removal */ static DEFINE_MUTEX(thermal_zone_mutex); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:269 @ static void pkg_temp_thermal_threshold_w u64 msr_val, wr_val; mutex_lock(&thermal_zone_mutex); - spin_lock_irq(&pkg_temp_lock); + raw_spin_lock_irq(&pkg_temp_lock); ++pkg_work_cnt; zonedev = pkg_temp_thermal_get_dev(cpu); if (!zonedev) { - spin_unlock_irq(&pkg_temp_lock); + raw_spin_unlock_irq(&pkg_temp_lock); mutex_unlock(&thermal_zone_mutex); return; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:288 @ static void pkg_temp_thermal_threshold_w } enable_pkg_thres_interrupt(); - spin_unlock_irq(&pkg_temp_lock); + raw_spin_unlock_irq(&pkg_temp_lock); /* * If tzone is not NULL, then thermal_zone_mutex will prevent the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:313 @ static int pkg_thermal_notify(u64 msr_va struct zone_device *zonedev; unsigned long flags; - spin_lock_irqsave(&pkg_temp_lock, flags); + raw_spin_lock_irqsave(&pkg_temp_lock, flags); ++pkg_interrupt_cnt; disable_pkg_thres_interrupt(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:325 @ static int pkg_thermal_notify(u64 msr_va pkg_thermal_schedule_work(zonedev->cpu, &zonedev->work); } - spin_unlock_irqrestore(&pkg_temp_lock, flags); + raw_spin_unlock_irqrestore(&pkg_temp_lock, flags); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:371 @ static int pkg_temp_thermal_device_add(u zonedev->msr_pkg_therm_high); cpumask_set_cpu(cpu, &zonedev->cpumask); - spin_lock_irq(&pkg_temp_lock); + raw_spin_lock_irq(&pkg_temp_lock); zones[id] = zonedev; - spin_unlock_irq(&pkg_temp_lock); + raw_spin_unlock_irq(&pkg_temp_lock); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:410 @ static int pkg_thermal_cpu_offline(unsig } /* Protect against work and interrupts */ - spin_lock_irq(&pkg_temp_lock); + raw_spin_lock_irq(&pkg_temp_lock); /* * Check whether this cpu was the current target and store the new @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:442 @ static int pkg_thermal_cpu_offline(unsig * To cancel the work we need to drop the lock, otherwise * we might deadlock if the work needs to be flushed. */ - spin_unlock_irq(&pkg_temp_lock); + raw_spin_unlock_irq(&pkg_temp_lock); cancel_delayed_work_sync(&zonedev->work); - spin_lock_irq(&pkg_temp_lock); + raw_spin_lock_irq(&pkg_temp_lock); /* * If this is not the last cpu in the package and the work * did not run after we dropped the lock above, then we @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:455 @ static int pkg_thermal_cpu_offline(unsig pkg_thermal_schedule_work(target, &zonedev->work); } - spin_unlock_irq(&pkg_temp_lock); + raw_spin_unlock_irq(&pkg_temp_lock); /* Final cleanup if this is the last cpu */ if (lastcpu) Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250.h =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250.h +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:133 @ static inline void serial_dl_write(struc up->dl_write(up, value); } +static inline void serial8250_set_IER(struct uart_8250_port *up, + unsigned char ier) +{ + struct uart_port *port = &up->port; + unsigned int flags; + bool is_console; + + is_console = uart_console(port); + + if (is_console) + console_atomic_lock(&flags); + + serial_out(up, UART_IER, ier); + + if (is_console) + console_atomic_unlock(flags); +} + +static inline unsigned char serial8250_clear_IER(struct uart_8250_port *up) +{ + struct uart_port *port = &up->port; + unsigned int clearval = 0; + unsigned int prior; + unsigned int flags; + bool is_console; + + is_console = uart_console(port); + + if (up->capabilities & UART_CAP_UUE) + clearval = UART_IER_UUE; + + if (is_console) + console_atomic_lock(&flags); + + prior = serial_port_in(port, UART_IER); + serial_port_out(port, UART_IER, clearval); + + if (is_console) + console_atomic_unlock(flags); + + return prior; +} + static inline bool serial8250_set_THRI(struct uart_8250_port *up) { if (up->ier & UART_IER_THRI) return false; up->ier |= UART_IER_THRI; - serial_out(up, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); return true; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:190 @ static inline bool serial8250_clear_THRI if (!(up->ier & UART_IER_THRI)) return false; up->ier &= ~UART_IER_THRI; - serial_out(up, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); return true; } Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250_core.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250_core.c +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250_core.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:58 @ static struct uart_driver serial8250_reg static unsigned int skip_txen_test; /* force skip of txen test at init time */ -#define PASS_LIMIT 512 +/* + * On -rt we can have a more delays, and legitimately + * so - so don't drop work spuriously and spam the + * syslog: + */ +#ifdef CONFIG_PREEMPT_RT +# define PASS_LIMIT 1000000 +#else +# define PASS_LIMIT 512 +#endif #include <asm/serial.h> /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:287 @ static void serial8250_backup_timeout(st * Must disable interrupts or else we risk racing with the interrupt * based handler. */ - if (up->port.irq) { - ier = serial_in(up, UART_IER); - serial_out(up, UART_IER, 0); - } + if (up->port.irq) + ier = serial8250_clear_IER(up); iir = serial_in(up, UART_IIR); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:311 @ static void serial8250_backup_timeout(st serial8250_tx_chars(up); if (up->port.irq) - serial_out(up, UART_IER, ier); + serial8250_set_IER(up, ier); spin_unlock_irqrestore(&up->port.lock, flags); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:589 @ serial8250_register_ports(struct uart_dr #ifdef CONFIG_SERIAL_8250_CONSOLE +static void univ8250_console_write_atomic(struct console *co, const char *s, + unsigned int count) +{ + struct uart_8250_port *up = &serial8250_ports[co->index]; + + serial8250_console_write_atomic(up, s, count); +} + static void univ8250_console_write(struct console *co, const char *s, unsigned int count) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:682 @ static int univ8250_console_match(struct static struct console univ8250_console = { .name = "ttyS", + .write_atomic = univ8250_console_write_atomic, .write = univ8250_console_write, .device = uart_console_device, .setup = univ8250_console_setup, Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250_fsl.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250_fsl.c +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250_fsl.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:60 @ int fsl8250_handle_irq(struct uart_port /* Stop processing interrupts on input overrun */ if ((orig_lsr & UART_LSR_OE) && (up->overrun_backoff_time_ms > 0)) { + unsigned int ca_flags; unsigned long delay; + bool is_console; + is_console = uart_console(port); + + if (is_console) + console_atomic_lock(&ca_flags); up->ier = port->serial_in(port, UART_IER); + if (is_console) + console_atomic_unlock(ca_flags); + if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) { port->ops->stop_rx(port); } else { Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250_ingenic.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250_ingenic.c +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250_ingenic.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:149 @ OF_EARLYCON_DECLARE(x1000_uart, "ingenic static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value) { + unsigned int flags; + bool is_console; int ier; switch (offset) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:172 @ static void ingenic_uart_serial_out(stru * If we have enabled modem status IRQs we should enable * modem mode. */ + is_console = uart_console(p); + if (is_console) + console_atomic_lock(&flags); ier = p->serial_in(p, UART_IER); + if (is_console) + console_atomic_unlock(flags); if (ier & UART_IER_MSI) value |= UART_MCR_MDCE | UART_MCR_FCM; Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250_mtk.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250_mtk.c +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250_mtk.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:215 @ static void mtk8250_shutdown(struct uart static void mtk8250_disable_intrs(struct uart_8250_port *up, int mask) { - serial_out(up, UART_IER, serial_in(up, UART_IER) & (~mask)); + struct uart_port *port = &up->port; + unsigned int flags; + unsigned int ier; + bool is_console; + + is_console = uart_console(port); + + if (is_console) + console_atomic_lock(&flags); + + ier = serial_in(up, UART_IER); + serial_out(up, UART_IER, ier & (~mask)); + + if (is_console) + console_atomic_unlock(flags); } static void mtk8250_enable_intrs(struct uart_8250_port *up, int mask) { - serial_out(up, UART_IER, serial_in(up, UART_IER) | mask); + struct uart_port *port = &up->port; + unsigned int flags; + unsigned int ier; + + if (uart_console(port)) + console_atomic_lock(&flags); + + ier = serial_in(up, UART_IER); + serial_out(up, UART_IER, ier | mask); + + if (uart_console(port)) + console_atomic_unlock(flags); } static void mtk8250_set_flow_ctrl(struct uart_8250_port *up, int mode) Index: linux-5.4.5-rt3/drivers/tty/serial/8250/8250_port.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/8250/8250_port.c +++ linux-5.4.5-rt3/drivers/tty/serial/8250/8250_port.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:724 @ static void serial8250_set_sleep(struct serial_out(p, UART_EFR, UART_EFR_ECB); serial_out(p, UART_LCR, 0); } - serial_out(p, UART_IER, sleep ? UART_IERX_SLEEP : 0); + serial8250_set_IER(p, sleep ? UART_IERX_SLEEP : 0); if (p->capabilities & UART_CAP_EFR) { serial_out(p, UART_LCR, UART_LCR_CONF_MODE_B); serial_out(p, UART_EFR, efr); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1393 @ static void serial8250_stop_rx(struct ua up->ier &= ~(UART_IER_RLSI | UART_IER_RDI); up->port.read_status_mask &= ~UART_LSR_DR; - serial_port_out(port, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); serial8250_rpm_put(up); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1411 @ static void __do_stop_tx_rs485(struct ua serial8250_clear_and_reinit_fifos(p); p->ier |= UART_IER_RLSI | UART_IER_RDI; - serial_port_out(&p->port, UART_IER, p->ier); + serial8250_set_IER(p, p->ier); } } static enum hrtimer_restart serial8250_em485_handle_stop_tx(struct hrtimer *t) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1619 @ static void serial8250_disable_ms(struct mctrl_gpio_disable_ms(up->gpios); up->ier &= ~UART_IER_MSI; - serial_port_out(port, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); } static void serial8250_enable_ms(struct uart_port *port) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1635 @ static void serial8250_enable_ms(struct up->ier |= UART_IER_MSI; serial8250_rpm_get(up); - serial_port_out(port, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); serial8250_rpm_put(up); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2029 @ static void serial8250_put_poll_char(str struct uart_8250_port *up = up_to_u8250p(port); serial8250_rpm_get(up); - /* - * First save the IER then disable the interrupts - */ - ier = serial_port_in(port, UART_IER); - if (up->capabilities & UART_CAP_UUE) - serial_port_out(port, UART_IER, UART_IER_UUE); - else - serial_port_out(port, UART_IER, 0); + ier = serial8250_clear_IER(up); wait_for_xmitr(up, BOTH_EMPTY); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2042 @ static void serial8250_put_poll_char(str * and restore the IER */ wait_for_xmitr(up, BOTH_EMPTY); - serial_port_out(port, UART_IER, ier); + serial8250_set_IER(up, ier); serial8250_rpm_put(up); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2350 @ void serial8250_do_shutdown(struct uart_ */ spin_lock_irqsave(&port->lock, flags); up->ier = 0; - serial_port_out(port, UART_IER, 0); + serial8250_set_IER(up, 0); spin_unlock_irqrestore(&port->lock, flags); synchronize_irq(port->irq); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2635 @ serial8250_do_set_termios(struct uart_po if (up->capabilities & UART_CAP_RTOIE) up->ier |= UART_IER_RTOIE; - serial_port_out(port, UART_IER, up->ier); + serial8250_set_IER(up, up->ier); if (up->capabilities & UART_CAP_EFR) { unsigned char efr = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3099 @ EXPORT_SYMBOL_GPL(serial8250_set_default #ifdef CONFIG_SERIAL_8250_CONSOLE -static void serial8250_console_putchar(struct uart_port *port, int ch) +static void serial8250_console_putchar_locked(struct uart_port *port, int ch) { struct uart_8250_port *up = up_to_u8250p(port); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3107 @ static void serial8250_console_putchar(s serial_port_out(port, UART_TX, ch); } +static void serial8250_console_putchar(struct uart_port *port, int ch) +{ + struct uart_8250_port *up = up_to_u8250p(port); + unsigned int flags; + + wait_for_xmitr(up, UART_LSR_THRE); + + console_atomic_lock(&flags); + serial8250_console_putchar_locked(port, ch); + console_atomic_unlock(flags); +} + /* * Restore serial console when h/w power-off detected */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3140 @ static void serial8250_console_restore(s serial8250_out_MCR(up, UART_MCR_DTR | UART_MCR_RTS); } +void serial8250_console_write_atomic(struct uart_8250_port *up, + const char *s, unsigned int count) +{ + struct uart_port *port = &up->port; + unsigned int flags; + unsigned int ier; + + console_atomic_lock(&flags); + + touch_nmi_watchdog(); + + ier = serial8250_clear_IER(up); + + if (atomic_fetch_inc(&up->console_printing)) { + uart_console_write(port, "\n", 1, + serial8250_console_putchar_locked); + } + uart_console_write(port, s, count, serial8250_console_putchar_locked); + atomic_dec(&up->console_printing); + + wait_for_xmitr(up, BOTH_EMPTY); + serial8250_set_IER(up, ier); + + console_atomic_unlock(flags); +} + /* * Print a string to the serial port trying not to disturb * any possible real use of the port... @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3178 @ void serial8250_console_write(struct uar struct uart_port *port = &up->port; unsigned long flags; unsigned int ier; - int locked = 1; touch_nmi_watchdog(); serial8250_rpm_get(up); + spin_lock_irqsave(&port->lock, flags); - if (oops_in_progress) - locked = spin_trylock_irqsave(&port->lock, flags); - else - spin_lock_irqsave(&port->lock, flags); - - /* - * First save the IER then disable the interrupts - */ - ier = serial_port_in(port, UART_IER); - - if (up->capabilities & UART_CAP_UUE) - serial_port_out(port, UART_IER, UART_IER_UUE); - else - serial_port_out(port, UART_IER, 0); + ier = serial8250_clear_IER(up); /* check scratch reg to see if port powered off during system sleep */ if (up->canary && (up->canary != serial_port_in(port, UART_SCR))) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3192 @ void serial8250_console_write(struct uar up->canary = 0; } + atomic_inc(&up->console_printing); uart_console_write(port, s, count, serial8250_console_putchar); + atomic_dec(&up->console_printing); /* * Finally, wait for transmitter to become empty * and restore the IER */ wait_for_xmitr(up, BOTH_EMPTY); - serial_port_out(port, UART_IER, ier); + serial8250_set_IER(up, ier); /* * The receive handling will happen properly because the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3213 @ void serial8250_console_write(struct uar if (up->msr_saved_flags) serial8250_modem_status(up); - if (locked) - spin_unlock_irqrestore(&port->lock, flags); + spin_unlock_irqrestore(&port->lock, flags); serial8250_rpm_put(up); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3234 @ static unsigned int probe_baud(struct ua int serial8250_console_setup(struct uart_port *port, char *options, bool probe) { + struct uart_8250_port *up = up_to_u8250p(port); int baud = 9600; int bits = 8; int parity = 'n'; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3243 @ int serial8250_console_setup(struct uart if (!port->iobase && !port->membase) return -ENODEV; + atomic_set(&up->console_printing, 0); + if (options) uart_parse_options(options, &baud, &parity, &bits, &flow); else if (probe) Index: linux-5.4.5-rt3/drivers/tty/serial/amba-pl011.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/amba-pl011.c +++ linux-5.4.5-rt3/drivers/tty/serial/amba-pl011.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2212 @ pl011_console_write(struct console *co, { struct uart_amba_port *uap = amba_ports[co->index]; unsigned int old_cr = 0, new_cr; - unsigned long flags; + unsigned long flags = 0; int locked = 1; clk_enable(uap->clk); - local_irq_save(flags); + /* + * local_irq_save(flags); + * + * This local_irq_save() is nonsense. If we come in via sysrq + * handling then interrupts are already disabled. Aside of + * that the port.sysrq check is racy on SMP regardless. + */ if (uap->port.sysrq) locked = 0; else if (oops_in_progress) - locked = spin_trylock(&uap->port.lock); + locked = spin_trylock_irqsave(&uap->port.lock, flags); else - spin_lock(&uap->port.lock); + spin_lock_irqsave(&uap->port.lock, flags); /* * First save the CR then disable the interrupts @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2255 @ pl011_console_write(struct console *co, pl011_write(old_cr, uap, REG_CR); if (locked) - spin_unlock(&uap->port.lock); - local_irq_restore(flags); + spin_unlock_irqrestore(&uap->port.lock, flags); clk_disable(uap->clk); } Index: linux-5.4.5-rt3/drivers/tty/serial/omap-serial.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/tty/serial/omap-serial.c +++ linux-5.4.5-rt3/drivers/tty/serial/omap-serial.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1310 @ serial_omap_console_write(struct console pm_runtime_get_sync(up->dev); - local_irq_save(flags); - if (up->port.sysrq) - locked = 0; - else if (oops_in_progress) - locked = spin_trylock(&up->port.lock); + if (up->port.sysrq || oops_in_progress) + locked = spin_trylock_irqsave(&up->port.lock, flags); else - spin_lock(&up->port.lock); + spin_lock_irqsave(&up->port.lock, flags); /* * First save the IER then disable the interrupts @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1342 @ serial_omap_console_write(struct console pm_runtime_mark_last_busy(up->dev); pm_runtime_put_autosuspend(up->dev); if (locked) - spin_unlock(&up->port.lock); - local_irq_restore(flags); + spin_unlock_irqrestore(&up->port.lock, flags); } static int __init Index: linux-5.4.5-rt3/drivers/usb/gadget/function/f_fs.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/usb/gadget/function/f_fs.c +++ linux-5.4.5-rt3/drivers/usb/gadget/function/f_fs.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1718 @ static void ffs_data_put(struct ffs_data pr_info("%s(): freeing\n", __func__); ffs_data_clear(ffs); BUG_ON(waitqueue_active(&ffs->ev.waitq) || - waitqueue_active(&ffs->ep0req_completion.wait) || + swait_active(&ffs->ep0req_completion.wait) || waitqueue_active(&ffs->wait)); destroy_workqueue(ffs->io_completion_wq); kfree(ffs->dev_name); Index: linux-5.4.5-rt3/drivers/usb/gadget/legacy/inode.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/usb/gadget/legacy/inode.c +++ linux-5.4.5-rt3/drivers/usb/gadget/legacy/inode.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:347 @ ep_io (struct ep_data *epdata, void *buf spin_unlock_irq (&epdata->dev->lock); if (likely (value == 0)) { - value = wait_event_interruptible (done.wait, done.done); + value = swait_event_interruptible_exclusive(done.wait, done.done); if (value != 0) { spin_lock_irq (&epdata->dev->lock); if (likely (epdata->ep != NULL)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:356 @ ep_io (struct ep_data *epdata, void *buf usb_ep_dequeue (epdata->ep, epdata->req); spin_unlock_irq (&epdata->dev->lock); - wait_event (done.wait, done.done); + swait_event_exclusive(done.wait, done.done); if (epdata->status == -ECONNRESET) epdata->status = -EINTR; } else { Index: linux-5.4.5-rt3/drivers/video/backlight/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/drivers/video/backlight/Kconfig +++ linux-5.4.5-rt3/drivers/video/backlight/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:102 @ config LCD_TOSA config LCD_HP700 tristate "HP Jornada 700 series LCD Driver" - depends on SA1100_JORNADA720_SSP && !PREEMPT + depends on SA1100_JORNADA720_SSP && !PREEMPTION default y help If you have an HP Jornada 700 series handheld (710/720/728) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:231 @ config BACKLIGHT_HP680 config BACKLIGHT_HP700 tristate "HP Jornada 700 series Backlight Driver" - depends on SA1100_JORNADA720_SSP && !PREEMPT + depends on SA1100_JORNADA720_SSP && !PREEMPTION default y help If you have an HP Jornada 700 series, Index: linux-5.4.5-rt3/drivers/xen/preempt.c =================================================================== --- linux-5.4.5-rt3.orig/drivers/xen/preempt.c +++ linux-5.4.5-rt3/drivers/xen/preempt.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ #include <linux/sched.h> #include <xen/xen-ops.h> -#ifndef CONFIG_PREEMPT +#ifndef CONFIG_PREEMPTION /* * Some hypercalls issued by the toolstack can take many 10s of @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:40 @ asmlinkage __visible void xen_maybe_pree __this_cpu_write(xen_in_preemptible_hcall, true); } } -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ Index: linux-5.4.5-rt3/fs/afs/dir_silly.c =================================================================== --- linux-5.4.5-rt3.orig/fs/afs/dir_silly.c +++ linux-5.4.5-rt3/fs/afs/dir_silly.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:205 @ int afs_silly_iput(struct dentry *dentry struct dentry *alias; int ret; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); _enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode); Index: linux-5.4.5-rt3/fs/btrfs/volumes.h =================================================================== --- linux-5.4.5-rt3.orig/fs/btrfs/volumes.h +++ linux-5.4.5-rt3/fs/btrfs/volumes.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:182 @ btrfs_device_set_##name(struct btrfs_dev write_seqcount_end(&dev->data_seqcount); \ preempt_enable(); \ } -#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) +#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION) #define BTRFS_DEVICE_GETSET_FUNCS(name) \ static inline u64 \ btrfs_device_get_##name(const struct btrfs_device *dev) \ Index: linux-5.4.5-rt3/fs/buffer.c =================================================================== --- linux-5.4.5-rt3.orig/fs/buffer.c +++ linux-5.4.5-rt3/fs/buffer.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:278 @ static void end_buffer_async_read(struct * decide that the page is now completely done. */ first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + spin_lock_irqsave(&first->b_uptodate_lock, flags); clear_buffer_async_read(bh); unlock_buffer(bh); tmp = bh; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:291 @ static void end_buffer_async_read(struct } tmp = tmp->b_this_page; } while (tmp != bh); - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); /* * If none of the buffers had errors and they are all @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:303 @ static void end_buffer_async_read(struct return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); return; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:331 @ void end_buffer_async_write(struct buffe } first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + spin_lock_irqsave(&first->b_uptodate_lock, flags); clear_buffer_async_write(bh); unlock_buffer(bh); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:343 @ void end_buffer_async_write(struct buffe } tmp = tmp->b_this_page; } - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); end_page_writeback(page); return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); return; } EXPORT_SYMBOL(end_buffer_async_write); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1390 @ static bool has_bh_in_lru(int cpu, void void invalidate_bh_lrus(void) { - on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1, GFP_KERNEL); + on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1); } EXPORT_SYMBOL_GPL(invalidate_bh_lrus); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3342 @ struct buffer_head *alloc_buffer_head(gf struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags); if (ret) { INIT_LIST_HEAD(&ret->b_assoc_buffers); + spin_lock_init(&ret->b_uptodate_lock); preempt_disable(); __this_cpu_inc(bh_accounting.nr); recalc_bh_state(); Index: linux-5.4.5-rt3/fs/cifs/readdir.c =================================================================== --- linux-5.4.5-rt3.orig/fs/cifs/readdir.c +++ linux-5.4.5-rt3/fs/cifs/readdir.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:83 @ cifs_prime_dcache(struct dentry *parent, struct inode *inode; struct super_block *sb = parent->d_sb; struct cifs_sb_info *cifs_sb = CIFS_SB(sb); - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); cifs_dbg(FYI, "%s: for %s\n", __func__, name->name); Index: linux-5.4.5-rt3/fs/dcache.c =================================================================== --- linux-5.4.5-rt3.orig/fs/dcache.c +++ linux-5.4.5-rt3/fs/dcache.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2485 @ EXPORT_SYMBOL(d_rehash); static inline unsigned start_dir_add(struct inode *dir) { + preempt_disable_rt(); for (;;) { - unsigned n = dir->i_dir_seq; - if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n) + unsigned n = dir->__i_dir_seq; + if (!(n & 1) && cmpxchg(&dir->__i_dir_seq, n, n + 1) == n) return n; cpu_relax(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2496 @ static inline unsigned start_dir_add(str static inline void end_dir_add(struct inode *dir, unsigned n) { - smp_store_release(&dir->i_dir_seq, n + 2); + smp_store_release(&dir->__i_dir_seq, n + 2); + preempt_enable_rt(); } static void d_wait_lookup(struct dentry *dentry) { - if (d_in_lookup(dentry)) { - DECLARE_WAITQUEUE(wait, current); - add_wait_queue(dentry->d_wait, &wait); - do { - set_current_state(TASK_UNINTERRUPTIBLE); - spin_unlock(&dentry->d_lock); - schedule(); - spin_lock(&dentry->d_lock); - } while (d_in_lookup(dentry)); - } + struct swait_queue __wait; + + if (!d_in_lookup(dentry)) + return; + + INIT_LIST_HEAD(&__wait.task_list); + do { + prepare_to_swait_exclusive(dentry->d_wait, &__wait, TASK_UNINTERRUPTIBLE); + spin_unlock(&dentry->d_lock); + schedule(); + spin_lock(&dentry->d_lock); + } while (d_in_lookup(dentry)); + finish_swait(dentry->d_wait, &__wait); } struct dentry *d_alloc_parallel(struct dentry *parent, const struct qstr *name, - wait_queue_head_t *wq) + struct swait_queue_head *wq) { unsigned int hash = name->hash; struct hlist_bl_head *b = in_lookup_hash(parent, hash); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2533 @ struct dentry *d_alloc_parallel(struct d retry: rcu_read_lock(); - seq = smp_load_acquire(&parent->d_inode->i_dir_seq); + seq = smp_load_acquire(&parent->d_inode->__i_dir_seq); r_seq = read_seqbegin(&rename_lock); dentry = __d_lookup_rcu(parent, name, &d_seq); if (unlikely(dentry)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2561 @ retry: } hlist_bl_lock(b); - if (unlikely(READ_ONCE(parent->d_inode->i_dir_seq) != seq)) { + if (unlikely(READ_ONCE(parent->d_inode->__i_dir_seq) != seq)) { hlist_bl_unlock(b); rcu_read_unlock(); goto retry; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2634 @ void __d_lookup_done(struct dentry *dent hlist_bl_lock(b); dentry->d_flags &= ~DCACHE_PAR_LOOKUP; __hlist_bl_del(&dentry->d_u.d_in_lookup_hash); - wake_up_all(dentry->d_wait); + swake_up_all(dentry->d_wait); dentry->d_wait = NULL; hlist_bl_unlock(b); INIT_HLIST_NODE(&dentry->d_u.d_alias); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3147 @ __setup("dhash_entries=", set_dhash_entr static void __init dcache_init_early(void) { + unsigned int loop; + /* If hashes are distributed across NUMA nodes, defer * hash allocation until vmalloc space is available. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3165 @ static void __init dcache_init_early(voi NULL, 0, 0); + + for (loop = 0; loop < (1U << d_hash_shift); loop++) + INIT_HLIST_BL_HEAD(dentry_hashtable + loop); + d_hash_shift = 32 - d_hash_shift; } static void __init dcache_init(void) { + unsigned int loop; /* * A constructor could be added for stable state like the lists, * but it is probably not worth it because of the cache nature @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3198 @ static void __init dcache_init(void) NULL, 0, 0); + + for (loop = 0; loop < (1U << d_hash_shift); loop++) + INIT_HLIST_BL_HEAD(dentry_hashtable + loop); + d_hash_shift = 32 - d_hash_shift; } Index: linux-5.4.5-rt3/fs/eventpoll.c =================================================================== --- linux-5.4.5-rt3.orig/fs/eventpoll.c +++ linux-5.4.5-rt3/fs/eventpoll.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:570 @ static int ep_poll_wakeup_proc(void *pri static void ep_poll_safewake(wait_queue_head_t *wq) { - int this_cpu = get_cpu(); + int this_cpu = get_cpu_light(); ep_call_nested(&poll_safewake_ncalls, ep_poll_wakeup_proc, NULL, wq, (void *) (long) this_cpu); - put_cpu(); + put_cpu_light(); } #else Index: linux-5.4.5-rt3/fs/ext4/page-io.c =================================================================== --- linux-5.4.5-rt3.orig/fs/ext4/page-io.c +++ linux-5.4.5-rt3/fs/ext4/page-io.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:90 @ static void ext4_finish_bio(struct bio * } bh = head = page_buffers(page); /* - * We check all buffers in the page under BH_Uptodate_Lock + * We check all buffers in the page under b_uptodate_lock * to avoid races with other end io clearing async_write flags */ - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &head->b_state); + spin_lock_irqsave(&head->b_uptodate_lock, flags); do { if (bh_offset(bh) < bio_start || bh_offset(bh) + bh->b_size > bio_end) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:105 @ static void ext4_finish_bio(struct bio * if (bio->bi_status) buffer_io_error(bh); } while ((bh = bh->b_this_page) != head); - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&head->b_uptodate_lock, flags); if (!under_io) { fscrypt_free_bounce_page(bounce_page); end_page_writeback(page); Index: linux-5.4.5-rt3/fs/fscache/cookie.c =================================================================== --- linux-5.4.5-rt3.orig/fs/fscache/cookie.c +++ linux-5.4.5-rt3/fs/fscache/cookie.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:961 @ inconsistent: return -ESTALE; } EXPORT_SYMBOL(__fscache_check_consistency); + +void __init fscache_cookie_init(void) +{ + int i; + + for (i = 0; i < (1 << fscache_cookie_hash_shift) - 1; i++) + INIT_HLIST_BL_HEAD(&fscache_cookie_hash[i]); +} Index: linux-5.4.5-rt3/fs/fscache/main.c =================================================================== --- linux-5.4.5-rt3.orig/fs/fscache/main.c +++ linux-5.4.5-rt3/fs/fscache/main.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:148 @ static int __init fscache_init(void) ret = -ENOMEM; goto error_cookie_jar; } + fscache_cookie_init(); fscache_root = kobject_create_and_add("fscache", kernel_kobj); if (!fscache_root) Index: linux-5.4.5-rt3/fs/fuse/readdir.c =================================================================== --- linux-5.4.5-rt3.orig/fs/fuse/readdir.c +++ linux-5.4.5-rt3/fs/fuse/readdir.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:161 @ static int fuse_direntplus_link(struct f struct inode *dir = d_inode(parent); struct fuse_conn *fc; struct inode *inode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); if (!o->nodeid) { /* Index: linux-5.4.5-rt3/fs/inode.c =================================================================== --- linux-5.4.5-rt3.orig/fs/inode.c +++ linux-5.4.5-rt3/fs/inode.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:159 @ int inode_init_always(struct super_block inode->i_bdev = NULL; inode->i_cdev = NULL; inode->i_link = NULL; - inode->i_dir_seq = 0; + inode->__i_dir_seq = 0; inode->i_rdev = 0; inode->dirtied_when = 0; Index: linux-5.4.5-rt3/fs/jbd2/commit.c =================================================================== --- linux-5.4.5-rt3.orig/fs/jbd2/commit.c +++ linux-5.4.5-rt3/fs/jbd2/commit.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:485 @ void jbd2_journal_commit_transaction(jou if (jh->b_committed_data) { struct buffer_head *bh = jh2bh(jh); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); jbd2_free(jh->b_committed_data, bh->b_size); jh->b_committed_data = NULL; - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); } jbd2_journal_refile_buffer(journal, jh); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:923 @ restart_loop: transaction_t *cp_transaction; struct buffer_head *bh; int try_to_free = 0; + bool drop_ref; jh = commit_transaction->t_forget; spin_unlock(&journal->j_list_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:933 @ restart_loop: * done with it. */ get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); J_ASSERT_JH(jh, jh->b_transaction == commit_transaction); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1028 @ restart_loop: try_to_free = 1; } JBUFFER_TRACE(jh, "refile or unfile buffer"); - __jbd2_journal_refile_buffer(jh); - jbd_unlock_bh_state(bh); + drop_ref = __jbd2_journal_refile_buffer(jh); + spin_unlock(&jh->b_state_lock); + if (drop_ref) + jbd2_journal_put_journal_head(jh); if (try_to_free) release_buffer_page(bh); /* Drops bh reference */ else Index: linux-5.4.5-rt3/fs/jbd2/journal.c =================================================================== --- linux-5.4.5-rt3.orig/fs/jbd2/journal.c +++ linux-5.4.5-rt3/fs/jbd2/journal.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:366 @ int jbd2_journal_write_metadata_buffer(t /* keep subsequent assertions sane */ atomic_set(&new_bh->b_count, 1); - jbd_lock_bh_state(bh_in); + spin_lock(&jh_in->b_state_lock); repeat: /* * If a new transaction has already done a buffer copy-out, then @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:408 @ repeat: if (need_copy_out && !done_copy_out) { char *tmp; - jbd_unlock_bh_state(bh_in); + spin_unlock(&jh_in->b_state_lock); tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS); if (!tmp) { brelse(new_bh); return -ENOMEM; } - jbd_lock_bh_state(bh_in); + spin_lock(&jh_in->b_state_lock); if (jh_in->b_frozen_data) { jbd2_free(tmp, bh_in->b_size); goto repeat; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:467 @ repeat: __jbd2_journal_file_buffer(jh_in, transaction, BJ_Shadow); spin_unlock(&journal->j_list_lock); set_buffer_shadow(bh_in); - jbd_unlock_bh_state(bh_in); + spin_unlock(&jh_in->b_state_lock); return do_escape | (done_copy_out << 1); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2413 @ static struct journal_head *journal_allo ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS | __GFP_NOFAIL); } + if (ret) + spin_lock_init(&ret->b_state_lock); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2534 @ static void __journal_remove_journal_hea J_ASSERT_BH(bh, buffer_jbd(bh)); J_ASSERT_BH(bh, jh2bh(jh) == bh); BUFFER_TRACE(bh, "remove journal_head"); + + /* Unlink before dropping the lock */ + bh->b_private = NULL; + jh->b_bh = NULL; /* debug, really */ + clear_buffer_jbd(bh); +} + +static void journal_release_journal_head(struct journal_head *jh, size_t b_size) +{ if (jh->b_frozen_data) { printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__); - jbd2_free(jh->b_frozen_data, bh->b_size); + jbd2_free(jh->b_frozen_data, b_size); } if (jh->b_committed_data) { printk(KERN_WARNING "%s: freeing b_committed_data\n", __func__); - jbd2_free(jh->b_committed_data, bh->b_size); + jbd2_free(jh->b_committed_data, b_size); } - bh->b_private = NULL; - jh->b_bh = NULL; /* debug, really */ - clear_buffer_jbd(bh); journal_free_journal_head(jh); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2568 @ void jbd2_journal_put_journal_head(struc if (!jh->b_jcount) { __journal_remove_journal_head(bh); jbd_unlock_bh_journal_head(bh); + journal_release_journal_head(jh, bh->b_size); __brelse(bh); - } else + } else { jbd_unlock_bh_journal_head(bh); + } } /* Index: linux-5.4.5-rt3/fs/jbd2/transaction.c =================================================================== --- linux-5.4.5-rt3.orig/fs/jbd2/transaction.c +++ linux-5.4.5-rt3/fs/jbd2/transaction.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:882 @ repeat: start_lock = jiffies; lock_buffer(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); /* If it takes too long to lock the buffer, trace it */ time_lock = jbd2_time_diff(start_lock, jiffies); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:932 @ repeat: error = -EROFS; if (is_handle_aborted(handle)) { - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); goto out; } error = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:996 @ repeat: */ if (buffer_shadow(bh)) { JBUFFER_TRACE(jh, "on shadow: sleep"); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE); goto repeat; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1017 @ repeat: JBUFFER_TRACE(jh, "generate frozen data"); if (!frozen_buffer) { JBUFFER_TRACE(jh, "allocate memory for buffer"); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); frozen_buffer = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL); goto repeat; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1036 @ attach_next: jh->b_next_transaction = transaction; done: - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); /* * If we are about to journal a buffer, then any revoke pending on it is @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1175 @ int jbd2_journal_get_create_access(handl * that case: the transaction must have deleted the buffer for it to be * reused here. */ - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); J_ASSERT_JH(jh, (jh->b_transaction == transaction || jh->b_transaction == NULL || (jh->b_transaction == journal->j_committing_transaction && @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1210 @ int jbd2_journal_get_create_access(handl jh->b_next_transaction = transaction; spin_unlock(&journal->j_list_lock); } - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); /* * akpm: I added this. ext3_alloc_branch can pick up new indirect @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1278 @ repeat: committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS|__GFP_NOFAIL); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); if (!jh->b_committed_data) { /* Copy out the current buffer contents into the * preserved, committed copy. */ JBUFFER_TRACE(jh, "generate b_committed data"); if (!committed_data) { - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); goto repeat; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1292 @ repeat: committed_data = NULL; memcpy(jh->b_committed_data, bh->b_data, bh->b_size); } - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); out: jbd2_journal_put_journal_head(jh); if (unlikely(committed_data)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1393 @ int jbd2_journal_dirty_metadata(handle_t */ if (jh->b_transaction != transaction && jh->b_next_transaction != transaction) { - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); J_ASSERT_JH(jh, jh->b_transaction == transaction || jh->b_next_transaction == transaction); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); } if (jh->b_modified == 1) { /* If it's in our transaction it must be in BJ_Metadata list. */ if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) { - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) pr_err("JBD2: assertion failure: h_type=%u " @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1412 @ int jbd2_journal_dirty_metadata(handle_t jh->b_jlist); J_ASSERT_JH(jh, jh->b_transaction != transaction || jh->b_jlist == BJ_Metadata); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); } goto out; } journal = transaction->t_journal; - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); if (jh->b_modified == 0) { /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1504 @ int jbd2_journal_dirty_metadata(handle_t __jbd2_journal_file_buffer(jh, transaction, BJ_Metadata); spin_unlock(&journal->j_list_lock); out_unlock_bh: - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); out: JBUFFER_TRACE(jh, "exit"); return ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1542 @ int jbd2_journal_forget (handle_t *handl BUFFER_TRACE(bh, "entry"); - jbd_lock_bh_state(bh); + jh = jbd2_journal_grab_journal_head(bh); + if (!jh) { + __bforget(bh); + return 0; + } - if (!buffer_jbd(bh)) - goto not_jbd; - jh = bh2jh(bh); + spin_lock(&jh->b_state_lock); /* Critical error: attempting to delete a bitmap buffer, maybe? * Don't do any jbd operations, and return an error. */ if (!J_EXPECT_JH(jh, !jh->b_committed_data, "inconsistent data on disk")) { err = -EIO; - goto not_jbd; + goto drop; } /* keep track of whether or not this transaction modified us */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1603 @ int jbd2_journal_forget (handle_t *handl __jbd2_journal_file_buffer(jh, transaction, BJ_Forget); } else { __jbd2_journal_unfile_buffer(jh); - if (!buffer_jbd(bh)) { - spin_unlock(&journal->j_list_lock); - goto not_jbd; - } + jbd2_journal_put_journal_head(jh); } spin_unlock(&journal->j_list_lock); } else if (jh->b_transaction) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1645 @ int jbd2_journal_forget (handle_t *handl if (!jh->b_cp_transaction) { JBUFFER_TRACE(jh, "belongs to none transaction"); spin_unlock(&journal->j_list_lock); - goto not_jbd; + goto drop; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1655 @ int jbd2_journal_forget (handle_t *handl if (!buffer_dirty(bh)) { __jbd2_journal_remove_checkpoint(jh); spin_unlock(&journal->j_list_lock); - goto not_jbd; + goto drop; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1668 @ int jbd2_journal_forget (handle_t *handl __jbd2_journal_file_buffer(jh, transaction, BJ_Forget); spin_unlock(&journal->j_list_lock); } - - jbd_unlock_bh_state(bh); - __brelse(bh); drop: + __brelse(bh); + spin_unlock(&jh->b_state_lock); + jbd2_journal_put_journal_head(jh); if (drop_reserve) { /* no need to reserve log space for this block -bzzz */ handle->h_buffer_credits++; } return err; - -not_jbd: - jbd_unlock_bh_state(bh); - __bforget(bh); - goto drop; } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1875 @ free_and_exit: * * j_list_lock is held. * - * jbd_lock_bh_state(jh2bh(jh)) is held. + * jh->b_state_lock is held. */ static inline void @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1899 @ __blist_add_buffer(struct journal_head * * * Called with j_list_lock held, and the journal may not be locked. * - * jbd_lock_bh_state(jh2bh(jh)) is held. + * jh->b_state_lock is held. */ static inline void @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1931 @ static void __jbd2_journal_temp_unlink_b transaction_t *transaction; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + lockdep_assert_held(&jh->b_state_lock); transaction = jh->b_transaction; if (transaction) assert_spin_locked(&transaction->t_journal->j_list_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1968 @ static void __jbd2_journal_temp_unlink_b } /* - * Remove buffer from all transactions. + * Remove buffer from all transactions. The caller is responsible for dropping + * the jh reference that belonged to the transaction. * * Called with bh_state lock and j_list_lock - * - * jh and bh may be already freed when this function returns. */ static void __jbd2_journal_unfile_buffer(struct journal_head *jh) { __jbd2_journal_temp_unlink_buffer(jh); jh->b_transaction = NULL; - jbd2_journal_put_journal_head(jh); } void jbd2_journal_unfile_buffer(journal_t *journal, struct journal_head *jh) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1985 @ void jbd2_journal_unfile_buffer(journal_ /* Get reference so that buffer cannot be freed before we unlock it */ get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); spin_lock(&journal->j_list_lock); __jbd2_journal_unfile_buffer(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); + jbd2_journal_put_journal_head(jh); __brelse(bh); } /* * Called from jbd2_journal_try_to_free_buffers(). * - * Called under jbd_lock_bh_state(bh) + * Called under jh->b_state_lock */ static void __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2084 @ int jbd2_journal_try_to_free_buffers(jou if (!jh) continue; - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); __journal_try_to_free_buffer(journal, bh); + spin_unlock(&jh->b_state_lock); jbd2_journal_put_journal_head(jh); - jbd_unlock_bh_state(bh); if (buffer_jbd(bh)) goto busy; } while ((bh = bh->b_this_page) != head); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2108 @ busy: * * Called under j_list_lock. * - * Called under jbd_lock_bh_state(bh). + * Called under jh->b_state_lock. */ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2129 @ static int __dispose_buffer(struct journ } else { JBUFFER_TRACE(jh, "on running transaction"); __jbd2_journal_unfile_buffer(jh); + jbd2_journal_put_journal_head(jh); } return may_free; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2196 @ static int journal_unmap_buffer(journal_ * holding the page lock. --sct */ - if (!buffer_jbd(bh)) + jh = jbd2_journal_grab_journal_head(bh); + if (!jh) goto zap_buffer_unlocked; /* OK, we have data buffer in journaled mode */ write_lock(&journal->j_state_lock); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); spin_lock(&journal->j_list_lock); - jh = jbd2_journal_grab_journal_head(bh); - if (!jh) - goto zap_buffer_no_jh; - /* * We cannot remove the buffer from checkpoint lists until the * transaction adding inode to orphan list (let's call it T) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2283 @ static int journal_unmap_buffer(journal_ * for commit and try again. */ if (partial_page) { - jbd2_journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); write_unlock(&journal->j_state_lock); + jbd2_journal_put_journal_head(jh); return -EBUSY; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2298 @ static int journal_unmap_buffer(journal_ set_buffer_freed(bh); if (journal->j_running_transaction && buffer_jbddirty(bh)) jh->b_next_transaction = journal->j_running_transaction; - jbd2_journal_put_journal_head(jh); spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); write_unlock(&journal->j_state_lock); + jbd2_journal_put_journal_head(jh); return 0; } else { /* Good, the buffer belongs to the running transaction. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2325 @ zap_buffer: * here. */ jh->b_modified = 0; - jbd2_journal_put_journal_head(jh); -zap_buffer_no_jh: spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); + spin_unlock(&jh->b_state_lock); write_unlock(&journal->j_state_lock); + jbd2_journal_put_journal_head(jh); zap_buffer_unlocked: clear_buffer_dirty(bh); J_ASSERT_BH(bh, !buffer_jbddirty(bh)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2415 @ void __jbd2_journal_file_buffer(struct j int was_dirty = 0; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + lockdep_assert_held(&jh->b_state_lock); assert_spin_locked(&transaction->t_journal->j_list_lock); J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2477 @ void __jbd2_journal_file_buffer(struct j void jbd2_journal_file_buffer(struct journal_head *jh, transaction_t *transaction, int jlist) { - jbd_lock_bh_state(jh2bh(jh)); + spin_lock(&jh->b_state_lock); spin_lock(&transaction->t_journal->j_list_lock); __jbd2_journal_file_buffer(jh, transaction, jlist); spin_unlock(&transaction->t_journal->j_list_lock); - jbd_unlock_bh_state(jh2bh(jh)); + spin_unlock(&jh->b_state_lock); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2491 @ void jbd2_journal_file_buffer(struct jou * buffer on that transaction's metadata list. * * Called under j_list_lock - * Called under jbd_lock_bh_state(jh2bh(jh)) + * Called under jh->b_state_lock * - * jh and bh may be already free when this function returns + * When this function returns true, there's no next transaction to refile to + * and the caller has to drop jh reference through + * jbd2_journal_put_journal_head(). */ -void __jbd2_journal_refile_buffer(struct journal_head *jh) +bool __jbd2_journal_refile_buffer(struct journal_head *jh) { int was_dirty, jlist; struct buffer_head *bh = jh2bh(jh); - J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh)); + lockdep_assert_held(&jh->b_state_lock); if (jh->b_transaction) assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock); /* If the buffer is now unused, just drop it. */ if (jh->b_next_transaction == NULL) { __jbd2_journal_unfile_buffer(jh); - return; + return true; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2537 @ void __jbd2_journal_refile_buffer(struct if (was_dirty) set_buffer_jbddirty(bh); + return false; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2548 @ void __jbd2_journal_refile_buffer(struct */ void jbd2_journal_refile_buffer(journal_t *journal, struct journal_head *jh) { - struct buffer_head *bh = jh2bh(jh); + bool drop; - /* Get reference so that buffer cannot be freed before we unlock it */ - get_bh(bh); - jbd_lock_bh_state(bh); + spin_lock(&jh->b_state_lock); spin_lock(&journal->j_list_lock); - __jbd2_journal_refile_buffer(jh); - jbd_unlock_bh_state(bh); + drop = __jbd2_journal_refile_buffer(jh); + spin_unlock(&jh->b_state_lock); spin_unlock(&journal->j_list_lock); - __brelse(bh); + if (drop) + jbd2_journal_put_journal_head(jh); } /* Index: linux-5.4.5-rt3/fs/namei.c =================================================================== --- linux-5.4.5-rt3.orig/fs/namei.c +++ linux-5.4.5-rt3/fs/namei.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1641 @ static struct dentry *__lookup_slow(cons { struct dentry *dentry, *old; struct inode *inode = dir->d_inode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); /* Don't go there if it's already dead */ if (unlikely(IS_DEADDIR(inode))) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3129 @ static int lookup_open(struct nameidata struct dentry *dentry; int error, create_error = 0; umode_t mode = op->mode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); if (unlikely(IS_DEADDIR(dir_inode))) return -ENOENT; Index: linux-5.4.5-rt3/fs/namespace.c =================================================================== --- linux-5.4.5-rt3.orig/fs/namespace.c +++ linux-5.4.5-rt3/fs/namespace.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:17 @ #include <linux/mnt_namespace.h> #include <linux/user_namespace.h> #include <linux/namei.h> +#include <linux/delay.h> #include <linux/security.h> #include <linux/cred.h> #include <linux/idr.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:325 @ int __mnt_want_write(struct vfsmount *m) * incremented count after it has set MNT_WRITE_HOLD. */ smp_mb(); - while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) - cpu_relax(); + while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) { + preempt_enable(); + cpu_chill(); + preempt_disable(); + } /* * After the slowpath clears MNT_WRITE_HOLD, mnt_is_readonly will * be set to match its requirements. So we must not load that until Index: linux-5.4.5-rt3/fs/nfs/delegation.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/delegation.c +++ linux-5.4.5-rt3/fs/nfs/delegation.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:165 @ again: sp = state->owner; /* Block nfs4_proc_unlck */ mutex_lock(&sp->so_delegreturn_mutex); - seq = raw_seqcount_begin(&sp->so_reclaim_seqcount); + seq = read_seqbegin(&sp->so_reclaim_seqlock); err = nfs4_open_delegation_recall(ctx, state, stateid); if (!err) err = nfs_delegation_claim_locks(state, stateid); - if (!err && read_seqcount_retry(&sp->so_reclaim_seqcount, seq)) + if (!err && read_seqretry(&sp->so_reclaim_seqlock, seq)) err = -EAGAIN; mutex_unlock(&sp->so_delegreturn_mutex); put_nfs_open_context(ctx); Index: linux-5.4.5-rt3/fs/nfs/dir.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/dir.c +++ linux-5.4.5-rt3/fs/nfs/dir.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:448 @ static void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry) { struct qstr filename = QSTR_INIT(entry->name, entry->len); - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); struct dentry *dentry; struct dentry *alias; struct inode *dir = d_inode(parent); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1499 @ int nfs_atomic_open(struct inode *dir, s struct file *file, unsigned open_flags, umode_t mode) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); struct nfs_open_context *ctx; struct dentry *res; struct iattr attr = { .ia_valid = ATTR_OPEN }; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1828 @ int nfs_rmdir(struct inode *dir, struct trace_nfs_rmdir_enter(dir, dentry); if (d_really_is_positive(dentry)) { +#ifdef CONFIG_PREEMPT_RT + down(&NFS_I(d_inode(dentry))->rmdir_sem); +#else down_write(&NFS_I(d_inode(dentry))->rmdir_sem); +#endif error = NFS_PROTO(dir)->rmdir(dir, &dentry->d_name); /* Ensure the VFS deletes this inode */ switch (error) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1842 @ int nfs_rmdir(struct inode *dir, struct case -ENOENT: nfs_dentry_handle_enoent(dentry); } +#ifdef CONFIG_PREEMPT_RT + up(&NFS_I(d_inode(dentry))->rmdir_sem); +#else up_write(&NFS_I(d_inode(dentry))->rmdir_sem); +#endif } else error = NFS_PROTO(dir)->rmdir(dir, &dentry->d_name); trace_nfs_rmdir_exit(dir, dentry, error); Index: linux-5.4.5-rt3/fs/nfs/inode.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/inode.c +++ linux-5.4.5-rt3/fs/nfs/inode.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2108 @ static void init_once(void *foo) atomic_long_set(&nfsi->nrequests, 0); atomic_long_set(&nfsi->commit_info.ncommit, 0); atomic_set(&nfsi->commit_info.rpcs_out, 0); +#ifdef CONFIG_PREEMPT_RT + sema_init(&nfsi->rmdir_sem, 1); +#else init_rwsem(&nfsi->rmdir_sem); +#endif mutex_init(&nfsi->commit_mutex); nfs4_init_once(nfsi); } Index: linux-5.4.5-rt3/fs/nfs/nfs4_fs.h =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/nfs4_fs.h +++ linux-5.4.5-rt3/fs/nfs/nfs4_fs.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:118 @ struct nfs4_state_owner { unsigned long so_flags; struct list_head so_states; struct nfs_seqid_counter so_seqid; - seqcount_t so_reclaim_seqcount; + seqlock_t so_reclaim_seqlock; struct mutex so_delegreturn_mutex; }; Index: linux-5.4.5-rt3/fs/nfs/nfs4proc.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/nfs4proc.c +++ linux-5.4.5-rt3/fs/nfs/nfs4proc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2968 @ static int _nfs4_open_and_get_state(stru unsigned int seq; int ret; - seq = raw_seqcount_begin(&sp->so_reclaim_seqcount); + seq = raw_seqcount_begin(&sp->so_reclaim_seqlock.seqcount); ret = _nfs4_proc_open(opendata, ctx); if (ret != 0) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3010 @ static int _nfs4_open_and_get_state(stru if (d_inode(dentry) == state->inode) { nfs_inode_attach_open_context(ctx); - if (read_seqcount_retry(&sp->so_reclaim_seqcount, seq)) + if (read_seqretry(&sp->so_reclaim_seqlock, seq)) nfs4_schedule_stateid_recovery(server, state); } Index: linux-5.4.5-rt3/fs/nfs/nfs4state.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/nfs4state.c +++ linux-5.4.5-rt3/fs/nfs/nfs4state.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:513 @ nfs4_alloc_state_owner(struct nfs_server nfs4_init_seqid_counter(&sp->so_seqid); atomic_set(&sp->so_count, 1); INIT_LIST_HEAD(&sp->so_lru); - seqcount_init(&sp->so_reclaim_seqcount); + seqlock_init(&sp->so_reclaim_seqlock); mutex_init(&sp->so_delegreturn_mutex); return sp; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1621 @ static int nfs4_reclaim_open_state(struc * recovering after a network partition or a reboot from a * server that doesn't support a grace period. */ +#ifdef CONFIG_PREEMPT_RT + write_seqlock(&sp->so_reclaim_seqlock); +#else + write_seqcount_begin(&sp->so_reclaim_seqlock.seqcount); +#endif spin_lock(&sp->so_lock); - raw_write_seqcount_begin(&sp->so_reclaim_seqcount); restart: list_for_each_entry(state, &sp->so_states, open_states) { if (!test_and_clear_bit(ops->state_flag_bit, &state->flags)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1687 @ restart: spin_lock(&sp->so_lock); goto restart; } - raw_write_seqcount_end(&sp->so_reclaim_seqcount); spin_unlock(&sp->so_lock); +#ifdef CONFIG_PREEMPT_RT + write_sequnlock(&sp->so_reclaim_seqlock); +#else + write_seqcount_end(&sp->so_reclaim_seqlock.seqcount); +#endif return 0; out_err: nfs4_put_open_state(state); - spin_lock(&sp->so_lock); - raw_write_seqcount_end(&sp->so_reclaim_seqcount); - spin_unlock(&sp->so_lock); +#ifdef CONFIG_PREEMPT_RT + write_sequnlock(&sp->so_reclaim_seqlock); +#else + write_seqcount_end(&sp->so_reclaim_seqlock.seqcount); +#endif return status; } Index: linux-5.4.5-rt3/fs/nfs/unlink.c =================================================================== --- linux-5.4.5-rt3.orig/fs/nfs/unlink.c +++ linux-5.4.5-rt3/fs/nfs/unlink.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:16 @ #include <linux/sunrpc/clnt.h> #include <linux/nfs_fs.h> #include <linux/sched.h> -#include <linux/wait.h> +#include <linux/swait.h> #include <linux/namei.h> #include <linux/fsnotify.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:56 @ static void nfs_async_unlink_done(struct rpc_restart_call_prepare(task); } +#ifdef CONFIG_PREEMPT_RT +static void nfs_down_anon(struct semaphore *sema) +{ + down(sema); +} + +static void nfs_up_anon(struct semaphore *sema) +{ + up(sema); +} + +#else +static void nfs_down_anon(struct rw_semaphore *rwsem) +{ + down_read_non_owner(rwsem); +} + +static void nfs_up_anon(struct rw_semaphore *rwsem) +{ + up_read_non_owner(rwsem); +} +#endif + /** * nfs_async_unlink_release - Release the sillydelete data. * @calldata: struct nfs_unlinkdata to release @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:92 @ static void nfs_async_unlink_release(voi struct dentry *dentry = data->dentry; struct super_block *sb = dentry->d_sb; - up_read_non_owner(&NFS_I(d_inode(dentry->d_parent))->rmdir_sem); + nfs_up_anon(&NFS_I(d_inode(dentry->d_parent))->rmdir_sem); d_lookup_done(dentry); nfs_free_unlinkdata(data); dput(dentry); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:145 @ static int nfs_call_unlink(struct dentry struct inode *dir = d_inode(dentry->d_parent); struct dentry *alias; - down_read_non_owner(&NFS_I(dir)->rmdir_sem); + nfs_down_anon(&NFS_I(dir)->rmdir_sem); alias = d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq); if (IS_ERR(alias)) { - up_read_non_owner(&NFS_I(dir)->rmdir_sem); + nfs_up_anon(&NFS_I(dir)->rmdir_sem); return 0; } if (!d_in_lookup(alias)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:170 @ static int nfs_call_unlink(struct dentry ret = 0; spin_unlock(&alias->d_lock); dput(alias); - up_read_non_owner(&NFS_I(dir)->rmdir_sem); + nfs_up_anon(&NFS_I(dir)->rmdir_sem); /* * If we'd displaced old cached devname, free it. At that * point dentry is definitely not a root, so we won't need @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:206 @ nfs_async_unlink(struct dentry *dentry, data->cred = get_current_cred(); data->res.dir_attr = &data->dir_attr; - init_waitqueue_head(&data->wq); + init_swait_queue_head(&data->wq); status = -EBUSY; spin_lock(&dentry->d_lock); Index: linux-5.4.5-rt3/fs/ntfs/aops.c =================================================================== --- linux-5.4.5-rt3.orig/fs/ntfs/aops.c +++ linux-5.4.5-rt3/fs/ntfs/aops.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:95 @ static void ntfs_end_buffer_async_read(s "0x%llx.", (unsigned long long)bh->b_blocknr); } first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + spin_lock_irqsave(&first->b_uptodate_lock, flags); clear_buffer_async_read(bh); unlock_buffer(bh); tmp = bh; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:110 @ static void ntfs_end_buffer_async_read(s } tmp = tmp->b_this_page; } while (tmp != bh); - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); /* * If none of the buffers had errors then we can set the page uptodate, * but we first have to perform the post read mst fixups, if the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:143 @ static void ntfs_end_buffer_async_read(s unlock_page(page); return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->b_uptodate_lock, flags); return; } Index: linux-5.4.5-rt3/fs/ocfs2/suballoc.c =================================================================== --- linux-5.4.5-rt3.orig/fs/ocfs2/suballoc.c +++ linux-5.4.5-rt3/fs/ocfs2/suballoc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1255 @ static int ocfs2_test_bg_bit_allocatable int nr) { struct ocfs2_group_desc *bg = (struct ocfs2_group_desc *) bg_bh->b_data; + struct journal_head *jh; int ret; if (ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1264 @ static int ocfs2_test_bg_bit_allocatable if (!buffer_jbd(bg_bh)) return 1; - jbd_lock_bh_state(bg_bh); - bg = (struct ocfs2_group_desc *) bh2jh(bg_bh)->b_committed_data; + jh = bh2jh(bg_bh); + spin_lock(&jh->b_state_lock); + bg = (struct ocfs2_group_desc *) jh->b_committed_data; if (bg) ret = !ocfs2_test_bit(nr, (unsigned long *)bg->bg_bitmap); else ret = 1; - jbd_unlock_bh_state(bg_bh); + spin_unlock(&jh->b_state_lock); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2392 @ static int ocfs2_block_group_clear_bits( int status; unsigned int tmp; struct ocfs2_group_desc *undo_bg = NULL; + struct journal_head *jh; /* The caller got this descriptor from * ocfs2_read_group_descriptor(). Any corruption is a code bug. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2411 @ static int ocfs2_block_group_clear_bits( goto bail; } + jh = bh2jh(group_bh); if (undo_fn) { - jbd_lock_bh_state(group_bh); - undo_bg = (struct ocfs2_group_desc *) - bh2jh(group_bh)->b_committed_data; + spin_lock(&jh->b_state_lock); + undo_bg = (struct ocfs2_group_desc *) jh->b_committed_data; BUG_ON(!undo_bg); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2429 @ static int ocfs2_block_group_clear_bits( le16_add_cpu(&bg->bg_free_bits_count, num_bits); if (le16_to_cpu(bg->bg_free_bits_count) > le16_to_cpu(bg->bg_bits)) { if (undo_fn) - jbd_unlock_bh_state(group_bh); + spin_unlock(&jh->b_state_lock); return ocfs2_error(alloc_inode->i_sb, "Group descriptor # %llu has bit count %u but claims %u are freed. num_bits %d\n", (unsigned long long)le64_to_cpu(bg->bg_blkno), le16_to_cpu(bg->bg_bits), @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2438 @ static int ocfs2_block_group_clear_bits( } if (undo_fn) - jbd_unlock_bh_state(group_bh); + spin_unlock(&jh->b_state_lock); ocfs2_journal_dirty(handle, group_bh); bail: Index: linux-5.4.5-rt3/fs/proc/base.c =================================================================== --- linux-5.4.5-rt3.orig/fs/proc/base.c +++ linux-5.4.5-rt3/fs/proc/base.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1894 @ bool proc_fill_cache(struct file *file, child = d_hash_and_lookup(dir, &qname); if (!child) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); child = d_alloc_parallel(dir, &qname, &wq); if (IS_ERR(child)) goto end_instantiate; Index: linux-5.4.5-rt3/fs/proc/kmsg.c =================================================================== --- linux-5.4.5-rt3.orig/fs/proc/kmsg.c +++ linux-5.4.5-rt3/fs/proc/kmsg.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #include <linux/uaccess.h> #include <asm/io.h> -extern wait_queue_head_t log_wait; - static int kmsg_open(struct inode * inode, struct file * file) { return do_syslog(SYSLOG_ACTION_OPEN, NULL, 0, SYSLOG_FROM_PROC); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:43 @ static ssize_t kmsg_read(struct file *fi static __poll_t kmsg_poll(struct file *file, poll_table *wait) { - poll_wait(file, &log_wait, wait); + poll_wait(file, printk_wait_queue(), wait); if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_PROC)) return EPOLLIN | EPOLLRDNORM; return 0; Index: linux-5.4.5-rt3/fs/proc/proc_sysctl.c =================================================================== --- linux-5.4.5-rt3.orig/fs/proc/proc_sysctl.c +++ linux-5.4.5-rt3/fs/proc/proc_sysctl.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:705 @ static bool proc_sys_fill_cache(struct f child = d_lookup(dir, &qname); if (!child) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + DECLARE_SWAIT_QUEUE_HEAD_ONSTACK(wq); child = d_alloc_parallel(dir, &qname, &wq); if (IS_ERR(child)) return false; Index: linux-5.4.5-rt3/fs/squashfs/decompressor_multi_percpu.c =================================================================== --- linux-5.4.5-rt3.orig/fs/squashfs/decompressor_multi_percpu.c +++ linux-5.4.5-rt3/fs/squashfs/decompressor_multi_percpu.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ #include <linux/slab.h> #include <linux/percpu.h> #include <linux/buffer_head.h> +#include <linux/locallock.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:27 @ struct squashfs_stream { void *stream; }; +static DEFINE_LOCAL_IRQ_LOCK(stream_lock); + void *squashfs_decompressor_create(struct squashfs_sb_info *msblk, void *comp_opts) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:83 @ int squashfs_decompress(struct squashfs_ { struct squashfs_stream __percpu *percpu = (struct squashfs_stream __percpu *) msblk->stream; - struct squashfs_stream *stream = get_cpu_ptr(percpu); - int res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, - offset, length, output); - put_cpu_ptr(stream); + struct squashfs_stream *stream; + int res; + + stream = get_locked_ptr(stream_lock, percpu); + + res = msblk->decompressor->decompress(msblk, stream->stream, bh, b, + offset, length, output); + + put_locked_ptr(stream_lock, stream); if (res < 0) ERROR("%s decompression failed, data probably corrupt\n", Index: linux-5.4.5-rt3/fs/stack.c =================================================================== --- linux-5.4.5-rt3.orig/fs/stack.c +++ linux-5.4.5-rt3/fs/stack.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:26 @ void fsstack_copy_inode_size(struct inod /* * But on 32-bit, we ought to make an effort to keep the two halves of - * i_blocks in sync despite SMP or PREEMPT - though stat's + * i_blocks in sync despite SMP or PREEMPTION - though stat's * generic_fillattr() doesn't bother, and we won't be applying quotas * (where i_blocks does become important) at the upper level. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:41 @ void fsstack_copy_inode_size(struct inod spin_unlock(&src->i_lock); /* - * If CONFIG_SMP or CONFIG_PREEMPT on 32-bit, it's vital for + * If CONFIG_SMP or CONFIG_PREEMPTION on 32-bit, it's vital for * fsstack_copy_inode_size() to hold some lock around * i_size_write(), otherwise i_size_read() may spin forever (see * include/linux/fs.h). We don't necessarily hold i_mutex when this * is called, so take i_lock for that case. * * And if on 32-bit, continue our effort to keep the two halves of - * i_blocks in sync despite SMP or PREEMPT: use i_lock for that case + * i_blocks in sync despite SMP or PREEMPTION: use i_lock for that case * too, and do both at once by combining the tests. * * There is none of this locking overhead in the 64-bit case. Index: linux-5.4.5-rt3/fs/userfaultfd.c =================================================================== --- linux-5.4.5-rt3.orig/fs/userfaultfd.c +++ linux-5.4.5-rt3/fs/userfaultfd.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ struct userfaultfd_ctx { /* waitqueue head for events */ wait_queue_head_t event_wqh; /* a refile sequence protected by fault_pending_wqh lock */ - struct seqcount refile_seq; + seqlock_t refile_seq; /* pseudo fd refcounting */ refcount_t refcount; /* userfaultfd syscall flags */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1066 @ static ssize_t userfaultfd_ctx_read(stru * waitqueue could become empty if this is the * only userfault. */ - write_seqcount_begin(&ctx->refile_seq); + write_seqlock(&ctx->refile_seq); /* * The fault_pending_wqh.lock prevents the uwq @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1092 @ static ssize_t userfaultfd_ctx_read(stru list_del(&uwq->wq.entry); add_wait_queue(&ctx->fault_wqh, &uwq->wq); - write_seqcount_end(&ctx->refile_seq); + write_sequnlock(&ctx->refile_seq); /* careful to always initialize msg if ret == 0 */ *msg = uwq->msg; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1265 @ static __always_inline void wake_userfau * sure we've userfaults to wake. */ do { - seq = read_seqcount_begin(&ctx->refile_seq); + seq = read_seqbegin(&ctx->refile_seq); need_wakeup = waitqueue_active(&ctx->fault_pending_wqh) || waitqueue_active(&ctx->fault_wqh); cond_resched(); - } while (read_seqcount_retry(&ctx->refile_seq, seq)); + } while (read_seqretry(&ctx->refile_seq, seq)); if (need_wakeup) __wake_userfault(ctx, range); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1942 @ static void init_once_userfaultfd_ctx(vo init_waitqueue_head(&ctx->fault_wqh); init_waitqueue_head(&ctx->event_wqh); init_waitqueue_head(&ctx->fd_wqh); - seqcount_init(&ctx->refile_seq); + seqlock_init(&ctx->refile_seq); } SYSCALL_DEFINE1(userfaultfd, int, flags) Index: linux-5.4.5-rt3/include/Kbuild =================================================================== --- linux-5.4.5-rt3.orig/include/Kbuild +++ linux-5.4.5-rt3/include/Kbuild @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1161 @ header-test- += xen/xenbus.h # Do not include directly header-test- += linux/compiler-clang.h header-test- += linux/compiler-gcc.h +header-test- += linux/mutex_rt.h header-test- += linux/patchkey.h header-test- += linux/rwlock_api_smp.h +header-test- += linux/rwlock_rt.h +header-test- += linux/rwlock_types_rt.h +header-test- += linux/rwsem-rt.h +header-test- += linux/spinlock_rt.h +header-test- += linux/spinlock_types_nort.h +header-test- += linux/spinlock_types_rt.h header-test- += linux/spinlock_types_up.h header-test- += linux/spinlock_up.h header-test- += linux/wimax/debug.h Index: linux-5.4.5-rt3/include/linux/bottom_half.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/bottom_half.h +++ linux-5.4.5-rt3/include/linux/bottom_half.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:7 @ #include <linux/preempt.h> +#ifdef CONFIG_PREEMPT_RT +extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt); +#else + #ifdef CONFIG_TRACE_IRQFLAGS extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt); #else @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:20 @ static __always_inline void __local_bh_d barrier(); } #endif +#endif static inline void local_bh_disable(void) { Index: linux-5.4.5-rt3/include/linux/buffer_head.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/buffer_head.h +++ linux-5.4.5-rt3/include/linux/buffer_head.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:25 @ enum bh_state_bits { BH_Dirty, /* Is dirty */ BH_Lock, /* Is locked */ BH_Req, /* Has been submitted for I/O */ - BH_Uptodate_Lock,/* Used by the first bh in a page, to serialise - * IO completion of other buffers in the page - */ BH_Mapped, /* Has a disk mapping */ BH_New, /* Disk mapping was newly created by get_block */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:76 @ struct buffer_head { struct address_space *b_assoc_map; /* mapping this buffer is associated with */ atomic_t b_count; /* users using this buffer_head */ + spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to + * serialise IO completion of other + * buffers in the page */ }; /* Index: linux-5.4.5-rt3/include/linux/cgroup-defs.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/cgroup-defs.h +++ linux-5.4.5-rt3/include/linux/cgroup-defs.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:147 @ struct cgroup_subsys_state { struct list_head sibling; struct list_head children; - /* flush target list anchored at cgrp->rstat_css_list */ - struct list_head rstat_css_node; - /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:455 @ struct cgroup { /* per-cpu recursive resource statistics */ struct cgroup_rstat_cpu __percpu *rstat_cpu; - struct list_head rstat_css_list; /* cgroup basic resource statistics */ struct cgroup_base_stat pending_bstat; /* pending from children */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:632 @ struct cgroup_subsys { void (*css_released)(struct cgroup_subsys_state *css); void (*css_free)(struct cgroup_subsys_state *css); void (*css_reset)(struct cgroup_subsys_state *css); - void (*css_rstat_flush)(struct cgroup_subsys_state *css, int cpu); int (*css_extra_stat_show)(struct seq_file *seq, struct cgroup_subsys_state *css); Index: linux-5.4.5-rt3/include/linux/cgroup.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/cgroup.h +++ linux-5.4.5-rt3/include/linux/cgroup.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:753 @ static inline void cgroup_path_from_kern */ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); -void cgroup_rstat_flush_hold(struct cgroup *cgrp); -void cgroup_rstat_flush_release(void); /* * Basic resource stats. Index: linux-5.4.5-rt3/include/linux/completion.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/completion.h +++ linux-5.4.5-rt3/include/linux/completion.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:12 @ * See kernel/sched/completion.c for details. */ -#include <linux/wait.h> +#include <linux/swait.h> /* * struct completion - structure used to maintain state for a "completion" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:28 @ */ struct completion { unsigned int done; - wait_queue_head_t wait; + struct swait_queue_head wait; }; #define init_completion_map(x, m) __init_completion(x) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:37 @ static inline void complete_acquire(stru static inline void complete_release(struct completion *x) {} #define COMPLETION_INITIALIZER(work) \ - { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } + { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) } #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \ (*({ init_completion_map(&(work), &(map)); &(work); })) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:88 @ static inline void complete_release(stru static inline void __init_completion(struct completion *x) { x->done = 0; - init_waitqueue_head(&x->wait); + init_swait_queue_head(&x->wait); } /** Index: linux-5.4.5-rt3/include/linux/console.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/console.h +++ linux-5.4.5-rt3/include/linux/console.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:148 @ static inline int con_debug_leave(void) struct console { char name[16]; void (*write)(struct console *, const char *, unsigned); + void (*write_atomic)(struct console *, const char *, unsigned); int (*read)(struct console *, char *, unsigned); struct tty_driver *(*device)(struct console *, int *); void (*unblank)(void); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:157 @ struct console { short flags; short index; int cflag; + unsigned long printk_seq; + int wrote_history; void *data; struct console *next; }; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:240 @ extern void console_init(void); void dummycon_register_output_notifier(struct notifier_block *nb); void dummycon_unregister_output_notifier(struct notifier_block *nb); +extern void console_atomic_lock(unsigned int *flags); +extern void console_atomic_unlock(unsigned int flags); + #endif /* _LINUX_CONSOLE_H */ Index: linux-5.4.5-rt3/include/linux/dcache.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/dcache.h +++ linux-5.4.5-rt3/include/linux/dcache.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:109 @ struct dentry { union { struct list_head d_lru; /* LRU list */ - wait_queue_head_t *d_wait; /* in-lookup ones only */ + struct swait_queue_head *d_wait; /* in-lookup ones only */ }; struct list_head d_child; /* child of parent list */ struct list_head d_subdirs; /* our children */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:239 @ extern void d_set_d_op(struct dentry *de extern struct dentry * d_alloc(struct dentry *, const struct qstr *); extern struct dentry * d_alloc_anon(struct super_block *); extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *, - wait_queue_head_t *); + struct swait_queue_head *); extern struct dentry * d_splice_alias(struct inode *, struct dentry *); extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *); extern struct dentry * d_exact_alias(struct dentry *, struct inode *); Index: linux-5.4.5-rt3/include/linux/delay.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/delay.h +++ linux-5.4.5-rt3/include/linux/delay.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:68 @ static inline void ssleep(unsigned int s msleep(seconds * 1000); } +#ifdef CONFIG_PREEMPT_RT +extern void cpu_chill(void); +#else +# define cpu_chill() cpu_relax() +#endif + #endif /* defined(_LINUX_DELAY_H) */ Index: linux-5.4.5-rt3/include/linux/dma-resv.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/dma-resv.h +++ linux-5.4.5-rt3/include/linux/dma-resv.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:68 @ struct dma_resv_list { /** * struct dma_resv - a reservation object manages fences for a buffer * @lock: update side lock - * @seq: sequence count for managing RCU read-side synchronization + * @seq: sequence lock for managing RCU read-side synchronization * @fence_excl: the exclusive fence, if there is one currently * @fence: list of current shared fences */ struct dma_resv { struct ww_mutex lock; - seqcount_t seq; + seqlock_t seq; struct dma_fence __rcu *fence_excl; struct dma_resv_list __rcu *fence; Index: linux-5.4.5-rt3/include/linux/fs.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/fs.h +++ linux-5.4.5-rt3/include/linux/fs.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:719 @ struct inode { struct block_device *i_bdev; struct cdev *i_cdev; char *i_link; - unsigned i_dir_seq; + unsigned __i_dir_seq; }; __u32 i_generation; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:858 @ static inline loff_t i_size_read(const s i_size = inode->i_size; } while (read_seqcount_retry(&inode->i_size_seqcount, seq)); return i_size; -#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) +#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION) loff_t i_size; preempt_disable(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:883 @ static inline void i_size_write(struct i inode->i_size = i_size; write_seqcount_end(&inode->i_size_seqcount); preempt_enable(); -#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) +#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION) preempt_disable(); inode->i_size = i_size; preempt_enable(); Index: linux-5.4.5-rt3/include/linux/fscache.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/fscache.h +++ linux-5.4.5-rt3/include/linux/fscache.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:229 @ extern void __fscache_readpages_cancel(s extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool); extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t, bool (*)(void *), void *); +extern void fscache_cookie_init(void); /** * fscache_register_netfs - Register a filesystem as desiring caching services Index: linux-5.4.5-rt3/include/linux/genhd.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/genhd.h +++ linux-5.4.5-rt3/include/linux/genhd.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:720 @ static inline void hd_free_part(struct h * accessor function. * * Code written along the lines of i_size_read() and i_size_write(). - * CONFIG_PREEMPT case optimizes the case of UP kernel with preemption + * CONFIG_PREEMPTION case optimizes the case of UP kernel with preemption * on. */ static inline sector_t part_nr_sects_read(struct hd_struct *part) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:733 @ static inline sector_t part_nr_sects_rea nr_sects = part->nr_sects; } while (read_seqcount_retry(&part->nr_sects_seq, seq)); return nr_sects; -#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) +#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION) sector_t nr_sects; preempt_disable(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:756 @ static inline void part_nr_sects_write(s write_seqcount_begin(&part->nr_sects_seq); part->nr_sects = size; write_seqcount_end(&part->nr_sects_seq); -#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT) +#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPTION) preempt_disable(); part->nr_sects = size; preempt_enable(); Index: linux-5.4.5-rt3/include/linux/gfp.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/gfp.h +++ linux-5.4.5-rt3/include/linux/gfp.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:583 @ extern void page_frag_free(void *addr); void page_alloc_init(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); +void drain_cpu_pages(unsigned int cpu, struct zone *zone); void drain_local_pages(struct zone *zone); void page_alloc_init_late(void); Index: linux-5.4.5-rt3/include/linux/hardirq.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/hardirq.h +++ linux-5.4.5-rt3/include/linux/hardirq.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:71 @ extern void irq_exit(void); #define nmi_enter() \ do { \ arch_nmi_enter(); \ - printk_nmi_enter(); \ lockdep_off(); \ ftrace_nmi_enter(); \ BUG_ON(in_nmi()); \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:87 @ extern void irq_exit(void); preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \ ftrace_nmi_exit(); \ lockdep_on(); \ - printk_nmi_exit(); \ arch_nmi_exit(); \ } while (0) Index: linux-5.4.5-rt3/include/linux/highmem.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/highmem.h +++ linux-5.4.5-rt3/include/linux/highmem.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/sched.h> #include <asm/cacheflush.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:94 @ static inline void kunmap(struct page *p static inline void *kmap_atomic(struct page *page) { - preempt_disable(); + preempt_disable_nort(); pagefault_disable(); return page_address(page); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:103 @ static inline void *kmap_atomic(struct p static inline void __kunmap_atomic(void *addr) { pagefault_enable(); - preempt_enable(); + preempt_enable_nort(); } #define kmap_atomic_pfn(pfn) kmap_atomic(pfn_to_page(pfn)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:115 @ static inline void __kunmap_atomic(void #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) +#ifndef CONFIG_PREEMPT_RT DECLARE_PER_CPU(int, __kmap_atomic_idx); +#endif static inline int kmap_atomic_idx_push(void) { +#ifndef CONFIG_PREEMPT_RT int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1; -#ifdef CONFIG_DEBUG_HIGHMEM +# ifdef CONFIG_DEBUG_HIGHMEM WARN_ON_ONCE(in_irq() && !irqs_disabled()); BUG_ON(idx >= KM_TYPE_NR); -#endif +# endif return idx; +#else + current->kmap_idx++; + BUG_ON(current->kmap_idx > KM_TYPE_NR); + return current->kmap_idx - 1; +#endif } static inline int kmap_atomic_idx(void) { +#ifndef CONFIG_PREEMPT_RT return __this_cpu_read(__kmap_atomic_idx) - 1; +#else + return current->kmap_idx - 1; +#endif } static inline void kmap_atomic_idx_pop(void) { -#ifdef CONFIG_DEBUG_HIGHMEM +#ifndef CONFIG_PREEMPT_RT +# ifdef CONFIG_DEBUG_HIGHMEM int idx = __this_cpu_dec_return(__kmap_atomic_idx); BUG_ON(idx < 0); -#else +# else __this_cpu_dec(__kmap_atomic_idx); +# endif +#else + current->kmap_idx--; +# ifdef CONFIG_DEBUG_HIGHMEM + BUG_ON(current->kmap_idx < 0); +# endif #endif } Index: linux-5.4.5-rt3/include/linux/idr.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/idr.h +++ linux-5.4.5-rt3/include/linux/idr.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:172 @ static inline bool idr_is_empty(const st * Each idr_preload() should be matched with an invocation of this * function. See idr_preload() for details. */ -static inline void idr_preload_end(void) -{ - preempt_enable(); -} +void idr_preload_end(void); /** * idr_for_each_entry() - Iterate over an IDR's elements of a given type. Index: linux-5.4.5-rt3/include/linux/interrupt.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/interrupt.h +++ linux-5.4.5-rt3/include/linux/interrupt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:549 @ struct softirq_action asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); -#ifdef __ARCH_HAS_DO_SOFTIRQ +#if defined(__ARCH_HAS_DO_SOFTIRQ) && !defined(CONFIG_PREEMPT_RT) void do_softirq_own_stack(void); #else static inline void do_softirq_own_stack(void) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:564 @ extern void __raise_softirq_irqoff(unsig extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); +extern void softirq_check_pending_idle(void); DECLARE_PER_CPU(struct task_struct *, ksoftirqd); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:629 @ static inline void tasklet_unlock(struct static inline void tasklet_unlock_wait(struct tasklet_struct *t) { - while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); } + while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { + local_bh_disable(); + local_bh_enable(); + } } #else #define tasklet_trylock(t) 1 Index: linux-5.4.5-rt3/include/linux/irq_work.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/irq_work.h +++ linux-5.4.5-rt3/include/linux/irq_work.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ /* Doesn't want IPI, wait for tick: */ #define IRQ_WORK_LAZY BIT(2) +/* Run hard IRQ context, even on RT */ +#define IRQ_WORK_HARD_IRQ BIT(3) #define IRQ_WORK_CLAIMED (IRQ_WORK_PENDING | IRQ_WORK_BUSY) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:57 @ static inline bool irq_work_needs_cpu(vo static inline void irq_work_run(void) { } #endif +#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT) +void irq_work_tick_soft(void); +#else +static inline void irq_work_tick_soft(void) { } +#endif + #endif /* _LINUX_IRQ_WORK_H */ Index: linux-5.4.5-rt3/include/linux/irqdesc.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/irqdesc.h +++ linux-5.4.5-rt3/include/linux/irqdesc.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:75 @ struct irq_desc { unsigned int irqs_unhandled; atomic_t threads_handled; int threads_handled_last; + u64 random_ip; raw_spinlock_t lock; struct cpumask *percpu_enabled; const struct cpumask *percpu_affinity; Index: linux-5.4.5-rt3/include/linux/irqflags.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/irqflags.h +++ linux-5.4.5-rt3/include/linux/irqflags.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:46 @ do { \ do { \ current->hardirq_context--; \ } while (0) -# define lockdep_softirq_enter() \ -do { \ - current->softirq_context++; \ -} while (0) -# define lockdep_softirq_exit() \ -do { \ - current->softirq_context--; \ -} while (0) #else # define trace_hardirqs_on() do { } while (0) # define trace_hardirqs_off() do { } while (0) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:58 @ do { \ # define lockdep_softirq_enter() do { } while (0) # define lockdep_softirq_exit() do { } while (0) #endif + +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PREEMPT_RT) +# define lockdep_softirq_enter() \ +do { \ + current->softirq_context++; \ +} while (0) +# define lockdep_softirq_exit() \ +do { \ + current->softirq_context--; \ +} while (0) + +#else +# define lockdep_softirq_enter() do { } while (0) +# define lockdep_softirq_exit() do { } while (0) +#endif #if defined(CONFIG_IRQSOFF_TRACER) || \ defined(CONFIG_PREEMPT_TRACER) Index: linux-5.4.5-rt3/include/linux/jbd2.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/jbd2.h +++ linux-5.4.5-rt3/include/linux/jbd2.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:316 @ enum jbd_state_bits { BH_Revoked, /* Has been revoked from the log */ BH_RevokeValid, /* Revoked flag is valid */ BH_JBDDirty, /* Is dirty but journaled */ - BH_State, /* Pins most journal_head state */ BH_JournalHead, /* Pins bh->b_private and jh->b_bh */ BH_Shadow, /* IO on shadow buffer is running */ BH_Verified, /* Metadata block has been verified ok */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:344 @ static inline struct journal_head *bh2jh return bh->b_private; } -static inline void jbd_lock_bh_state(struct buffer_head *bh) -{ - bit_spin_lock(BH_State, &bh->b_state); -} - -static inline int jbd_trylock_bh_state(struct buffer_head *bh) -{ - return bit_spin_trylock(BH_State, &bh->b_state); -} - -static inline int jbd_is_locked_bh_state(struct buffer_head *bh) -{ - return bit_spin_is_locked(BH_State, &bh->b_state); -} - -static inline void jbd_unlock_bh_state(struct buffer_head *bh) -{ - bit_spin_unlock(BH_State, &bh->b_state); -} - static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { bit_spin_lock(BH_JournalHead, &bh->b_state); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:538 @ struct transaction_chp_stats_s { * ->jbd_lock_bh_journal_head() (This is "innermost") * * j_state_lock - * ->jbd_lock_bh_state() + * ->b_state_lock * - * jbd_lock_bh_state() + * b_state_lock * ->j_list_lock * * j_state_lock @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1239 @ JBD2_FEATURE_INCOMPAT_FUNCS(csum3, CSUM /* Filing buffers */ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *); -extern void __jbd2_journal_refile_buffer(struct journal_head *); +extern bool __jbd2_journal_refile_buffer(struct journal_head *); extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *); extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int); extern void __journal_free_buffer(struct journal_head *bh); Index: linux-5.4.5-rt3/include/linux/journal-head.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/journal-head.h +++ linux-5.4.5-rt3/include/linux/journal-head.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:14 @ #ifndef JOURNAL_HEAD_H_INCLUDED #define JOURNAL_HEAD_H_INCLUDED +#include <linux/spinlock.h> + typedef unsigned int tid_t; /* Unique transaction ID */ typedef struct transaction_s transaction_t; /* Compound transaction type */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:29 @ struct journal_head { struct buffer_head *b_bh; /* + * Protect the buffer head state + */ + spinlock_t b_state_lock; + + /* * Reference count - see description in journal.c * [jbd_lock_bh_journal_head()] */ int b_jcount; /* - * Journalling list for this buffer [jbd_lock_bh_state()] + * Journalling list for this buffer [b_state_lock] * NOTE: We *cannot* combine this with b_modified into a bitfield * as gcc would then (which the C standard allows but which is * very unuseful) make 64-bit accesses to the bitfield and clobber @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:51 @ struct journal_head { /* * This flag signals the buffer has been modified by * the currently running transaction - * [jbd_lock_bh_state()] + * [b_state_lock] */ unsigned b_modified; /* * Copy of the buffer data frozen for writing to the log. - * [jbd_lock_bh_state()] + * [b_state_lock] */ char *b_frozen_data; /* * Pointer to a saved copy of the buffer containing no uncommitted * deallocation references, so that allocations can avoid overwriting - * uncommitted deletes. [jbd_lock_bh_state()] + * uncommitted deletes. [b_state_lock] */ char *b_committed_data; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:73 @ struct journal_head { * metadata: either the running transaction or the committing * transaction (if there is one). Only applies to buffers on a * transaction's data or metadata journaling list. - * [j_list_lock] [jbd_lock_bh_state()] + * [j_list_lock] [b_state_lock] * Either of these locks is enough for reading, both are needed for * changes. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:83 @ struct journal_head { * Pointer to the running compound transaction which is currently * modifying the buffer's metadata, if there was already a transaction * committing it when the new transaction touched it. - * [t_list_lock] [jbd_lock_bh_state()] + * [t_list_lock] [b_state_lock] */ transaction_t *b_next_transaction; /* * Doubly-linked list of buffers on a transaction's data, metadata or - * forget queue. [t_list_lock] [jbd_lock_bh_state()] + * forget queue. [t_list_lock] [b_state_lock] */ struct journal_head *b_tnext, *b_tprev; Index: linux-5.4.5-rt3/include/linux/kernel.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/kernel.h +++ linux-5.4.5-rt3/include/linux/kernel.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:230 @ extern void __cant_sleep(const char *fil */ # define might_sleep() \ do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) + +# define might_sleep_no_state_check() \ + do { ___might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0) + /** * cant_sleep - annotation for functions that cannot sleep * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:265 @ extern void __cant_sleep(const char *fil static inline void __might_sleep(const char *file, int line, int preempt_offset) { } # define might_sleep() do { might_resched(); } while (0) +# define might_sleep_no_state_check() do { might_resched(); } while (0) # define cant_sleep() do { } while (0) # define sched_annotate_sleep() do { } while (0) # define non_block_start() do { } while (0) Index: linux-5.4.5-rt3/include/linux/kmsg_dump.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/kmsg_dump.h +++ linux-5.4.5-rt3/include/linux/kmsg_dump.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:49 @ struct kmsg_dumper { bool registered; /* private state of the kmsg iterator */ - u32 cur_idx; - u32 next_idx; - u64 cur_seq; - u64 next_seq; + u64 line_seq; + u64 buffer_end_seq; }; #ifdef CONFIG_PRINTK Index: linux-5.4.5-rt3/include/linux/list_bl.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/list_bl.h +++ linux-5.4.5-rt3/include/linux/list_bl.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ #define _LINUX_LIST_BL_H #include <linux/list.h> +#include <linux/spinlock.h> #include <linux/bit_spinlock.h> /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:37 @ struct hlist_bl_head { struct hlist_bl_node *first; +#ifdef CONFIG_PREEMPT_RT + raw_spinlock_t lock; +#endif }; struct hlist_bl_node { struct hlist_bl_node *next, **pprev; }; -#define INIT_HLIST_BL_HEAD(ptr) \ - ((ptr)->first = NULL) + +#ifdef CONFIG_PREEMPT_RT +#define INIT_HLIST_BL_HEAD(h) \ +do { \ + (h)->first = NULL; \ + raw_spin_lock_init(&(h)->lock); \ +} while (0) +#else +#define INIT_HLIST_BL_HEAD(h) (h)->first = NULL +#endif static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:160 @ static inline void hlist_bl_del_init(str static inline void hlist_bl_lock(struct hlist_bl_head *b) { +#ifndef CONFIG_PREEMPT_RT bit_spin_lock(0, (unsigned long *)b); +#else + raw_spin_lock(&b->lock); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + __set_bit(0, (unsigned long *)b); +#endif +#endif } static inline void hlist_bl_unlock(struct hlist_bl_head *b) { +#ifndef CONFIG_PREEMPT_RT __bit_spin_unlock(0, (unsigned long *)b); +#else +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + __clear_bit(0, (unsigned long *)b); +#endif + raw_spin_unlock(&b->lock); +#endif } static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) Index: linux-5.4.5-rt3/include/linux/locallock.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/locallock.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef _LINUX_LOCALLOCK_H +#define _LINUX_LOCALLOCK_H + +#include <linux/percpu.h> +#include <linux/spinlock.h> +#include <asm/current.h> + +#ifdef CONFIG_PREEMPT_RT + +#ifdef CONFIG_DEBUG_SPINLOCK +# define LL_WARN(cond) WARN_ON(cond) +#else +# define LL_WARN(cond) do { } while (0) +#endif + +/* + * per cpu lock based substitute for local_irq_*() + */ +struct local_irq_lock { + spinlock_t lock; + struct task_struct *owner; + int nestcnt; + unsigned long flags; +}; + +#define DEFINE_LOCAL_IRQ_LOCK(lvar) \ + DEFINE_PER_CPU(struct local_irq_lock, lvar) = { \ + .lock = __SPIN_LOCK_UNLOCKED((lvar).lock) } + +#define DECLARE_LOCAL_IRQ_LOCK(lvar) \ + DECLARE_PER_CPU(struct local_irq_lock, lvar) + +#define local_irq_lock_init(lvar) \ + do { \ + int __cpu; \ + for_each_possible_cpu(__cpu) \ + spin_lock_init(&per_cpu(lvar, __cpu).lock); \ + } while (0) + +static inline void __local_lock(struct local_irq_lock *lv) +{ + if (lv->owner != current) { + spin_lock(&lv->lock); + LL_WARN(lv->owner); + LL_WARN(lv->nestcnt); + lv->owner = current; + } + lv->nestcnt++; +} + +#define local_lock(lvar) \ + do { __local_lock(&get_local_var(lvar)); } while (0) + +#define local_lock_on(lvar, cpu) \ + do { __local_lock(&per_cpu(lvar, cpu)); } while (0) + +static inline int __local_trylock(struct local_irq_lock *lv) +{ + if (lv->owner != current && spin_trylock(&lv->lock)) { + LL_WARN(lv->owner); + LL_WARN(lv->nestcnt); + lv->owner = current; + lv->nestcnt = 1; + return 1; + } else if (lv->owner == current) { + lv->nestcnt++; + return 1; + } + return 0; +} + +#define local_trylock(lvar) \ + ({ \ + int __locked; \ + __locked = __local_trylock(&get_local_var(lvar)); \ + if (!__locked) \ + put_local_var(lvar); \ + __locked; \ + }) + +static inline void __local_unlock(struct local_irq_lock *lv) +{ + LL_WARN(lv->nestcnt == 0); + LL_WARN(lv->owner != current); + if (--lv->nestcnt) + return; + + lv->owner = NULL; + spin_unlock(&lv->lock); +} + +#define local_unlock(lvar) \ + do { \ + __local_unlock(this_cpu_ptr(&lvar)); \ + put_local_var(lvar); \ + } while (0) + +#define local_unlock_on(lvar, cpu) \ + do { __local_unlock(&per_cpu(lvar, cpu)); } while (0) + +static inline void __local_lock_irq(struct local_irq_lock *lv) +{ + spin_lock_irqsave(&lv->lock, lv->flags); + LL_WARN(lv->owner); + LL_WARN(lv->nestcnt); + lv->owner = current; + lv->nestcnt = 1; +} + +#define local_lock_irq(lvar) \ + do { __local_lock_irq(&get_local_var(lvar)); } while (0) + +#define local_lock_irq_on(lvar, cpu) \ + do { __local_lock_irq(&per_cpu(lvar, cpu)); } while (0) + +static inline void __local_unlock_irq(struct local_irq_lock *lv) +{ + LL_WARN(!lv->nestcnt); + LL_WARN(lv->owner != current); + lv->owner = NULL; + lv->nestcnt = 0; + spin_unlock_irq(&lv->lock); +} + +#define local_unlock_irq(lvar) \ + do { \ + __local_unlock_irq(this_cpu_ptr(&lvar)); \ + put_local_var(lvar); \ + } while (0) + +#define local_unlock_irq_on(lvar, cpu) \ + do { \ + __local_unlock_irq(&per_cpu(lvar, cpu)); \ + } while (0) + +static inline int __local_lock_irqsave(struct local_irq_lock *lv) +{ + if (lv->owner != current) { + __local_lock_irq(lv); + return 0; + } else { + lv->nestcnt++; + return 1; + } +} + +#define local_lock_irqsave(lvar, _flags) \ + do { \ + if (__local_lock_irqsave(&get_local_var(lvar))) \ + put_local_var(lvar); \ + _flags = __this_cpu_read(lvar.flags); \ + } while (0) + +#define local_lock_irqsave_on(lvar, _flags, cpu) \ + do { \ + __local_lock_irqsave(&per_cpu(lvar, cpu)); \ + _flags = per_cpu(lvar, cpu).flags; \ + } while (0) + +static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, + unsigned long flags) +{ + LL_WARN(!lv->nestcnt); + LL_WARN(lv->owner != current); + if (--lv->nestcnt) + return 0; + + lv->owner = NULL; + spin_unlock_irqrestore(&lv->lock, lv->flags); + return 1; +} + +#define local_unlock_irqrestore(lvar, flags) \ + do { \ + if (__local_unlock_irqrestore(this_cpu_ptr(&lvar), flags)) \ + put_local_var(lvar); \ + } while (0) + +#define local_unlock_irqrestore_on(lvar, flags, cpu) \ + do { \ + __local_unlock_irqrestore(&per_cpu(lvar, cpu), flags); \ + } while (0) + +#define local_spin_trylock_irq(lvar, lock) \ + ({ \ + int __locked; \ + local_lock_irq(lvar); \ + __locked = spin_trylock(lock); \ + if (!__locked) \ + local_unlock_irq(lvar); \ + __locked; \ + }) + +#define local_spin_lock_irq(lvar, lock) \ + do { \ + local_lock_irq(lvar); \ + spin_lock(lock); \ + } while (0) + +#define local_spin_unlock_irq(lvar, lock) \ + do { \ + spin_unlock(lock); \ + local_unlock_irq(lvar); \ + } while (0) + +#define local_spin_lock_irqsave(lvar, lock, flags) \ + do { \ + local_lock_irqsave(lvar, flags); \ + spin_lock(lock); \ + } while (0) + +#define local_spin_unlock_irqrestore(lvar, lock, flags) \ + do { \ + spin_unlock(lock); \ + local_unlock_irqrestore(lvar, flags); \ + } while (0) + +#define get_locked_var(lvar, var) \ + (*({ \ + local_lock(lvar); \ + this_cpu_ptr(&var); \ + })) + +#define put_locked_var(lvar, var) local_unlock(lvar); + +#define get_locked_ptr(lvar, var) \ + ({ \ + local_lock(lvar); \ + this_cpu_ptr(var); \ + }) + +#define put_locked_ptr(lvar, var) local_unlock(lvar); + +#define local_lock_cpu(lvar) \ + ({ \ + local_lock(lvar); \ + smp_processor_id(); \ + }) + +#define local_unlock_cpu(lvar) local_unlock(lvar) + +#else /* PREEMPT_RT */ + +#define DEFINE_LOCAL_IRQ_LOCK(lvar) __typeof__(const int) lvar +#define DECLARE_LOCAL_IRQ_LOCK(lvar) extern __typeof__(const int) lvar + +static inline void local_irq_lock_init(int lvar) { } + +#define local_trylock(lvar) \ + ({ \ + preempt_disable(); \ + 1; \ + }) + +#define local_lock(lvar) preempt_disable() +#define local_unlock(lvar) preempt_enable() +#define local_lock_irq(lvar) local_irq_disable() +#define local_lock_irq_on(lvar, cpu) local_irq_disable() +#define local_unlock_irq(lvar) local_irq_enable() +#define local_unlock_irq_on(lvar, cpu) local_irq_enable() +#define local_lock_irqsave(lvar, flags) local_irq_save(flags) +#define local_unlock_irqrestore(lvar, flags) local_irq_restore(flags) + +#define local_spin_trylock_irq(lvar, lock) spin_trylock_irq(lock) +#define local_spin_lock_irq(lvar, lock) spin_lock_irq(lock) +#define local_spin_unlock_irq(lvar, lock) spin_unlock_irq(lock) +#define local_spin_lock_irqsave(lvar, lock, flags) \ + spin_lock_irqsave(lock, flags) +#define local_spin_unlock_irqrestore(lvar, lock, flags) \ + spin_unlock_irqrestore(lock, flags) + +#define get_locked_var(lvar, var) get_cpu_var(var) +#define put_locked_var(lvar, var) put_cpu_var(var) +#define get_locked_ptr(lvar, var) get_cpu_ptr(var) +#define put_locked_ptr(lvar, var) put_cpu_ptr(var) + +#define local_lock_cpu(lvar) get_cpu() +#define local_unlock_cpu(lvar) put_cpu() + +#endif + +#endif Index: linux-5.4.5-rt3/include/linux/mm_types.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/mm_types.h +++ linux-5.4.5-rt3/include/linux/mm_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:15 @ #include <linux/completion.h> #include <linux/cpumask.h> #include <linux/uprobes.h> +#include <linux/rcupdate.h> #include <linux/page-flags-layout.h> #include <linux/workqueue.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:524 @ struct mm_struct { bool tlb_flush_batched; #endif struct uprobes_state uprobes_state; +#ifdef CONFIG_PREEMPT_RT + struct rcu_head delayed_drop; +#endif #ifdef CONFIG_HUGETLB_PAGE atomic_long_t hugetlb_usage; #endif Index: linux-5.4.5-rt3/include/linux/mutex.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/mutex.h +++ linux-5.4.5-rt3/include/linux/mutex.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:25 @ struct ww_acquire_ctx; +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ + , .dep_map = { .name = #lockname } +#else +# define __DEP_MAP_MUTEX_INITIALIZER(lockname) +#endif + +#ifdef CONFIG_PREEMPT_RT +# include <linux/mutex_rt.h> +#else + /* * Simple, straightforward mutexes with strict semantics: * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:122 @ do { \ __mutex_init((mutex), #mutex, &__key); \ } while (0) -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \ - , .dep_map = { .name = #lockname } -#else -# define __DEP_MAP_MUTEX_INITIALIZER(lockname) -#endif - #define __MUTEX_INITIALIZER(lockname) \ { .owner = ATOMIC_LONG_INIT(0) \ , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:217 @ enum mutex_trylock_recursive_enum { extern /* __deprecated */ __must_check enum mutex_trylock_recursive_enum mutex_trylock_recursive(struct mutex *lock); +#endif /* !PREEMPT_RT */ + #endif /* __LINUX_MUTEX_H */ Index: linux-5.4.5-rt3/include/linux/mutex_rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/mutex_rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_MUTEX_RT_H +#define __LINUX_MUTEX_RT_H + +#ifndef __LINUX_MUTEX_H +#error "Please include mutex.h" +#endif + +#include <linux/rtmutex.h> + +/* FIXME: Just for __lockfunc */ +#include <linux/spinlock.h> + +struct mutex { + struct rt_mutex lock; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +}; + +#define __MUTEX_INITIALIZER(mutexname) \ + { \ + .lock = __RT_MUTEX_INITIALIZER(mutexname.lock) \ + __DEP_MAP_MUTEX_INITIALIZER(mutexname) \ + } + +#define DEFINE_MUTEX(mutexname) \ + struct mutex mutexname = __MUTEX_INITIALIZER(mutexname) + +extern void __mutex_do_init(struct mutex *lock, const char *name, struct lock_class_key *key); +extern void __lockfunc _mutex_lock(struct mutex *lock); +extern void __lockfunc _mutex_lock_io(struct mutex *lock); +extern void __lockfunc _mutex_lock_io_nested(struct mutex *lock, int subclass); +extern int __lockfunc _mutex_lock_interruptible(struct mutex *lock); +extern int __lockfunc _mutex_lock_killable(struct mutex *lock); +extern void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass); +extern void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest_lock); +extern int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass); +extern int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass); +extern int __lockfunc _mutex_trylock(struct mutex *lock); +extern void __lockfunc _mutex_unlock(struct mutex *lock); + +#define mutex_is_locked(l) rt_mutex_is_locked(&(l)->lock) +#define mutex_lock(l) _mutex_lock(l) +#define mutex_lock_interruptible(l) _mutex_lock_interruptible(l) +#define mutex_lock_killable(l) _mutex_lock_killable(l) +#define mutex_trylock(l) _mutex_trylock(l) +#define mutex_unlock(l) _mutex_unlock(l) +#define mutex_lock_io(l) _mutex_lock_io(l); + +#define __mutex_owner(l) ((l)->lock.owner) + +#ifdef CONFIG_DEBUG_MUTEXES +#define mutex_destroy(l) rt_mutex_destroy(&(l)->lock) +#else +static inline void mutex_destroy(struct mutex *lock) {} +#endif + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define mutex_lock_nested(l, s) _mutex_lock_nested(l, s) +# define mutex_lock_interruptible_nested(l, s) \ + _mutex_lock_interruptible_nested(l, s) +# define mutex_lock_killable_nested(l, s) \ + _mutex_lock_killable_nested(l, s) +# define mutex_lock_io_nested(l, s) _mutex_lock_io_nested(l, s) + +# define mutex_lock_nest_lock(lock, nest_lock) \ +do { \ + typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \ + _mutex_lock_nest_lock(lock, &(nest_lock)->dep_map); \ +} while (0) + +#else +# define mutex_lock_nested(l, s) _mutex_lock(l) +# define mutex_lock_interruptible_nested(l, s) \ + _mutex_lock_interruptible(l) +# define mutex_lock_killable_nested(l, s) \ + _mutex_lock_killable(l) +# define mutex_lock_nest_lock(lock, nest_lock) mutex_lock(lock) +# define mutex_lock_io_nested(l, s) _mutex_lock_io(l) +#endif + +# define mutex_init(mutex) \ +do { \ + static struct lock_class_key __key; \ + \ + rt_mutex_init(&(mutex)->lock); \ + __mutex_do_init((mutex), #mutex, &__key); \ +} while (0) + +# define __mutex_init(mutex, name, key) \ +do { \ + rt_mutex_init(&(mutex)->lock); \ + __mutex_do_init((mutex), name, key); \ +} while (0) + +/** + * These values are chosen such that FAIL and SUCCESS match the + * values of the regular mutex_trylock(). + */ +enum mutex_trylock_recursive_enum { + MUTEX_TRYLOCK_FAILED = 0, + MUTEX_TRYLOCK_SUCCESS = 1, + MUTEX_TRYLOCK_RECURSIVE, +}; +/** + * mutex_trylock_recursive - trylock variant that allows recursive locking + * @lock: mutex to be locked + * + * This function should not be used, _ever_. It is purely for hysterical GEM + * raisins, and once those are gone this will be removed. + * + * Returns: + * MUTEX_TRYLOCK_FAILED - trylock failed, + * MUTEX_TRYLOCK_SUCCESS - lock acquired, + * MUTEX_TRYLOCK_RECURSIVE - we already owned the lock. + */ +int __rt_mutex_owner_current(struct rt_mutex *lock); + +static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum +mutex_trylock_recursive(struct mutex *lock) +{ + if (unlikely(__rt_mutex_owner_current(&lock->lock))) + return MUTEX_TRYLOCK_RECURSIVE; + + return mutex_trylock(lock); +} + +extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); + +#endif Index: linux-5.4.5-rt3/include/linux/netdevice.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/netdevice.h +++ linux-5.4.5-rt3/include/linux/netdevice.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3019 @ struct softnet_data { unsigned int dropped; struct sk_buff_head input_pkt_queue; struct napi_struct backlog; + struct sk_buff_head tofree_queue; }; Index: linux-5.4.5-rt3/include/linux/nfs_fs.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/nfs_fs.h +++ linux-5.4.5-rt3/include/linux/nfs_fs.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:168 @ struct nfs_inode { /* Readers: in-flight sillydelete RPC calls */ /* Writers: rmdir */ +#ifdef CONFIG_PREEMPT_RT + struct semaphore rmdir_sem; +#else struct rw_semaphore rmdir_sem; +#endif struct mutex commit_mutex; #if IS_ENABLED(CONFIG_NFS_V4) Index: linux-5.4.5-rt3/include/linux/nfs_xdr.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/nfs_xdr.h +++ linux-5.4.5-rt3/include/linux/nfs_xdr.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1597 @ struct nfs_unlinkdata { struct nfs_removeargs args; struct nfs_removeres res; struct dentry *dentry; - wait_queue_head_t wq; + struct swait_queue_head wq; const struct cred *cred; struct nfs_fattr dir_attr; long timeout; Index: linux-5.4.5-rt3/include/linux/percpu-refcount.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/percpu-refcount.h +++ linux-5.4.5-rt3/include/linux/percpu-refcount.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:189 @ static inline void percpu_ref_get_many(s { unsigned long __percpu *percpu_count; - rcu_read_lock_sched(); + rcu_read_lock(); if (__ref_is_percpu(ref, &percpu_count)) this_cpu_add(*percpu_count, nr); else atomic_long_add(nr, &ref->count); - rcu_read_unlock_sched(); + rcu_read_unlock(); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:226 @ static inline bool percpu_ref_tryget(str unsigned long __percpu *percpu_count; bool ret; - rcu_read_lock_sched(); + rcu_read_lock(); if (__ref_is_percpu(ref, &percpu_count)) { this_cpu_inc(*percpu_count); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:235 @ static inline bool percpu_ref_tryget(str ret = atomic_long_inc_not_zero(&ref->count); } - rcu_read_unlock_sched(); + rcu_read_unlock(); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:260 @ static inline bool percpu_ref_tryget_liv unsigned long __percpu *percpu_count; bool ret = false; - rcu_read_lock_sched(); + rcu_read_lock(); if (__ref_is_percpu(ref, &percpu_count)) { this_cpu_inc(*percpu_count); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:269 @ static inline bool percpu_ref_tryget_liv ret = atomic_long_inc_not_zero(&ref->count); } - rcu_read_unlock_sched(); + rcu_read_unlock(); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:288 @ static inline void percpu_ref_put_many(s { unsigned long __percpu *percpu_count; - rcu_read_lock_sched(); + rcu_read_lock(); if (__ref_is_percpu(ref, &percpu_count)) this_cpu_sub(*percpu_count, nr); else if (unlikely(atomic_long_sub_and_test(nr, &ref->count))) ref->release(ref); - rcu_read_unlock_sched(); + rcu_read_unlock(); } /** Index: linux-5.4.5-rt3/include/linux/percpu-rwsem.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/percpu-rwsem.h +++ linux-5.4.5-rt3/include/linux/percpu-rwsem.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ #define _LINUX_PERCPU_RWSEM_H #include <linux/atomic.h> -#include <linux/rwsem.h> #include <linux/percpu.h> #include <linux/rcuwait.h> +#include <linux/wait.h> #include <linux/rcu_sync.h> #include <linux/lockdep.h> struct percpu_rw_semaphore { struct rcu_sync rss; unsigned int __percpu *read_count; - struct rw_semaphore rw_sem; /* slowpath */ - struct rcuwait writer; /* blocked writer */ - int readers_block; + struct rcuwait writer; + wait_queue_head_t waiters; + atomic_t block; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif }; +#ifdef CONFIG_DEBUG_LOCK_ALLOC +#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }, +#else +#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) +#endif + #define __DEFINE_PERCPU_RWSEM(name, is_static) \ static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \ is_static struct percpu_rw_semaphore name = { \ .rss = __RCU_SYNC_INITIALIZER(name.rss), \ .read_count = &__percpu_rwsem_rc_##name, \ - .rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \ .writer = __RCUWAIT_INITIALIZER(name.writer), \ + .waiters = __WAIT_QUEUE_HEAD_INITIALIZER(name.waiters), \ + .block = ATOMIC_INIT(0), \ + __PERCPU_RWSEM_DEP_MAP_INIT(name) \ } + #define DEFINE_PERCPU_RWSEM(name) \ __DEFINE_PERCPU_RWSEM(name, /* not static */) #define DEFINE_STATIC_PERCPU_RWSEM(name) \ __DEFINE_PERCPU_RWSEM(name, static) -extern int __percpu_down_read(struct percpu_rw_semaphore *, int); -extern void __percpu_up_read(struct percpu_rw_semaphore *); +extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool); static inline void percpu_down_read(struct percpu_rw_semaphore *sem) { might_sleep(); - rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_); + rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); preempt_disable(); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:62 @ static inline void percpu_down_read(stru * and that once the synchronize_rcu() is done, the writer will see * anything we did within this RCU-sched read-size critical section. */ - __this_cpu_inc(*sem->read_count); - if (unlikely(!rcu_sync_is_idle(&sem->rss))) + if (likely(rcu_sync_is_idle(&sem->rss))) + __this_cpu_inc(*sem->read_count); + else __percpu_down_read(sem, false); /* Unconditional memory barrier */ /* * The preempt_enable() prevents the compiler from @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:73 @ static inline void percpu_down_read(stru preempt_enable(); } -static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem) +static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) { - int ret = 1; + bool ret = true; preempt_disable(); /* * Same as in percpu_down_read(). */ - __this_cpu_inc(*sem->read_count); - if (unlikely(!rcu_sync_is_idle(&sem->rss))) + if (likely(rcu_sync_is_idle(&sem->rss))) + __this_cpu_inc(*sem->read_count); + else ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */ preempt_enable(); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:92 @ static inline int percpu_down_read_trylo */ if (ret) - rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_); + rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_); return ret; } static inline void percpu_up_read(struct percpu_rw_semaphore *sem) { + rwsem_release(&sem->dep_map, 1, _RET_IP_); + preempt_disable(); /* * Same as in percpu_down_read(). */ - if (likely(rcu_sync_is_idle(&sem->rss))) + if (likely(rcu_sync_is_idle(&sem->rss))) { __this_cpu_dec(*sem->read_count); - else - __percpu_up_read(sem); /* Unconditional memory barrier */ + } else { + /* + * slowpath; reader will only ever wake a single blocked + * writer. + */ + smp_mb(); /* B matches C */ + /* + * In other words, if they see our decrement (presumably to + * aggregate zero, as that is the only time it matters) they + * will also see our critical section. + */ + __this_cpu_dec(*sem->read_count); + rcuwait_wake_up(&sem->writer); + } preempt_enable(); - - rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_); } extern void percpu_down_write(struct percpu_rw_semaphore *); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:138 @ extern void percpu_free_rwsem(struct per __percpu_init_rwsem(sem, #sem, &rwsem_key); \ }) -#define percpu_rwsem_is_held(sem) lockdep_is_held(&(sem)->rw_sem) - -#define percpu_rwsem_assert_held(sem) \ - lockdep_assert_held(&(sem)->rw_sem) +#define percpu_rwsem_is_held(sem) lockdep_is_held(sem) +#define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem) static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, bool read, unsigned long ip) { - lock_release(&sem->rw_sem.dep_map, 1, ip); -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER - if (!read) - atomic_long_set(&sem->rw_sem.owner, RWSEM_OWNER_UNKNOWN); -#endif + lock_release(&sem->dep_map, 1, ip); } static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, bool read, unsigned long ip) { - lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip); -#ifdef CONFIG_RWSEM_SPIN_ON_OWNER - if (!read) - atomic_long_set(&sem->rw_sem.owner, (long)current); -#endif + lock_acquire(&sem->dep_map, 0, 1, read, 1, NULL, ip); } #endif Index: linux-5.4.5-rt3/include/linux/percpu.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/percpu.h +++ linux-5.4.5-rt3/include/linux/percpu.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:22 @ #define PERCPU_MODULE_RESERVE 0 #endif +#ifdef CONFIG_PREEMPT_RT + +#define get_local_var(var) (*({ \ + migrate_disable(); \ + this_cpu_ptr(&var); })) + +#define put_local_var(var) do { \ + (void)&(var); \ + migrate_enable(); \ +} while (0) + +# define get_local_ptr(var) ({ \ + migrate_disable(); \ + this_cpu_ptr(var); }) + +# define put_local_ptr(var) do { \ + (void)(var); \ + migrate_enable(); \ +} while (0) + +#else + +#define get_local_var(var) get_cpu_var(var) +#define put_local_var(var) put_cpu_var(var) +#define get_local_ptr(var) get_cpu_ptr(var) +#define put_local_ptr(var) put_cpu_ptr(var) + +#endif + /* minimum unit size, also is the maximum supported allocation size */ #define PCPU_MIN_UNIT_SIZE PFN_ALIGN(32 << 10) Index: linux-5.4.5-rt3/include/linux/pid.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/pid.h +++ linux-5.4.5-rt3/include/linux/pid.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ #define _LINUX_PID_H #include <linux/rculist.h> +#include <linux/atomic.h> #include <linux/wait.h> #include <linux/refcount.h> Index: linux-5.4.5-rt3/include/linux/posix-timers.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/posix-timers.h +++ linux-5.4.5-rt3/include/linux/posix-timers.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:75 @ struct cpu_timer { struct task_struct *task; struct list_head elist; int firing; + int firing_cpu; }; static inline bool cpu_timer_enqueue(struct timerqueue_head *head, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:127 @ struct posix_cputimers { struct posix_cputimer_base bases[CPUCLOCK_MAX]; unsigned int timers_active; unsigned int expiry_active; +#ifdef CONFIG_PREEMPT_RT + struct task_struct *posix_timer_list; +#endif }; static inline void posix_cputimers_init(struct posix_cputimers *pct) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:159 @ static inline void posix_cputimers_rt_wa INIT_CPU_TIMERBASE(b[2]), \ } +#ifdef CONFIG_PREEMPT_RT +# define INIT_TIMER_LIST .posix_timer_list = NULL, +#else +# define INIT_TIMER_LIST +#endif + #define INIT_CPU_TIMERS(s) \ .posix_cputimers = { \ .bases = INIT_CPU_TIMERBASES(s.posix_cputimers.bases), \ + INIT_TIMER_LIST \ }, #else struct posix_cputimers { }; Index: linux-5.4.5-rt3/include/linux/preempt.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/preempt.h +++ linux-5.4.5-rt3/include/linux/preempt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:81 @ #include <asm/preempt.h> #define hardirq_count() (preempt_count() & HARDIRQ_MASK) -#define softirq_count() (preempt_count() & SOFTIRQ_MASK) #define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \ | NMI_MASK)) - /* * Are we doing bottom half or hardware interrupt processing? * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:97 @ * should not be used in new code. */ #define in_irq() (hardirq_count()) -#define in_softirq() (softirq_count()) #define in_interrupt() (irq_count()) -#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET) #define in_nmi() (preempt_count() & NMI_MASK) #define in_task() (!(preempt_count() & \ (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) +#ifdef CONFIG_PREEMPT_RT + +#define softirq_count() ((long)current->softirq_count) +#define in_softirq() (softirq_count()) +#define in_serving_softirq() (current->softirq_count & SOFTIRQ_OFFSET) + +#else + +#define softirq_count() (preempt_count() & SOFTIRQ_MASK) +#define in_softirq() (softirq_count()) +#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET) + +#endif /* * The preempt_count offset after preempt_disable(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:127 @ /* * The preempt_count offset after spin_lock() */ +#if !defined(CONFIG_PREEMPT_RT) #define PREEMPT_LOCK_OFFSET PREEMPT_DISABLE_OFFSET +#else +#define PREEMPT_LOCK_OFFSET 0 +#endif /* * The preempt_count offset needed for things like: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:180 @ extern void preempt_count_sub(int val); #define preempt_count_inc() preempt_count_add(1) #define preempt_count_dec() preempt_count_sub(1) +#ifdef CONFIG_PREEMPT_LAZY +#define add_preempt_lazy_count(val) do { preempt_lazy_count() += (val); } while (0) +#define sub_preempt_lazy_count(val) do { preempt_lazy_count() -= (val); } while (0) +#define inc_preempt_lazy_count() add_preempt_lazy_count(1) +#define dec_preempt_lazy_count() sub_preempt_lazy_count(1) +#define preempt_lazy_count() (current_thread_info()->preempt_lazy_count) +#else +#define add_preempt_lazy_count(val) do { } while (0) +#define sub_preempt_lazy_count(val) do { } while (0) +#define inc_preempt_lazy_count() do { } while (0) +#define dec_preempt_lazy_count() do { } while (0) +#define preempt_lazy_count() (0) +#endif + #ifdef CONFIG_PREEMPT_COUNT #define preempt_disable() \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:202 @ do { \ barrier(); \ } while (0) +#define preempt_lazy_disable() \ +do { \ + inc_preempt_lazy_count(); \ + barrier(); \ +} while (0) + #define sched_preempt_enable_no_resched() \ do { \ barrier(); \ preempt_count_dec(); \ } while (0) -#define preempt_enable_no_resched() sched_preempt_enable_no_resched() +#ifdef CONFIG_PREEMPT_RT +# define preempt_enable_no_resched() sched_preempt_enable_no_resched() +# define preempt_check_resched_rt() preempt_check_resched() +#else +# define preempt_enable_no_resched() preempt_enable() +# define preempt_check_resched_rt() barrier(); +#endif #define preemptible() (preempt_count() == 0 && !irqs_disabled()) +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) + +extern void migrate_disable(void); +extern void migrate_enable(void); + +int __migrate_disabled(struct task_struct *p); + +#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) + +extern void migrate_disable(void); +extern void migrate_enable(void); +static inline int __migrate_disabled(struct task_struct *p) +{ + return 0; +} + +#else +#define migrate_disable() preempt_disable() +#define migrate_enable() preempt_enable() +static inline int __migrate_disabled(struct task_struct *p) +{ + return 0; +} +#endif + #ifdef CONFIG_PREEMPTION #define preempt_enable() \ do { \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:270 @ do { \ __preempt_schedule(); \ } while (0) +#define preempt_lazy_enable() \ +do { \ + dec_preempt_lazy_count(); \ + barrier(); \ + preempt_check_resched(); \ +} while (0) + #else /* !CONFIG_PREEMPTION */ #define preempt_enable() \ do { \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:284 @ do { \ preempt_count_dec(); \ } while (0) +#define preempt_lazy_enable() \ +do { \ + dec_preempt_lazy_count(); \ + barrier(); \ +} while (0) + #define preempt_enable_notrace() \ do { \ barrier(); \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:328 @ do { \ #define preempt_disable_notrace() barrier() #define preempt_enable_no_resched_notrace() barrier() #define preempt_enable_notrace() barrier() +#define preempt_check_resched_rt() barrier() #define preemptible() 0 +#define migrate_disable() barrier() +#define migrate_enable() barrier() + +static inline int __migrate_disabled(struct task_struct *p) +{ + return 0; +} #endif /* CONFIG_PREEMPT_COUNT */ #ifdef MODULE @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:356 @ do { \ } while (0) #define preempt_fold_need_resched() \ do { \ - if (tif_need_resched()) \ + if (tif_need_resched_now()) \ set_preempt_need_resched(); \ } while (0) +#ifdef CONFIG_PREEMPT_RT +# define preempt_disable_rt() preempt_disable() +# define preempt_enable_rt() preempt_enable() +# define preempt_disable_nort() barrier() +# define preempt_enable_nort() barrier() +#else +# define preempt_disable_rt() barrier() +# define preempt_enable_rt() barrier() +# define preempt_disable_nort() preempt_disable() +# define preempt_enable_nort() preempt_enable() +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS struct preempt_notifier; Index: linux-5.4.5-rt3/include/linux/printk.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/printk.h +++ linux-5.4.5-rt3/include/linux/printk.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ static inline const char *printk_skip_he */ #define CONSOLE_LOGLEVEL_DEFAULT CONFIG_CONSOLE_LOGLEVEL_DEFAULT #define CONSOLE_LOGLEVEL_QUIET CONFIG_CONSOLE_LOGLEVEL_QUIET +#define CONSOLE_LOGLEVEL_EMERGENCY CONFIG_CONSOLE_LOGLEVEL_EMERGENCY extern int console_printk[]; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:69 @ extern int console_printk[]; #define default_message_loglevel (console_printk[1]) #define minimum_console_loglevel (console_printk[2]) #define default_console_loglevel (console_printk[3]) +#define emergency_console_loglevel (console_printk[4]) static inline void console_silent(void) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:151 @ static inline __printf(1, 2) __cold void early_printk(const char *s, ...) { } #endif -#ifdef CONFIG_PRINTK_NMI -extern void printk_nmi_enter(void); -extern void printk_nmi_exit(void); -extern void printk_nmi_direct_enter(void); -extern void printk_nmi_direct_exit(void); -#else -static inline void printk_nmi_enter(void) { } -static inline void printk_nmi_exit(void) { } -static inline void printk_nmi_direct_enter(void) { } -static inline void printk_nmi_direct_exit(void) { } -#endif /* PRINTK_NMI */ - #ifdef CONFIG_PRINTK asmlinkage __printf(5, 0) int vprintk_emit(int facility, int level, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:195 @ __printf(1, 2) void dump_stack_set_arch_ void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); extern asmlinkage void dump_stack(void) __cold; -extern void printk_safe_init(void); -extern void printk_safe_flush(void); -extern void printk_safe_flush_on_panic(void); +struct wait_queue_head *printk_wait_queue(void); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:259 @ static inline void show_regs_print_info( static inline void dump_stack(void) { } - -static inline void printk_safe_init(void) -{ -} - -static inline void printk_safe_flush(void) -{ -} - -static inline void printk_safe_flush_on_panic(void) -{ -} #endif extern int kptr_restrict; Index: linux-5.4.5-rt3/include/linux/printk_ringbuffer.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/printk_ringbuffer.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PRINTK_RINGBUFFER_H +#define _LINUX_PRINTK_RINGBUFFER_H + +#include <linux/irq_work.h> +#include <linux/atomic.h> +#include <linux/percpu.h> +#include <linux/wait.h> + +struct prb_cpulock { + atomic_t owner; + unsigned long __percpu *irqflags; +}; + +struct printk_ringbuffer { + void *buffer; + unsigned int size_bits; + + u64 seq; + atomic_long_t lost; + + atomic_long_t tail; + atomic_long_t head; + atomic_long_t reserve; + + struct prb_cpulock *cpulock; + atomic_t ctx; + + struct wait_queue_head *wq; + atomic_long_t wq_counter; + struct irq_work *wq_work; +}; + +struct prb_entry { + unsigned int size; + u64 seq; + char data[0]; +}; + +struct prb_handle { + struct printk_ringbuffer *rb; + unsigned int cpu; + struct prb_entry *entry; +}; + +#define DECLARE_STATIC_PRINTKRB_CPULOCK(name) \ +static DEFINE_PER_CPU(unsigned long, _##name##_percpu_irqflags); \ +static struct prb_cpulock name = { \ + .owner = ATOMIC_INIT(-1), \ + .irqflags = &_##name##_percpu_irqflags, \ +} + +#define PRB_INIT ((unsigned long)-1) + +#define DECLARE_STATIC_PRINTKRB_ITER(name, rbaddr) \ +static struct prb_iterator name = { \ + .rb = rbaddr, \ + .lpos = PRB_INIT, \ +} + +struct prb_iterator { + struct printk_ringbuffer *rb; + unsigned long lpos; +}; + +#define DECLARE_STATIC_PRINTKRB(name, szbits, cpulockptr) \ +static char _##name##_buffer[1 << (szbits)] \ + __aligned(__alignof__(long)); \ +static DECLARE_WAIT_QUEUE_HEAD(_##name##_wait); \ +static void _##name##_wake_work_func(struct irq_work *irq_work) \ +{ \ + wake_up_interruptible_all(&_##name##_wait); \ +} \ +static struct irq_work _##name##_wake_work = { \ + .func = _##name##_wake_work_func, \ + .flags = IRQ_WORK_LAZY, \ +}; \ +static struct printk_ringbuffer name = { \ + .buffer = &_##name##_buffer[0], \ + .size_bits = szbits, \ + .seq = 0, \ + .lost = ATOMIC_LONG_INIT(0), \ + .tail = ATOMIC_LONG_INIT(-111 * sizeof(long)), \ + .head = ATOMIC_LONG_INIT(-111 * sizeof(long)), \ + .reserve = ATOMIC_LONG_INIT(-111 * sizeof(long)), \ + .cpulock = cpulockptr, \ + .ctx = ATOMIC_INIT(0), \ + .wq = &_##name##_wait, \ + .wq_counter = ATOMIC_LONG_INIT(0), \ + .wq_work = &_##name##_wake_work, \ +} + +/* writer interface */ +char *prb_reserve(struct prb_handle *h, struct printk_ringbuffer *rb, + unsigned int size); +void prb_commit(struct prb_handle *h); + +/* reader interface */ +void prb_iter_init(struct prb_iterator *iter, struct printk_ringbuffer *rb, + u64 *seq); +void prb_iter_copy(struct prb_iterator *dest, struct prb_iterator *src); +int prb_iter_next(struct prb_iterator *iter, char *buf, int size, u64 *seq); +int prb_iter_wait_next(struct prb_iterator *iter, char *buf, int size, + u64 *seq); +int prb_iter_seek(struct prb_iterator *iter, u64 seq); +int prb_iter_data(struct prb_iterator *iter, char *buf, int size, u64 *seq); + +/* utility functions */ +int prb_buffer_size(struct printk_ringbuffer *rb); +void prb_inc_lost(struct printk_ringbuffer *rb); +void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store); +void prb_unlock(struct prb_cpulock *cpu_lock, unsigned int cpu_store); + +#endif /*_LINUX_PRINTK_RINGBUFFER_H */ Index: linux-5.4.5-rt3/include/linux/radix-tree.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/radix-tree.h +++ linux-5.4.5-rt3/include/linux/radix-tree.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:229 @ unsigned int radix_tree_gang_lookup(cons unsigned int max_items); int radix_tree_preload(gfp_t gfp_mask); int radix_tree_maybe_preload(gfp_t gfp_mask); +void radix_tree_preload_end(void); void radix_tree_init(void); void *radix_tree_tag_set(struct radix_tree_root *, unsigned long index, unsigned int tag); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:247 @ unsigned int radix_tree_gang_lookup_tag_ unsigned int max_items, unsigned int tag); int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag); -static inline void radix_tree_preload_end(void) -{ - preempt_enable(); -} - void __rcu **idr_get_free(struct radix_tree_root *root, struct radix_tree_iter *iter, gfp_t gfp, unsigned long max); Index: linux-5.4.5-rt3/include/linux/random.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/random.h +++ linux-5.4.5-rt3/include/linux/random.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:36 @ static inline void add_latent_entropy(vo extern void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; -extern void add_interrupt_randomness(int irq, int irq_flags) __latent_entropy; +extern void add_interrupt_randomness(int irq, int irq_flags, __u64 ip) __latent_entropy; extern void get_random_bytes(void *buf, int nbytes); extern int wait_for_random_bytes(void); Index: linux-5.4.5-rt3/include/linux/ratelimit.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/ratelimit.h +++ linux-5.4.5-rt3/include/linux/ratelimit.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:62 @ static inline void ratelimit_state_exit( return; if (rs->missed) { - pr_warn("%s: %d output lines suppressed due to ratelimiting\n", + pr_info("%s: %d output lines suppressed due to ratelimiting\n", current->comm, rs->missed); rs->missed = 0; } Index: linux-5.4.5-rt3/include/linux/rbtree.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/rbtree.h +++ linux-5.4.5-rt3/include/linux/rbtree.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:22 @ #include <linux/kernel.h> #include <linux/stddef.h> -#include <linux/rcupdate.h> +#include <linux/rcu_assign_pointer.h> struct rb_node { unsigned long __rb_parent_color; Index: linux-5.4.5-rt3/include/linux/rcu_assign_pointer.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/rcu_assign_pointer.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +/* SPDX-License-Identifier: GPL-2.0+ */ +#ifndef __LINUX_RCU_ASSIGN_POINTER_H__ +#define __LINUX_RCU_ASSIGN_POINTER_H__ +#include <linux/compiler.h> +#include <asm/barrier.h> + +#ifdef __CHECKER__ +#define rcu_check_sparse(p, space) \ + ((void)(((typeof(*p) space *)p) == p)) +#else /* #ifdef __CHECKER__ */ +#define rcu_check_sparse(p, space) +#endif /* #else #ifdef __CHECKER__ */ + +/** + * RCU_INITIALIZER() - statically initialize an RCU-protected global variable + * @v: The value to statically initialize with. + */ +#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v) + +/** + * rcu_assign_pointer() - assign to RCU-protected pointer + * @p: pointer to assign to + * @v: value to assign (publish) + * + * Assigns the specified value to the specified RCU-protected + * pointer, ensuring that any concurrent RCU readers will see + * any prior initialization. + * + * Inserts memory barriers on architectures that require them + * (which is most of them), and also prevents the compiler from + * reordering the code that initializes the structure after the pointer + * assignment. More importantly, this call documents which pointers + * will be dereferenced by RCU read-side code. + * + * In some special cases, you may use RCU_INIT_POINTER() instead + * of rcu_assign_pointer(). RCU_INIT_POINTER() is a bit faster due + * to the fact that it does not constrain either the CPU or the compiler. + * That said, using RCU_INIT_POINTER() when you should have used + * rcu_assign_pointer() is a very bad thing that results in + * impossible-to-diagnose memory corruption. So please be careful. + * See the RCU_INIT_POINTER() comment header for details. + * + * Note that rcu_assign_pointer() evaluates each of its arguments only + * once, appearances notwithstanding. One of the "extra" evaluations + * is in typeof() and the other visible only to sparse (__CHECKER__), + * neither of which actually execute the argument. As with most cpp + * macros, this execute-arguments-only-once property is important, so + * please be careful when making changes to rcu_assign_pointer() and the + * other macros that it invokes. + */ +#define rcu_assign_pointer(p, v) \ +do { \ + uintptr_t _r_a_p__v = (uintptr_t)(v); \ + rcu_check_sparse(p, __rcu); \ + \ + if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \ + WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \ + else \ + smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \ +} while (0) + +#endif Index: linux-5.4.5-rt3/include/linux/rcupdate.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/rcupdate.h +++ linux-5.4.5-rt3/include/linux/rcupdate.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:32 @ #include <linux/lockdep.h> #include <asm/processor.h> #include <linux/cpumask.h> +#include <linux/rcu_assign_pointer.h> #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b)) #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:55 @ void __rcu_read_unlock(void); * types of kernel builds, the rcu_read_lock() nesting depth is unknowable. */ #define rcu_preempt_depth() (current->rcu_read_lock_nesting) +#ifndef CONFIG_PREEMPT_RT +#define sched_rcu_preempt_depth() rcu_preempt_depth() +#else +static inline int sched_rcu_preempt_depth(void) { return 0; } +#endif #else /* #ifdef CONFIG_PREEMPT_RCU */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:78 @ static inline int rcu_preempt_depth(void return 0; } +#define sched_rcu_preempt_depth() rcu_preempt_depth() + #endif /* #else #ifdef CONFIG_PREEMPT_RCU */ /* Internal to kernel */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:165 @ static inline void exit_tasks_rcu_finish * * This macro resembles cond_resched(), except that it is defined to * report potential quiescent states to RCU-tasks even if the cond_resched() - * machinery were to be shut off, as some advocate for PREEMPT kernels. + * machinery were to be shut off, as some advocate for PREEMPTION kernels. */ #define cond_resched_tasks_rcu_qs() \ do { \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:290 @ static inline void rcu_preempt_sleep_che #define rcu_sleep_check() \ do { \ rcu_preempt_sleep_check(); \ - RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \ + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ + RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map), \ "Illegal context switch in RCU-bh read-side critical section"); \ RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map), \ "Illegal context switch in RCU-sched read-side critical section"); \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:312 @ static inline void rcu_preempt_sleep_che * (e.g., __srcu), should this make sense in the future. */ -#ifdef __CHECKER__ -#define rcu_check_sparse(p, space) \ - ((void)(((typeof(*p) space *)p) == p)) -#else /* #ifdef __CHECKER__ */ -#define rcu_check_sparse(p, space) -#endif /* #else #ifdef __CHECKER__ */ - #define __rcu_access_pointer(p, space) \ ({ \ typeof(*p) *_________p1 = (typeof(*p) *__force)READ_ONCE(p); \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:340 @ static inline void rcu_preempt_sleep_che }) /** - * RCU_INITIALIZER() - statically initialize an RCU-protected global variable - * @v: The value to statically initialize with. - */ -#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v) - -/** - * rcu_assign_pointer() - assign to RCU-protected pointer - * @p: pointer to assign to - * @v: value to assign (publish) - * - * Assigns the specified value to the specified RCU-protected - * pointer, ensuring that any concurrent RCU readers will see - * any prior initialization. - * - * Inserts memory barriers on architectures that require them - * (which is most of them), and also prevents the compiler from - * reordering the code that initializes the structure after the pointer - * assignment. More importantly, this call documents which pointers - * will be dereferenced by RCU read-side code. - * - * In some special cases, you may use RCU_INIT_POINTER() instead - * of rcu_assign_pointer(). RCU_INIT_POINTER() is a bit faster due - * to the fact that it does not constrain either the CPU or the compiler. - * That said, using RCU_INIT_POINTER() when you should have used - * rcu_assign_pointer() is a very bad thing that results in - * impossible-to-diagnose memory corruption. So please be careful. - * See the RCU_INIT_POINTER() comment header for details. - * - * Note that rcu_assign_pointer() evaluates each of its arguments only - * once, appearances notwithstanding. One of the "extra" evaluations - * is in typeof() and the other visible only to sparse (__CHECKER__), - * neither of which actually execute the argument. As with most cpp - * macros, this execute-arguments-only-once property is important, so - * please be careful when making changes to rcu_assign_pointer() and the - * other macros that it invokes. - */ -#define rcu_assign_pointer(p, v) \ -do { \ - uintptr_t _r_a_p__v = (uintptr_t)(v); \ - rcu_check_sparse(p, __rcu); \ - \ - if (__builtin_constant_p(v) && (_r_a_p__v) == (uintptr_t)NULL) \ - WRITE_ONCE((p), (typeof(p))(_r_a_p__v)); \ - else \ - smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \ -} while (0) - -/** * rcu_swap_protected() - swap an RCU and a regular pointer * @rcu_ptr: RCU pointer * @ptr: regular pointer @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:537 @ do { \ * * You can avoid reading and understanding the next paragraph by * following this rule: don't put anything in an rcu_read_lock() RCU - * read-side critical section that would block in a !PREEMPT kernel. + * read-side critical section that would block in a !PREEMPTION kernel. * But if you want the full story, read on! * * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), Index: linux-5.4.5-rt3/include/linux/rtmutex.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/rtmutex.h +++ linux-5.4.5-rt3/include/linux/rtmutex.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:17 @ #define __LINUX_RT_MUTEX_H #include <linux/linkage.h> +#include <linux/spinlock_types_raw.h> #include <linux/rbtree.h> -#include <linux/spinlock_types.h> extern int max_lock_depth; /* for sysctl */ +#ifdef CONFIG_DEBUG_MUTEXES +#include <linux/debug_locks.h> +#endif + /** * The rt_mutex structure * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:38 @ struct rt_mutex { raw_spinlock_t wait_lock; struct rb_root_cached waiters; struct task_struct *owner; -#ifdef CONFIG_DEBUG_RT_MUTEXES int save_state; +#ifdef CONFIG_DEBUG_RT_MUTEXES const char *name, *file; int line; void *magic; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:89 @ do { \ #define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname) #endif -#define __RT_MUTEX_INITIALIZER(mutexname) \ - { .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \ +#define __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \ + .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \ , .waiters = RB_ROOT_CACHED \ , .owner = NULL \ __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \ - __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)} + __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname) + +#define __RT_MUTEX_INITIALIZER(mutexname) \ + { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) } #define DEFINE_RT_MUTEX(mutexname) \ struct rt_mutex mutexname = __RT_MUTEX_INITIALIZER(mutexname) +#define __RT_MUTEX_INITIALIZER_SAVE_STATE(mutexname) \ + { __RT_MUTEX_INITIALIZER_PLAIN(mutexname) \ + , .save_state = 1 } + /** * rt_mutex_is_locked - is the mutex locked * @lock: the mutex to be queried @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:129 @ extern void rt_mutex_lock(struct rt_mute #endif extern int rt_mutex_lock_interruptible(struct rt_mutex *lock); +extern int rt_mutex_lock_killable(struct rt_mutex *lock); extern int rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout); Index: linux-5.4.5-rt3/include/linux/rwlock_rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/rwlock_rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_RWLOCK_RT_H +#define __LINUX_RWLOCK_RT_H + +#ifndef __LINUX_SPINLOCK_H +#error Do not include directly. Use spinlock.h +#endif + +extern void __lockfunc rt_write_lock(rwlock_t *rwlock); +extern void __lockfunc rt_read_lock(rwlock_t *rwlock); +extern int __lockfunc rt_write_trylock(rwlock_t *rwlock); +extern int __lockfunc rt_read_trylock(rwlock_t *rwlock); +extern void __lockfunc rt_write_unlock(rwlock_t *rwlock); +extern void __lockfunc rt_read_unlock(rwlock_t *rwlock); +extern int __lockfunc rt_read_can_lock(rwlock_t *rwlock); +extern int __lockfunc rt_write_can_lock(rwlock_t *rwlock); +extern void __rt_rwlock_init(rwlock_t *rwlock, char *name, struct lock_class_key *key); + +#define read_can_lock(rwlock) rt_read_can_lock(rwlock) +#define write_can_lock(rwlock) rt_write_can_lock(rwlock) + +#define read_trylock(lock) __cond_lock(lock, rt_read_trylock(lock)) +#define write_trylock(lock) __cond_lock(lock, rt_write_trylock(lock)) + +static inline int __write_trylock_rt_irqsave(rwlock_t *lock, unsigned long *flags) +{ + /* XXX ARCH_IRQ_ENABLED */ + *flags = 0; + return rt_write_trylock(lock); +} + +#define write_trylock_irqsave(lock, flags) \ + __cond_lock(lock, __write_trylock_rt_irqsave(lock, &(flags))) + +#define read_lock_irqsave(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + rt_read_lock(lock); \ + flags = 0; \ + } while (0) + +#define write_lock_irqsave(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + rt_write_lock(lock); \ + flags = 0; \ + } while (0) + +#define read_lock(lock) rt_read_lock(lock) + +#define read_lock_bh(lock) \ + do { \ + local_bh_disable(); \ + rt_read_lock(lock); \ + } while (0) + +#define read_lock_irq(lock) read_lock(lock) + +#define write_lock(lock) rt_write_lock(lock) + +#define write_lock_bh(lock) \ + do { \ + local_bh_disable(); \ + rt_write_lock(lock); \ + } while (0) + +#define write_lock_irq(lock) write_lock(lock) + +#define read_unlock(lock) rt_read_unlock(lock) + +#define read_unlock_bh(lock) \ + do { \ + rt_read_unlock(lock); \ + local_bh_enable(); \ + } while (0) + +#define read_unlock_irq(lock) read_unlock(lock) + +#define write_unlock(lock) rt_write_unlock(lock) + +#define write_unlock_bh(lock) \ + do { \ + rt_write_unlock(lock); \ + local_bh_enable(); \ + } while (0) + +#define write_unlock_irq(lock) write_unlock(lock) + +#define read_unlock_irqrestore(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + (void) flags; \ + rt_read_unlock(lock); \ + } while (0) + +#define write_unlock_irqrestore(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + (void) flags; \ + rt_write_unlock(lock); \ + } while (0) + +#define rwlock_init(rwl) \ +do { \ + static struct lock_class_key __key; \ + \ + __rt_rwlock_init(rwl, #rwl, &__key); \ +} while (0) + +/* + * Internal functions made global for CPU pinning + */ +void __read_rt_lock(struct rt_rw_lock *lock); +int __read_rt_trylock(struct rt_rw_lock *lock); +void __write_rt_lock(struct rt_rw_lock *lock); +int __write_rt_trylock(struct rt_rw_lock *lock); +void __read_rt_unlock(struct rt_rw_lock *lock); +void __write_rt_unlock(struct rt_rw_lock *lock); + +#endif Index: linux-5.4.5-rt3/include/linux/rwlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/rwlock_types.h +++ linux-5.4.5-rt3/include/linux/rwlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ #ifndef __LINUX_RWLOCK_TYPES_H #define __LINUX_RWLOCK_TYPES_H +#if !defined(__LINUX_SPINLOCK_TYPES_H) +# error "Do not include directly, include spinlock_types.h" +#endif + /* * include/linux/rwlock_types.h - generic rwlock type definitions * and initializers Index: linux-5.4.5-rt3/include/linux/rwlock_types_rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/rwlock_types_rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_RWLOCK_TYPES_RT_H +#define __LINUX_RWLOCK_TYPES_RT_H + +#ifndef __LINUX_SPINLOCK_TYPES_H +#error "Do not include directly. Include spinlock_types.h instead" +#endif + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define RW_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } +#else +# define RW_DEP_MAP_INIT(lockname) +#endif + +typedef struct rt_rw_lock rwlock_t; + +#define __RW_LOCK_UNLOCKED(name) __RWLOCK_RT_INITIALIZER(name) + +#define DEFINE_RWLOCK(name) \ + rwlock_t name = __RW_LOCK_UNLOCKED(name) + +/* + * A reader biased implementation primarily for CPU pinning. + * + * Can be selected as general replacement for the single reader RT rwlock + * variant + */ +struct rt_rw_lock { + struct rt_mutex rtmutex; + atomic_t readers; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +}; + +#define READER_BIAS (1U << 31) +#define WRITER_BIAS (1U << 30) + +#define __RWLOCK_RT_INITIALIZER(name) \ +{ \ + .readers = ATOMIC_INIT(READER_BIAS), \ + .rtmutex = __RT_MUTEX_INITIALIZER_SAVE_STATE(name.rtmutex), \ + RW_DEP_MAP_INIT(name) \ +} + +void __rwlock_biased_rt_init(struct rt_rw_lock *lock, const char *name, + struct lock_class_key *key); + +#define rwlock_biased_rt_init(rwlock) \ + do { \ + static struct lock_class_key __key; \ + \ + __rwlock_biased_rt_init((rwlock), #rwlock, &__key); \ + } while (0) + +#endif Index: linux-5.4.5-rt3/include/linux/rwsem-rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/rwsem-rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef _LINUX_RWSEM_RT_H +#define _LINUX_RWSEM_RT_H + +#ifndef _LINUX_RWSEM_H +#error "Include rwsem.h" +#endif + +#include <linux/rtmutex.h> +#include <linux/swait.h> + +#define READER_BIAS (1U << 31) +#define WRITER_BIAS (1U << 30) + +struct rw_semaphore { + atomic_t readers; + struct rt_mutex rtmutex; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +}; + +#define __RWSEM_INITIALIZER(name) \ +{ \ + .readers = ATOMIC_INIT(READER_BIAS), \ + .rtmutex = __RT_MUTEX_INITIALIZER(name.rtmutex), \ + RW_DEP_MAP_INIT(name) \ +} + +#define DECLARE_RWSEM(lockname) \ + struct rw_semaphore lockname = __RWSEM_INITIALIZER(lockname) + +extern void __rwsem_init(struct rw_semaphore *rwsem, const char *name, + struct lock_class_key *key); + +#define __init_rwsem(sem, name, key) \ +do { \ + rt_mutex_init(&(sem)->rtmutex); \ + __rwsem_init((sem), (name), (key)); \ +} while (0) + +#define init_rwsem(sem) \ +do { \ + static struct lock_class_key __key; \ + \ + __init_rwsem((sem), #sem, &__key); \ +} while (0) + +static inline int rwsem_is_locked(struct rw_semaphore *sem) +{ + return atomic_read(&sem->readers) != READER_BIAS; +} + +static inline int rwsem_is_contended(struct rw_semaphore *sem) +{ + return atomic_read(&sem->readers) > 0; +} + +extern void __down_read(struct rw_semaphore *sem); +extern int __down_read_killable(struct rw_semaphore *sem); +extern int __down_read_trylock(struct rw_semaphore *sem); +extern void __down_write(struct rw_semaphore *sem); +extern int __must_check __down_write_killable(struct rw_semaphore *sem); +extern int __down_write_trylock(struct rw_semaphore *sem); +extern void __up_read(struct rw_semaphore *sem); +extern void __up_write(struct rw_semaphore *sem); +extern void __downgrade_write(struct rw_semaphore *sem); + +#endif Index: linux-5.4.5-rt3/include/linux/rwsem.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/rwsem.h +++ linux-5.4.5-rt3/include/linux/rwsem.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:19 @ #include <linux/spinlock.h> #include <linux/atomic.h> #include <linux/err.h> + +#ifdef CONFIG_PREEMPT_RT +#include <linux/rwsem-rt.h> +#else /* PREEMPT_RT */ + #ifdef CONFIG_RWSEM_SPIN_ON_OWNER #include <linux/osq_lock.h> #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ struct rw_semaphore { #endif }; -/* - * Setting all bits of the owner field except bit 0 will indicate - * that the rwsem is writer-owned with an unknown owner. - */ -#define RWSEM_OWNER_UNKNOWN (-2L) - /* In all implementations count != 0 means locked */ static inline int rwsem_is_locked(struct rw_semaphore *sem) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:123 @ static inline int rwsem_is_contended(str return !list_empty(&sem->wait_list); } +#endif /* !PREEMPT_RT */ + +/* + * The functions below are the same for all rwsem implementations including + * the RT specific variant. + */ + /* * lock for reading */ Index: linux-5.4.5-rt3/include/linux/sched.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/sched.h +++ linux-5.4.5-rt3/include/linux/sched.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:34 @ #include <linux/task_io_accounting.h> #include <linux/posix-timers.h> #include <linux/rseq.h> +#include <asm/kmap_types.h> /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:111 @ struct task_group; __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \ TASK_PARKED) -#define task_is_traced(task) ((task->state & __TASK_TRACED) != 0) - #define task_is_stopped(task) ((task->state & __TASK_STOPPED) != 0) -#define task_is_stopped_or_traced(task) ((task->state & (__TASK_STOPPED | __TASK_TRACED)) != 0) - #define task_contributes_to_load(task) ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ (task->flags & PF_FROZEN) == 0 && \ (task->state & TASK_NOLOAD) == 0) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:140 @ struct task_group; smp_store_mb(current->state, (state_value)); \ } while (0) +#define __set_current_state_no_track(state_value) \ + current->state = (state_value); + #define set_special_state(state_value) \ do { \ unsigned long flags; /* may shadow */ \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:152 @ struct task_group; current->state = (state_value); \ raw_spin_unlock_irqrestore(¤t->pi_lock, flags); \ } while (0) + #else /* * set_current_state() includes a barrier so that the write of current->state @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:197 @ struct task_group; #define set_current_state(state_value) \ smp_store_mb(current->state, (state_value)) +#define __set_current_state_no_track(state_value) \ + __set_current_state(state_value) + /* * set_special_state() should be used for those states when the blocking task * can not use the regular condition based wait-loop. In that case we must @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:237 @ extern void io_schedule_finish(int token extern long io_schedule_timeout(long timeout); extern void io_schedule(void); +int cpu_nr_pinned(int cpu); + /** * struct prev_cputime - snapshot of system and user cputime * @utime: time spent in user mode @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:640 @ struct task_struct { #endif /* -1 unrunnable, 0 runnable, >0 stopped: */ volatile long state; + /* saved state for "spinlock sleepers" */ + volatile long saved_state; /* * This begins the randomizable portion of task_struct. Only @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:711 @ struct task_struct { int nr_cpus_allowed; const cpumask_t *cpus_ptr; cpumask_t cpus_mask; +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) + int migrate_disable; + bool migrate_disable_scheduled; +# ifdef CONFIG_SCHED_DEBUG + int pinned_on_cpu; +# endif +#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +# ifdef CONFIG_SCHED_DEBUG + int migrate_disable; +# endif +#endif +#ifdef CONFIG_PREEMPT_RT + int sleeping_lock; +#endif #ifdef CONFIG_PREEMPT_RCU int rcu_read_lock_nesting; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:938 @ struct task_struct { /* Signal handlers: */ struct signal_struct *signal; struct sighand_struct *sighand; + struct sigqueue *sigqueue_cache; + sigset_t blocked; sigset_t real_blocked; /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; +#ifdef CONFIG_PREEMPT_RT + /* TODO: move me into ->restart_block ? */ + struct kernel_siginfo forced_info; +#endif unsigned long sas_ss_sp; size_t sas_ss_size; unsigned int sas_ss_flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:975 @ struct task_struct { raw_spinlock_t pi_lock; struct wake_q_node wake_q; + struct wake_q_node wake_q_sleeper; #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1010 @ struct task_struct { int softirqs_enabled; int softirq_context; #endif +#ifdef CONFIG_PREEMPT_RT + int softirq_count; +#endif #ifdef CONFIG_LOCKDEP # define MAX_LOCK_DEPTH 48UL @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1276 @ struct task_struct { unsigned int sequential_io; unsigned int sequential_io_avg; #endif +#ifdef CONFIG_PREEMPT_RT +# if defined CONFIG_HIGHMEM || defined CONFIG_X86_32 + int kmap_idx; + pte_t kmap_pte[KM_TYPE_NR]; +# endif +#endif #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1713 @ extern struct task_struct *find_get_task extern int wake_up_state(struct task_struct *tsk, unsigned int state); extern int wake_up_process(struct task_struct *tsk); +extern int wake_up_lock_sleeper(struct task_struct *tsk); extern void wake_up_new_task(struct task_struct *tsk); #ifdef CONFIG_SMP @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1796 @ static inline int test_tsk_need_resched( return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); } +#ifdef CONFIG_PREEMPT_LAZY +static inline void set_tsk_need_resched_lazy(struct task_struct *tsk) +{ + set_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY); +} + +static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk) +{ + clear_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY); +} + +static inline int test_tsk_need_resched_lazy(struct task_struct *tsk) +{ + return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED_LAZY)); +} + +static inline int need_resched_lazy(void) +{ + return test_thread_flag(TIF_NEED_RESCHED_LAZY); +} + +static inline int need_resched_now(void) +{ + return test_thread_flag(TIF_NEED_RESCHED); +} + +#else +static inline void clear_tsk_need_resched_lazy(struct task_struct *tsk) { } +static inline int need_resched_lazy(void) { return 0; } + +static inline int need_resched_now(void) +{ + return test_thread_flag(TIF_NEED_RESCHED); +} + +#endif + + +static inline bool __task_is_stopped_or_traced(struct task_struct *task) +{ + if (task->state & (__TASK_STOPPED | __TASK_TRACED)) + return true; +#ifdef CONFIG_PREEMPT_RT + if (task->saved_state & (__TASK_STOPPED | __TASK_TRACED)) + return true; +#endif + return false; +} + +static inline bool task_is_stopped_or_traced(struct task_struct *task) +{ + bool traced_stopped; + +#ifdef CONFIG_PREEMPT_RT + unsigned long flags; + + raw_spin_lock_irqsave(&task->pi_lock, flags); + traced_stopped = __task_is_stopped_or_traced(task); + raw_spin_unlock_irqrestore(&task->pi_lock, flags); +#else + traced_stopped = __task_is_stopped_or_traced(task); +#endif + return traced_stopped; +} + +static inline bool task_is_traced(struct task_struct *task) +{ + bool traced = false; + + if (task->state & __TASK_TRACED) + return true; +#ifdef CONFIG_PREEMPT_RT + /* in case the task is sleeping on tasklist_lock */ + raw_spin_lock_irq(&task->pi_lock); + if (task->state & __TASK_TRACED) + traced = true; + else if (task->saved_state & __TASK_TRACED) + traced = true; + raw_spin_unlock_irq(&task->pi_lock); +#endif + return traced; +} + /* * cond_resched() and cond_resched_lock(): latency reduction via * explicit rescheduling in places that are safe. The return @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1931 @ static __always_inline bool need_resched return unlikely(tif_need_resched()); } +#ifdef CONFIG_PREEMPT_RT +static inline void sleeping_lock_inc(void) +{ + current->sleeping_lock++; +} + +static inline void sleeping_lock_dec(void) +{ + current->sleeping_lock--; +} + +#else + +static inline void sleeping_lock_inc(void) { } +static inline void sleeping_lock_dec(void) { } +#endif + /* * Wrappers for p->thread_info->cpu access. No-op on UP. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2139 @ int sched_trace_rq_cpu(struct rq *rq); const struct cpumask *sched_trace_rd_span(struct root_domain *rd); +extern struct task_struct *takedown_cpu_task; + #endif Index: linux-5.4.5-rt3/include/linux/sched/mm.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/sched/mm.h +++ linux-5.4.5-rt3/include/linux/sched/mm.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:52 @ static inline void mmdrop(struct mm_stru __mmdrop(mm); } +#ifdef CONFIG_PREEMPT_RT +extern void __mmdrop_delayed(struct rcu_head *rhp); +static inline void mmdrop_delayed(struct mm_struct *mm) +{ + if (atomic_dec_and_test(&mm->mm_count)) + call_rcu(&mm->delayed_drop, __mmdrop_delayed); +} +#else +# define mmdrop_delayed(mm) mmdrop(mm) +#endif + /* * This has to be called after a get_task_mm()/mmget_not_zero() * followed by taking the mmap_sem for writing before modifying the Index: linux-5.4.5-rt3/include/linux/sched/wake_q.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/sched/wake_q.h +++ linux-5.4.5-rt3/include/linux/sched/wake_q.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ static inline bool wake_q_empty(struct w extern void wake_q_add(struct wake_q_head *head, struct task_struct *task); extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task); -extern void wake_up_q(struct wake_q_head *head); +extern void wake_q_add_sleeper(struct wake_q_head *head, struct task_struct *task); +extern void __wake_up_q(struct wake_q_head *head, bool sleeper); + +static inline void wake_up_q(struct wake_q_head *head) +{ + __wake_up_q(head, false); +} + +static inline void wake_up_q_sleeper(struct wake_q_head *head) +{ + __wake_up_q(head, true); +} #endif /* _LINUX_SCHED_WAKE_Q_H */ Index: linux-5.4.5-rt3/include/linux/seqlock.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/seqlock.h +++ linux-5.4.5-rt3/include/linux/seqlock.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:224 @ static inline int read_seqcount_retry(co return __read_seqcount_retry(s, start); } - - -static inline void raw_write_seqcount_begin(seqcount_t *s) +static inline void __raw_write_seqcount_begin(seqcount_t *s) { s->sequence++; smp_wmb(); } -static inline void raw_write_seqcount_end(seqcount_t *s) +static inline void raw_write_seqcount_begin(seqcount_t *s) +{ + preempt_disable_rt(); + __raw_write_seqcount_begin(s); +} + +static inline void __raw_write_seqcount_end(seqcount_t *s) { smp_wmb(); s->sequence++; } +static inline void raw_write_seqcount_end(seqcount_t *s) +{ + __raw_write_seqcount_end(s); + preempt_enable_rt(); +} + /** * raw_write_seqcount_barrier - do a seq write barrier * @s: pointer to seqcount_t @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:441 @ typedef struct { /* * Read side functions for starting and finalizing a read side section. */ +#ifndef CONFIG_PREEMPT_RT static inline unsigned read_seqbegin(const seqlock_t *sl) { return read_seqcount_begin(&sl->seqcount); } +#else +/* + * Starvation safe read side for RT + */ +static inline unsigned read_seqbegin(seqlock_t *sl) +{ + unsigned ret; + +repeat: + ret = READ_ONCE(sl->seqcount.sequence); + if (unlikely(ret & 1)) { + /* + * Take the lock and let the writer proceed (i.e. evtl + * boost it), otherwise we could loop here forever. + */ + spin_unlock_wait(&sl->lock); + goto repeat; + } + smp_rmb(); + return ret; +} +#endif static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:482 @ static inline unsigned read_seqretry(con static inline void write_seqlock(seqlock_t *sl) { spin_lock(&sl->lock); - write_seqcount_begin(&sl->seqcount); + __raw_write_seqcount_begin(&sl->seqcount); +} + +static inline int try_write_seqlock(seqlock_t *sl) +{ + if (spin_trylock(&sl->lock)) { + __raw_write_seqcount_begin(&sl->seqcount); + return 1; + } + return 0; } static inline void write_sequnlock(seqlock_t *sl) { - write_seqcount_end(&sl->seqcount); + __raw_write_seqcount_end(&sl->seqcount); spin_unlock(&sl->lock); } static inline void write_seqlock_bh(seqlock_t *sl) { spin_lock_bh(&sl->lock); - write_seqcount_begin(&sl->seqcount); + __raw_write_seqcount_begin(&sl->seqcount); } static inline void write_sequnlock_bh(seqlock_t *sl) { - write_seqcount_end(&sl->seqcount); + __raw_write_seqcount_end(&sl->seqcount); spin_unlock_bh(&sl->lock); } static inline void write_seqlock_irq(seqlock_t *sl) { spin_lock_irq(&sl->lock); - write_seqcount_begin(&sl->seqcount); + __raw_write_seqcount_begin(&sl->seqcount); } static inline void write_sequnlock_irq(seqlock_t *sl) { - write_seqcount_end(&sl->seqcount); + __raw_write_seqcount_end(&sl->seqcount); spin_unlock_irq(&sl->lock); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:529 @ static inline unsigned long __write_seql unsigned long flags; spin_lock_irqsave(&sl->lock, flags); - write_seqcount_begin(&sl->seqcount); + __raw_write_seqcount_begin(&sl->seqcount); return flags; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:539 @ static inline unsigned long __write_seql static inline void write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags) { - write_seqcount_end(&sl->seqcount); + __raw_write_seqcount_end(&sl->seqcount); spin_unlock_irqrestore(&sl->lock, flags); } Index: linux-5.4.5-rt3/include/linux/serial_8250.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/serial_8250.h +++ linux-5.4.5-rt3/include/linux/serial_8250.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:10 @ #ifndef _LINUX_SERIAL_8250_H #define _LINUX_SERIAL_8250_H +#include <linux/atomic.h> #include <linux/serial_core.h> #include <linux/serial_reg.h> #include <linux/platform_device.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:127 @ struct uart_8250_port { #define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA unsigned char msr_saved_flags; + atomic_t console_printing; + struct uart_8250_dma *dma; const struct uart_8250_ops *ops; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:180 @ void serial8250_init_port(struct uart_82 void serial8250_set_defaults(struct uart_8250_port *up); void serial8250_console_write(struct uart_8250_port *up, const char *s, unsigned int count); +void serial8250_console_write_atomic(struct uart_8250_port *up, const char *s, + unsigned int count); int serial8250_console_setup(struct uart_port *port, char *options, bool probe); extern void serial8250_set_isa_configurator(void (*v) Index: linux-5.4.5-rt3/include/linux/signal.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/signal.h +++ linux-5.4.5-rt3/include/linux/signal.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:258 @ static inline void init_sigpending(struc } extern void flush_sigqueue(struct sigpending *queue); +extern void flush_task_sigqueue(struct task_struct *tsk); /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */ static inline int valid_signal(unsigned long sig) Index: linux-5.4.5-rt3/include/linux/skbuff.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/skbuff.h +++ linux-5.4.5-rt3/include/linux/skbuff.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:296 @ struct sk_buff_head { __u32 qlen; spinlock_t lock; + raw_spinlock_t raw_lock; }; struct sk_buff; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1848 @ static inline void skb_queue_head_init(s __skb_queue_head_init(list); } +static inline void skb_queue_head_init_raw(struct sk_buff_head *list) +{ + raw_spin_lock_init(&list->raw_lock); + __skb_queue_head_init(list); +} + static inline void skb_queue_head_init_class(struct sk_buff_head *list, struct lock_class_key *class) { Index: linux-5.4.5-rt3/include/linux/smp.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/smp.h +++ linux-5.4.5-rt3/include/linux/smp.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:18 @ #include <linux/llist.h> typedef void (*smp_call_func_t)(void *info); +typedef bool (*smp_cond_func_t)(int cpu, void *info); struct __call_single_data { struct llist_node llist; smp_call_func_t func; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:53 @ void on_each_cpu_mask(const struct cpuma * cond_func returns a positive value. This may include the local * processor. */ -void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags); - -void on_each_cpu_cond_mask(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags, const struct cpumask *mask); +void on_each_cpu_cond(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait); + +void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait, const struct cpumask *mask); int smp_call_function_single_async(int cpu, call_single_data_t *csd); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:224 @ static inline int get_boot_cpu_id(void) #define get_cpu() ({ preempt_disable(); __smp_processor_id(); }) #define put_cpu() preempt_enable() +#define get_cpu_light() ({ migrate_disable(); __smp_processor_id(); }) +#define put_cpu_light() migrate_enable() + /* * Callback to arch code if there's nosmp or maxcpus=0 on the * boot command line: Index: linux-5.4.5-rt3/include/linux/spinlock.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/spinlock.h +++ linux-5.4.5-rt3/include/linux/spinlock.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:310 @ static inline void do_raw_spin_unlock(ra }) /* Include rwlock functions */ -#include <linux/rwlock.h> +#ifdef CONFIG_PREEMPT_RT +# include <linux/rwlock_rt.h> +#else +# include <linux/rwlock.h> +#endif /* * Pull the _spin_*()/_read_*()/_write_*() functions/declarations: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:325 @ static inline void do_raw_spin_unlock(ra # include <linux/spinlock_api_up.h> #endif +#ifdef CONFIG_PREEMPT_RT +# include <linux/spinlock_rt.h> +#else /* PREEMPT_RT */ + /* * Map the spin_lock functions to the raw variants for PREEMPT_RT=n */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:449 @ static __always_inline int spin_is_conte #define assert_spin_locked(lock) assert_raw_spin_locked(&(lock)->rlock) +#endif /* !PREEMPT_RT */ + /* * Pull the atomic_t declaration: * (asm-mips/atomic.h needs above definitions) Index: linux-5.4.5-rt3/include/linux/spinlock_api_smp.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/spinlock_api_smp.h +++ linux-5.4.5-rt3/include/linux/spinlock_api_smp.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:190 @ static inline int __raw_spin_trylock_bh( return 0; } -#include <linux/rwlock_api_smp.h> +#ifndef CONFIG_PREEMPT_RT +# include <linux/rwlock_api_smp.h> +#endif #endif /* __LINUX_SPINLOCK_API_SMP_H */ Index: linux-5.4.5-rt3/include/linux/spinlock_rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/spinlock_rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_SPINLOCK_RT_H +#define __LINUX_SPINLOCK_RT_H + +#ifndef __LINUX_SPINLOCK_H +#error Do not include directly. Use spinlock.h +#endif + +#include <linux/bug.h> + +extern void +__rt_spin_lock_init(spinlock_t *lock, const char *name, struct lock_class_key *key); + +#define spin_lock_init(slock) \ +do { \ + static struct lock_class_key __key; \ + \ + rt_mutex_init(&(slock)->lock); \ + __rt_spin_lock_init(slock, #slock, &__key); \ +} while (0) + +extern void __lockfunc rt_spin_lock(spinlock_t *lock); +extern unsigned long __lockfunc rt_spin_lock_trace_flags(spinlock_t *lock); +extern void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass); +extern void __lockfunc rt_spin_unlock(spinlock_t *lock); +extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock); +extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags); +extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock); +extern int __lockfunc rt_spin_trylock(spinlock_t *lock); +extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock); + +/* + * lockdep-less calls, for derived types like rwlock: + * (for trylock they can use rt_mutex_trylock() directly. + * Migrate disable handling must be done at the call site. + */ +extern void __lockfunc __rt_spin_lock(struct rt_mutex *lock); +extern void __lockfunc __rt_spin_trylock(struct rt_mutex *lock); +extern void __lockfunc __rt_spin_unlock(struct rt_mutex *lock); + +#define spin_lock(lock) rt_spin_lock(lock) + +#define spin_lock_bh(lock) \ + do { \ + local_bh_disable(); \ + rt_spin_lock(lock); \ + } while (0) + +#define spin_lock_irq(lock) spin_lock(lock) + +#define spin_do_trylock(lock) __cond_lock(lock, rt_spin_trylock(lock)) + +#define spin_trylock(lock) \ +({ \ + int __locked; \ + __locked = spin_do_trylock(lock); \ + __locked; \ +}) + +#ifdef CONFIG_LOCKDEP +# define spin_lock_nested(lock, subclass) \ + do { \ + rt_spin_lock_nested(lock, subclass); \ + } while (0) + +#define spin_lock_bh_nested(lock, subclass) \ + do { \ + local_bh_disable(); \ + rt_spin_lock_nested(lock, subclass); \ + } while (0) + +# define spin_lock_irqsave_nested(lock, flags, subclass) \ + do { \ + typecheck(unsigned long, flags); \ + flags = 0; \ + rt_spin_lock_nested(lock, subclass); \ + } while (0) +#else +# define spin_lock_nested(lock, subclass) spin_lock(lock) +# define spin_lock_bh_nested(lock, subclass) spin_lock_bh(lock) + +# define spin_lock_irqsave_nested(lock, flags, subclass) \ + do { \ + typecheck(unsigned long, flags); \ + flags = 0; \ + spin_lock(lock); \ + } while (0) +#endif + +#define spin_lock_irqsave(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + flags = 0; \ + spin_lock(lock); \ + } while (0) + +static inline unsigned long spin_lock_trace_flags(spinlock_t *lock) +{ + unsigned long flags = 0; +#ifdef CONFIG_TRACE_IRQFLAGS + flags = rt_spin_lock_trace_flags(lock); +#else + spin_lock(lock); /* lock_local */ +#endif + return flags; +} + +/* FIXME: we need rt_spin_lock_nest_lock */ +#define spin_lock_nest_lock(lock, nest_lock) spin_lock_nested(lock, 0) + +#define spin_unlock(lock) rt_spin_unlock(lock) + +#define spin_unlock_bh(lock) \ + do { \ + rt_spin_unlock(lock); \ + local_bh_enable(); \ + } while (0) + +#define spin_unlock_irq(lock) spin_unlock(lock) + +#define spin_unlock_irqrestore(lock, flags) \ + do { \ + typecheck(unsigned long, flags); \ + (void) flags; \ + spin_unlock(lock); \ + } while (0) + +#define spin_trylock_bh(lock) __cond_lock(lock, rt_spin_trylock_bh(lock)) +#define spin_trylock_irq(lock) spin_trylock(lock) + +#define spin_trylock_irqsave(lock, flags) \ + rt_spin_trylock_irqsave(lock, &(flags)) + +#define spin_unlock_wait(lock) rt_spin_unlock_wait(lock) + +#ifdef CONFIG_GENERIC_LOCKBREAK +# define spin_is_contended(lock) ((lock)->break_lock) +#else +# define spin_is_contended(lock) (((void)(lock), 0)) +#endif + +static inline int spin_can_lock(spinlock_t *lock) +{ + return !rt_mutex_is_locked(&lock->lock); +} + +static inline int spin_is_locked(spinlock_t *lock) +{ + return rt_mutex_is_locked(&lock->lock); +} + +static inline void assert_spin_locked(spinlock_t *lock) +{ + BUG_ON(!spin_is_locked(lock)); +} + +#endif Index: linux-5.4.5-rt3/include/linux/spinlock_types.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/spinlock_types.h +++ linux-5.4.5-rt3/include/linux/spinlock_types.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:12 @ * Released under the General Public License (GPL). */ -#if defined(CONFIG_SMP) -# include <asm/spinlock_types.h> -#else -# include <linux/spinlock_types_up.h> -#endif - -#include <linux/lockdep.h> - -typedef struct raw_spinlock { - arch_spinlock_t raw_lock; -#ifdef CONFIG_DEBUG_SPINLOCK - unsigned int magic, owner_cpu; - void *owner; -#endif -#ifdef CONFIG_DEBUG_LOCK_ALLOC - struct lockdep_map dep_map; -#endif -} raw_spinlock_t; - -#define SPINLOCK_MAGIC 0xdead4ead - -#define SPINLOCK_OWNER_INIT ((void *)-1L) - -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } -#else -# define SPIN_DEP_MAP_INIT(lockname) -#endif +#include <linux/spinlock_types_raw.h> -#ifdef CONFIG_DEBUG_SPINLOCK -# define SPIN_DEBUG_INIT(lockname) \ - .magic = SPINLOCK_MAGIC, \ - .owner_cpu = -1, \ - .owner = SPINLOCK_OWNER_INIT, +#ifndef CONFIG_PREEMPT_RT +# include <linux/spinlock_types_nort.h> +# include <linux/rwlock_types.h> #else -# define SPIN_DEBUG_INIT(lockname) +# include <linux/rtmutex.h> +# include <linux/spinlock_types_rt.h> +# include <linux/rwlock_types_rt.h> #endif -#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \ - { \ - .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ - SPIN_DEBUG_INIT(lockname) \ - SPIN_DEP_MAP_INIT(lockname) } - -#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ - (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) - -#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x) - -typedef struct spinlock { - union { - struct raw_spinlock rlock; - -#ifdef CONFIG_DEBUG_LOCK_ALLOC -# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map)) - struct { - u8 __padding[LOCK_PADSIZE]; - struct lockdep_map dep_map; - }; -#endif - }; -} spinlock_t; - -#define __SPIN_LOCK_INITIALIZER(lockname) \ - { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } } - -#define __SPIN_LOCK_UNLOCKED(lockname) \ - (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname) - -#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x) - -#include <linux/rwlock_types.h> - #endif /* __LINUX_SPINLOCK_TYPES_H */ Index: linux-5.4.5-rt3/include/linux/spinlock_types_nort.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/spinlock_types_nort.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_SPINLOCK_TYPES_NORT_H +#define __LINUX_SPINLOCK_TYPES_NORT_H + +#ifndef __LINUX_SPINLOCK_TYPES_H +#error "Do not include directly. Include spinlock_types.h instead" +#endif + +/* + * The non RT version maps spinlocks to raw_spinlocks + */ +typedef struct spinlock { + union { + struct raw_spinlock rlock; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map)) + struct { + u8 __padding[LOCK_PADSIZE]; + struct lockdep_map dep_map; + }; +#endif + }; +} spinlock_t; + +#define __SPIN_LOCK_INITIALIZER(lockname) \ + { { .rlock = __RAW_SPIN_LOCK_INITIALIZER(lockname) } } + +#define __SPIN_LOCK_UNLOCKED(lockname) \ + (spinlock_t ) __SPIN_LOCK_INITIALIZER(lockname) + +#define DEFINE_SPINLOCK(x) spinlock_t x = __SPIN_LOCK_UNLOCKED(x) + +#endif Index: linux-5.4.5-rt3/include/linux/spinlock_types_raw.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/spinlock_types_raw.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_SPINLOCK_TYPES_RAW_H +#define __LINUX_SPINLOCK_TYPES_RAW_H + +#include <linux/types.h> + +#if defined(CONFIG_SMP) +# include <asm/spinlock_types.h> +#else +# include <linux/spinlock_types_up.h> +#endif + +#include <linux/lockdep.h> + +typedef struct raw_spinlock { + arch_spinlock_t raw_lock; +#ifdef CONFIG_DEBUG_SPINLOCK + unsigned int magic, owner_cpu; + void *owner; +#endif +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +} raw_spinlock_t; + +#define SPINLOCK_MAGIC 0xdead4ead + +#define SPINLOCK_OWNER_INIT ((void *)-1L) + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +# define SPIN_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname } +#else +# define SPIN_DEP_MAP_INIT(lockname) +#endif + +#ifdef CONFIG_DEBUG_SPINLOCK +# define SPIN_DEBUG_INIT(lockname) \ + .magic = SPINLOCK_MAGIC, \ + .owner_cpu = -1, \ + .owner = SPINLOCK_OWNER_INIT, +#else +# define SPIN_DEBUG_INIT(lockname) +#endif + +#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \ + { \ + .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \ + SPIN_DEBUG_INIT(lockname) \ + SPIN_DEP_MAP_INIT(lockname) } + +#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \ + (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname) + +#define DEFINE_RAW_SPINLOCK(x) raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x) + +#endif Index: linux-5.4.5-rt3/include/linux/spinlock_types_rt.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/linux/spinlock_types_rt.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __LINUX_SPINLOCK_TYPES_RT_H +#define __LINUX_SPINLOCK_TYPES_RT_H + +#ifndef __LINUX_SPINLOCK_TYPES_H +#error "Do not include directly. Include spinlock_types.h instead" +#endif + +#include <linux/cache.h> + +/* + * PREEMPT_RT: spinlocks - an RT mutex plus lock-break field: + */ +typedef struct spinlock { + struct rt_mutex lock; + unsigned int break_lock; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif +} spinlock_t; + +#ifdef CONFIG_DEBUG_RT_MUTEXES +# define __RT_SPIN_INITIALIZER(name) \ + { \ + .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ + .save_state = 1, \ + .file = __FILE__, \ + .line = __LINE__ , \ + } +#else +# define __RT_SPIN_INITIALIZER(name) \ + { \ + .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock), \ + .save_state = 1, \ + } +#endif + +/* +.wait_list = PLIST_HEAD_INIT_RAW((name).lock.wait_list, (name).lock.wait_lock) +*/ + +#define __SPIN_LOCK_UNLOCKED(name) \ + { .lock = __RT_SPIN_INITIALIZER(name.lock), \ + SPIN_DEP_MAP_INIT(name) } + +#define DEFINE_SPINLOCK(name) \ + spinlock_t name = __SPIN_LOCK_UNLOCKED(name) + +#endif Index: linux-5.4.5-rt3/include/linux/spinlock_types_up.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/spinlock_types_up.h +++ linux-5.4.5-rt3/include/linux/spinlock_types_up.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3 @ #ifndef __LINUX_SPINLOCK_TYPES_UP_H #define __LINUX_SPINLOCK_TYPES_UP_H -#ifndef __LINUX_SPINLOCK_TYPES_H -# error "please don't include this file directly" -#endif - /* * include/linux/spinlock_types_up.h - spinlock type definitions for UP * Index: linux-5.4.5-rt3/include/linux/stop_machine.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/stop_machine.h +++ linux-5.4.5-rt3/include/linux/stop_machine.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:29 @ struct cpu_stop_work { cpu_stop_fn_t fn; void *arg; struct cpu_stop_done *done; + /* Did not run due to disabled stopper; for nowait debug checks */ + bool disabled; }; int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg); Index: linux-5.4.5-rt3/include/linux/suspend.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/suspend.h +++ linux-5.4.5-rt3/include/linux/suspend.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:200 @ struct platform_s2idle_ops { void (*end)(void); }; +#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION) +extern bool pm_in_action; +#else +# define pm_in_action false +#endif + #ifdef CONFIG_SUSPEND extern suspend_state_t mem_sleep_current; extern suspend_state_t mem_sleep_default; Index: linux-5.4.5-rt3/include/linux/swait.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/swait.h +++ linux-5.4.5-rt3/include/linux/swait.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:163 @ static inline bool swq_has_sleeper(struc extern void swake_up_one(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); extern void swake_up_locked(struct swait_queue_head *q); +extern void swake_up_all_locked(struct swait_queue_head *q); +extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); extern void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *wait, int state); extern long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait, int state); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:302 @ do { \ __ret; \ }) +#define __swait_event_lock_irq(wq, condition, lock, cmd) \ + ___swait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, \ + raw_spin_unlock_irq(&lock); \ + cmd; \ + schedule(); \ + raw_spin_lock_irq(&lock)) + +#define swait_event_lock_irq(wq_head, condition, lock) \ + do { \ + if (condition) \ + break; \ + __swait_event_lock_irq(wq_head, condition, lock, ); \ + } while (0) + #endif /* _LINUX_SWAIT_H */ Index: linux-5.4.5-rt3/include/linux/thread_info.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/thread_info.h +++ linux-5.4.5-rt3/include/linux/thread_info.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:100 @ static inline int test_ti_thread_flag(st #define test_thread_flag(flag) \ test_ti_thread_flag(current_thread_info(), flag) -#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED) +#ifdef CONFIG_PREEMPT_LAZY +#define tif_need_resched() (test_thread_flag(TIF_NEED_RESCHED) || \ + test_thread_flag(TIF_NEED_RESCHED_LAZY)) +#define tif_need_resched_now() (test_thread_flag(TIF_NEED_RESCHED)) +#define tif_need_resched_lazy() test_thread_flag(TIF_NEED_RESCHED_LAZY)) + +#else +#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED) +#define tif_need_resched_now() test_thread_flag(TIF_NEED_RESCHED) +#define tif_need_resched_lazy() 0 +#endif #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES static inline int arch_within_stack_frames(const void * const stack, Index: linux-5.4.5-rt3/include/linux/trace_events.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/trace_events.h +++ linux-5.4.5-rt3/include/linux/trace_events.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:65 @ struct trace_entry { unsigned char flags; unsigned char preempt_count; int pid; + unsigned short migrate_disable; + unsigned short padding; + unsigned char preempt_lazy_count; }; #define TRACE_EVENT_TYPE_MAX \ Index: linux-5.4.5-rt3/include/linux/uaccess.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/uaccess.h +++ linux-5.4.5-rt3/include/linux/uaccess.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:185 @ static __always_inline void pagefault_di */ static inline void pagefault_disable(void) { + migrate_disable(); pagefault_disabled_inc(); /* * make sure to have issued the store before a pagefault @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:202 @ static inline void pagefault_enable(void */ barrier(); pagefault_disabled_dec(); + migrate_enable(); } /* Index: linux-5.4.5-rt3/include/linux/vmstat.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/vmstat.h +++ linux-5.4.5-rt3/include/linux/vmstat.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:57 @ DECLARE_PER_CPU(struct vm_event_state, v */ static inline void __count_vm_event(enum vm_event_item item) { + preempt_disable_rt(); raw_cpu_inc(vm_event_states.event[item]); + preempt_enable_rt(); } static inline void count_vm_event(enum vm_event_item item) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:69 @ static inline void count_vm_event(enum v static inline void __count_vm_events(enum vm_event_item item, long delta) { + preempt_disable_rt(); raw_cpu_add(vm_event_states.event[item], delta); + preempt_enable_rt(); } static inline void count_vm_events(enum vm_event_item item, long delta) Index: linux-5.4.5-rt3/include/linux/wait.h =================================================================== --- linux-5.4.5-rt3.orig/include/linux/wait.h +++ linux-5.4.5-rt3/include/linux/wait.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:13 @ #include <asm/current.h> #include <uapi/linux/wait.h> +#include <linux/atomic.h> typedef struct wait_queue_entry wait_queue_entry_t; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:24 @ int default_wake_function(struct wait_qu #define WQ_FLAG_EXCLUSIVE 0x01 #define WQ_FLAG_WOKEN 0x02 #define WQ_FLAG_BOOKMARK 0x04 +#define WQ_FLAG_CUSTOM 0x08 /* * A single wait-queue entry structure: Index: linux-5.4.5-rt3/include/net/gen_stats.h =================================================================== --- linux-5.4.5-rt3.orig/include/net/gen_stats.h +++ linux-5.4.5-rt3/include/net/gen_stats.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:9 @ #include <linux/socket.h> #include <linux/rtnetlink.h> #include <linux/pkt_sched.h> +#include <net/net_seq_lock.h> struct gnet_stats_basic_cpu { struct gnet_stats_basic_packed bstats; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:40 @ int gnet_stats_start_copy_compat(struct spinlock_t *lock, struct gnet_dump *d, int padattr); -int gnet_stats_copy_basic(const seqcount_t *running, +int gnet_stats_copy_basic(net_seqlock_t *running, struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b); -void __gnet_stats_copy_basic(const seqcount_t *running, +void __gnet_stats_copy_basic(net_seqlock_t *running, struct gnet_stats_basic_packed *bstats, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b); -int gnet_stats_copy_basic_hw(const seqcount_t *running, +int gnet_stats_copy_basic_hw(net_seqlock_t *running, struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:68 @ int gen_new_estimator(struct gnet_stats_ struct gnet_stats_basic_cpu __percpu *cpu_bstats, struct net_rate_estimator __rcu **rate_est, spinlock_t *lock, - seqcount_t *running, struct nlattr *opt); + net_seqlock_t *running, struct nlattr *opt); void gen_kill_estimator(struct net_rate_estimator __rcu **ptr); int gen_replace_estimator(struct gnet_stats_basic_packed *bstats, struct gnet_stats_basic_cpu __percpu *cpu_bstats, struct net_rate_estimator __rcu **ptr, spinlock_t *lock, - seqcount_t *running, struct nlattr *opt); + net_seqlock_t *running, struct nlattr *opt); bool gen_estimator_active(struct net_rate_estimator __rcu **ptr); bool gen_estimator_read(struct net_rate_estimator __rcu **ptr, struct gnet_stats_rate_est64 *sample); Index: linux-5.4.5-rt3/include/net/neighbour.h =================================================================== --- linux-5.4.5-rt3.orig/include/net/neighbour.h +++ linux-5.4.5-rt3/include/net/neighbour.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:462 @ static inline int neigh_hh_bridge(struct } #endif -static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb) +static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb) { unsigned int hh_alen = 0; unsigned int seq; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:505 @ static inline int neigh_hh_output(const static inline int neigh_output(struct neighbour *n, struct sk_buff *skb, bool skip_cache) { - const struct hh_cache *hh = &n->hh; + struct hh_cache *hh = &n->hh; if ((n->nud_state & NUD_CONNECTED) && hh->hh_len && !skip_cache) return neigh_hh_output(hh, skb); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:546 @ struct neighbour_cb { #define NEIGH_CB(skb) ((struct neighbour_cb *)(skb)->cb) -static inline void neigh_ha_snapshot(char *dst, const struct neighbour *n, +static inline void neigh_ha_snapshot(char *dst, struct neighbour *n, const struct net_device *dev) { unsigned int seq; Index: linux-5.4.5-rt3/include/net/net_seq_lock.h =================================================================== --- /dev/null +++ linux-5.4.5-rt3/include/net/net_seq_lock.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +#ifndef __NET_NET_SEQ_LOCK_H__ +#define __NET_NET_SEQ_LOCK_H__ + +#ifdef CONFIG_PREEMPT_RT +# define net_seqlock_t seqlock_t +# define net_seq_begin(__r) read_seqbegin(__r) +# define net_seq_retry(__r, __s) read_seqretry(__r, __s) + +#else +# define net_seqlock_t seqcount_t +# define net_seq_begin(__r) read_seqcount_begin(__r) +# define net_seq_retry(__r, __s) read_seqcount_retry(__r, __s) +#endif + +#endif Index: linux-5.4.5-rt3/include/net/sch_generic.h =================================================================== --- linux-5.4.5-rt3.orig/include/net/sch_generic.h +++ linux-5.4.5-rt3/include/net/sch_generic.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:13 @ #include <linux/percpu.h> #include <linux/dynamic_queue_limits.h> #include <linux/list.h> +#include <net/net_seq_lock.h> #include <linux/refcount.h> #include <linux/workqueue.h> #include <linux/mutex.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:104 @ struct Qdisc { struct sk_buff_head gso_skb ____cacheline_aligned_in_smp; struct qdisc_skb_head q; struct gnet_stats_basic_packed bstats; - seqcount_t running; + net_seqlock_t running; struct gnet_stats_queue qstats; unsigned long state; struct Qdisc *next_sched; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:142 @ static inline bool qdisc_is_running(stru { if (qdisc->flags & TCQ_F_NOLOCK) return spin_is_locked(&qdisc->seqlock); +#ifdef CONFIG_PREEMPT_RT + return spin_is_locked(&qdisc->running.lock) ? true : false; +#else return (raw_read_seqcount(&qdisc->running) & 1) ? true : false; +#endif } static inline bool qdisc_is_percpu_stats(const struct Qdisc *q) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:170 @ static inline bool qdisc_run_begin(struc } else if (qdisc_is_running(qdisc)) { return false; } +#ifdef CONFIG_PREEMPT_RT + if (try_write_seqlock(&qdisc->running)) + return true; + return false; +#else /* Variant of write_seqcount_begin() telling lockdep a trylock * was attempted. */ raw_write_seqcount_begin(&qdisc->running); seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_); return true; +#endif } static inline void qdisc_run_end(struct Qdisc *qdisc) { +#ifdef CONFIG_PREEMPT_RT + write_sequnlock(&qdisc->running); +#else write_seqcount_end(&qdisc->running); +#endif if (qdisc->flags & TCQ_F_NOLOCK) spin_unlock(&qdisc->seqlock); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:559 @ static inline spinlock_t *qdisc_root_sle return qdisc_lock(root); } -static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc) +static inline net_seqlock_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc) { struct Qdisc *root = qdisc_root_sleeping(qdisc); Index: linux-5.4.5-rt3/include/xen/xen-ops.h =================================================================== --- linux-5.4.5-rt3.orig/include/xen/xen-ops.h +++ linux-5.4.5-rt3/include/xen/xen-ops.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:218 @ bool xen_running_on_version_or_later(uns void xen_efi_runtime_setup(void); -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION static inline void xen_preemptible_hcall_begin(void) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:242 @ static inline void xen_preemptible_hcall __this_cpu_write(xen_in_preemptible_hcall, false); } -#endif /* CONFIG_PREEMPT */ +#endif /* CONFIG_PREEMPTION */ #endif /* INCLUDE_XEN_OPS_H */ Index: linux-5.4.5-rt3/init/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/init/Kconfig +++ linux-5.4.5-rt3/init/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:927 @ config CFS_BANDWIDTH config RT_GROUP_SCHED bool "Group scheduling for SCHED_RR/FIFO" depends on CGROUP_SCHED + depends on !PREEMPT_RT default n help This feature lets you explicitly allocate real CPU bandwidth @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1634 @ config KALLSYMS_BASE_RELATIVE # syscall, maps, verifier config BPF_SYSCALL bool "Enable bpf() system call" + depends on !PREEMPT_RT select BPF select IRQ_WORK default n @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1810 @ choice config SLAB bool "SLAB" + depends on !PREEMPT_RT select HAVE_HARDENED_USERCOPY_ALLOCATOR help The regular slab allocator that is established and known to work @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1831 @ config SLUB config SLOB depends on EXPERT bool "SLOB (Simple Allocator)" + depends on !PREEMPT_RT help SLOB replaces the stock allocator with a drastically simpler allocator. SLOB is generally more space efficient but @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1897 @ config SHUFFLE_PAGE_ALLOCATOR config SLUB_CPU_PARTIAL default y - depends on SLUB && SMP + depends on SLUB && SMP && !PREEMPT_RT bool "SLUB per cpu partial cache" help Per cpu partial caches accelerate objects allocation and freeing Index: linux-5.4.5-rt3/init/init_task.c =================================================================== --- linux-5.4.5-rt3.orig/init/init_task.c +++ linux-5.4.5-rt3/init/init_task.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:76 @ struct task_struct init_task .cpus_ptr = &init_task.cpus_mask, .cpus_mask = CPU_MASK_ALL, .nr_cpus_allowed= NR_CPUS, +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) && \ + defined(CONFIG_SCHED_DEBUG) + .pinned_on_cpu = -1, +#endif .mm = NULL, .active_mm = &init_mm, .restart_block = { Index: linux-5.4.5-rt3/init/main.c =================================================================== --- linux-5.4.5-rt3.orig/init/main.c +++ linux-5.4.5-rt3/init/main.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:697 @ asmlinkage __visible void __init start_k boot_init_stack_canary(); time_init(); - printk_safe_init(); perf_event_init(); profile_init(); call_function_init(); Index: linux-5.4.5-rt3/kernel/Kconfig.locks =================================================================== --- linux-5.4.5-rt3.orig/kernel/Kconfig.locks +++ linux-5.4.5-rt3/kernel/Kconfig.locks @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:104 @ config UNINLINE_SPIN_UNLOCK # unlock and unlock_irq functions are inlined when: # - DEBUG_SPINLOCK=n and ARCH_INLINE_*LOCK=y # or -# - DEBUG_SPINLOCK=n and PREEMPT=n +# - DEBUG_SPINLOCK=n and PREEMPTION=n # # unlock_bh and unlock_irqrestore functions are inlined when: # - DEBUG_SPINLOCK=n and ARCH_INLINE_*LOCK=y @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:142 @ config INLINE_SPIN_UNLOCK_BH config INLINE_SPIN_UNLOCK_IRQ def_bool y - depends on !PREEMPT || ARCH_INLINE_SPIN_UNLOCK_IRQ + depends on !PREEMPTION || ARCH_INLINE_SPIN_UNLOCK_IRQ config INLINE_SPIN_UNLOCK_IRQRESTORE def_bool y @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:171 @ config INLINE_READ_LOCK_IRQSAVE config INLINE_READ_UNLOCK def_bool y - depends on !PREEMPT || ARCH_INLINE_READ_UNLOCK + depends on !PREEMPTION || ARCH_INLINE_READ_UNLOCK config INLINE_READ_UNLOCK_BH def_bool y @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:179 @ config INLINE_READ_UNLOCK_BH config INLINE_READ_UNLOCK_IRQ def_bool y - depends on !PREEMPT || ARCH_INLINE_READ_UNLOCK_IRQ + depends on !PREEMPTION || ARCH_INLINE_READ_UNLOCK_IRQ config INLINE_READ_UNLOCK_IRQRESTORE def_bool y @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:208 @ config INLINE_WRITE_LOCK_IRQSAVE config INLINE_WRITE_UNLOCK def_bool y - depends on !PREEMPT || ARCH_INLINE_WRITE_UNLOCK + depends on !PREEMPTION || ARCH_INLINE_WRITE_UNLOCK config INLINE_WRITE_UNLOCK_BH def_bool y @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:216 @ config INLINE_WRITE_UNLOCK_BH config INLINE_WRITE_UNLOCK_IRQ def_bool y - depends on !PREEMPT || ARCH_INLINE_WRITE_UNLOCK_IRQ + depends on !PREEMPTION || ARCH_INLINE_WRITE_UNLOCK_IRQ config INLINE_WRITE_UNLOCK_IRQRESTORE def_bool y Index: linux-5.4.5-rt3/kernel/Kconfig.preempt =================================================================== --- linux-5.4.5-rt3.orig/kernel/Kconfig.preempt +++ linux-5.4.5-rt3/kernel/Kconfig.preempt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ # SPDX-License-Identifier: GPL-2.0-only +config HAVE_PREEMPT_LAZY + bool + +config PREEMPT_LAZY + def_bool y if HAVE_PREEMPT_LAZY && PREEMPT_RT + choice prompt "Preemption Model" default PREEMPT_NONE Index: linux-5.4.5-rt3/kernel/cgroup/cgroup.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/cgroup/cgroup.c +++ linux-5.4.5-rt3/kernel/cgroup/cgroup.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1960 @ static void init_cgroup_housekeeping(str cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; - INIT_LIST_HEAD(&cgrp->rstat_css_list); prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5015 @ static void css_release_work_fn(struct w list_del_rcu(&css->sibling); if (ss) { - /* css release path */ - if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); - list_del_rcu(&css->rstat_css_node); - } - cgroup_idr_replace(&ss->css_idr, NULL, css->id); if (ss->css_released) ss->css_released(css); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5076 @ static void init_and_link_css(struct cgr css->id = -1; INIT_LIST_HEAD(&css->sibling); INIT_LIST_HEAD(&css->children); - INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5084 @ static void init_and_link_css(struct cgr css_get(css->parent); } - if (cgroup_on_dfl(cgrp) && ss->css_rstat_flush) - list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); - BUG_ON(cgroup_css(cgrp, ss)); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5185 @ static struct cgroup_subsys_state *css_c err_list_del: list_del_rcu(&css->sibling); err_free_css: - list_del_rcu(&css->rstat_css_node); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); Index: linux-5.4.5-rt3/kernel/cgroup/cpuset.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/cgroup/cpuset.c +++ linux-5.4.5-rt3/kernel/cgroup/cpuset.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:348 @ void cpuset_read_unlock(void) percpu_up_read(&cpuset_rwsem); } -static DEFINE_SPINLOCK(callback_lock); +static DEFINE_RAW_SPINLOCK(callback_lock); static struct workqueue_struct *cpuset_migrate_mm_wq; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1258 @ static int update_parent_subparts_cpumas * Newly added CPUs will be removed from effective_cpus and * newly deleted ones will be added back to effective_cpus. */ - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); if (adding) { cpumask_or(parent->subparts_cpus, parent->subparts_cpus, tmp->addmask); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1277 @ static int update_parent_subparts_cpumas } parent->nr_subparts_cpus = cpumask_weight(parent->subparts_cpus); - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); return cmd == partcmd_update; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1382 @ static void update_cpumasks_hier(struct continue; rcu_read_unlock(); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cpumask_copy(cp->effective_cpus, tmp->new_cpus); if (cp->nr_subparts_cpus && @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1413 @ static void update_cpumasks_hier(struct = cpumask_weight(cp->subparts_cpus); } } - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1531 @ static int update_cpumask(struct cpuset return -EINVAL; } - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1542 @ static int update_cpumask(struct cpuset cs->cpus_allowed); cs->nr_subparts_cpus = cpumask_weight(cs->subparts_cpus); } - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); update_cpumasks_hier(cs, &tmp); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1736 @ static void update_nodemasks_hier(struct continue; rcu_read_unlock(); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cp->effective_mems = *new_mems; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); WARN_ON(!is_in_v2_mode() && !nodes_equal(cp->mems_allowed, cp->effective_mems)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1806 @ static int update_nodemask(struct cpuset if (retval < 0) goto done; - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cs->mems_allowed = trialcs->mems_allowed; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); /* use trialcs->mems_allowed as a temp variable */ update_nodemasks_hier(cs, &trialcs->mems_allowed); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1899 @ static int update_flag(cpuset_flagbits_t spread_flag_changed = ((is_spread_slab(cs) != is_spread_slab(trialcs)) || (is_spread_page(cs) != is_spread_page(trialcs))); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cs->flags = trialcs->flags; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed) rebuild_sched_domains_locked(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2410 @ static int cpuset_common_seq_show(struct cpuset_filetype_t type = seq_cft(sf)->private; int ret = 0; - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); switch (type) { case FILE_CPULIST: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2432 @ static int cpuset_common_seq_show(struct ret = -EINVAL; } - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2745 @ static int cpuset_css_online(struct cgro cpuset_inc(); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); if (is_in_v2_mode()) { cpumask_copy(cs->effective_cpus, parent->effective_cpus); cs->effective_mems = parent->effective_mems; cs->use_parent_ecpus = true; parent->child_ecpus_count++; } - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags)) goto out_unlock; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2779 @ static int cpuset_css_online(struct cgro } rcu_read_unlock(); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cs->mems_allowed = parent->mems_allowed; cs->effective_mems = parent->mems_allowed; cpumask_copy(cs->cpus_allowed, parent->cpus_allowed); cpumask_copy(cs->effective_cpus, parent->cpus_allowed); - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); out_unlock: percpu_up_write(&cpuset_rwsem); put_online_cpus(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2840 @ static void cpuset_css_free(struct cgrou static void cpuset_bind(struct cgroup_subsys_state *root_css) { percpu_down_write(&cpuset_rwsem); - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); if (is_in_v2_mode()) { cpumask_copy(top_cpuset.cpus_allowed, cpu_possible_mask); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2851 @ static void cpuset_bind(struct cgroup_su top_cpuset.mems_allowed = top_cpuset.effective_mems; } - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); percpu_up_write(&cpuset_rwsem); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2948 @ hotplug_update_tasks_legacy(struct cpuse { bool is_empty; - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cpumask_copy(cs->cpus_allowed, new_cpus); cpumask_copy(cs->effective_cpus, new_cpus); cs->mems_allowed = *new_mems; cs->effective_mems = *new_mems; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); /* * Don't call update_tasks_cpumask() if the cpuset becomes empty, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2990 @ hotplug_update_tasks(struct cpuset *cs, if (nodes_empty(*new_mems)) *new_mems = parent_cs(cs)->effective_mems; - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); cpumask_copy(cs->effective_cpus, new_cpus); cs->effective_mems = *new_mems; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); if (cpus_updated) update_tasks_cpumask(cs); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3148 @ static void cpuset_hotplug_workfn(struct /* synchronize cpus_allowed to cpu_active_mask */ if (cpus_updated) { - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); if (!on_dfl) cpumask_copy(top_cpuset.cpus_allowed, &new_cpus); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3168 @ static void cpuset_hotplug_workfn(struct } } cpumask_copy(top_cpuset.effective_cpus, &new_cpus); - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); /* we don't mess with cpumasks of tasks in top_cpuset */ } /* synchronize mems_allowed to N_MEMORY */ if (mems_updated) { - spin_lock_irq(&callback_lock); + raw_spin_lock_irq(&callback_lock); if (!on_dfl) top_cpuset.mems_allowed = new_mems; top_cpuset.effective_mems = new_mems; - spin_unlock_irq(&callback_lock); + raw_spin_unlock_irq(&callback_lock); update_tasks_nodemask(&top_cpuset); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3279 @ void cpuset_cpus_allowed(struct task_str { unsigned long flags; - spin_lock_irqsave(&callback_lock, flags); + raw_spin_lock_irqsave(&callback_lock, flags); rcu_read_lock(); guarantee_online_cpus(task_cs(tsk), pmask); rcu_read_unlock(); - spin_unlock_irqrestore(&callback_lock, flags); + raw_spin_unlock_irqrestore(&callback_lock, flags); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3344 @ nodemask_t cpuset_mems_allowed(struct ta nodemask_t mask; unsigned long flags; - spin_lock_irqsave(&callback_lock, flags); + raw_spin_lock_irqsave(&callback_lock, flags); rcu_read_lock(); guarantee_online_mems(task_cs(tsk), &mask); rcu_read_unlock(); - spin_unlock_irqrestore(&callback_lock, flags); + raw_spin_unlock_irqrestore(&callback_lock, flags); return mask; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3440 @ bool __cpuset_node_allowed(int node, gfp return true; /* Not hardwall and node outside mems_allowed: scan up cpusets */ - spin_lock_irqsave(&callback_lock, flags); + raw_spin_lock_irqsave(&callback_lock, flags); rcu_read_lock(); cs = nearest_hardwall_ancestor(task_cs(current)); allowed = node_isset(node, cs->mems_allowed); rcu_read_unlock(); - spin_unlock_irqrestore(&callback_lock, flags); + raw_spin_unlock_irqrestore(&callback_lock, flags); return allowed; } Index: linux-5.4.5-rt3/kernel/cgroup/rstat.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/cgroup/rstat.c +++ linux-5.4.5-rt3/kernel/cgroup/rstat.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:152 @ static struct cgroup *cgroup_rstat_cpu_p } /* see cgroup_rstat_flush() */ -static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) +static void cgroup_rstat_flush_locked(struct cgroup *cgrp) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { int cpu; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:164 @ static void cgroup_rstat_flush_locked(st cpu); struct cgroup *pos = NULL; - raw_spin_lock(cpu_lock); - while ((pos = cgroup_rstat_cpu_pop_updated(pos, cgrp, cpu))) { - struct cgroup_subsys_state *css; - + raw_spin_lock_irq(cpu_lock); + while ((pos = cgroup_rstat_cpu_pop_updated(pos, cgrp, cpu))) cgroup_base_stat_flush(pos, cpu); - rcu_read_lock(); - list_for_each_entry_rcu(css, &pos->rstat_css_list, - rstat_css_node) - css->ss->css_rstat_flush(css, cpu); - rcu_read_unlock(); - } - raw_spin_unlock(cpu_lock); + raw_spin_unlock_irq(cpu_lock); - /* if @may_sleep, play nice and yield if necessary */ - if (may_sleep && (need_resched() || - spin_needbreak(&cgroup_rstat_lock))) { - spin_unlock_irq(&cgroup_rstat_lock); + if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { + spin_unlock(&cgroup_rstat_lock); if (!cond_resched()) cpu_relax(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); } } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:196 @ void cgroup_rstat_flush(struct cgroup *c { might_sleep(); - spin_lock_irq(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, true); - spin_unlock_irq(&cgroup_rstat_lock); -} - -/** - * cgroup_rstat_flush_irqsafe - irqsafe version of cgroup_rstat_flush() - * @cgrp: target cgroup - * - * This function can be called from any context. - */ -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) -{ - unsigned long flags; - - spin_lock_irqsave(&cgroup_rstat_lock, flags); - cgroup_rstat_flush_locked(cgrp, false); - spin_unlock_irqrestore(&cgroup_rstat_lock, flags); + spin_lock(&cgroup_rstat_lock); + cgroup_rstat_flush_locked(cgrp); + spin_unlock(&cgroup_rstat_lock); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:210 @ void cgroup_rstat_flush_irqsafe(struct c * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup *cgrp) +static void cgroup_rstat_flush_hold(struct cgroup *cgrp) __acquires(&cgroup_rstat_lock) { might_sleep(); - spin_lock_irq(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, true); + spin_lock(&cgroup_rstat_lock); + cgroup_rstat_flush_locked(cgrp); } /** * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() */ -void cgroup_rstat_flush_release(void) +static void cgroup_rstat_flush_release(void) __releases(&cgroup_rstat_lock) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } int cgroup_rstat_init(struct cgroup *cgrp) Index: linux-5.4.5-rt3/kernel/cpu.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/cpu.c +++ linux-5.4.5-rt3/kernel/cpu.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:334 @ void lockdep_assert_cpus_held(void) static void lockdep_acquire_cpus_lock(void) { - rwsem_acquire(&cpu_hotplug_lock.rw_sem.dep_map, 0, 0, _THIS_IP_); + rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_); } static void lockdep_release_cpus_lock(void) { - rwsem_release(&cpu_hotplug_lock.rw_sem.dep_map, 1, _THIS_IP_); + rwsem_release(&cpu_hotplug_lock.dep_map, 1, _THIS_IP_); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:852 @ static int take_cpu_down(void *_param) int err, cpu = smp_processor_id(); int ret; +#ifdef CONFIG_PREEMPT_RT + /* + * If any tasks disabled migration before we got here, + * go back and sleep again. + */ + if (cpu_nr_pinned(cpu)) + return -EAGAIN; +#endif + /* Ensure this CPU doesn't handle any more interrupts. */ err = __cpu_disable(); if (err < 0) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:890 @ static int take_cpu_down(void *_param) return 0; } +#ifdef CONFIG_PREEMPT_RT +struct task_struct *takedown_cpu_task; +#endif + static int takedown_cpu(unsigned int cpu) { struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:908 @ static int takedown_cpu(unsigned int cpu */ irq_lock_sparse(); +#ifdef CONFIG_PREEMPT_RT + WARN_ON_ONCE(takedown_cpu_task); + takedown_cpu_task = current; + +again: + /* + * If a task pins this CPU after we pass this check, take_cpu_down + * will return -EAGAIN. + */ + for (;;) { + int nr_pinned; + + set_current_state(TASK_UNINTERRUPTIBLE); + nr_pinned = cpu_nr_pinned(cpu); + if (nr_pinned == 0) + break; + schedule(); + } + set_current_state(TASK_RUNNING); +#endif + /* * So now all preempt/rcu users must observe !cpu_active(). */ err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu)); +#ifdef CONFIG_PREEMPT_RT + if (err == -EAGAIN) + goto again; +#endif if (err) { +#ifdef CONFIG_PREEMPT_RT + takedown_cpu_task = NULL; +#endif /* CPU refused to die */ irq_unlock_sparse(); /* Unpark the hotplug thread so we can rollback there */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:959 @ static int takedown_cpu(unsigned int cpu wait_for_ap_thread(st, false); BUG_ON(st->state != CPUHP_AP_IDLE_DEAD); +#ifdef CONFIG_PREEMPT_RT + takedown_cpu_task = NULL; +#endif /* Interrupts are moved away from the dying cpu, reenable alloc/free */ irq_unlock_sparse(); Index: linux-5.4.5-rt3/kernel/events/core.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/events/core.c +++ linux-5.4.5-rt3/kernel/events/core.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:10259 @ static struct pmu *perf_init_event(struc goto unlock; } - list_for_each_entry_rcu(pmu, &pmus, entry) { + list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) { ret = perf_try_init_event(pmu, event); if (!ret) goto unlock; Index: linux-5.4.5-rt3/kernel/exit.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/exit.c +++ linux-5.4.5-rt3/kernel/exit.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:164 @ static void __exit_signal(struct task_st * Do this under ->siglock, we can race with another thread * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals. */ - flush_sigqueue(&tsk->pending); + flush_task_sigqueue(tsk); tsk->sighand = NULL; spin_unlock(&sighand->siglock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:261 @ void rcuwait_wake_up(struct rcuwait *w) wake_up_process(task); rcu_read_unlock(); } +EXPORT_SYMBOL_GPL(rcuwait_wake_up); /* * Determine if a process group is "orphaned", according to the POSIX Index: linux-5.4.5-rt3/kernel/fork.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/fork.c +++ linux-5.4.5-rt3/kernel/fork.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:46 @ #include <linux/hmm.h> #include <linux/fs.h> #include <linux/mm.h> +#include <linux/kprobes.h> #include <linux/vmacache.h> #include <linux/nsproxy.h> #include <linux/capability.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:293 @ static inline void free_thread_stack(str return; } - vfree_atomic(tsk->stack); + vfree(tsk->stack); return; } #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:700 @ void __mmdrop(struct mm_struct *mm) } EXPORT_SYMBOL_GPL(__mmdrop); +#ifdef CONFIG_PREEMPT_RT +/* + * RCU callback for delayed mm drop. Not strictly rcu, but we don't + * want another facility to make this work. + */ +void __mmdrop_delayed(struct rcu_head *rhp) +{ + struct mm_struct *mm = container_of(rhp, struct mm_struct, delayed_drop); + + __mmdrop(mm); +} +#endif + static void mmdrop_async_fn(struct work_struct *work) { struct mm_struct *mm; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:754 @ void __put_task_struct(struct task_struc WARN_ON(refcount_read(&tsk->usage)); WARN_ON(tsk == current); + /* + * Remove function-return probe instances associated with this + * task and put them back on the free list. + */ + kprobe_flush_task(tsk); + + /* Task is done with its stack. */ + put_task_stack(tsk); + cgroup_free(tsk); task_numa_free(tsk, true); security_task_free(tsk); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:953 @ static struct task_struct *dup_task_stru tsk->splice_pipe = NULL; tsk->task_frag.page = NULL; tsk->wake_q.next = NULL; + tsk->wake_q_sleeper.next = NULL; account_kernel_stack(tsk, 1); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1927 @ static __latent_entropy struct task_stru spin_lock_init(&p->alloc_lock); init_sigpending(&p->pending); + p->sigqueue_cache = NULL; p->utime = p->stime = p->gtime = 0; #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME Index: linux-5.4.5-rt3/kernel/futex.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/futex.c +++ linux-5.4.5-rt3/kernel/futex.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:948 @ static void exit_pi_state_list(struct ta if (head->next != next) { /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); + raw_spin_unlock_irq(&curr->pi_lock); spin_unlock(&hb->lock); + raw_spin_lock_irq(&curr->pi_lock); put_pi_state(pi_state); continue; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1559 @ static int wake_futex_pi(u32 __user *uad struct task_struct *new_owner; bool postunlock = false; DEFINE_WAKE_Q(wake_q); + DEFINE_WAKE_Q(wake_sleeper_q); int ret = 0; new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1619 @ static int wake_futex_pi(u32 __user *uad pi_state->owner = new_owner; raw_spin_unlock(&new_owner->pi_lock); - postunlock = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); - + postunlock = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q, + &wake_sleeper_q); out_unlock: raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); if (postunlock) - rt_mutex_postunlock(&wake_q); + rt_mutex_postunlock(&wake_q, &wake_sleeper_q); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2249 @ retry_private: requeue_pi_wake_futex(this, &key2, hb2); drop_count++; continue; + } else if (ret == -EAGAIN) { + /* + * Waiter was woken by timeout or + * signal and has set pi_blocked_on to + * PI_WAKEUP_INPROGRESS before we + * tried to enqueue it on the rtmutex. + */ + this->pi_state = NULL; + put_pi_state(pi_state); + continue; } else if (ret) { /* * rt_mutex_start_proxy_lock() detected a @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2967 @ retry_private: goto no_block; } - rt_mutex_init_waiter(&rt_waiter); + rt_mutex_init_waiter(&rt_waiter, false); /* * On PREEMPT_RT_FULL, when hb->lock becomes an rt_mutex, we must not @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2983 @ retry_private: * before __rt_mutex_start_proxy_lock() is done. */ raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); + /* + * the migrate_disable() here disables migration in the in_atomic() fast + * path which is enabled again in the following spin_unlock(). We have + * one migrate_disable() pending in the slow-path which is reversed + * after the raw_spin_unlock_irq() where we leave the atomic context. + */ + migrate_disable(); + spin_unlock(q.lock_ptr); /* * __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2999 @ retry_private: */ ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current); raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock); + migrate_enable(); if (ret) { if (ret == 1) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3148 @ retry: * rt_waiter. Also see the WARN in wake_futex_pi(). */ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + /* + * Magic trickery for now to make the RT migrate disable + * logic happy. The following spin_unlock() happens with + * interrupts disabled so the internal migrate_enable() + * won't undo the migrate_disable() which was issued when + * locking hb->lock. + */ + migrate_disable(); spin_unlock(&hb->lock); /* drops pi_state->pi_mutex.wait_lock */ ret = wake_futex_pi(uaddr, uval, pi_state); + migrate_enable(); put_pi_state(pi_state); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3332 @ static int futex_wait_requeue_pi(u32 __u struct hrtimer_sleeper timeout, *to; struct futex_pi_state *pi_state = NULL; struct rt_mutex_waiter rt_waiter; - struct futex_hash_bucket *hb; + struct futex_hash_bucket *hb, *hb2; union futex_key key2 = FUTEX_KEY_INIT; struct futex_q q = futex_q_init; int res, ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3353 @ static int futex_wait_requeue_pi(u32 __u * The waiter is allocated on our stack, manipulated by the requeue * code while we sleep on uaddr. */ - rt_mutex_init_waiter(&rt_waiter); + rt_mutex_init_waiter(&rt_waiter, false); ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE); if (unlikely(ret != 0)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3384 @ static int futex_wait_requeue_pi(u32 __u /* Queue the futex_q, drop the hb lock, wait for wakeup. */ futex_wait_queue_me(hb, &q, to); - spin_lock(&hb->lock); - ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); - spin_unlock(&hb->lock); - if (ret) - goto out_put_keys; + /* + * On RT we must avoid races with requeue and trying to block + * on two mutexes (hb->lock and uaddr2's rtmutex) by + * serializing access to pi_blocked_on with pi_lock. + */ + raw_spin_lock_irq(¤t->pi_lock); + if (current->pi_blocked_on) { + /* + * We have been requeued or are in the process of + * being requeued. + */ + raw_spin_unlock_irq(¤t->pi_lock); + } else { + /* + * Setting pi_blocked_on to PI_WAKEUP_INPROGRESS + * prevents a concurrent requeue from moving us to the + * uaddr2 rtmutex. After that we can safely acquire + * (and possibly block on) hb->lock. + */ + current->pi_blocked_on = PI_WAKEUP_INPROGRESS; + raw_spin_unlock_irq(¤t->pi_lock); + + spin_lock(&hb->lock); + + /* + * Clean up pi_blocked_on. We might leak it otherwise + * when we succeeded with the hb->lock in the fast + * path. + */ + raw_spin_lock_irq(¤t->pi_lock); + current->pi_blocked_on = NULL; + raw_spin_unlock_irq(¤t->pi_lock); + + ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); + spin_unlock(&hb->lock); + if (ret) + goto out_put_keys; + } /* - * In order for us to be here, we know our q.key == key2, and since - * we took the hb->lock above, we also know that futex_requeue() has - * completed and we no longer have to concern ourselves with a wakeup - * race with the atomic proxy lock acquisition by the requeue code. The - * futex_requeue dropped our key1 reference and incremented our key2 - * reference count. + * In order to be here, we have either been requeued, are in + * the process of being requeued, or requeue successfully + * acquired uaddr2 on our behalf. If pi_blocked_on was + * non-null above, we may be racing with a requeue. Do not + * rely on q->lock_ptr to be hb2->lock until after blocking on + * hb->lock or hb2->lock. The futex_requeue dropped our key1 + * reference and incremented our key2 reference count. */ + hb2 = hash_futex(&key2); /* Check if the requeue code acquired the second futex for us. */ if (!q.rt_waiter) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3441 @ static int futex_wait_requeue_pi(u32 __u * did a lock-steal - fix up the PI-state in that case. */ if (q.pi_state && (q.pi_state->owner != current)) { - spin_lock(q.lock_ptr); + spin_lock(&hb2->lock); + BUG_ON(&hb2->lock != q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current); if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) { pi_state = q.pi_state; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3453 @ static int futex_wait_requeue_pi(u32 __u * the requeue_pi() code acquired for us. */ put_pi_state(q.pi_state); - spin_unlock(q.lock_ptr); + spin_unlock(&hb2->lock); } } else { struct rt_mutex *pi_mutex; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3467 @ static int futex_wait_requeue_pi(u32 __u pi_mutex = &q.pi_state->pi_mutex; ret = rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter); - spin_lock(q.lock_ptr); + spin_lock(&hb2->lock); + BUG_ON(&hb2->lock != q.lock_ptr); if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter)) ret = 0; Index: linux-5.4.5-rt3/kernel/irq/handle.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/irq/handle.c +++ linux-5.4.5-rt3/kernel/irq/handle.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:188 @ irqreturn_t handle_irq_event_percpu(stru { irqreturn_t retval; unsigned int flags = 0; + struct pt_regs *regs = get_irq_regs(); + u64 ip = regs ? instruction_pointer(regs) : 0; retval = __handle_irq_event_percpu(desc, &flags); - add_interrupt_randomness(desc->irq_data.irq, flags); +#ifdef CONFIG_PREEMPT_RT + desc->random_ip = ip; +#else + add_interrupt_randomness(desc->irq_data.irq, flags, ip); +#endif if (!noirqdebug) note_interrupt(desc, retval); Index: linux-5.4.5-rt3/kernel/irq/manage.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/irq/manage.c +++ linux-5.4.5-rt3/kernel/irq/manage.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1102 @ static int irq_thread(void *data) if (action_ret == IRQ_WAKE_THREAD) irq_wake_secondary(desc, action); +#ifdef CONFIG_PREEMPT_RT + migrate_disable(); + add_interrupt_randomness(action->irq, 0, + desc->random_ip ^ (unsigned long) action); + migrate_enable(); +#endif wake_threads_waitq(desc); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2690 @ EXPORT_SYMBOL_GPL(irq_get_irqchip_state) * This call sets the internal irqchip state of an interrupt, * depending on the value of @which. * - * This function should be called with preemption disabled if the + * This function should be called with migration disabled if the * interrupt controller has per-cpu registers. */ int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which, Index: linux-5.4.5-rt3/kernel/irq/spurious.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/irq/spurious.c +++ linux-5.4.5-rt3/kernel/irq/spurious.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:445 @ MODULE_PARM_DESC(noirqdebug, "Disable ir static int __init irqfixup_setup(char *str) { +#ifdef CONFIG_PREEMPT_RT + pr_warn("irqfixup boot option not supported w/ CONFIG_PREEMPT_RT\n"); + return 1; +#endif irqfixup = 1; printk(KERN_WARNING "Misrouted IRQ fixup support enabled.\n"); printk(KERN_WARNING "This may impact system performance.\n"); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:461 @ module_param(irqfixup, int, 0644); static int __init irqpoll_setup(char *str) { +#ifdef CONFIG_PREEMPT_RT + pr_warn("irqpoll boot option not supported w/ CONFIG_PREEMPT_RT\n"); + return 1; +#endif irqfixup = 2; printk(KERN_WARNING "Misrouted IRQ fixup and polling support " "enabled\n"); Index: linux-5.4.5-rt3/kernel/irq_work.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/irq_work.c +++ linux-5.4.5-rt3/kernel/irq_work.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/smp.h> +#include <linux/interrupt.h> #include <asm/processor.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ void __weak arch_irq_work_raise(void) /* Enqueue on current CPU, work must already be claimed and preempt disabled */ static void __irq_work_queue_local(struct irq_work *work) { + struct llist_head *list; + bool lazy_work, realtime = IS_ENABLED(CONFIG_PREEMPT_RT); + + lazy_work = work->flags & IRQ_WORK_LAZY; + /* If the work is "lazy", handle it from next tick if any */ - if (work->flags & IRQ_WORK_LAZY) { - if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) && - tick_nohz_tick_stopped()) - arch_irq_work_raise(); - } else { - if (llist_add(&work->llnode, this_cpu_ptr(&raised_list))) + if (lazy_work || (realtime && !(work->flags & IRQ_WORK_HARD_IRQ))) + list = this_cpu_ptr(&lazy_list); + else + list = this_cpu_ptr(&raised_list); + + if (llist_add(&work->llnode, list)) { + if (!lazy_work || tick_nohz_tick_stopped()) arch_irq_work_raise(); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:118 @ bool irq_work_queue_on(struct irq_work * preempt_disable(); if (cpu != smp_processor_id()) { + struct llist_head *list; + /* Arch remote IPI send/receive backend aren't NMI safe */ WARN_ON_ONCE(in_nmi()); - if (llist_add(&work->llnode, &per_cpu(raised_list, cpu))) + if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(work->flags & IRQ_WORK_HARD_IRQ)) + list = &per_cpu(lazy_list, cpu); + else + list = &per_cpu(raised_list, cpu); + + if (llist_add(&work->llnode, list)) arch_send_call_function_single_ipi(cpu); } else { __irq_work_queue_local(work); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:146 @ bool irq_work_needs_cpu(void) raised = this_cpu_ptr(&raised_list); lazy = this_cpu_ptr(&lazy_list); - if (llist_empty(raised) || arch_irq_work_has_interrupt()) - if (llist_empty(lazy)) - return false; + if (llist_empty(raised) && llist_empty(lazy)) + return false; /* All work should have been flushed before going offline */ WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:161 @ static void irq_work_run_list(struct lli struct llist_node *llnode; unsigned long flags; +#ifndef CONFIG_PREEMPT_RT + /* + * nort: On RT IRQ-work may run in SOFTIRQ context. + */ BUG_ON(!irqs_disabled()); - +#endif if (llist_empty(list)) return; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:198 @ static void irq_work_run_list(struct lli void irq_work_run(void) { irq_work_run_list(this_cpu_ptr(&raised_list)); - irq_work_run_list(this_cpu_ptr(&lazy_list)); + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + /* + * NOTE: we raise softirq via IPI for safety, + * and execute in irq_work_tick() to move the + * overhead from hard to soft irq context. + */ + if (!llist_empty(this_cpu_ptr(&lazy_list))) + raise_softirq(TIMER_SOFTIRQ); + } else + irq_work_run_list(this_cpu_ptr(&lazy_list)); } EXPORT_SYMBOL_GPL(irq_work_run); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:217 @ void irq_work_tick(void) if (!llist_empty(raised) && !arch_irq_work_has_interrupt()) irq_work_run_list(raised); + + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + irq_work_run_list(this_cpu_ptr(&lazy_list)); +} + +#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_PREEMPT_RT) +void irq_work_tick_soft(void) +{ irq_work_run_list(this_cpu_ptr(&lazy_list)); } +#endif /* * Synchronize against the irq_work @entry, ensures the entry is not Index: linux-5.4.5-rt3/kernel/kexec_core.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/kexec_core.c +++ linux-5.4.5-rt3/kernel/kexec_core.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:975 @ void crash_kexec(struct pt_regs *regs) old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu); if (old_cpu == PANIC_CPU_INVALID) { /* This is the 1st CPU which comes here, so go ahead. */ - printk_safe_flush_on_panic(); __crash_kexec(regs); /* Index: linux-5.4.5-rt3/kernel/ksysfs.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/ksysfs.c +++ linux-5.4.5-rt3/kernel/ksysfs.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:141 @ KERNEL_ATTR_RO(vmcoreinfo); #endif /* CONFIG_CRASH_CORE */ +#if defined(CONFIG_PREEMPT_RT) +static ssize_t realtime_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", 1); +} +KERNEL_ATTR_RO(realtime); +#endif + /* whether file capabilities are enabled */ static ssize_t fscaps_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:241 @ static struct attribute * kernel_attrs[] &rcu_expedited_attr.attr, &rcu_normal_attr.attr, #endif +#ifdef CONFIG_PREEMPT_RT + &realtime_attr.attr, +#endif NULL }; Index: linux-5.4.5-rt3/kernel/locking/Makefile =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/Makefile +++ linux-5.4.5-rt3/kernel/locking/Makefile @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ # and is generally not a function of system call inputs. KCOV_INSTRUMENT := n -obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o +obj-y += semaphore.o rwsem.o percpu-rwsem.o ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:15 @ CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE) endif -obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o obj-$(CONFIG_LOCKDEP) += lockdep.o ifeq ($(CONFIG_PROC_FS),y) obj-$(CONFIG_LOCKDEP) += lockdep_proc.o endif obj-$(CONFIG_SMP) += spinlock.o -obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_PROVE_LOCKING) += spinlock.o obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o +ifneq ($(CONFIG_PREEMPT_RT),y) +obj-y += mutex.o +obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o +obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o +endif +obj-$(CONFIG_PREEMPT_RT) += mutex-rt.o rwsem-rt.o rwlock-rt.o obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o Index: linux-5.4.5-rt3/kernel/locking/lockdep.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/lockdep.c +++ linux-5.4.5-rt3/kernel/locking/lockdep.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4412 @ static void check_flags(unsigned long fl } } +#ifndef CONFIG_PREEMPT_RT /* * We dont accurately track softirq state in e.g. * hardirq contexts (such as on 4KSTACKS), so only @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4427 @ static void check_flags(unsigned long fl DEBUG_LOCKS_WARN_ON(!current->softirqs_enabled); } } +#endif if (!debug_locks) print_irqtrace_events(current); Index: linux-5.4.5-rt3/kernel/locking/locktorture.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/locktorture.c +++ linux-5.4.5-rt3/kernel/locking/locktorture.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:19 @ #include <linux/kthread.h> #include <linux/sched/rt.h> #include <linux/spinlock.h> -#include <linux/rwlock.h> #include <linux/mutex.h> #include <linux/rwsem.h> #include <linux/smp.h> Index: linux-5.4.5-rt3/kernel/locking/mutex-rt.c =================================================================== --- /dev/null +++ linux-5.4.5-rt3/kernel/locking/mutex-rt.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +/* + * kernel/rt.c + * + * Real-Time Preemption Support + * + * started by Ingo Molnar: + * + * Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> + * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com> + * + * historic credit for proving that Linux spinlocks can be implemented via + * RT-aware mutexes goes to many people: The Pmutex project (Dirk Grambow + * and others) who prototyped it on 2.4 and did lots of comparative + * research and analysis; TimeSys, for proving that you can implement a + * fully preemptible kernel via the use of IRQ threading and mutexes; + * Bill Huey for persuasively arguing on lkml that the mutex model is the + * right one; and to MontaVista, who ported pmutexes to 2.6. + * + * This code is a from-scratch implementation and is not based on pmutexes, + * but the idea of converting spinlocks to mutexes is used here too. + * + * lock debugging, locking tree, deadlock detection: + * + * Copyright (C) 2004, LynuxWorks, Inc., Igor Manyilov, Bill Huey + * Released under the General Public License (GPL). + * + * Includes portions of the generic R/W semaphore implementation from: + * + * Copyright (c) 2001 David Howells (dhowells@redhat.com). + * - Derived partially from idea by Andrea Arcangeli <andrea@suse.de> + * - Derived also from comments by Linus + * + * Pending ownership of locks and ownership stealing: + * + * Copyright (C) 2005, Kihon Technologies Inc., Steven Rostedt + * + * (also by Steven Rostedt) + * - Converted single pi_lock to individual task locks. + * + * By Esben Nielsen: + * Doing priority inheritance with help of the scheduler. + * + * Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com> + * - major rework based on Esben Nielsens initial patch + * - replaced thread_info references by task_struct refs + * - removed task->pending_owner dependency + * - BKL drop/reacquire for semaphore style locks to avoid deadlocks + * in the scheduler return path as discussed with Steven Rostedt + * + * Copyright (C) 2006, Kihon Technologies Inc. + * Steven Rostedt <rostedt@goodmis.org> + * - debugged and patched Thomas Gleixner's rework. + * - added back the cmpxchg to the rework. + * - turned atomic require back on for SMP. + */ + +#include <linux/spinlock.h> +#include <linux/rtmutex.h> +#include <linux/sched.h> +#include <linux/delay.h> +#include <linux/module.h> +#include <linux/kallsyms.h> +#include <linux/syscalls.h> +#include <linux/interrupt.h> +#include <linux/plist.h> +#include <linux/fs.h> +#include <linux/futex.h> +#include <linux/hrtimer.h> + +#include "rtmutex_common.h" + +/* + * struct mutex functions + */ +void __mutex_do_init(struct mutex *mutex, const char *name, + struct lock_class_key *key) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* + * Make sure we are not reinitializing a held lock: + */ + debug_check_no_locks_freed((void *)mutex, sizeof(*mutex)); + lockdep_init_map(&mutex->dep_map, name, key, 0); +#endif + mutex->lock.save_state = 0; +} +EXPORT_SYMBOL(__mutex_do_init); + +void __lockfunc _mutex_lock(struct mutex *lock) +{ + mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); + __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL(_mutex_lock); + +void __lockfunc _mutex_lock_io(struct mutex *lock) +{ + int token; + + token = io_schedule_prepare(); + _mutex_lock(lock); + io_schedule_finish(token); +} +EXPORT_SYMBOL_GPL(_mutex_lock_io); + +int __lockfunc _mutex_lock_interruptible(struct mutex *lock) +{ + int ret; + + mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); + ret = __rt_mutex_lock_state(&lock->lock, TASK_INTERRUPTIBLE); + if (ret) + mutex_release(&lock->dep_map, 1, _RET_IP_); + return ret; +} +EXPORT_SYMBOL(_mutex_lock_interruptible); + +int __lockfunc _mutex_lock_killable(struct mutex *lock) +{ + int ret; + + mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); + ret = __rt_mutex_lock_state(&lock->lock, TASK_KILLABLE); + if (ret) + mutex_release(&lock->dep_map, 1, _RET_IP_); + return ret; +} +EXPORT_SYMBOL(_mutex_lock_killable); + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +void __lockfunc _mutex_lock_nested(struct mutex *lock, int subclass) +{ + mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_); + __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL(_mutex_lock_nested); + +void __lockfunc _mutex_lock_io_nested(struct mutex *lock, int subclass) +{ + int token; + + token = io_schedule_prepare(); + + mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_); + __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE); + + io_schedule_finish(token); +} +EXPORT_SYMBOL_GPL(_mutex_lock_io_nested); + +void __lockfunc _mutex_lock_nest_lock(struct mutex *lock, struct lockdep_map *nest) +{ + mutex_acquire_nest(&lock->dep_map, 0, 0, nest, _RET_IP_); + __rt_mutex_lock_state(&lock->lock, TASK_UNINTERRUPTIBLE); +} +EXPORT_SYMBOL(_mutex_lock_nest_lock); + +int __lockfunc _mutex_lock_interruptible_nested(struct mutex *lock, int subclass) +{ + int ret; + + mutex_acquire_nest(&lock->dep_map, subclass, 0, NULL, _RET_IP_); + ret = __rt_mutex_lock_state(&lock->lock, TASK_INTERRUPTIBLE); + if (ret) + mutex_release(&lock->dep_map, 1, _RET_IP_); + return ret; +} +EXPORT_SYMBOL(_mutex_lock_interruptible_nested); + +int __lockfunc _mutex_lock_killable_nested(struct mutex *lock, int subclass) +{ + int ret; + + mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_); + ret = __rt_mutex_lock_state(&lock->lock, TASK_KILLABLE); + if (ret) + mutex_release(&lock->dep_map, 1, _RET_IP_); + return ret; +} +EXPORT_SYMBOL(_mutex_lock_killable_nested); +#endif + +int __lockfunc _mutex_trylock(struct mutex *lock) +{ + int ret = __rt_mutex_trylock(&lock->lock); + + if (ret) + mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); + + return ret; +} +EXPORT_SYMBOL(_mutex_trylock); + +void __lockfunc _mutex_unlock(struct mutex *lock) +{ + mutex_release(&lock->dep_map, 1, _RET_IP_); + __rt_mutex_unlock(&lock->lock); +} +EXPORT_SYMBOL(_mutex_unlock); + +/** + * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0 + * @cnt: the atomic which we are to dec + * @lock: the mutex to return holding if we dec to 0 + * + * return true and hold lock if we dec to 0, return false otherwise + */ +int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock) +{ + /* dec if we can't possibly hit 0 */ + if (atomic_add_unless(cnt, -1, 1)) + return 0; + /* we might hit 0, so take the lock */ + mutex_lock(lock); + if (!atomic_dec_and_test(cnt)) { + /* when we actually did the dec, we didn't hit 0 */ + mutex_unlock(lock); + return 0; + } + /* we hit 0, and we hold the lock */ + return 1; +} +EXPORT_SYMBOL(atomic_dec_and_mutex_lock); Index: linux-5.4.5-rt3/kernel/locking/percpu-rwsem.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/percpu-rwsem.c +++ linux-5.4.5-rt3/kernel/locking/percpu-rwsem.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ // SPDX-License-Identifier: GPL-2.0-only #include <linux/atomic.h> -#include <linux/rwsem.h> #include <linux/percpu.h> +#include <linux/wait.h> #include <linux/lockdep.h> #include <linux/percpu-rwsem.h> #include <linux/rcupdate.h> #include <linux/sched.h> +#include <linux/sched/task.h> #include <linux/errno.h> -#include "rwsem.h" - int __percpu_init_rwsem(struct percpu_rw_semaphore *sem, - const char *name, struct lock_class_key *rwsem_key) + const char *name, struct lock_class_key *key) { sem->read_count = alloc_percpu(int); if (unlikely(!sem->read_count)) return -ENOMEM; - /* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */ rcu_sync_init(&sem->rss); - __init_rwsem(&sem->rw_sem, name, rwsem_key); rcuwait_init(&sem->writer); - sem->readers_block = 0; + init_waitqueue_head(&sem->waiters); + atomic_set(&sem->block, 0); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + debug_check_no_locks_freed((void *)sem, sizeof(*sem)); + lockdep_init_map(&sem->dep_map, name, key, 0); +#endif return 0; } EXPORT_SYMBOL_GPL(__percpu_init_rwsem); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:46 @ void percpu_free_rwsem(struct percpu_rw_ } EXPORT_SYMBOL_GPL(percpu_free_rwsem); -int __percpu_down_read(struct percpu_rw_semaphore *sem, int try) +static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem) { + __this_cpu_inc(*sem->read_count); + /* * Due to having preemption disabled the decrement happens on * the same CPU as the increment, avoiding the * increment-on-one-CPU-and-decrement-on-another problem. * - * If the reader misses the writer's assignment of readers_block, then - * the writer is guaranteed to see the reader's increment. + * If the reader misses the writer's assignment of sem->block, then the + * writer is guaranteed to see the reader's increment. * * Conversely, any readers that increment their sem->read_count after - * the writer looks are guaranteed to see the readers_block value, - * which in turn means that they are guaranteed to immediately - * decrement their sem->read_count, so that it doesn't matter that the - * writer missed them. + * the writer looks are guaranteed to see the sem->block value, which + * in turn means that they are guaranteed to immediately decrement + * their sem->read_count, so that it doesn't matter that the writer + * missed them. */ smp_mb(); /* A matches D */ /* - * If !readers_block the critical section starts here, matched by the + * If !sem->block the critical section starts here, matched by the * release in percpu_up_write(). */ - if (likely(!smp_load_acquire(&sem->readers_block))) - return 1; + if (likely(!atomic_read_acquire(&sem->block))) + return true; - /* - * Per the above comment; we still have preemption disabled and - * will thus decrement on the same CPU as we incremented. - */ - __percpu_up_read(sem); + __this_cpu_dec(*sem->read_count); - if (try) - return 0; + /* Prod writer to re-evaluate readers_active_check() */ + rcuwait_wake_up(&sem->writer); - /* - * We either call schedule() in the wait, or we'll fall through - * and reschedule on the preempt_enable() in percpu_down_read(). - */ - preempt_enable_no_resched(); + return false; +} - /* - * Avoid lockdep for the down/up_read() we already have them. - */ - __down_read(&sem->rw_sem); - this_cpu_inc(*sem->read_count); - __up_read(&sem->rw_sem); +static inline bool __percpu_down_write_trylock(struct percpu_rw_semaphore *sem) +{ + if (atomic_read(&sem->block)) + return false; - preempt_disable(); - return 1; + return atomic_xchg(&sem->block, 1) == 0; +} + +static bool __percpu_rwsem_trylock(struct percpu_rw_semaphore *sem, bool reader) +{ + if (reader) { + bool ret; + + preempt_disable(); + ret = __percpu_down_read_trylock(sem); + preempt_enable(); + + return ret; + } + return __percpu_down_write_trylock(sem); +} + +/* + * The return value of wait_queue_entry::func means: + * + * <0 - error, wakeup is terminated and the error is returned + * 0 - no wakeup, a next waiter is tried + * >0 - woken, if EXCLUSIVE, counted towards @nr_exclusive. + * + * We use EXCLUSIVE for both readers and writers to preserve FIFO order, + * and play games with the return value to allow waking multiple readers. + * + * Specifically, we wake readers until we've woken a single writer, or until a + * trylock fails. + */ +static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry, + unsigned int mode, int wake_flags, + void *key) +{ + struct task_struct *p = get_task_struct(wq_entry->private); + bool reader = wq_entry->flags & WQ_FLAG_CUSTOM; + struct percpu_rw_semaphore *sem = key; + + /* concurrent against percpu_down_write(), can get stolen */ + if (!__percpu_rwsem_trylock(sem, reader)) + return 1; + + list_del_init(&wq_entry->entry); + smp_store_release(&wq_entry->private, NULL); + + wake_up_process(p); + put_task_struct(p); + + return !reader; /* wake (readers until) 1 writer */ } -EXPORT_SYMBOL_GPL(__percpu_down_read); -void __percpu_up_read(struct percpu_rw_semaphore *sem) +static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) { - smp_mb(); /* B matches C */ + DEFINE_WAIT_FUNC(wq_entry, percpu_rwsem_wake_function); + bool wait; + + spin_lock_irq(&sem->waiters.lock); /* - * In other words, if they see our decrement (presumably to aggregate - * zero, as that is the only time it matters) they will also see our - * critical section. + * Serialize against the wakeup in percpu_up_write(), if we fail + * the trylock, the wakeup must see us on the list. */ - __this_cpu_dec(*sem->read_count); + wait = !__percpu_rwsem_trylock(sem, reader); + if (wait) { + wq_entry.flags |= WQ_FLAG_EXCLUSIVE | reader * WQ_FLAG_CUSTOM; + __add_wait_queue_entry_tail(&sem->waiters, &wq_entry); + } + spin_unlock_irq(&sem->waiters.lock); + + while (wait) { + set_current_state(TASK_UNINTERRUPTIBLE); + if (!smp_load_acquire(&wq_entry.private)) + break; + schedule(); + } + __set_current_state(TASK_RUNNING); +} - /* Prod writer to recheck readers_active */ - rcuwait_wake_up(&sem->writer); +bool __percpu_down_read(struct percpu_rw_semaphore *sem, bool try) +{ + if (__percpu_down_read_trylock(sem)) + return true; + + if (try) + return false; + + preempt_enable(); + percpu_rwsem_wait(sem, /* .reader = */ true); + preempt_disable(); + + return true; } -EXPORT_SYMBOL_GPL(__percpu_up_read); +EXPORT_SYMBOL_GPL(__percpu_down_read); #define per_cpu_sum(var) \ ({ \ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:195 @ EXPORT_SYMBOL_GPL(__percpu_up_read); * zero. If this sum is zero, then it is stable due to the fact that if any * newly arriving readers increment a given counter, they will immediately * decrement that same counter. + * + * Assumes sem->block is set. */ static bool readers_active_check(struct percpu_rw_semaphore *sem) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:215 @ static bool readers_active_check(struct void percpu_down_write(struct percpu_rw_semaphore *sem) { + might_sleep(); + rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_); + /* Notify readers to take the slow path. */ rcu_sync_enter(&sem->rss); - down_write(&sem->rw_sem); - /* - * Notify new readers to block; up until now, and thus throughout the - * longish rcu_sync_enter() above, new readers could still come in. + * Try set sem->block; this provides writer-writer exclusion. + * Having sem->block set makes new readers block. */ - WRITE_ONCE(sem->readers_block, 1); + if (!__percpu_down_write_trylock(sem)) + percpu_rwsem_wait(sem, /* .reader = */ false); - smp_mb(); /* D matches A */ + /* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */ /* - * If they don't see our writer of readers_block, then we are - * guaranteed to see their sem->read_count increment, and therefore - * will wait for them. + * If they don't see our store of sem->block, then we are guaranteed to + * see their sem->read_count increment, and therefore will wait for + * them. */ - /* Wait for all now active readers to complete. */ + /* Wait for all active readers to complete. */ rcuwait_wait_event(&sem->writer, readers_active_check(sem)); } EXPORT_SYMBOL_GPL(percpu_down_write); void percpu_up_write(struct percpu_rw_semaphore *sem) { + rwsem_release(&sem->dep_map, 1, _RET_IP_); + /* * Signal the writer is done, no fast path yet. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:255 @ void percpu_up_write(struct percpu_rw_se * Therefore we force it through the slow path which guarantees an * acquire and thereby guarantees the critical section's consistency. */ - smp_store_release(&sem->readers_block, 0); + atomic_set_release(&sem->block, 0); /* - * Release the write lock, this will allow readers back in the game. + * Prod any pending reader/writer to make progress. */ - up_write(&sem->rw_sem); + __wake_up(&sem->waiters, TASK_NORMAL, 1, sem); /* * Once this completes (at least one RCU-sched grace period hence) the Index: linux-5.4.5-rt3/kernel/locking/rtmutex.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/rtmutex.c +++ linux-5.4.5-rt3/kernel/locking/rtmutex.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ * Copyright (C) 2005-2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com> * Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt * Copyright (C) 2006 Esben Nielsen + * Adaptive Spinlocks: + * Copyright (C) 2008 Novell, Inc., Gregory Haskins, Sven Dietrich, + * and Peter Morreale, + * Adaptive Spinlocks simplification: + * Copyright (C) 2008 Red Hat, Inc., Steven Rostedt <srostedt@redhat.com> * * See Documentation/locking/rt-mutex-design.rst for details. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:27 @ #include <linux/sched/wake_q.h> #include <linux/sched/debug.h> #include <linux/timer.h> +#include <linux/ww_mutex.h> +#include <linux/blkdev.h> #include "rtmutex_common.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:146 @ static void fixup_rt_mutex_waiters(struc WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS); } +static int rt_mutex_real_waiter(struct rt_mutex_waiter *waiter) +{ + return waiter && waiter != PI_WAKEUP_INPROGRESS && + waiter != PI_REQUEUE_INPROGRESS; +} + /* * We can speed up the acquire/release, if there's no debugging state to be * set up. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:245 @ static inline bool unlock_rt_mutex_safe( * Only use with rt_mutex_waiter_{less,equal}() */ #define task_to_waiter(p) \ - &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } + &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline, .task = (p) } static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:285 @ rt_mutex_waiter_equal(struct rt_mutex_wa return 1; } +#define STEAL_NORMAL 0 +#define STEAL_LATERAL 1 + +static inline int +rt_mutex_steal(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, int mode) +{ + struct rt_mutex_waiter *top_waiter = rt_mutex_top_waiter(lock); + + if (waiter == top_waiter || rt_mutex_waiter_less(waiter, top_waiter)) + return 1; + + /* + * Note that RT tasks are excluded from lateral-steals + * to prevent the introduction of an unbounded latency. + */ + if (mode == STEAL_NORMAL || rt_task(waiter->task)) + return 0; + + return rt_mutex_waiter_equal(waiter, top_waiter); +} + static void rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:410 @ static bool rt_mutex_cond_detect_deadloc return debug_rt_mutex_detect_deadlock(waiter, chwalk); } +static void rt_mutex_wake_waiter(struct rt_mutex_waiter *waiter) +{ + if (waiter->savestate) + wake_up_lock_sleeper(waiter->task); + else + wake_up_process(waiter->task); +} + /* * Max number of times we'll walk the boosting chain: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:425 @ int max_lock_depth = 1024; static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p) { - return p->pi_blocked_on ? p->pi_blocked_on->lock : NULL; + return rt_mutex_real_waiter(p->pi_blocked_on) ? + p->pi_blocked_on->lock : NULL; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:562 @ static int rt_mutex_adjust_prio_chain(st * reached or the state of the chain has changed while we * dropped the locks. */ - if (!waiter) + if (!rt_mutex_real_waiter(waiter)) goto out_unlock_pi; /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:742 @ static int rt_mutex_adjust_prio_chain(st * follow here. This is the end of the chain we are walking. */ if (!rt_mutex_owner(lock)) { + struct rt_mutex_waiter *lock_top_waiter; + /* * If the requeue [7] above changed the top waiter, * then we need to wake the new top waiter up to try * to get the lock. */ - if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) - wake_up_process(rt_mutex_top_waiter(lock)->task); + lock_top_waiter = rt_mutex_top_waiter(lock); + if (prerequeue_top_waiter != lock_top_waiter) + rt_mutex_wake_waiter(lock_top_waiter); raw_spin_unlock_irq(&lock->wait_lock); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:852 @ static int rt_mutex_adjust_prio_chain(st * @task: The task which wants to acquire the lock * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL + * @mode: Lock steal mode (STEAL_NORMAL, STEAL_LATERAL) */ -static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, - struct rt_mutex_waiter *waiter) +static int __try_to_take_rt_mutex(struct rt_mutex *lock, + struct task_struct *task, + struct rt_mutex_waiter *waiter, int mode) { lockdep_assert_held(&lock->wait_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:892 @ static int try_to_take_rt_mutex(struct r */ if (waiter) { /* - * If waiter is not the highest priority waiter of - * @lock, give up. + * If waiter is not the highest priority waiter of @lock, + * or its peer when lateral steal is allowed, give up. */ - if (waiter != rt_mutex_top_waiter(lock)) + if (!rt_mutex_steal(lock, waiter, mode)) return 0; - /* * We can acquire the lock. Remove the waiter from the * lock waiters tree. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:914 @ static int try_to_take_rt_mutex(struct r */ if (rt_mutex_has_waiters(lock)) { /* - * If @task->prio is greater than or equal to - * the top waiter priority (kernel view), - * @task lost. + * If @task->prio is greater than the top waiter + * priority (kernel view), or equal to it when a + * lateral steal is forbidden, @task lost. */ - if (!rt_mutex_waiter_less(task_to_waiter(task), - rt_mutex_top_waiter(lock))) + if (!rt_mutex_steal(lock, task_to_waiter(task), mode)) return 0; - /* * The current top waiter stays enqueued. We * don't have to change anything in the lock @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:966 @ takeit: return 1; } +#ifdef CONFIG_PREEMPT_RT +/* + * preemptible spin_lock functions: + */ +static inline void rt_spin_lock_fastlock(struct rt_mutex *lock, + void (*slowfn)(struct rt_mutex *lock)) +{ + might_sleep_no_state_check(); + + if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) + return; + else + slowfn(lock); +} + +static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock, + void (*slowfn)(struct rt_mutex *lock)) +{ + if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) + return; + else + slowfn(lock); +} +#ifdef CONFIG_SMP +/* + * Note that owner is a speculative pointer and dereferencing relies + * on rcu_read_lock() and the check against the lock owner. + */ +static int adaptive_wait(struct rt_mutex *lock, + struct task_struct *owner) +{ + int res = 0; + + rcu_read_lock(); + for (;;) { + if (owner != rt_mutex_owner(lock)) + break; + /* + * Ensure that owner->on_cpu is dereferenced _after_ + * checking the above to be valid. + */ + barrier(); + if (!owner->on_cpu) { + res = 1; + break; + } + cpu_relax(); + } + rcu_read_unlock(); + return res; +} +#else +static int adaptive_wait(struct rt_mutex *lock, + struct task_struct *orig_owner) +{ + return 1; +} +#endif + +static int task_blocks_on_rt_mutex(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + struct task_struct *task, + enum rtmutex_chainwalk chwalk); +/* + * Slow path lock function spin_lock style: this variant is very + * careful not to miss any non-lock wakeups. + * + * We store the current state under p->pi_lock in p->saved_state and + * the try_to_wake_up() code handles this accordingly. + */ +void __sched rt_spin_lock_slowlock_locked(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + unsigned long flags) +{ + struct task_struct *lock_owner, *self = current; + struct rt_mutex_waiter *top_waiter; + int ret; + + if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) + return; + + BUG_ON(rt_mutex_owner(lock) == self); + + /* + * We save whatever state the task is in and we'll restore it + * after acquiring the lock taking real wakeups into account + * as well. We are serialized via pi_lock against wakeups. See + * try_to_wake_up(). + */ + raw_spin_lock(&self->pi_lock); + self->saved_state = self->state; + __set_current_state_no_track(TASK_UNINTERRUPTIBLE); + raw_spin_unlock(&self->pi_lock); + + ret = task_blocks_on_rt_mutex(lock, waiter, self, RT_MUTEX_MIN_CHAINWALK); + BUG_ON(ret); + + for (;;) { + /* Try to acquire the lock again. */ + if (__try_to_take_rt_mutex(lock, self, waiter, STEAL_LATERAL)) + break; + + top_waiter = rt_mutex_top_waiter(lock); + lock_owner = rt_mutex_owner(lock); + + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + + debug_rt_mutex_print_deadlock(waiter); + + if (top_waiter != waiter || adaptive_wait(lock, lock_owner)) + schedule(); + + raw_spin_lock_irqsave(&lock->wait_lock, flags); + + raw_spin_lock(&self->pi_lock); + __set_current_state_no_track(TASK_UNINTERRUPTIBLE); + raw_spin_unlock(&self->pi_lock); + } + + /* + * Restore the task state to current->saved_state. We set it + * to the original state above and the try_to_wake_up() code + * has possibly updated it when a real (non-rtmutex) wakeup + * happened while we were blocked. Clear saved_state so + * try_to_wakeup() does not get confused. + */ + raw_spin_lock(&self->pi_lock); + __set_current_state_no_track(self->saved_state); + self->saved_state = TASK_RUNNING; + raw_spin_unlock(&self->pi_lock); + + /* + * try_to_take_rt_mutex() sets the waiter bit + * unconditionally. We might have to fix that up: + */ + fixup_rt_mutex_waiters(lock); + + BUG_ON(rt_mutex_has_waiters(lock) && waiter == rt_mutex_top_waiter(lock)); + BUG_ON(!RB_EMPTY_NODE(&waiter->tree_entry)); +} + +static void noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock) +{ + struct rt_mutex_waiter waiter; + unsigned long flags; + + rt_mutex_init_waiter(&waiter, true); + + raw_spin_lock_irqsave(&lock->wait_lock, flags); + rt_spin_lock_slowlock_locked(lock, &waiter, flags); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + debug_rt_mutex_free_waiter(&waiter); +} + +static bool __sched __rt_mutex_unlock_common(struct rt_mutex *lock, + struct wake_q_head *wake_q, + struct wake_q_head *wq_sleeper); +/* + * Slow path to release a rt_mutex spin_lock style + */ +void __sched rt_spin_lock_slowunlock(struct rt_mutex *lock) +{ + unsigned long flags; + DEFINE_WAKE_Q(wake_q); + DEFINE_WAKE_Q(wake_sleeper_q); + bool postunlock; + + raw_spin_lock_irqsave(&lock->wait_lock, flags); + postunlock = __rt_mutex_unlock_common(lock, &wake_q, &wake_sleeper_q); + raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + + if (postunlock) + rt_mutex_postunlock(&wake_q, &wake_sleeper_q); +} + +void __lockfunc rt_spin_lock(spinlock_t *lock) +{ + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); + rt_spin_lock_fastlock(&lock->lock, rt_spin_lock_slowlock); +} +EXPORT_SYMBOL(rt_spin_lock); + +void __lockfunc __rt_spin_lock(struct rt_mutex *lock) +{ + rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock); +} + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass) +{ + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + spin_acquire(&lock->dep_map, subclass, 0, _RET_IP_); + rt_spin_lock_fastlock(&lock->lock, rt_spin_lock_slowlock); +} +EXPORT_SYMBOL(rt_spin_lock_nested); +#endif + +void __lockfunc rt_spin_unlock(spinlock_t *lock) +{ + /* NOTE: we always pass in '1' for nested, for simplicity */ + spin_release(&lock->dep_map, 1, _RET_IP_); + rt_spin_lock_fastunlock(&lock->lock, rt_spin_lock_slowunlock); + migrate_enable(); + rcu_read_unlock(); + sleeping_lock_dec(); +} +EXPORT_SYMBOL(rt_spin_unlock); + +void __lockfunc __rt_spin_unlock(struct rt_mutex *lock) +{ + rt_spin_lock_fastunlock(lock, rt_spin_lock_slowunlock); +} +EXPORT_SYMBOL(__rt_spin_unlock); + +/* + * Wait for the lock to get unlocked: instead of polling for an unlock + * (like raw spinlocks do), we lock and unlock, to force the kernel to + * schedule if there's contention: + */ +void __lockfunc rt_spin_unlock_wait(spinlock_t *lock) +{ + spin_lock(lock); + spin_unlock(lock); +} +EXPORT_SYMBOL(rt_spin_unlock_wait); + +int __lockfunc rt_spin_trylock(spinlock_t *lock) +{ + int ret; + + sleeping_lock_inc(); + migrate_disable(); + ret = __rt_mutex_trylock(&lock->lock); + if (ret) { + spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); + rcu_read_lock(); + } else { + migrate_enable(); + sleeping_lock_dec(); + } + return ret; +} +EXPORT_SYMBOL(rt_spin_trylock); + +int __lockfunc rt_spin_trylock_bh(spinlock_t *lock) +{ + int ret; + + local_bh_disable(); + ret = __rt_mutex_trylock(&lock->lock); + if (ret) { + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); + } else + local_bh_enable(); + return ret; +} +EXPORT_SYMBOL(rt_spin_trylock_bh); + +int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags) +{ + int ret; + + *flags = 0; + ret = __rt_mutex_trylock(&lock->lock); + if (ret) { + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); + } + return ret; +} +EXPORT_SYMBOL(rt_spin_trylock_irqsave); + +void +__rt_spin_lock_init(spinlock_t *lock, const char *name, struct lock_class_key *key) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* + * Make sure we are not reinitializing a held lock: + */ + debug_check_no_locks_freed((void *)lock, sizeof(*lock)); + lockdep_init_map(&lock->dep_map, name, key, 0); +#endif +} +EXPORT_SYMBOL(__rt_spin_lock_init); + +#endif /* PREEMPT_RT */ + +#ifdef CONFIG_PREEMPT_RT + static inline int __sched +__mutex_lock_check_stamp(struct rt_mutex *lock, struct ww_acquire_ctx *ctx) +{ + struct ww_mutex *ww = container_of(lock, struct ww_mutex, base.lock); + struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx); + + if (!hold_ctx) + return 0; + + if (unlikely(ctx == hold_ctx)) + return -EALREADY; + + if (ctx->stamp - hold_ctx->stamp <= LONG_MAX && + (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) { +#ifdef CONFIG_DEBUG_MUTEXES + DEBUG_LOCKS_WARN_ON(ctx->contending_lock); + ctx->contending_lock = ww; +#endif + return -EDEADLK; + } + + return 0; +} +#else + static inline int __sched +__mutex_lock_check_stamp(struct rt_mutex *lock, struct ww_acquire_ctx *ctx) +{ + BUG(); + return 0; +} + +#endif + +static inline int +try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, + struct rt_mutex_waiter *waiter) +{ + return __try_to_take_rt_mutex(lock, task, waiter, STEAL_NORMAL); +} + /* * Task blocks on lock. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1336 @ static int task_blocks_on_rt_mutex(struc return -EDEADLK; raw_spin_lock(&task->pi_lock); + /* + * In the case of futex requeue PI, this will be a proxy + * lock. The task will wake unaware that it is enqueueed on + * this lock. Avoid blocking on two locks and corrupting + * pi_blocked_on via the PI_WAKEUP_INPROGRESS + * flag. futex_wait_requeue_pi() sets this when it wakes up + * before requeue (due to a signal or timeout). Do not enqueue + * the task if PI_WAKEUP_INPROGRESS is set. + */ + if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { + raw_spin_unlock(&task->pi_lock); + return -EAGAIN; + } + + BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)); + waiter->task = task; waiter->lock = lock; waiter->prio = task->prio; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1375 @ static int task_blocks_on_rt_mutex(struc rt_mutex_enqueue_pi(owner, waiter); rt_mutex_adjust_prio(owner); - if (owner->pi_blocked_on) + if (rt_mutex_real_waiter(owner->pi_blocked_on)) chain_walk = 1; } else if (rt_mutex_cond_detect_deadlock(waiter, chwalk)) { chain_walk = 1; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1417 @ static int task_blocks_on_rt_mutex(struc * Called with lock->wait_lock held and interrupts disabled. */ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, + struct wake_q_head *wake_sleeper_q, struct rt_mutex *lock) { struct rt_mutex_waiter *waiter; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1457 @ static void mark_wakeup_next_waiter(stru * Pairs with preempt_enable() in rt_mutex_postunlock(); */ preempt_disable(); - wake_q_add(wake_q, waiter->task); + if (waiter->savestate) + wake_q_add_sleeper(wake_sleeper_q, waiter->task); + else + wake_q_add(wake_q, waiter->task); raw_spin_unlock(¤t->pi_lock); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1475 @ static void remove_waiter(struct rt_mute { bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock)); struct task_struct *owner = rt_mutex_owner(lock); - struct rt_mutex *next_lock; + struct rt_mutex *next_lock = NULL; lockdep_assert_held(&lock->wait_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1501 @ static void remove_waiter(struct rt_mute rt_mutex_adjust_prio(owner); /* Store the lock on which owner is blocked or NULL */ - next_lock = task_blocked_on_lock(owner); + if (rt_mutex_real_waiter(owner->pi_blocked_on)) + next_lock = task_blocked_on_lock(owner); raw_spin_unlock(&owner->pi_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1538 @ void rt_mutex_adjust_pi(struct task_stru raw_spin_lock_irqsave(&task->pi_lock, flags); waiter = task->pi_blocked_on; - if (!waiter || rt_mutex_waiter_equal(waiter, task_to_waiter(task))) { + if (!rt_mutex_real_waiter(waiter) || + rt_mutex_waiter_equal(waiter, task_to_waiter(task))) { raw_spin_unlock_irqrestore(&task->pi_lock, flags); return; } next_lock = waiter->lock; - raw_spin_unlock_irqrestore(&task->pi_lock, flags); /* gets dropped in rt_mutex_adjust_prio_chain()! */ get_task_struct(task); + raw_spin_unlock_irqrestore(&task->pi_lock, flags); rt_mutex_adjust_prio_chain(task, RT_MUTEX_MIN_CHAINWALK, NULL, next_lock, NULL, task); } -void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter) +void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter, bool savestate) { debug_rt_mutex_init_waiter(waiter); RB_CLEAR_NODE(&waiter->pi_tree_entry); RB_CLEAR_NODE(&waiter->tree_entry); waiter->task = NULL; + waiter->savestate = savestate; } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1575 @ void rt_mutex_init_waiter(struct rt_mute static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, int state, struct hrtimer_sleeper *timeout, - struct rt_mutex_waiter *waiter) + struct rt_mutex_waiter *waiter, + struct ww_acquire_ctx *ww_ctx) { int ret = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1585 @ __rt_mutex_slowlock(struct rt_mutex *loc if (try_to_take_rt_mutex(lock, current, waiter)) break; - /* - * TASK_INTERRUPTIBLE checks for signals and - * timeout. Ignored otherwise. - */ - if (likely(state == TASK_INTERRUPTIBLE)) { - /* Signal pending? */ - if (signal_pending(current)) - ret = -EINTR; - if (timeout && !timeout->task) - ret = -ETIMEDOUT; + if (timeout && !timeout->task) { + ret = -ETIMEDOUT; + break; + } + if (signal_pending_state(state, current)) { + ret = -EINTR; + break; + } + + if (ww_ctx && ww_ctx->acquired > 0) { + ret = __mutex_lock_check_stamp(lock, ww_ctx); if (ret) break; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1634 @ static void rt_mutex_handle_deadlock(int } } -/* - * Slow path lock function: - */ -static int __sched -rt_mutex_slowlock(struct rt_mutex *lock, int state, - struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk) +static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww, + struct ww_acquire_ctx *ww_ctx) { - struct rt_mutex_waiter waiter; - unsigned long flags; - int ret = 0; +#ifdef CONFIG_DEBUG_MUTEXES + /* + * If this WARN_ON triggers, you used ww_mutex_lock to acquire, + * but released with a normal mutex_unlock in this call. + * + * This should never happen, always use ww_mutex_unlock. + */ + DEBUG_LOCKS_WARN_ON(ww->ctx); - rt_mutex_init_waiter(&waiter); + /* + * Not quite done after calling ww_acquire_done() ? + */ + DEBUG_LOCKS_WARN_ON(ww_ctx->done_acquire); + + if (ww_ctx->contending_lock) { + /* + * After -EDEADLK you tried to + * acquire a different ww_mutex? Bad! + */ + DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww); + + /* + * You called ww_mutex_lock after receiving -EDEADLK, + * but 'forgot' to unlock everything else first? + */ + DEBUG_LOCKS_WARN_ON(ww_ctx->acquired > 0); + ww_ctx->contending_lock = NULL; + } /* - * Technically we could use raw_spin_[un]lock_irq() here, but this can - * be called in early boot if the cmpxchg() fast path is disabled - * (debug, no architecture support). In this case we will acquire the - * rtmutex with lock->wait_lock held. But we cannot unconditionally - * enable interrupts in that early boot case. So we need to use the - * irqsave/restore variants. + * Naughty, using a different class will lead to undefined behavior! */ - raw_spin_lock_irqsave(&lock->wait_lock, flags); + DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class); +#endif + ww_ctx->acquired++; +} + +#ifdef CONFIG_PREEMPT_RT +static void ww_mutex_account_lock(struct rt_mutex *lock, + struct ww_acquire_ctx *ww_ctx) +{ + struct ww_mutex *ww = container_of(lock, struct ww_mutex, base.lock); + struct rt_mutex_waiter *waiter, *n; + + /* + * This branch gets optimized out for the common case, + * and is only important for ww_mutex_lock. + */ + ww_mutex_lock_acquired(ww, ww_ctx); + ww->ctx = ww_ctx; + + /* + * Give any possible sleeping processes the chance to wake up, + * so they can recheck if they have to back off. + */ + rbtree_postorder_for_each_entry_safe(waiter, n, &lock->waiters.rb_root, + tree_entry) { + /* XXX debug rt mutex waiter wakeup */ + + BUG_ON(waiter->lock != lock); + rt_mutex_wake_waiter(waiter); + } +} + +#else + +static void ww_mutex_account_lock(struct rt_mutex *lock, + struct ww_acquire_ctx *ww_ctx) +{ + BUG(); +} +#endif + +int __sched rt_mutex_slowlock_locked(struct rt_mutex *lock, int state, + struct hrtimer_sleeper *timeout, + enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx, + struct rt_mutex_waiter *waiter) +{ + int ret; + +#ifdef CONFIG_PREEMPT_RT + if (ww_ctx) { + struct ww_mutex *ww; + + ww = container_of(lock, struct ww_mutex, base.lock); + if (unlikely(ww_ctx == READ_ONCE(ww->ctx))) + return -EALREADY; + } +#endif /* Try to acquire the lock again: */ if (try_to_take_rt_mutex(lock, current, NULL)) { - raw_spin_unlock_irqrestore(&lock->wait_lock, flags); + if (ww_ctx) + ww_mutex_account_lock(lock, ww_ctx); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1741 @ rt_mutex_slowlock(struct rt_mutex *lock, if (unlikely(timeout)) hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS); - ret = task_blocks_on_rt_mutex(lock, &waiter, current, chwalk); + ret = task_blocks_on_rt_mutex(lock, waiter, current, chwalk); - if (likely(!ret)) + if (likely(!ret)) { /* sleep on the mutex */ - ret = __rt_mutex_slowlock(lock, state, timeout, &waiter); + ret = __rt_mutex_slowlock(lock, state, timeout, waiter, + ww_ctx); + } else if (ww_ctx) { + /* ww_mutex received EDEADLK, let it become EALREADY */ + ret = __mutex_lock_check_stamp(lock, ww_ctx); + BUG_ON(!ret); + } if (unlikely(ret)) { __set_current_state(TASK_RUNNING); - remove_waiter(lock, &waiter); - rt_mutex_handle_deadlock(ret, chwalk, &waiter); + remove_waiter(lock, waiter); + /* ww_mutex wants to report EDEADLK/EALREADY, let it */ + if (!ww_ctx) + rt_mutex_handle_deadlock(ret, chwalk, waiter); + } else if (ww_ctx) { + ww_mutex_account_lock(lock, ww_ctx); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1768 @ rt_mutex_slowlock(struct rt_mutex *lock, * unconditionally. We might have to fix that up. */ fixup_rt_mutex_waiters(lock); + return ret; +} + +/* + * Slow path lock function: + */ +static int __sched +rt_mutex_slowlock(struct rt_mutex *lock, int state, + struct hrtimer_sleeper *timeout, + enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx) +{ + struct rt_mutex_waiter waiter; + unsigned long flags; + int ret = 0; + + rt_mutex_init_waiter(&waiter, false); + + /* + * Technically we could use raw_spin_[un]lock_irq() here, but this can + * be called in early boot if the cmpxchg() fast path is disabled + * (debug, no architecture support). In this case we will acquire the + * rtmutex with lock->wait_lock held. But we cannot unconditionally + * enable interrupts in that early boot case. So we need to use the + * irqsave/restore variants. + */ + raw_spin_lock_irqsave(&lock->wait_lock, flags); + + ret = rt_mutex_slowlock_locked(lock, state, timeout, chwalk, ww_ctx, + &waiter); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1858 @ static inline int rt_mutex_slowtrylock(s * Return whether the current task needs to call rt_mutex_postunlock(). */ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, - struct wake_q_head *wake_q) + struct wake_q_head *wake_q, + struct wake_q_head *wake_sleeper_q) { unsigned long flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1913 @ static bool __sched rt_mutex_slowunlock( * * Queue the next waiter for wakeup once we release the wait_lock. */ - mark_wakeup_next_waiter(wake_q, lock); + mark_wakeup_next_waiter(wake_q, wake_sleeper_q, lock); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); return true; /* call rt_mutex_postunlock() */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1927 @ static bool __sched rt_mutex_slowunlock( */ static inline int rt_mutex_fastlock(struct rt_mutex *lock, int state, + struct ww_acquire_ctx *ww_ctx, int (*slowfn)(struct rt_mutex *lock, int state, struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk)) + enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx)) { if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; - return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK); + /* + * If rt_mutex blocks, the function sched_submit_work will not call + * blk_schedule_flush_plug (because tsk_is_pi_blocked would be true). + * We must call blk_schedule_flush_plug here, if we don't call it, + * a deadlock in I/O may happen. + */ + if (unlikely(blk_needs_flush_plug(current))) + blk_schedule_flush_plug(current); + + return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK, ww_ctx); } static inline int rt_mutex_timed_fastlock(struct rt_mutex *lock, int state, struct hrtimer_sleeper *timeout, enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx, int (*slowfn)(struct rt_mutex *lock, int state, struct hrtimer_sleeper *timeout, - enum rtmutex_chainwalk chwalk)) + enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx)) { if (chwalk == RT_MUTEX_MIN_CHAINWALK && likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; - return slowfn(lock, state, timeout, chwalk); + if (unlikely(blk_needs_flush_plug(current))) + blk_schedule_flush_plug(current); + + return slowfn(lock, state, timeout, chwalk, ww_ctx); } static inline int @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1981 @ rt_mutex_fasttrylock(struct rt_mutex *lo /* * Performs the wakeup of the the top-waiter and re-enables preemption. */ -void rt_mutex_postunlock(struct wake_q_head *wake_q) +void rt_mutex_postunlock(struct wake_q_head *wake_q, + struct wake_q_head *wake_sleeper_q) { wake_up_q(wake_q); + wake_up_q_sleeper(wake_sleeper_q); /* Pairs with preempt_disable() in rt_mutex_slowunlock() */ preempt_enable(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1994 @ void rt_mutex_postunlock(struct wake_q_h static inline void rt_mutex_fastunlock(struct rt_mutex *lock, bool (*slowfn)(struct rt_mutex *lock, - struct wake_q_head *wqh)) + struct wake_q_head *wqh, + struct wake_q_head *wq_sleeper)) { DEFINE_WAKE_Q(wake_q); + DEFINE_WAKE_Q(wake_sleeper_q); if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) return; - if (slowfn(lock, &wake_q)) - rt_mutex_postunlock(&wake_q); + if (slowfn(lock, &wake_q, &wake_sleeper_q)) + rt_mutex_postunlock(&wake_q, &wake_sleeper_q); } -static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass) +int __sched __rt_mutex_lock_state(struct rt_mutex *lock, int state) { might_sleep(); + return rt_mutex_fastlock(lock, state, NULL, rt_mutex_slowlock); +} + +/** + * rt_mutex_lock_state - lock a rt_mutex with a given state + * + * @lock: The rt_mutex to be locked + * @state: The state to set when blocking on the rt_mutex + */ +static inline int __sched rt_mutex_lock_state(struct rt_mutex *lock, + unsigned int subclass, int state) +{ + int ret; mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_); - rt_mutex_fastlock(lock, TASK_UNINTERRUPTIBLE, rt_mutex_slowlock); + ret = __rt_mutex_lock_state(lock, state); + if (ret) + mutex_release(&lock->dep_map, 1, _RET_IP_); + return ret; +} + +static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass) +{ + rt_mutex_lock_state(lock, subclass, TASK_UNINTERRUPTIBLE); } #ifdef CONFIG_DEBUG_LOCK_ALLOC @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2074 @ EXPORT_SYMBOL_GPL(rt_mutex_lock); */ int __sched rt_mutex_lock_interruptible(struct rt_mutex *lock) { - int ret; - - might_sleep(); - - mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); - ret = rt_mutex_fastlock(lock, TASK_INTERRUPTIBLE, rt_mutex_slowlock); - if (ret) - mutex_release(&lock->dep_map, 1, _RET_IP_); - - return ret; + return rt_mutex_lock_state(lock, 0, TASK_INTERRUPTIBLE); } EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2092 @ int __sched __rt_mutex_futex_trylock(str } /** + * rt_mutex_lock_killable - lock a rt_mutex killable + * + * @lock: the rt_mutex to be locked + * @detect_deadlock: deadlock detection on/off + * + * Returns: + * 0 on success + * -EINTR when interrupted by a signal + */ +int __sched rt_mutex_lock_killable(struct rt_mutex *lock) +{ + return rt_mutex_lock_state(lock, 0, TASK_KILLABLE); +} +EXPORT_SYMBOL_GPL(rt_mutex_lock_killable); + +/** * rt_mutex_timed_lock - lock a rt_mutex interruptible * the timeout structure is provided * by the caller @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2130 @ rt_mutex_timed_lock(struct rt_mutex *loc mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_); ret = rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout, RT_MUTEX_MIN_CHAINWALK, + NULL, rt_mutex_slowlock); if (ret) mutex_release(&lock->dep_map, 1, _RET_IP_); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2139 @ rt_mutex_timed_lock(struct rt_mutex *loc } EXPORT_SYMBOL_GPL(rt_mutex_timed_lock); +int __sched __rt_mutex_trylock(struct rt_mutex *lock) +{ +#ifdef CONFIG_PREEMPT_RT + if (WARN_ON_ONCE(in_irq() || in_nmi())) +#else + if (WARN_ON_ONCE(in_irq() || in_nmi() || in_serving_softirq())) +#endif + return 0; + + return rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock); +} + /** * rt_mutex_trylock - try to lock a rt_mutex * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2166 @ int __sched rt_mutex_trylock(struct rt_m { int ret; - if (WARN_ON_ONCE(in_irq() || in_nmi() || in_serving_softirq())) - return 0; - - ret = rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock); + ret = __rt_mutex_trylock(lock); if (ret) mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2174 @ int __sched rt_mutex_trylock(struct rt_m } EXPORT_SYMBOL_GPL(rt_mutex_trylock); +void __sched __rt_mutex_unlock(struct rt_mutex *lock) +{ + rt_mutex_fastunlock(lock, rt_mutex_slowunlock); +} + /** * rt_mutex_unlock - unlock a rt_mutex * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2187 @ EXPORT_SYMBOL_GPL(rt_mutex_trylock); void __sched rt_mutex_unlock(struct rt_mutex *lock) { mutex_release(&lock->dep_map, 1, _RET_IP_); - rt_mutex_fastunlock(lock, rt_mutex_slowunlock); + __rt_mutex_unlock(lock); } EXPORT_SYMBOL_GPL(rt_mutex_unlock); -/** - * Futex variant, that since futex variants do not use the fast-path, can be - * simple and will not need to retry. - */ -bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock, - struct wake_q_head *wake_q) +static bool __sched __rt_mutex_unlock_common(struct rt_mutex *lock, + struct wake_q_head *wake_q, + struct wake_q_head *wq_sleeper) { lockdep_assert_held(&lock->wait_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2210 @ bool __sched __rt_mutex_futex_unlock(str * avoid inversion prior to the wakeup. preempt_disable() * therein pairs with rt_mutex_postunlock(). */ - mark_wakeup_next_waiter(wake_q, lock); + mark_wakeup_next_waiter(wake_q, wq_sleeper, lock); return true; /* call postunlock() */ } +/** + * Futex variant, that since futex variants do not use the fast-path, can be + * simple and will not need to retry. + */ +bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wake_q, + struct wake_q_head *wq_sleeper) +{ + return __rt_mutex_unlock_common(lock, wake_q, wq_sleeper); +} + void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) { DEFINE_WAKE_Q(wake_q); + DEFINE_WAKE_Q(wake_sleeper_q); unsigned long flags; bool postunlock; raw_spin_lock_irqsave(&lock->wait_lock, flags); - postunlock = __rt_mutex_futex_unlock(lock, &wake_q); + postunlock = __rt_mutex_futex_unlock(lock, &wake_q, &wake_sleeper_q); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); if (postunlock) - rt_mutex_postunlock(&wake_q); + rt_mutex_postunlock(&wake_q, &wake_sleeper_q); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2277 @ void __rt_mutex_init(struct rt_mutex *lo if (name && key) debug_rt_mutex_init(lock, name, key); } -EXPORT_SYMBOL_GPL(__rt_mutex_init); +EXPORT_SYMBOL(__rt_mutex_init); /** * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2297 @ void rt_mutex_init_proxy_locked(struct r struct task_struct *proxy_owner) { __rt_mutex_init(lock, NULL, NULL); +#ifdef CONFIG_DEBUG_SPINLOCK + /* + * get another key class for the wait_lock. LOCK_PI and UNLOCK_PI is + * holding the ->wait_lock of the proxy_lock while unlocking a sleeping + * lock. + */ + raw_spin_lock_init(&lock->wait_lock); +#endif debug_rt_mutex_proxy_lock(lock, proxy_owner); rt_mutex_set_owner(lock, proxy_owner); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2328 @ void rt_mutex_proxy_unlock(struct rt_mut rt_mutex_set_owner(lock, NULL); } +static void fixup_rt_mutex_blocked(struct rt_mutex *lock) +{ + struct task_struct *tsk = current; + /* + * RT has a problem here when the wait got interrupted by a timeout + * or a signal. task->pi_blocked_on is still set. The task must + * acquire the hash bucket lock when returning from this function. + * + * If the hash bucket lock is contended then the + * BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in + * task_blocks_on_rt_mutex() will trigger. This can be avoided by + * clearing task->pi_blocked_on which removes the task from the + * boosting chain of the rtmutex. That's correct because the task + * is not longer blocked on it. + */ + raw_spin_lock(&tsk->pi_lock); + tsk->pi_blocked_on = NULL; + raw_spin_unlock(&tsk->pi_lock); +} + /** * __rt_mutex_start_proxy_lock() - Start lock acquisition for another task * @lock: the rt_mutex to take @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2378 @ int __rt_mutex_start_proxy_lock(struct r if (try_to_take_rt_mutex(lock, task, NULL)) return 1; +#ifdef CONFIG_PREEMPT_RT + /* + * In PREEMPT_RT there's an added race. + * If the task, that we are about to requeue, times out, + * it can set the PI_WAKEUP_INPROGRESS. This tells the requeue + * to skip this task. But right after the task sets + * its pi_blocked_on to PI_WAKEUP_INPROGRESS it can then + * block on the spin_lock(&hb->lock), which in RT is an rtmutex. + * This will replace the PI_WAKEUP_INPROGRESS with the actual + * lock that it blocks on. We *must not* place this task + * on this proxy lock in that case. + * + * To prevent this race, we first take the task's pi_lock + * and check if it has updated its pi_blocked_on. If it has, + * we assume that it woke up and we return -EAGAIN. + * Otherwise, we set the task's pi_blocked_on to + * PI_REQUEUE_INPROGRESS, so that if the task is waking up + * it will know that we are in the process of requeuing it. + */ + raw_spin_lock(&task->pi_lock); + if (task->pi_blocked_on) { + raw_spin_unlock(&task->pi_lock); + return -EAGAIN; + } + task->pi_blocked_on = PI_REQUEUE_INPROGRESS; + raw_spin_unlock(&task->pi_lock); +#endif + /* We enforce deadlock detection for futexes */ ret = task_blocks_on_rt_mutex(lock, waiter, task, RT_MUTEX_FULL_CHAINWALK); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2420 @ int __rt_mutex_start_proxy_lock(struct r ret = 0; } + if (ret) + fixup_rt_mutex_blocked(lock); + debug_rt_mutex_print_deadlock(waiter); return ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2508 @ int rt_mutex_wait_proxy_lock(struct rt_m raw_spin_lock_irq(&lock->wait_lock); /* sleep on the mutex */ set_current_state(TASK_INTERRUPTIBLE); - ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter); + ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter, NULL); /* * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might * have to fix that up. */ fixup_rt_mutex_waiters(lock); + if (ret) + fixup_rt_mutex_blocked(lock); + raw_spin_unlock_irq(&lock->wait_lock); return ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2578 @ bool rt_mutex_cleanup_proxy_lock(struct return cleanup; } + +static inline int +ww_mutex_deadlock_injection(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) +{ +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH + unsigned tmp; + + if (ctx->deadlock_inject_countdown-- == 0) { + tmp = ctx->deadlock_inject_interval; + if (tmp > UINT_MAX/4) + tmp = UINT_MAX; + else + tmp = tmp*2 + tmp + tmp/2; + + ctx->deadlock_inject_interval = tmp; + ctx->deadlock_inject_countdown = tmp; + ctx->contending_lock = lock; + + ww_mutex_unlock(lock); + + return -EDEADLK; + } +#endif + + return 0; +} + +#ifdef CONFIG_PREEMPT_RT +int __sched +ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) +{ + int ret; + + might_sleep(); + + mutex_acquire_nest(&lock->base.dep_map, 0, 0, + ctx ? &ctx->dep_map : NULL, _RET_IP_); + ret = rt_mutex_slowlock(&lock->base.lock, TASK_INTERRUPTIBLE, NULL, 0, + ctx); + if (ret) + mutex_release(&lock->base.dep_map, 1, _RET_IP_); + else if (!ret && ctx && ctx->acquired > 1) + return ww_mutex_deadlock_injection(lock, ctx); + + return ret; +} +EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible); + +int __sched +ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx) +{ + int ret; + + might_sleep(); + + mutex_acquire_nest(&lock->base.dep_map, 0, 0, + ctx ? &ctx->dep_map : NULL, _RET_IP_); + ret = rt_mutex_slowlock(&lock->base.lock, TASK_UNINTERRUPTIBLE, NULL, 0, + ctx); + if (ret) + mutex_release(&lock->base.dep_map, 1, _RET_IP_); + else if (!ret && ctx && ctx->acquired > 1) + return ww_mutex_deadlock_injection(lock, ctx); + + return ret; +} +EXPORT_SYMBOL_GPL(ww_mutex_lock); + +void __sched ww_mutex_unlock(struct ww_mutex *lock) +{ + int nest = !!lock->ctx; + + /* + * The unlocking fastpath is the 0->1 transition from 'locked' + * into 'unlocked' state: + */ + if (nest) { +#ifdef CONFIG_DEBUG_MUTEXES + DEBUG_LOCKS_WARN_ON(!lock->ctx->acquired); +#endif + if (lock->ctx->acquired > 0) + lock->ctx->acquired--; + lock->ctx = NULL; + } + + mutex_release(&lock->base.dep_map, nest, _RET_IP_); + __rt_mutex_unlock(&lock->base.lock); +} +EXPORT_SYMBOL(ww_mutex_unlock); + +int __rt_mutex_owner_current(struct rt_mutex *lock) +{ + return rt_mutex_owner(lock) == current; +} +EXPORT_SYMBOL(__rt_mutex_owner_current); +#endif Index: linux-5.4.5-rt3/kernel/locking/rtmutex_common.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/rtmutex_common.h +++ linux-5.4.5-rt3/kernel/locking/rtmutex_common.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:18 @ #include <linux/rtmutex.h> #include <linux/sched/wake_q.h> +#include <linux/sched/debug.h> /* * This is the control structure for tasks blocked on a rt_mutex, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ struct rt_mutex_waiter { struct rb_node pi_tree_entry; struct task_struct *task; struct rt_mutex *lock; + bool savestate; #ifdef CONFIG_DEBUG_RT_MUTEXES unsigned long ip; struct pid *deadlock_task_pid; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:135 @ enum rtmutex_chainwalk { /* * PI-futex support (proxy locking functions, etc.): */ +#define PI_WAKEUP_INPROGRESS ((struct rt_mutex_waiter *) 1) +#define PI_REQUEUE_INPROGRESS ((struct rt_mutex_waiter *) 2) + extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock); extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock, struct task_struct *proxy_owner); extern void rt_mutex_proxy_unlock(struct rt_mutex *lock, struct task_struct *proxy_owner); -extern void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter); +extern void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter, bool savetate); extern int __rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, struct task_struct *task); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:161 @ extern int __rt_mutex_futex_trylock(stru extern void rt_mutex_futex_unlock(struct rt_mutex *lock); extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock, - struct wake_q_head *wqh); + struct wake_q_head *wqh, + struct wake_q_head *wq_sleeper); + +extern void rt_mutex_postunlock(struct wake_q_head *wake_q, + struct wake_q_head *wake_sleeper_q); + +/* RW semaphore special interface */ +struct ww_acquire_ctx; -extern void rt_mutex_postunlock(struct wake_q_head *wake_q); +extern int __rt_mutex_lock_state(struct rt_mutex *lock, int state); +extern int __rt_mutex_trylock(struct rt_mutex *lock); +extern void __rt_mutex_unlock(struct rt_mutex *lock); +int __sched rt_mutex_slowlock_locked(struct rt_mutex *lock, int state, + struct hrtimer_sleeper *timeout, + enum rtmutex_chainwalk chwalk, + struct ww_acquire_ctx *ww_ctx, + struct rt_mutex_waiter *waiter); +void __sched rt_spin_lock_slowlock_locked(struct rt_mutex *lock, + struct rt_mutex_waiter *waiter, + unsigned long flags); +void __sched rt_spin_lock_slowunlock(struct rt_mutex *lock); #ifdef CONFIG_DEBUG_RT_MUTEXES # include "rtmutex-debug.h" Index: linux-5.4.5-rt3/kernel/locking/rwlock-rt.c =================================================================== --- /dev/null +++ linux-5.4.5-rt3/kernel/locking/rwlock-rt.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +/* + */ +#include <linux/sched/debug.h> +#include <linux/export.h> + +#include "rtmutex_common.h" +#include <linux/rwlock_types_rt.h> + +/* + * RT-specific reader/writer locks + * + * write_lock() + * 1) Lock lock->rtmutex + * 2) Remove the reader BIAS to force readers into the slow path + * 3) Wait until all readers have left the critical region + * 4) Mark it write locked + * + * write_unlock() + * 1) Remove the write locked marker + * 2) Set the reader BIAS so readers can use the fast path again + * 3) Unlock lock->rtmutex to release blocked readers + * + * read_lock() + * 1) Try fast path acquisition (reader BIAS is set) + * 2) Take lock->rtmutex.wait_lock which protects the writelocked flag + * 3) If !writelocked, acquire it for read + * 4) If writelocked, block on lock->rtmutex + * 5) unlock lock->rtmutex, goto 1) + * + * read_unlock() + * 1) Try fast path release (reader count != 1) + * 2) Wake the writer waiting in write_lock()#3 + * + * read_lock()#3 has the consequence, that rw locks on RT are not writer + * fair, but writers, which should be avoided in RT tasks (think tasklist + * lock), are subject to the rtmutex priority/DL inheritance mechanism. + * + * It's possible to make the rw locks writer fair by keeping a list of + * active readers. A blocked writer would force all newly incoming readers + * to block on the rtmutex, but the rtmutex would have to be proxy locked + * for one reader after the other. We can't use multi-reader inheritance + * because there is no way to support that with + * SCHED_DEADLINE. Implementing the one by one reader boosting/handover + * mechanism is a major surgery for a very dubious value. + * + * The risk of writer starvation is there, but the pathological use cases + * which trigger it are not necessarily the typical RT workloads. + */ + +void __rwlock_biased_rt_init(struct rt_rw_lock *lock, const char *name, + struct lock_class_key *key) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* + * Make sure we are not reinitializing a held semaphore: + */ + debug_check_no_locks_freed((void *)lock, sizeof(*lock)); + lockdep_init_map(&lock->dep_map, name, key, 0); +#endif + atomic_set(&lock->readers, READER_BIAS); + rt_mutex_init(&lock->rtmutex); + lock->rtmutex.save_state = 1; +} + +int __read_rt_trylock(struct rt_rw_lock *lock) +{ + int r, old; + + /* + * Increment reader count, if lock->readers < 0, i.e. READER_BIAS is + * set. + */ + for (r = atomic_read(&lock->readers); r < 0;) { + old = atomic_cmpxchg(&lock->readers, r, r + 1); + if (likely(old == r)) + return 1; + r = old; + } + return 0; +} + +void __sched __read_rt_lock(struct rt_rw_lock *lock) +{ + struct rt_mutex *m = &lock->rtmutex; + struct rt_mutex_waiter waiter; + unsigned long flags; + + if (__read_rt_trylock(lock)) + return; + + raw_spin_lock_irqsave(&m->wait_lock, flags); + /* + * Allow readers as long as the writer has not completely + * acquired the semaphore for write. + */ + if (atomic_read(&lock->readers) != WRITER_BIAS) { + atomic_inc(&lock->readers); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + return; + } + + /* + * Call into the slow lock path with the rtmutex->wait_lock + * held, so this can't result in the following race: + * + * Reader1 Reader2 Writer + * read_lock() + * write_lock() + * rtmutex_lock(m) + * swait() + * read_lock() + * unlock(m->wait_lock) + * read_unlock() + * swake() + * lock(m->wait_lock) + * lock->writelocked=true + * unlock(m->wait_lock) + * + * write_unlock() + * lock->writelocked=false + * rtmutex_unlock(m) + * read_lock() + * write_lock() + * rtmutex_lock(m) + * swait() + * rtmutex_lock(m) + * + * That would put Reader1 behind the writer waiting on + * Reader2 to call read_unlock() which might be unbound. + */ + rt_mutex_init_waiter(&waiter, true); + rt_spin_lock_slowlock_locked(m, &waiter, flags); + /* + * The slowlock() above is guaranteed to return with the rtmutex is + * now held, so there can't be a writer active. Increment the reader + * count and immediately drop the rtmutex again. + */ + atomic_inc(&lock->readers); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + rt_spin_lock_slowunlock(m); + + debug_rt_mutex_free_waiter(&waiter); +} + +void __read_rt_unlock(struct rt_rw_lock *lock) +{ + struct rt_mutex *m = &lock->rtmutex; + struct task_struct *tsk; + + /* + * sem->readers can only hit 0 when a writer is waiting for the + * active readers to leave the critical region. + */ + if (!atomic_dec_and_test(&lock->readers)) + return; + + raw_spin_lock_irq(&m->wait_lock); + /* + * Wake the writer, i.e. the rtmutex owner. It might release the + * rtmutex concurrently in the fast path, but to clean up the rw + * lock it needs to acquire m->wait_lock. The worst case which can + * happen is a spurious wakeup. + */ + tsk = rt_mutex_owner(m); + if (tsk) + wake_up_process(tsk); + + raw_spin_unlock_irq(&m->wait_lock); +} + +static void __write_unlock_common(struct rt_rw_lock *lock, int bias, + unsigned long flags) +{ + struct rt_mutex *m = &lock->rtmutex; + + atomic_add(READER_BIAS - bias, &lock->readers); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + rt_spin_lock_slowunlock(m); +} + +void __sched __write_rt_lock(struct rt_rw_lock *lock) +{ + struct rt_mutex *m = &lock->rtmutex; + struct task_struct *self = current; + unsigned long flags; + + /* Take the rtmutex as a first step */ + __rt_spin_lock(m); + + /* Force readers into slow path */ + atomic_sub(READER_BIAS, &lock->readers); + + raw_spin_lock_irqsave(&m->wait_lock, flags); + + raw_spin_lock(&self->pi_lock); + self->saved_state = self->state; + __set_current_state_no_track(TASK_UNINTERRUPTIBLE); + raw_spin_unlock(&self->pi_lock); + + for (;;) { + /* Have all readers left the critical region? */ + if (!atomic_read(&lock->readers)) { + atomic_set(&lock->readers, WRITER_BIAS); + raw_spin_lock(&self->pi_lock); + __set_current_state_no_track(self->saved_state); + self->saved_state = TASK_RUNNING; + raw_spin_unlock(&self->pi_lock); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + return; + } + + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + + if (atomic_read(&lock->readers) != 0) + schedule(); + + raw_spin_lock_irqsave(&m->wait_lock, flags); + + raw_spin_lock(&self->pi_lock); + __set_current_state_no_track(TASK_UNINTERRUPTIBLE); + raw_spin_unlock(&self->pi_lock); + } +} + +int __write_rt_trylock(struct rt_rw_lock *lock) +{ + struct rt_mutex *m = &lock->rtmutex; + unsigned long flags; + + if (!__rt_mutex_trylock(m)) + return 0; + + atomic_sub(READER_BIAS, &lock->readers); + + raw_spin_lock_irqsave(&m->wait_lock, flags); + if (!atomic_read(&lock->readers)) { + atomic_set(&lock->readers, WRITER_BIAS); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + return 1; + } + __write_unlock_common(lock, 0, flags); + return 0; +} + +void __write_rt_unlock(struct rt_rw_lock *lock) +{ + struct rt_mutex *m = &lock->rtmutex; + unsigned long flags; + + raw_spin_lock_irqsave(&m->wait_lock, flags); + __write_unlock_common(lock, WRITER_BIAS, flags); +} + +/* Map the reader biased implementation */ +static inline int do_read_rt_trylock(rwlock_t *rwlock) +{ + return __read_rt_trylock(rwlock); +} + +static inline int do_write_rt_trylock(rwlock_t *rwlock) +{ + return __write_rt_trylock(rwlock); +} + +static inline void do_read_rt_lock(rwlock_t *rwlock) +{ + __read_rt_lock(rwlock); +} + +static inline void do_write_rt_lock(rwlock_t *rwlock) +{ + __write_rt_lock(rwlock); +} + +static inline void do_read_rt_unlock(rwlock_t *rwlock) +{ + __read_rt_unlock(rwlock); +} + +static inline void do_write_rt_unlock(rwlock_t *rwlock) +{ + __write_rt_unlock(rwlock); +} + +static inline void do_rwlock_rt_init(rwlock_t *rwlock, const char *name, + struct lock_class_key *key) +{ + __rwlock_biased_rt_init(rwlock, name, key); +} + +int __lockfunc rt_read_can_lock(rwlock_t *rwlock) +{ + return atomic_read(&rwlock->readers) < 0; +} + +int __lockfunc rt_write_can_lock(rwlock_t *rwlock) +{ + return atomic_read(&rwlock->readers) == READER_BIAS; +} + +/* + * The common functions which get wrapped into the rwlock API. + */ +int __lockfunc rt_read_trylock(rwlock_t *rwlock) +{ + int ret; + + sleeping_lock_inc(); + migrate_disable(); + ret = do_read_rt_trylock(rwlock); + if (ret) { + rwlock_acquire_read(&rwlock->dep_map, 0, 1, _RET_IP_); + rcu_read_lock(); + } else { + migrate_enable(); + sleeping_lock_dec(); + } + return ret; +} +EXPORT_SYMBOL(rt_read_trylock); + +int __lockfunc rt_write_trylock(rwlock_t *rwlock) +{ + int ret; + + sleeping_lock_inc(); + migrate_disable(); + ret = do_write_rt_trylock(rwlock); + if (ret) { + rwlock_acquire(&rwlock->dep_map, 0, 1, _RET_IP_); + rcu_read_lock(); + } else { + migrate_enable(); + sleeping_lock_dec(); + } + return ret; +} +EXPORT_SYMBOL(rt_write_trylock); + +void __lockfunc rt_read_lock(rwlock_t *rwlock) +{ + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + rwlock_acquire_read(&rwlock->dep_map, 0, 0, _RET_IP_); + do_read_rt_lock(rwlock); +} +EXPORT_SYMBOL(rt_read_lock); + +void __lockfunc rt_write_lock(rwlock_t *rwlock) +{ + sleeping_lock_inc(); + rcu_read_lock(); + migrate_disable(); + rwlock_acquire(&rwlock->dep_map, 0, 0, _RET_IP_); + do_write_rt_lock(rwlock); +} +EXPORT_SYMBOL(rt_write_lock); + +void __lockfunc rt_read_unlock(rwlock_t *rwlock) +{ + rwlock_release(&rwlock->dep_map, 1, _RET_IP_); + do_read_rt_unlock(rwlock); + migrate_enable(); + rcu_read_unlock(); + sleeping_lock_dec(); +} +EXPORT_SYMBOL(rt_read_unlock); + +void __lockfunc rt_write_unlock(rwlock_t *rwlock) +{ + rwlock_release(&rwlock->dep_map, 1, _RET_IP_); + do_write_rt_unlock(rwlock); + migrate_enable(); + rcu_read_unlock(); + sleeping_lock_dec(); +} +EXPORT_SYMBOL(rt_write_unlock); + +void __rt_rwlock_init(rwlock_t *rwlock, char *name, struct lock_class_key *key) +{ + do_rwlock_rt_init(rwlock, name, key); +} +EXPORT_SYMBOL(__rt_rwlock_init); Index: linux-5.4.5-rt3/kernel/locking/rwsem-rt.c =================================================================== --- /dev/null +++ linux-5.4.5-rt3/kernel/locking/rwsem-rt.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +/* + */ +#include <linux/blkdev.h> +#include <linux/rwsem.h> +#include <linux/sched/debug.h> +#include <linux/sched/signal.h> +#include <linux/export.h> + +#include "rtmutex_common.h" + +/* + * RT-specific reader/writer semaphores + * + * down_write() + * 1) Lock sem->rtmutex + * 2) Remove the reader BIAS to force readers into the slow path + * 3) Wait until all readers have left the critical region + * 4) Mark it write locked + * + * up_write() + * 1) Remove the write locked marker + * 2) Set the reader BIAS so readers can use the fast path again + * 3) Unlock sem->rtmutex to release blocked readers + * + * down_read() + * 1) Try fast path acquisition (reader BIAS is set) + * 2) Take sem->rtmutex.wait_lock which protects the writelocked flag + * 3) If !writelocked, acquire it for read + * 4) If writelocked, block on sem->rtmutex + * 5) unlock sem->rtmutex, goto 1) + * + * up_read() + * 1) Try fast path release (reader count != 1) + * 2) Wake the writer waiting in down_write()#3 + * + * down_read()#3 has the consequence, that rw semaphores on RT are not writer + * fair, but writers, which should be avoided in RT tasks (think mmap_sem), + * are subject to the rtmutex priority/DL inheritance mechanism. + * + * It's possible to make the rw semaphores writer fair by keeping a list of + * active readers. A blocked writer would force all newly incoming readers to + * block on the rtmutex, but the rtmutex would have to be proxy locked for one + * reader after the other. We can't use multi-reader inheritance because there + * is no way to support that with SCHED_DEADLINE. Implementing the one by one + * reader boosting/handover mechanism is a major surgery for a very dubious + * value. + * + * The risk of writer starvation is there, but the pathological use cases + * which trigger it are not necessarily the typical RT workloads. + */ + +void __rwsem_init(struct rw_semaphore *sem, const char *name, + struct lock_class_key *key) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* + * Make sure we are not reinitializing a held semaphore: + */ + debug_check_no_locks_freed((void *)sem, sizeof(*sem)); + lockdep_init_map(&sem->dep_map, name, key, 0); +#endif + atomic_set(&sem->readers, READER_BIAS); +} +EXPORT_SYMBOL(__rwsem_init); + +int __down_read_trylock(struct rw_semaphore *sem) +{ + int r, old; + + /* + * Increment reader count, if sem->readers < 0, i.e. READER_BIAS is + * set. + */ + for (r = atomic_read(&sem->readers); r < 0;) { + old = atomic_cmpxchg(&sem->readers, r, r + 1); + if (likely(old == r)) + return 1; + r = old; + } + return 0; +} + +static int __sched __down_read_common(struct rw_semaphore *sem, int state) +{ + struct rt_mutex *m = &sem->rtmutex; + struct rt_mutex_waiter waiter; + int ret; + + if (__down_read_trylock(sem)) + return 0; + /* + * If rt_mutex blocks, the function sched_submit_work will not call + * blk_schedule_flush_plug (because tsk_is_pi_blocked would be true). + * We must call blk_schedule_flush_plug here, if we don't call it, + * a deadlock in I/O may happen. + */ + if (unlikely(blk_needs_flush_plug(current))) + blk_schedule_flush_plug(current); + + might_sleep(); + raw_spin_lock_irq(&m->wait_lock); + /* + * Allow readers as long as the writer has not completely + * acquired the semaphore for write. + */ + if (atomic_read(&sem->readers) != WRITER_BIAS) { + atomic_inc(&sem->readers); + raw_spin_unlock_irq(&m->wait_lock); + return 0; + } + + /* + * Call into the slow lock path with the rtmutex->wait_lock + * held, so this can't result in the following race: + * + * Reader1 Reader2 Writer + * down_read() + * down_write() + * rtmutex_lock(m) + * swait() + * down_read() + * unlock(m->wait_lock) + * up_read() + * swake() + * lock(m->wait_lock) + * sem->writelocked=true + * unlock(m->wait_lock) + * + * up_write() + * sem->writelocked=false + * rtmutex_unlock(m) + * down_read() + * down_write() + * rtmutex_lock(m) + * swait() + * rtmutex_lock(m) + * + * That would put Reader1 behind the writer waiting on + * Reader2 to call up_read() which might be unbound. + */ + rt_mutex_init_waiter(&waiter, false); + ret = rt_mutex_slowlock_locked(m, state, NULL, RT_MUTEX_MIN_CHAINWALK, + NULL, &waiter); + /* + * The slowlock() above is guaranteed to return with the rtmutex (for + * ret = 0) is now held, so there can't be a writer active. Increment + * the reader count and immediately drop the rtmutex again. + * For ret != 0 we don't hold the rtmutex and need unlock the wait_lock. + * We don't own the lock then. + */ + if (!ret) + atomic_inc(&sem->readers); + raw_spin_unlock_irq(&m->wait_lock); + if (!ret) + __rt_mutex_unlock(m); + + debug_rt_mutex_free_waiter(&waiter); + return ret; +} + +void __down_read(struct rw_semaphore *sem) +{ + int ret; + + ret = __down_read_common(sem, TASK_UNINTERRUPTIBLE); + WARN_ON_ONCE(ret); +} + +int __down_read_killable(struct rw_semaphore *sem) +{ + int ret; + + ret = __down_read_common(sem, TASK_KILLABLE); + if (likely(!ret)) + return ret; + WARN_ONCE(ret != -EINTR, "Unexpected state: %d\n", ret); + return -EINTR; +} + +void __up_read(struct rw_semaphore *sem) +{ + struct rt_mutex *m = &sem->rtmutex; + struct task_struct *tsk; + + /* + * sem->readers can only hit 0 when a writer is waiting for the + * active readers to leave the critical region. + */ + if (!atomic_dec_and_test(&sem->readers)) + return; + + might_sleep(); + raw_spin_lock_irq(&m->wait_lock); + /* + * Wake the writer, i.e. the rtmutex owner. It might release the + * rtmutex concurrently in the fast path (due to a signal), but to + * clean up the rwsem it needs to acquire m->wait_lock. The worst + * case which can happen is a spurious wakeup. + */ + tsk = rt_mutex_owner(m); + if (tsk) + wake_up_process(tsk); + + raw_spin_unlock_irq(&m->wait_lock); +} + +static void __up_write_unlock(struct rw_semaphore *sem, int bias, + unsigned long flags) +{ + struct rt_mutex *m = &sem->rtmutex; + + atomic_add(READER_BIAS - bias, &sem->readers); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + __rt_mutex_unlock(m); +} + +static int __sched __down_write_common(struct rw_semaphore *sem, int state) +{ + struct rt_mutex *m = &sem->rtmutex; + unsigned long flags; + + /* Take the rtmutex as a first step */ + if (__rt_mutex_lock_state(m, state)) + return -EINTR; + + /* Force readers into slow path */ + atomic_sub(READER_BIAS, &sem->readers); + might_sleep(); + + set_current_state(state); + for (;;) { + raw_spin_lock_irqsave(&m->wait_lock, flags); + /* Have all readers left the critical region? */ + if (!atomic_read(&sem->readers)) { + atomic_set(&sem->readers, WRITER_BIAS); + __set_current_state(TASK_RUNNING); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + return 0; + } + + if (signal_pending_state(state, current)) { + __set_current_state(TASK_RUNNING); + __up_write_unlock(sem, 0, flags); + return -EINTR; + } + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + + if (atomic_read(&sem->readers) != 0) { + schedule(); + set_current_state(state); + } + } +} + +void __sched __down_write(struct rw_semaphore *sem) +{ + __down_write_common(sem, TASK_UNINTERRUPTIBLE); +} + +int __sched __down_write_killable(struct rw_semaphore *sem) +{ + return __down_write_common(sem, TASK_KILLABLE); +} + +int __down_write_trylock(struct rw_semaphore *sem) +{ + struct rt_mutex *m = &sem->rtmutex; + unsigned long flags; + + if (!__rt_mutex_trylock(m)) + return 0; + + atomic_sub(READER_BIAS, &sem->readers); + + raw_spin_lock_irqsave(&m->wait_lock, flags); + if (!atomic_read(&sem->readers)) { + atomic_set(&sem->readers, WRITER_BIAS); + raw_spin_unlock_irqrestore(&m->wait_lock, flags); + return 1; + } + __up_write_unlock(sem, 0, flags); + return 0; +} + +void __up_write(struct rw_semaphore *sem) +{ + struct rt_mutex *m = &sem->rtmutex; + unsigned long flags; + + raw_spin_lock_irqsave(&m->wait_lock, flags); + __up_write_unlock(sem, WRITER_BIAS, flags); +} + +void __downgrade_write(struct rw_semaphore *sem) +{ + struct rt_mutex *m = &sem->rtmutex; + unsigned long flags; + + raw_spin_lock_irqsave(&m->wait_lock, flags); + /* Release it and account current as reader */ + __up_write_unlock(sem, WRITER_BIAS - 1, flags); +} Index: linux-5.4.5-rt3/kernel/locking/rwsem.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/rwsem.c +++ linux-5.4.5-rt3/kernel/locking/rwsem.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:31 @ #include <linux/rwsem.h> #include <linux/atomic.h> -#include "rwsem.h" +#ifndef CONFIG_PREEMPT_RT #include "lock_events.h" /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:663 @ static inline bool rwsem_can_spin_on_own unsigned long flags; bool ret = true; - BUILD_BUG_ON(!(RWSEM_OWNER_UNKNOWN & RWSEM_NONSPINNABLE)); - if (need_resched()) { lockevent_inc(rwsem_opt_fail); return false; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1336 @ static struct rw_semaphore *rwsem_downgr return sem; } + /* * lock for reading */ -inline void __down_read(struct rw_semaphore *sem) +static inline void __down_read(struct rw_semaphore *sem) { if (!rwsem_read_trylock(sem)) { rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1428 @ static inline int __down_write_trylock(s /* * unlock after reading */ -inline void __up_read(struct rw_semaphore *sem) +static inline void __up_read(struct rw_semaphore *sem) { long tmp; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1487 @ static inline void __downgrade_write(str if (tmp & RWSEM_FLAG_WAITERS) rwsem_downgrade_wake(sem); } +#endif /* * lock for reading @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1619 @ void _down_write_nest_lock(struct rw_sem } EXPORT_SYMBOL(_down_write_nest_lock); +#ifndef CONFIG_PREEMPT_RT void down_read_non_owner(struct rw_semaphore *sem) { might_sleep(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1627 @ void down_read_non_owner(struct rw_semap __rwsem_set_reader_owned(sem, NULL); } EXPORT_SYMBOL(down_read_non_owner); +#endif void down_write_nested(struct rw_semaphore *sem, int subclass) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1652 @ int __sched down_write_killable_nested(s } EXPORT_SYMBOL(down_write_killable_nested); +#ifndef CONFIG_PREEMPT_RT void up_read_non_owner(struct rw_semaphore *sem) { DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem); __up_read(sem); } EXPORT_SYMBOL(up_read_non_owner); +#endif #endif Index: linux-5.4.5-rt3/kernel/locking/rwsem.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/rwsem.h +++ linux-5.4.5-rt3/kernel/locking/rwsem.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1 @ -/* SPDX-License-Identifier: GPL-2.0 */ - -#ifndef __INTERNAL_RWSEM_H -#define __INTERNAL_RWSEM_H -#include <linux/rwsem.h> - -extern void __down_read(struct rw_semaphore *sem); -extern void __up_read(struct rw_semaphore *sem); - -#endif /* __INTERNAL_RWSEM_H */ Index: linux-5.4.5-rt3/kernel/locking/spinlock.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/spinlock.c +++ linux-5.4.5-rt3/kernel/locking/spinlock.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:127 @ void __lockfunc __raw_##op##_lock_bh(loc * __[spin|read|write]_lock_bh() */ BUILD_LOCK_OPS(spin, raw_spinlock); + +#ifndef CONFIG_PREEMPT_RT BUILD_LOCK_OPS(read, rwlock); BUILD_LOCK_OPS(write, rwlock); +#endif #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:215 @ void __lockfunc _raw_spin_unlock_bh(raw_ EXPORT_SYMBOL(_raw_spin_unlock_bh); #endif +#ifndef CONFIG_PREEMPT_RT + #ifndef CONFIG_INLINE_READ_TRYLOCK int __lockfunc _raw_read_trylock(rwlock_t *lock) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:361 @ void __lockfunc _raw_write_unlock_bh(rwl EXPORT_SYMBOL(_raw_write_unlock_bh); #endif +#endif /* !PREEMPT_RT */ + #ifdef CONFIG_DEBUG_LOCK_ALLOC void __lockfunc _raw_spin_lock_nested(raw_spinlock_t *lock, int subclass) Index: linux-5.4.5-rt3/kernel/locking/spinlock_debug.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/locking/spinlock_debug.c +++ linux-5.4.5-rt3/kernel/locking/spinlock_debug.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:34 @ void __raw_spin_lock_init(raw_spinlock_t EXPORT_SYMBOL(__raw_spin_lock_init); +#ifndef CONFIG_PREEMPT_RT void __rwlock_init(rwlock_t *lock, const char *name, struct lock_class_key *key) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:52 @ void __rwlock_init(rwlock_t *lock, const } EXPORT_SYMBOL(__rwlock_init); +#endif static void spin_dump(raw_spinlock_t *lock, const char *msg) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:144 @ void do_raw_spin_unlock(raw_spinlock_t * arch_spin_unlock(&lock->raw_lock); } +#ifndef CONFIG_PREEMPT_RT static void rwlock_bug(rwlock_t *lock, const char *msg) { if (!debug_locks_off()) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:234 @ void do_raw_write_unlock(rwlock_t *lock) debug_write_unlock(lock); arch_write_unlock(&lock->raw_lock); } + +#endif Index: linux-5.4.5-rt3/kernel/panic.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/panic.c +++ linux-5.4.5-rt3/kernel/panic.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:240 @ void panic(const char *fmt, ...) * Bypass the panic_cpu check and call __crash_kexec directly. */ if (!_crash_kexec_post_notifiers) { - printk_safe_flush_on_panic(); __crash_kexec(NULL); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:263 @ void panic(const char *fmt, ...) */ atomic_notifier_call_chain(&panic_notifier_list, 0, buf); - /* Call flush even twice. It tries harder with a single online CPU */ - printk_safe_flush_on_panic(); kmsg_dump(KMSG_DUMP_PANIC); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:524 @ static u64 oops_id; static int init_oops_id(void) { +#ifndef CONFIG_PREEMPT_RT if (!oops_id) get_random_bytes(&oops_id, sizeof(oops_id)); else +#endif oops_id++; return 0; Index: linux-5.4.5-rt3/kernel/power/hibernate.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/power/hibernate.c +++ linux-5.4.5-rt3/kernel/power/hibernate.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:692 @ static int load_image_and_restore(void) return error; } +#ifndef CONFIG_SUSPEND +bool pm_in_action; +#endif + /** * hibernate - Carry out system hibernation, including saving the image. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:709 @ int hibernate(void) return -EPERM; } + pm_in_action = true; + lock_system_sleep(); /* The snapshot device should not be opened while we're running */ if (!atomic_add_unless(&snapshot_device_available, -1, 0)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:787 @ int hibernate(void) atomic_inc(&snapshot_device_available); Unlock: unlock_system_sleep(); + pm_in_action = false; pr_info("hibernation exit\n"); return error; Index: linux-5.4.5-rt3/kernel/power/suspend.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/power/suspend.c +++ linux-5.4.5-rt3/kernel/power/suspend.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:597 @ static int enter_state(suspend_state_t s return error; } +bool pm_in_action; + /** * pm_suspend - Externally visible function for suspending the system. * @state: System sleep state to enter. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:613 @ int pm_suspend(suspend_state_t state) if (state <= PM_SUSPEND_ON || state >= PM_SUSPEND_MAX) return -EINVAL; + pm_in_action = true; pr_info("suspend entry (%s)\n", mem_sleep_labels[state]); error = enter_state(state); if (error) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:623 @ int pm_suspend(suspend_state_t state) suspend_stats.success++; } pr_info("suspend exit\n"); + pm_in_action = false; return error; } EXPORT_SYMBOL(pm_suspend); Index: linux-5.4.5-rt3/kernel/printk/Makefile =================================================================== --- linux-5.4.5-rt3.orig/kernel/printk/Makefile +++ linux-5.4.5-rt3/kernel/printk/Makefile @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1 @ # SPDX-License-Identifier: GPL-2.0-only obj-y = printk.o -obj-$(CONFIG_PRINTK) += printk_safe.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o Index: linux-5.4.5-rt3/kernel/printk/internal.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/printk/internal.h +++ /dev/null @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1 @ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * internal.h - printk internal definitions - */ -#include <linux/percpu.h> - -#ifdef CONFIG_PRINTK - -#define PRINTK_SAFE_CONTEXT_MASK 0x3fffffff -#define PRINTK_NMI_DIRECT_CONTEXT_MASK 0x40000000 -#define PRINTK_NMI_CONTEXT_MASK 0x80000000 - -extern raw_spinlock_t logbuf_lock; - -__printf(5, 0) -int vprintk_store(int facility, int level, - const char *dict, size_t dictlen, - const char *fmt, va_list args); - -__printf(1, 0) int vprintk_default(const char *fmt, va_list args); -__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args); -__printf(1, 0) int vprintk_func(const char *fmt, va_list args); -void __printk_safe_enter(void); -void __printk_safe_exit(void); - -#define printk_safe_enter_irqsave(flags) \ - do { \ - local_irq_save(flags); \ - __printk_safe_enter(); \ - } while (0) - -#define printk_safe_exit_irqrestore(flags) \ - do { \ - __printk_safe_exit(); \ - local_irq_restore(flags); \ - } while (0) - -#define printk_safe_enter_irq() \ - do { \ - local_irq_disable(); \ - __printk_safe_enter(); \ - } while (0) - -#define printk_safe_exit_irq() \ - do { \ - __printk_safe_exit(); \ - local_irq_enable(); \ - } while (0) - -void defer_console_output(void); - -#else - -__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; } - -/* - * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem - * semaphore and some of console functions (console_unlock()/etc.), so - * printk-safe must preserve the existing local IRQ guarantees. - */ -#define printk_safe_enter_irqsave(flags) local_irq_save(flags) -#define printk_safe_exit_irqrestore(flags) local_irq_restore(flags) - -#define printk_safe_enter_irq() local_irq_disable() -#define printk_safe_exit_irq() local_irq_enable() - -#endif /* CONFIG_PRINTK */ Index: linux-5.4.5-rt3/kernel/printk/printk.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/printk/printk.c +++ linux-5.4.5-rt3/kernel/printk/printk.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:48 @ #include <linux/irq_work.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> +#include <linux/clocksource.h> +#include <linux/printk_ringbuffer.h> #include <linux/sched/clock.h> #include <linux/sched/debug.h> #include <linux/sched/task_stack.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ #include "console_cmdline.h" #include "braille.h" -#include "internal.h" -int console_printk[4] = { +int console_printk[5] = { CONSOLE_LOGLEVEL_DEFAULT, /* console_loglevel */ MESSAGE_LOGLEVEL_DEFAULT, /* default_message_loglevel */ CONSOLE_LOGLEVEL_MIN, /* minimum_console_loglevel */ CONSOLE_LOGLEVEL_DEFAULT, /* default_console_loglevel */ + CONSOLE_LOGLEVEL_EMERGENCY, /* emergency_console_loglevel */ }; EXPORT_SYMBOL_GPL(console_printk); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:231 @ static int nr_ext_console_drivers; static int __down_trylock_console_sem(unsigned long ip) { - int lock_failed; - unsigned long flags; - - /* - * Here and in __up_console_sem() we need to be in safe mode, - * because spindump/WARN/etc from under console ->lock will - * deadlock in printk()->down_trylock_console_sem() otherwise. - */ - printk_safe_enter_irqsave(flags); - lock_failed = down_trylock(&console_sem); - printk_safe_exit_irqrestore(flags); - - if (lock_failed) + if (down_trylock(&console_sem)) return 1; mutex_acquire(&console_lock_dep_map, 0, 1, ip); return 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:240 @ static int __down_trylock_console_sem(un static void __up_console_sem(unsigned long ip) { - unsigned long flags; - mutex_release(&console_lock_dep_map, 1, ip); - printk_safe_enter_irqsave(flags); up(&console_sem); - printk_safe_exit_irqrestore(flags); } #define up_console_sem() __up_console_sem(_RET_IP_) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:257 @ static void __up_console_sem(unsigned lo static int console_locked, console_suspended; /* - * If exclusive_console is non-NULL then only this console is to be printed to. - */ -static struct console *exclusive_console; - -/* * Array of consoles built from command line options (console=) */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:352 @ enum log_flags { struct printk_log { u64 ts_nsec; /* timestamp in nanoseconds */ + u16 cpu; /* cpu that generated record */ u16 len; /* length of entire record */ u16 text_len; /* length of text buffer */ u16 dict_len; /* length of dictionary buffer */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:368 @ __packed __aligned(4) #endif ; -/* - * The logbuf_lock protects kmsg buffer, indices, counters. This can be taken - * within the scheduler's rq lock. It must be released before calling - * console_unlock() or anything else that might wake up a process. - */ -DEFINE_RAW_SPINLOCK(logbuf_lock); - -/* - * Helper macros to lock/unlock logbuf_lock and switch between - * printk-safe/unsafe modes. - */ -#define logbuf_lock_irq() \ - do { \ - printk_safe_enter_irq(); \ - raw_spin_lock(&logbuf_lock); \ - } while (0) - -#define logbuf_unlock_irq() \ - do { \ - raw_spin_unlock(&logbuf_lock); \ - printk_safe_exit_irq(); \ - } while (0) - -#define logbuf_lock_irqsave(flags) \ - do { \ - printk_safe_enter_irqsave(flags); \ - raw_spin_lock(&logbuf_lock); \ - } while (0) - -#define logbuf_unlock_irqrestore(flags) \ - do { \ - raw_spin_unlock(&logbuf_lock); \ - printk_safe_exit_irqrestore(flags); \ - } while (0) +DECLARE_STATIC_PRINTKRB_CPULOCK(printk_cpulock); #ifdef CONFIG_PRINTK -DECLARE_WAIT_QUEUE_HEAD(log_wait); -/* the next printk record to read by syslog(READ) or /proc/kmsg */ +/* record buffer */ +DECLARE_STATIC_PRINTKRB(printk_rb, CONFIG_LOG_BUF_SHIFT, &printk_cpulock); + +static DEFINE_MUTEX(syslog_lock); +DECLARE_STATIC_PRINTKRB_ITER(syslog_iter, &printk_rb); + +/* the last printk record to read by syslog(READ) or /proc/kmsg */ static u64 syslog_seq; -static u32 syslog_idx; static size_t syslog_partial; static bool syslog_time; -/* index and sequence number of the first record stored in the buffer */ -static u64 log_first_seq; -static u32 log_first_idx; - -/* index and sequence number of the next record to store in the buffer */ -static u64 log_next_seq; -static u32 log_next_idx; - -/* the next printk record to write to the console */ -static u64 console_seq; -static u32 console_idx; -static u64 exclusive_console_stop_seq; - /* the next printk record to read after the last 'clear' command */ static u64 clear_seq; -static u32 clear_idx; #ifdef CONFIG_PRINTK_CALLER #define PREFIX_MAX 48 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:395 @ static u32 clear_idx; #define LOG_LEVEL(v) ((v) & 0x07) #define LOG_FACILITY(v) ((v) >> 3 & 0xff) -/* record buffer */ -#define LOG_ALIGN __alignof__(struct printk_log) -#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) -#define LOG_BUF_LEN_MAX (u32)(1 << 31) -static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); -static char *log_buf = __log_buf; -static u32 log_buf_len = __LOG_BUF_LEN; - /* Return log buffer address */ char *log_buf_addr_get(void) { - return log_buf; + return printk_rb.buffer; } /* Return log buffer size */ u32 log_buf_len_get(void) { - return log_buf_len; + return (1 << printk_rb.size_bits); } /* human readable text of the record */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:419 @ static char *log_dict(const struct print return (char *)msg + sizeof(struct printk_log) + msg->text_len; } -/* get record by index; idx must point to valid msg */ -static struct printk_log *log_from_idx(u32 idx) -{ - struct printk_log *msg = (struct printk_log *)(log_buf + idx); - - /* - * A length == 0 record is the end of buffer marker. Wrap around and - * read the message at the start of the buffer. - */ - if (!msg->len) - return (struct printk_log *)log_buf; - return msg; -} - -/* get next record; idx must point to valid msg */ -static u32 log_next(u32 idx) -{ - struct printk_log *msg = (struct printk_log *)(log_buf + idx); - - /* length == 0 indicates the end of the buffer; wrap */ - /* - * A length == 0 record is the end of buffer marker. Wrap around and - * read the message at the start of the buffer as *this* one, and - * return the one after that. - */ - if (!msg->len) { - msg = (struct printk_log *)log_buf; - return msg->len; - } - return idx + msg->len; -} - -/* - * Check whether there is enough free space for the given message. - * - * The same values of first_idx and next_idx mean that the buffer - * is either empty or full. - * - * If the buffer is empty, we must respect the position of the indexes. - * They cannot be reset to the beginning of the buffer. - */ -static int logbuf_has_space(u32 msg_size, bool empty) -{ - u32 free; - - if (log_next_idx > log_first_idx || empty) - free = max(log_buf_len - log_next_idx, log_first_idx); - else - free = log_first_idx - log_next_idx; - - /* - * We need space also for an empty header that signalizes wrapping - * of the buffer. - */ - return free >= msg_size + sizeof(struct printk_log); -} - -static int log_make_free_space(u32 msg_size) -{ - while (log_first_seq < log_next_seq && - !logbuf_has_space(msg_size, false)) { - /* drop old messages until we have enough contiguous space */ - log_first_idx = log_next(log_first_idx); - log_first_seq++; - } - - if (clear_seq < log_first_seq) { - clear_seq = log_first_seq; - clear_idx = log_first_idx; - } - - /* sequence numbers are equal, so the log buffer is empty */ - if (logbuf_has_space(msg_size, log_first_seq == log_next_seq)) - return 0; - - return -ENOMEM; -} - -/* compute the message size including the padding bytes */ -static u32 msg_used_size(u16 text_len, u16 dict_len, u32 *pad_len) -{ - u32 size; - - size = sizeof(struct printk_log) + text_len + dict_len; - *pad_len = (-size) & (LOG_ALIGN - 1); - size += *pad_len; - - return size; -} - -/* - * Define how much of the log buffer we could take at maximum. The value - * must be greater than two. Note that only half of the buffer is available - * when the index points to the middle. - */ -#define MAX_LOG_TAKE_PART 4 -static const char trunc_msg[] = "<truncated>"; - -static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len, - u16 *dict_len, u32 *pad_len) -{ - /* - * The message should not take the whole buffer. Otherwise, it might - * get removed too soon. - */ - u32 max_text_len = log_buf_len / MAX_LOG_TAKE_PART; - if (*text_len > max_text_len) - *text_len = max_text_len; - /* enable the warning message */ - *trunc_msg_len = strlen(trunc_msg); - /* disable the "dict" completely */ - *dict_len = 0; - /* compute the size again, count also the warning message */ - return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len); -} +static void printk_emergency(char *buffer, int level, u64 ts_nsec, u16 cpu, + char *text, u16 text_len); /* insert record into the buffer, discard old ones, update heads */ static int log_store(u32 caller_id, int facility, int level, - enum log_flags flags, u64 ts_nsec, + enum log_flags flags, u64 ts_nsec, u16 cpu, const char *dict, u16 dict_len, const char *text, u16 text_len) { struct printk_log *msg; - u32 size, pad_len; - u16 trunc_msg_len = 0; - - /* number of '\0' padding bytes to next message */ - size = msg_used_size(text_len, dict_len, &pad_len); + struct prb_handle h; + char *rbuf; + u32 size; - if (log_make_free_space(size)) { - /* truncate the message if it is too long for empty buffer */ - size = truncate_msg(&text_len, &trunc_msg_len, - &dict_len, &pad_len); - /* survive when the log buffer is too small for trunc_msg */ - if (log_make_free_space(size)) - return 0; - } + size = sizeof(*msg) + text_len + dict_len; - if (log_next_idx + size + sizeof(struct printk_log) > log_buf_len) { + rbuf = prb_reserve(&h, &printk_rb, size); + if (!rbuf) { /* - * This message + an additional empty header does not fit - * at the end of the buffer. Add an empty header with len == 0 - * to signify a wrap around. + * An emergency message would have been printed, but + * it cannot be stored in the log. */ - memset(log_buf + log_next_idx, 0, sizeof(struct printk_log)); - log_next_idx = 0; + prb_inc_lost(&printk_rb); + return 0; } /* fill message */ - msg = (struct printk_log *)(log_buf + log_next_idx); + msg = (struct printk_log *)rbuf; memcpy(log_text(msg), text, text_len); msg->text_len = text_len; - if (trunc_msg_len) { - memcpy(log_text(msg) + text_len, trunc_msg, trunc_msg_len); - msg->text_len += trunc_msg_len; - } memcpy(log_dict(msg), dict, dict_len); msg->dict_len = dict_len; msg->facility = facility; msg->level = level & 7; msg->flags = flags & 0x1f; - if (ts_nsec > 0) - msg->ts_nsec = ts_nsec; - else - msg->ts_nsec = local_clock(); + msg->ts_nsec = ts_nsec; #ifdef CONFIG_PRINTK_CALLER msg->caller_id = caller_id; #endif - memset(log_dict(msg) + dict_len, 0, pad_len); + msg->cpu = cpu; msg->len = size; /* insert message */ - log_next_idx += msg->len; - log_next_seq++; + prb_commit(&h); return msg->text_len; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:532 @ static ssize_t msg_print_ext_header(char do_div(ts_usec, 1000); - return scnprintf(buf, size, "%u,%llu,%llu,%c%s;", + return scnprintf(buf, size, "%u,%llu,%llu,%c%s,%hu;", (msg->facility << 3) | msg->level, seq, ts_usec, - msg->flags & LOG_CONT ? 'c' : '-', caller); + msg->flags & LOG_CONT ? 'c' : '-', caller, msg->cpu); } static ssize_t msg_print_ext_body(char *buf, size_t size, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:585 @ static ssize_t msg_print_ext_body(char * return p - buf; } +#define PRINTK_SPRINT_MAX (LOG_LINE_MAX + PREFIX_MAX) +#define PRINTK_RECORD_MAX (sizeof(struct printk_log) + \ + CONSOLE_EXT_LOG_MAX + PRINTK_SPRINT_MAX) + /* /dev/kmsg - userspace message inject/listen interface */ struct devkmsg_user { u64 seq; - u32 idx; + struct prb_iterator iter; struct ratelimit_state rs; struct mutex lock; char buf[CONSOLE_EXT_LOG_MAX]; + char msgbuf[PRINTK_RECORD_MAX]; }; static __printf(3, 4) __cold @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:679 @ static ssize_t devkmsg_read(struct file size_t count, loff_t *ppos) { struct devkmsg_user *user = file->private_data; + struct prb_iterator backup_iter; struct printk_log *msg; - size_t len; ssize_t ret; + size_t len; + u64 seq; if (!user) return -EBADF; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:692 @ static ssize_t devkmsg_read(struct file if (ret) return ret; - logbuf_lock_irq(); - while (user->seq == log_next_seq) { - if (file->f_flags & O_NONBLOCK) { - ret = -EAGAIN; - logbuf_unlock_irq(); - goto out; - } + /* make a backup copy in case there is a problem */ + prb_iter_copy(&backup_iter, &user->iter); - logbuf_unlock_irq(); - ret = wait_event_interruptible(log_wait, - user->seq != log_next_seq); - if (ret) - goto out; - logbuf_lock_irq(); + if (file->f_flags & O_NONBLOCK) { + ret = prb_iter_next(&user->iter, &user->msgbuf[0], + sizeof(user->msgbuf), &seq); + } else { + ret = prb_iter_wait_next(&user->iter, &user->msgbuf[0], + sizeof(user->msgbuf), &seq); } - - if (user->seq < log_first_seq) { - /* our last seen message is gone, return error and reset */ - user->idx = log_first_idx; - user->seq = log_first_seq; + if (ret == 0) { + /* end of list */ + ret = -EAGAIN; + goto out; + } else if (ret == -EINVAL) { + /* iterator invalid, return error and reset */ ret = -EPIPE; - logbuf_unlock_irq(); + prb_iter_init(&user->iter, &printk_rb, &user->seq); goto out; + } else if (ret < 0) { + /* interrupted by signal */ + goto out; + } + + user->seq++; + if (user->seq < seq) { + ret = -EPIPE; + goto restore_out; } - msg = log_from_idx(user->idx); + msg = (struct printk_log *)&user->msgbuf[0]; len = msg_print_ext_header(user->buf, sizeof(user->buf), msg, user->seq); len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len, log_dict(msg), msg->dict_len, log_text(msg), msg->text_len); - user->idx = log_next(user->idx); - user->seq++; - logbuf_unlock_irq(); - if (len > count) { ret = -EINVAL; - goto out; + goto restore_out; } if (copy_to_user(buf, user->buf, len)) { ret = -EFAULT; - goto out; + goto restore_out; } + ret = len; + goto out; +restore_out: + /* + * There was an error, but this message should not be + * lost because of it. Restore the backup and setup + * seq so that it will work with the next read. + */ + prb_iter_copy(&user->iter, &backup_iter); + user->seq = seq - 1; out: mutex_unlock(&user->lock); return ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:757 @ out: static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence) { struct devkmsg_user *user = file->private_data; - loff_t ret = 0; + loff_t ret; + u64 seq; if (!user) return -EBADF; if (offset) return -ESPIPE; - logbuf_lock_irq(); + ret = mutex_lock_interruptible(&user->lock); + if (ret) + return ret; + switch (whence) { case SEEK_SET: /* the first record */ - user->idx = log_first_idx; - user->seq = log_first_seq; + prb_iter_init(&user->iter, &printk_rb, &user->seq); break; case SEEK_DATA: /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:780 @ static loff_t devkmsg_llseek(struct file * like issued by 'dmesg -c'. Reading /dev/kmsg itself * changes no global state, and does not clear anything. */ - user->idx = clear_idx; - user->seq = clear_seq; + for (;;) { + prb_iter_init(&user->iter, &printk_rb, &seq); + ret = prb_iter_seek(&user->iter, clear_seq); + if (ret > 0) { + /* seeked to clear seq */ + user->seq = clear_seq; + break; + } else if (ret == 0) { + /* + * The end of the list was hit without + * ever seeing the clear seq. Just + * seek to the beginning of the list. + */ + prb_iter_init(&user->iter, &printk_rb, + &user->seq); + break; + } + /* iterator invalid, start over */ + + /* reset clear_seq if it is no longer available */ + if (seq > clear_seq) + clear_seq = 0; + } + ret = 0; break; case SEEK_END: /* after the last record */ - user->idx = log_next_idx; - user->seq = log_next_seq; + for (;;) { + ret = prb_iter_next(&user->iter, NULL, 0, &user->seq); + if (ret == 0) + break; + else if (ret > 0) + continue; + /* iterator invalid, start over */ + prb_iter_init(&user->iter, &printk_rb, &user->seq); + } + ret = 0; break; default: ret = -EINVAL; } - logbuf_unlock_irq(); + + mutex_unlock(&user->lock); return ret; } +struct wait_queue_head *printk_wait_queue(void) +{ + /* FIXME: using prb internals! */ + return printk_rb.wq; +} + static __poll_t devkmsg_poll(struct file *file, poll_table *wait) { struct devkmsg_user *user = file->private_data; + struct prb_iterator iter; __poll_t ret = 0; + int rbret; + u64 seq; if (!user) return EPOLLERR|EPOLLNVAL; - poll_wait(file, &log_wait, wait); + poll_wait(file, printk_wait_queue(), wait); - logbuf_lock_irq(); - if (user->seq < log_next_seq) { - /* return error when data has vanished underneath us */ - if (user->seq < log_first_seq) - ret = EPOLLIN|EPOLLRDNORM|EPOLLERR|EPOLLPRI; - else - ret = EPOLLIN|EPOLLRDNORM; - } - logbuf_unlock_irq(); + mutex_lock(&user->lock); + + /* use copy so no actual iteration takes place */ + prb_iter_copy(&iter, &user->iter); + + rbret = prb_iter_next(&iter, &user->msgbuf[0], + sizeof(user->msgbuf), &seq); + if (rbret == 0) + goto out; + + ret = EPOLLIN|EPOLLRDNORM; + + if (rbret < 0 || (seq - user->seq) != 1) + ret |= EPOLLERR|EPOLLPRI; +out: + mutex_unlock(&user->lock); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:890 @ static int devkmsg_open(struct inode *in mutex_init(&user->lock); - logbuf_lock_irq(); - user->idx = log_first_idx; - user->seq = log_first_seq; - logbuf_unlock_irq(); + prb_iter_init(&user->iter, &printk_rb, &user->seq); file->private_data = user; return 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:930 @ const struct file_operations kmsg_fops = */ void log_buf_vmcoreinfo_setup(void) { - VMCOREINFO_SYMBOL(log_buf); - VMCOREINFO_SYMBOL(log_buf_len); - VMCOREINFO_SYMBOL(log_first_idx); - VMCOREINFO_SYMBOL(clear_idx); - VMCOREINFO_SYMBOL(log_next_idx); /* * Export struct printk_log size and field offsets. User space tools can * parse it and detect any changes to structure down the line. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:945 @ void log_buf_vmcoreinfo_setup(void) } #endif +/* FIXME: no support for buffer resizing */ +#if 0 /* requested log_buf_len from kernel cmdline */ static unsigned long __initdata new_log_buf_len; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1012 @ static void __init log_buf_add_cpu(void) #else /* !CONFIG_SMP */ static inline void log_buf_add_cpu(void) {} #endif /* CONFIG_SMP */ +#endif /* 0 */ void __init setup_log_buf(int early) { +/* FIXME: no support for buffer resizing */ +#if 0 unsigned long flags; char *new_log_buf; unsigned int free; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1049 @ void __init setup_log_buf(int early) pr_info("log_buf_len: %u bytes\n", log_buf_len); pr_info("early log buf free: %u(%u%%)\n", free, (free * 100) / __LOG_BUF_LEN); +#endif } static bool __read_mostly ignore_loglevel; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1130 @ static inline void boot_delay_msec(int l static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME); module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR); +static size_t print_cpu(u16 cpu, char *buf) +{ + return sprintf(buf, "%03hu: ", cpu); +} + static size_t print_syslog(unsigned int level, char *buf) { return sprintf(buf, "<%u>", level); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1178 @ static size_t print_prefix(const struct buf[len++] = ' '; buf[len] = '\0'; } + len += print_cpu(msg->cpu, buf + len); return len; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1224 @ static size_t msg_print_text(const struc return len; } -static int syslog_print(char __user *buf, int size) +static int syslog_print(char __user *buf, int size, char *text, + char *msgbuf, int *locked) { - char *text; + struct prb_iterator iter; struct printk_log *msg; int len = 0; - - text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL); - if (!text) - return -ENOMEM; + u64 seq; + int ret; while (size > 0) { size_t n; size_t skip; - logbuf_lock_irq(); - if (syslog_seq < log_first_seq) { - /* messages are gone, move to first one */ - syslog_seq = log_first_seq; - syslog_idx = log_first_idx; - syslog_partial = 0; + for (;;) { + prb_iter_copy(&iter, &syslog_iter); + ret = prb_iter_next(&iter, msgbuf, + PRINTK_RECORD_MAX, &seq); + if (ret < 0) { + /* messages are gone, move to first one */ + prb_iter_init(&syslog_iter, &printk_rb, + &syslog_seq); + syslog_partial = 0; + continue; + } + break; } - if (syslog_seq == log_next_seq) { - logbuf_unlock_irq(); + if (ret == 0) break; + + /* + * If messages have been missed, the partial tracker + * is no longer valid and must be reset. + */ + if (syslog_seq > 0 && seq - 1 != syslog_seq) { + syslog_seq = seq - 1; + syslog_partial = 0; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1269 @ static int syslog_print(char __user *buf if (!syslog_partial) syslog_time = printk_time; + msg = (struct printk_log *)msgbuf; + skip = syslog_partial; - msg = log_from_idx(syslog_idx); n = msg_print_text(msg, true, syslog_time, text, - LOG_LINE_MAX + PREFIX_MAX); + PRINTK_SPRINT_MAX); if (n - syslog_partial <= size) { /* message fits into buffer, move forward */ - syslog_idx = log_next(syslog_idx); - syslog_seq++; + prb_iter_next(&syslog_iter, NULL, 0, &syslog_seq); n -= syslog_partial; syslog_partial = 0; - } else if (!len){ + } else if (!len) { /* partial read(), remember position */ n = size; syslog_partial += n; } else n = 0; - logbuf_unlock_irq(); if (!n) break; + mutex_unlock(&syslog_lock); if (copy_to_user(buf, text + skip, n)) { if (!len) len = -EFAULT; + *locked = 0; break; } + ret = mutex_lock_interruptible(&syslog_lock); len += n; size -= n; buf += n; + + if (ret) { + if (!len) + len = ret; + *locked = 0; + break; + } } - kfree(text); return len; } -static int syslog_print_all(char __user *buf, int size, bool clear) +static int count_remaining(struct prb_iterator *iter, u64 until_seq, + char *msgbuf, int size, bool records, bool time) { - char *text; + struct prb_iterator local_iter; + struct printk_log *msg; int len = 0; - u64 next_seq; u64 seq; - u32 idx; + int ret; + + prb_iter_copy(&local_iter, iter); + for (;;) { + ret = prb_iter_next(&local_iter, msgbuf, size, &seq); + if (ret == 0) { + break; + } else if (ret < 0) { + /* the iter is invalid, restart from head */ + prb_iter_init(&local_iter, &printk_rb, NULL); + len = 0; + continue; + } + + if (until_seq && seq >= until_seq) + break; + + if (records) { + len++; + } else { + msg = (struct printk_log *)msgbuf; + len += msg_print_text(msg, true, time, NULL, 0); + } + } + + return len; +} + +static void syslog_clear(void) +{ + struct prb_iterator iter; + int ret; + + prb_iter_init(&iter, &printk_rb, &clear_seq); + for (;;) { + ret = prb_iter_next(&iter, NULL, 0, &clear_seq); + if (ret == 0) + break; + else if (ret < 0) + prb_iter_init(&iter, &printk_rb, &clear_seq); + } +} + +static int syslog_print_all(char __user *buf, int size, bool clear) +{ + struct prb_iterator iter; + struct printk_log *msg; + char *msgbuf = NULL; + char *text = NULL; + int textlen; + u64 seq = 0; + int len = 0; bool time; + int ret; - text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL); + text = kmalloc(PRINTK_SPRINT_MAX, GFP_KERNEL); if (!text) return -ENOMEM; + msgbuf = kmalloc(PRINTK_RECORD_MAX, GFP_KERNEL); + if (!msgbuf) { + kfree(text); + return -ENOMEM; + } time = printk_time; - logbuf_lock_irq(); + /* - * Find first record that fits, including all following records, - * into the user-provided buffer for this dump. + * Setup iter to last event before clear. Clear may + * be lost, but keep going with a best effort. */ - seq = clear_seq; - idx = clear_idx; - while (seq < log_next_seq) { - struct printk_log *msg = log_from_idx(idx); - - len += msg_print_text(msg, true, time, NULL, 0); - idx = log_next(idx); - seq++; - } - - /* move first record forward until length fits into the buffer */ - seq = clear_seq; - idx = clear_idx; - while (len > size && seq < log_next_seq) { - struct printk_log *msg = log_from_idx(idx); + prb_iter_init(&iter, &printk_rb, NULL); + prb_iter_seek(&iter, clear_seq); + + /* count the total bytes after clear */ + len = count_remaining(&iter, 0, msgbuf, PRINTK_RECORD_MAX, + false, time); + + /* move iter forward until length fits into the buffer */ + while (len > size) { + ret = prb_iter_next(&iter, msgbuf, + PRINTK_RECORD_MAX, &seq); + if (ret == 0) { + break; + } else if (ret < 0) { + /* + * The iter is now invalid so clear will + * also be invalid. Restart from the head. + */ + prb_iter_init(&iter, &printk_rb, NULL); + len = count_remaining(&iter, 0, msgbuf, + PRINTK_RECORD_MAX, false, time); + continue; + } + msg = (struct printk_log *)msgbuf; len -= msg_print_text(msg, true, time, NULL, 0); - idx = log_next(idx); - seq++; - } - /* last message fitting into this dump */ - next_seq = log_next_seq; + if (clear) + clear_seq = seq; + } + /* copy messages to buffer */ len = 0; - while (len >= 0 && seq < next_seq) { - struct printk_log *msg = log_from_idx(idx); - int textlen = msg_print_text(msg, true, time, text, - LOG_LINE_MAX + PREFIX_MAX); + while (len >= 0 && len < size) { + if (clear) + clear_seq = seq; + + ret = prb_iter_next(&iter, msgbuf, + PRINTK_RECORD_MAX, &seq); + if (ret == 0) { + break; + } else if (ret < 0) { + /* + * The iter is now invalid. Make a best + * effort to grab the rest of the log + * from the new head. + */ + prb_iter_init(&iter, &printk_rb, NULL); + continue; + } + + msg = (struct printk_log *)msgbuf; + textlen = msg_print_text(msg, true, time, text, + PRINTK_SPRINT_MAX); + if (textlen < 0) { + len = textlen; + break; + } - idx = log_next(idx); - seq++; + if (len + textlen > size) + break; - logbuf_unlock_irq(); if (copy_to_user(buf + len, text, textlen)) len = -EFAULT; else len += textlen; - logbuf_lock_irq(); - - if (seq < log_first_seq) { - /* messages are gone, move to next one */ - seq = log_first_seq; - idx = log_first_idx; - } } - if (clear) { - clear_seq = log_next_seq; - clear_idx = log_next_idx; - } - logbuf_unlock_irq(); + if (clear && !seq) + syslog_clear(); - kfree(text); + if (text) + kfree(text); + if (msgbuf) + kfree(msgbuf); return len; } -static void syslog_clear(void) -{ - logbuf_lock_irq(); - clear_seq = log_next_seq; - clear_idx = log_next_idx; - logbuf_unlock_irq(); -} - int do_syslog(int type, char __user *buf, int len, int source) { bool clear = false; static int saved_console_loglevel = LOGLEVEL_DEFAULT; + struct prb_iterator iter; + char *msgbuf = NULL; + char *text = NULL; + int locked; int error; + int ret; error = check_syslog_permissions(type, source); if (error) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1495 @ int do_syslog(int type, char __user *buf return 0; if (!access_ok(buf, len)) return -EFAULT; - error = wait_event_interruptible(log_wait, - syslog_seq != log_next_seq); + + text = kmalloc(PRINTK_SPRINT_MAX, GFP_KERNEL); + msgbuf = kmalloc(PRINTK_RECORD_MAX, GFP_KERNEL); + if (!text || !msgbuf) { + error = -ENOMEM; + goto out; + } + + error = mutex_lock_interruptible(&syslog_lock); if (error) - return error; - error = syslog_print(buf, len); + goto out; + + /* + * Wait until a first message is available. Use a copy + * because no iteration should occur for syslog now. + */ + for (;;) { + prb_iter_copy(&iter, &syslog_iter); + + mutex_unlock(&syslog_lock); + ret = prb_iter_wait_next(&iter, NULL, 0, NULL); + if (ret == -ERESTARTSYS) { + error = ret; + goto out; + } + error = mutex_lock_interruptible(&syslog_lock); + if (error) + goto out; + + if (ret == -EINVAL) { + prb_iter_init(&syslog_iter, &printk_rb, + &syslog_seq); + syslog_partial = 0; + continue; + } + break; + } + + /* print as much as will fit in the user buffer */ + locked = 1; + error = syslog_print(buf, len, text, msgbuf, &locked); + if (locked) + mutex_unlock(&syslog_lock); break; /* Read/clear last kernel messages */ case SYSLOG_ACTION_READ_CLEAR: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1582 @ int do_syslog(int type, char __user *buf break; /* Number of chars in the log buffer */ case SYSLOG_ACTION_SIZE_UNREAD: - logbuf_lock_irq(); - if (syslog_seq < log_first_seq) { - /* messages are gone, move to first one */ - syslog_seq = log_first_seq; - syslog_idx = log_first_idx; - syslog_partial = 0; - } + msgbuf = kmalloc(PRINTK_RECORD_MAX, GFP_KERNEL); + if (!msgbuf) + return -ENOMEM; + + error = mutex_lock_interruptible(&syslog_lock); + if (error) + goto out; + if (source == SYSLOG_FROM_PROC) { /* * Short-cut for poll(/"proc/kmsg") which simply checks * for pending data, not the size; return the count of * records, not the length. */ - error = log_next_seq - syslog_seq; + error = count_remaining(&syslog_iter, 0, msgbuf, + PRINTK_RECORD_MAX, true, + printk_time); } else { - u64 seq = syslog_seq; - u32 idx = syslog_idx; - bool time = syslog_partial ? syslog_time : printk_time; - - while (seq < log_next_seq) { - struct printk_log *msg = log_from_idx(idx); - - error += msg_print_text(msg, true, time, NULL, - 0); - time = printk_time; - idx = log_next(idx); - seq++; - } + error = count_remaining(&syslog_iter, 0, msgbuf, + PRINTK_RECORD_MAX, false, + printk_time); error -= syslog_partial; } - logbuf_unlock_irq(); + + mutex_unlock(&syslog_lock); break; /* Size of the log buffer */ case SYSLOG_ACTION_SIZE_BUFFER: - error = log_buf_len; + error = prb_buffer_size(&printk_rb); break; default: error = -EINVAL; break; } - +out: + if (msgbuf) + kfree(msgbuf); + if (text) + kfree(text); return error; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1629 @ SYSCALL_DEFINE3(syslog, int, type, char return do_syslog(type, buf, len, SYSLOG_FROM_READER); } -/* - * Special console_lock variants that help to reduce the risk of soft-lockups. - * They allow to pass console_lock to another printk() call using a busy wait. - */ +int printk_delay_msec __read_mostly; -#ifdef CONFIG_LOCKDEP -static struct lockdep_map console_owner_dep_map = { - .name = "console_owner" -}; -#endif +static inline void printk_delay(int level) +{ + boot_delay_msec(level); + if (unlikely(printk_delay_msec)) { + int m = printk_delay_msec; -static DEFINE_RAW_SPINLOCK(console_owner_lock); -static struct task_struct *console_owner; -static bool console_waiter; + while (m--) { + mdelay(1); + touch_nmi_watchdog(); + } + } +} -/** - * console_lock_spinning_enable - mark beginning of code where another - * thread might safely busy wait - * - * This basically converts console_lock into a spinlock. This marks - * the section where the console_lock owner can not sleep, because - * there may be a waiter spinning (like a spinlock). Also it must be - * ready to hand over the lock at the end of the section. - */ -static void console_lock_spinning_enable(void) -{ - raw_spin_lock(&console_owner_lock); - console_owner = current; - raw_spin_unlock(&console_owner_lock); +static void print_console_dropped(struct console *con, u64 count) +{ + char text[64]; + int len; - /* The waiter may spin on us after setting console_owner */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + len = sprintf(text, "** %llu printk message%s dropped **\n", + count, count > 1 ? "s" : ""); + con->write(con, text, len); } -/** - * console_lock_spinning_disable_and_check - mark end of code where another - * thread was able to busy wait and check if there is a waiter - * - * This is called at the end of the section where spinning is allowed. - * It has two functions. First, it is a signal that it is no longer - * safe to start busy waiting for the lock. Second, it checks if - * there is a busy waiter and passes the lock rights to her. - * - * Important: Callers lose the lock if there was a busy waiter. - * They must not touch items synchronized by console_lock - * in this case. - * - * Return: 1 if the lock rights were passed, 0 otherwise. - */ -static int console_lock_spinning_disable_and_check(void) +static void format_text(struct printk_log *msg, u64 seq, + char *ext_text, size_t *ext_len, + char *text, size_t *len, bool time) { - int waiter; - - raw_spin_lock(&console_owner_lock); - waiter = READ_ONCE(console_waiter); - console_owner = NULL; - raw_spin_unlock(&console_owner_lock); + if (suppress_message_printing(msg->level)) { + /* + * Skip record that has level above the console + * loglevel and update each console's local seq. + */ + *len = 0; + *ext_len = 0; + return; + } - if (!waiter) { - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - return 0; + *len = msg_print_text(msg, console_msg_format & MSG_FORMAT_SYSLOG, + time, text, PRINTK_SPRINT_MAX); + if (nr_ext_console_drivers) { + *ext_len = msg_print_ext_header(ext_text, CONSOLE_EXT_LOG_MAX, + msg, seq); + *ext_len += msg_print_ext_body(ext_text + *ext_len, + CONSOLE_EXT_LOG_MAX - *ext_len, + log_dict(msg), msg->dict_len, + log_text(msg), msg->text_len); + } else { + *ext_len = 0; } +} - /* The waiter is now free to continue */ - WRITE_ONCE(console_waiter, false); +static void printk_write_history(struct console *con, u64 master_seq) +{ + struct prb_iterator iter; + bool time = printk_time; + static char *ext_text; + static char *text; + static char *buf; + u64 seq; - spin_release(&console_owner_dep_map, 1, _THIS_IP_); + ext_text = kmalloc(CONSOLE_EXT_LOG_MAX, GFP_KERNEL); + text = kmalloc(PRINTK_SPRINT_MAX, GFP_KERNEL); + buf = kmalloc(PRINTK_RECORD_MAX, GFP_KERNEL); + if (!ext_text || !text || !buf) + return; - /* - * Hand off console_lock to waiter. The waiter will perform - * the up(). After this, the waiter is the console_lock owner. - */ - mutex_release(&console_lock_dep_map, 1, _THIS_IP_); - return 1; -} + if (!(con->flags & CON_ENABLED)) + goto out; -/** - * console_trylock_spinning - try to get console_lock by busy waiting - * - * This allows to busy wait for the console_lock when the current - * owner is running in specially marked sections. It means that - * the current owner is running and cannot reschedule until it - * is ready to lose the lock. - * - * Return: 1 if we got the lock, 0 othrewise - */ -static int console_trylock_spinning(void) -{ - struct task_struct *owner = NULL; - bool waiter; - bool spin = false; - unsigned long flags; + if (!con->write) + goto out; - if (console_trylock()) - return 1; + if (!cpu_online(raw_smp_processor_id()) && + !(con->flags & CON_ANYTIME)) + goto out; - printk_safe_enter_irqsave(flags); + prb_iter_init(&iter, &printk_rb, NULL); - raw_spin_lock(&console_owner_lock); - owner = READ_ONCE(console_owner); - waiter = READ_ONCE(console_waiter); - if (!waiter && owner && owner != current) { - WRITE_ONCE(console_waiter, true); - spin = true; - } - raw_spin_unlock(&console_owner_lock); - - /* - * If there is an active printk() writing to the - * consoles, instead of having it write our data too, - * see if we can offload that load from the active - * printer, and do some printing ourselves. - * Go into a spin only if there isn't already a waiter - * spinning, and there is an active printer, and - * that active printer isn't us (recursive printk?). - */ - if (!spin) { - printk_safe_exit_irqrestore(flags); - return 0; - } + for (;;) { + struct printk_log *msg; + size_t ext_len; + size_t len; + int ret; - /* We spin waiting for the owner to release us */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); - /* Owner will clear console_waiter on hand off */ - while (READ_ONCE(console_waiter)) - cpu_relax(); - spin_release(&console_owner_dep_map, 1, _THIS_IP_); + ret = prb_iter_next(&iter, buf, PRINTK_RECORD_MAX, &seq); + if (ret == 0) { + break; + } else if (ret < 0) { + prb_iter_init(&iter, &printk_rb, NULL); + continue; + } - printk_safe_exit_irqrestore(flags); - /* - * The owner passed the console lock to us. - * Since we did not spin on console lock, annotate - * this as a trylock. Otherwise lockdep will - * complain. - */ - mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + if (seq > master_seq) + break; - return 1; + con->printk_seq++; + if (con->printk_seq < seq) { + print_console_dropped(con, seq - con->printk_seq); + con->printk_seq = seq; + } + + msg = (struct printk_log *)buf; + format_text(msg, master_seq, ext_text, &ext_len, text, + &len, time); + + if (len == 0 && ext_len == 0) + continue; + + if (con->flags & CON_EXTENDED) + con->write(con, ext_text, ext_len); + else + con->write(con, text, len); + + printk_delay(msg->level); + } +out: + con->wrote_history = 1; + kfree(ext_text); + kfree(text); + kfree(buf); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1758 @ static int console_trylock_spinning(void * log_buf[start] to log_buf[end - 1]. * The console_lock must be held. */ -static void call_console_drivers(const char *ext_text, size_t ext_len, - const char *text, size_t len) +static void call_console_drivers(u64 seq, const char *ext_text, size_t ext_len, + const char *text, size_t len, int level, + int facility) { struct console *con; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1770 @ static void call_console_drivers(const c return; for_each_console(con) { - if (exclusive_console && con != exclusive_console) - continue; if (!(con->flags & CON_ENABLED)) continue; + if (!con->wrote_history) { + if (con->flags & CON_PRINTBUFFER) { + printk_write_history(con, seq); + continue; + } + con->wrote_history = 1; + con->printk_seq = seq - 1; + } + if (con->flags & CON_BOOT && facility == 0) { + /* skip boot messages, already printed */ + if (con->printk_seq < seq) + con->printk_seq = seq; + continue; + } if (!con->write) continue; - if (!cpu_online(smp_processor_id()) && + if (!cpu_online(raw_smp_processor_id()) && !(con->flags & CON_ANYTIME)) continue; + if (con->printk_seq >= seq) + continue; + + con->printk_seq++; + if (con->printk_seq < seq) { + print_console_dropped(con, seq - con->printk_seq); + con->printk_seq = seq; + } + + /* for supressed messages, only seq is updated */ + if (len == 0 && ext_len == 0) + continue; + if (con->flags & CON_EXTENDED) con->write(con, ext_text, ext_len); else @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1811 @ static void call_console_drivers(const c } } -int printk_delay_msec __read_mostly; - -static inline void printk_delay(void) -{ - if (unlikely(printk_delay_msec)) { - int m = printk_delay_msec; - - while (m--) { - mdelay(1); - touch_nmi_watchdog(); - } - } -} - static inline u32 printk_caller_id(void) { return in_task() ? task_pid_nr(current) : @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1827 @ static struct cont { char buf[LOG_LINE_MAX]; size_t len; /* length == 0 means unused buffer */ u32 caller_id; /* printk_caller_id() of first print */ + int cpu_owner; /* cpu of first print */ u64 ts_nsec; /* time of first print */ u8 level; /* log level of first message */ u8 facility; /* log facility of first message */ enum log_flags flags; /* prefix, newline flags */ -} cont; +} cont[2]; -static void cont_flush(void) +static void cont_flush(int ctx) { - if (cont.len == 0) + struct cont *c = &cont[ctx]; + + if (c->len == 0) return; - log_store(cont.caller_id, cont.facility, cont.level, cont.flags, - cont.ts_nsec, NULL, 0, cont.buf, cont.len); - cont.len = 0; + log_store(c->caller_id, c->facility, c->level, c->flags, + c->ts_nsec, c->cpu_owner, NULL, 0, c->buf, c->len); + c->len = 0; } -static bool cont_add(u32 caller_id, int facility, int level, +static void cont_add(int ctx, int cpu, u32 caller_id, int facility, int level, enum log_flags flags, const char *text, size_t len) { + struct cont *c = &cont[ctx]; + + if (cpu != c->cpu_owner || !(flags & LOG_CONT)) + cont_flush(ctx); + /* If the line gets too long, split it up in separate records. */ - if (cont.len + len > sizeof(cont.buf)) { - cont_flush(); - return false; - } + while (c->len + len > sizeof(c->buf)) + cont_flush(ctx); - if (!cont.len) { - cont.facility = facility; - cont.level = level; - cont.caller_id = caller_id; - cont.ts_nsec = local_clock(); - cont.flags = flags; + if (!c->len) { + c->facility = facility; + c->level = level; + c->caller_id = caller_id; + c->ts_nsec = local_clock(); + c->flags = flags; + c->cpu_owner = cpu; } - memcpy(cont.buf + cont.len, text, len); - cont.len += len; + memcpy(c->buf + c->len, text, len); + c->len += len; // The original flags come from the first line, // but later continuations can add a newline. if (flags & LOG_NEWLINE) { - cont.flags |= LOG_NEWLINE; - cont_flush(); + c->flags |= LOG_NEWLINE; } - - return true; } -static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len) +/* ring buffer used as memory allocator for temporary sprint buffers */ +DECLARE_STATIC_PRINTKRB(sprint_rb, + ilog2(PRINTK_RECORD_MAX + sizeof(struct prb_entry) + + sizeof(long)) + 2, &printk_cpulock); + +asmlinkage int vprintk_emit(int facility, int level, + const char *dict, size_t dictlen, + const char *fmt, va_list args) { const u32 caller_id = printk_caller_id(); + int ctx = !!in_nmi(); + enum log_flags lflags = 0; + int printed_len = 0; + struct prb_handle h; + size_t text_len; + u64 ts_nsec; + char *text; + char *rbuf; + int cpu; - /* - * If an earlier line was buffered, and we're a continuation - * write from the same context, try to add it to the buffer. - */ - if (cont.len) { - if (cont.caller_id == caller_id && (lflags & LOG_CONT)) { - if (cont_add(caller_id, facility, level, lflags, text, text_len)) - return text_len; - } - /* Otherwise, make sure it's flushed */ - cont_flush(); - } - - /* Skip empty continuation lines that couldn't be added - they just flush */ - if (!text_len && (lflags & LOG_CONT)) - return 0; + ts_nsec = local_clock(); - /* If it doesn't end in a newline, try to buffer the current line */ - if (!(lflags & LOG_NEWLINE)) { - if (cont_add(caller_id, facility, level, lflags, text, text_len)) - return text_len; + rbuf = prb_reserve(&h, &sprint_rb, PRINTK_SPRINT_MAX); + if (!rbuf) { + prb_inc_lost(&printk_rb); + return printed_len; } - /* Store it in the record log */ - return log_store(caller_id, facility, level, lflags, 0, - dict, dictlen, text, text_len); -} - -/* Must be called under logbuf_lock. */ -int vprintk_store(int facility, int level, - const char *dict, size_t dictlen, - const char *fmt, va_list args) -{ - static char textbuf[LOG_LINE_MAX]; - char *text = textbuf; - size_t text_len; - enum log_flags lflags = 0; + cpu = raw_smp_processor_id(); /* - * The printf needs to come first; we need the syslog - * prefix which might be passed-in as a parameter. + * If this turns out to be an emergency message, there + * may need to be a prefix added. Leave room for it. */ - text_len = vscnprintf(text, sizeof(textbuf), fmt, args); + text = rbuf + PREFIX_MAX; + text_len = vscnprintf(text, PRINTK_SPRINT_MAX - PREFIX_MAX, fmt, args); - /* mark and strip a trailing newline */ + /* strip and flag a trailing newline */ if (text_len && text[text_len-1] == '\n') { text_len--; lflags |= LOG_NEWLINE; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1945 @ int vprintk_store(int facility, int leve if (dict) lflags |= LOG_NEWLINE; - return log_output(facility, level, lflags, - dict, dictlen, text, text_len); -} - -asmlinkage int vprintk_emit(int facility, int level, - const char *dict, size_t dictlen, - const char *fmt, va_list args) -{ - int printed_len; - bool in_sched = false, pending_output; - unsigned long flags; - u64 curr_log_seq; - - /* Suppress unimportant messages after panic happens */ - if (unlikely(suppress_printk)) - return 0; - - if (level == LOGLEVEL_SCHED) { - level = LOGLEVEL_DEFAULT; - in_sched = true; + /* + * NOTE: + * - rbuf points to beginning of allocated buffer + * - text points to beginning of text + * - there is room before text for prefix + */ + if (facility == 0) { + /* only the kernel can create emergency messages */ + printk_emergency(rbuf, level & 7, ts_nsec, cpu, text, text_len); } - boot_delay_msec(level); - printk_delay(); - - /* This stops the holder of console_sem just where we want him */ - logbuf_lock_irqsave(flags); - curr_log_seq = log_next_seq; - printed_len = vprintk_store(facility, level, dict, dictlen, fmt, args); - pending_output = (curr_log_seq != log_next_seq); - logbuf_unlock_irqrestore(flags); - - /* If called from the scheduler, we can not call up(). */ - if (!in_sched && pending_output) { - /* - * Disable preemption to avoid being preempted while holding - * console_sem which would prevent anyone from printing to - * console - */ - preempt_disable(); - /* - * Try to acquire and then immediately release the console - * semaphore. The release will print out buffers and wake up - * /dev/kmsg and syslog() users. - */ - if (console_trylock_spinning()) - console_unlock(); - preempt_enable(); + if ((lflags & LOG_CONT) || !(lflags & LOG_NEWLINE)) { + cont_add(ctx, cpu, caller_id, facility, level, lflags, text, text_len); + printed_len = text_len; + } else { + if (cpu == cont[ctx].cpu_owner) + cont_flush(ctx); + printed_len = log_store(caller_id, facility, level, lflags, ts_nsec, cpu, + dict, dictlen, text, text_len); } - if (pending_output) - wake_up_klogd(); + prb_commit(&h); return printed_len; } EXPORT_SYMBOL(vprintk_emit); +static __printf(1, 0) int vprintk_func(const char *fmt, va_list args) +{ + return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args); +} + asmlinkage int vprintk(const char *fmt, va_list args) { return vprintk_func(fmt, args); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2032 @ asmlinkage __visible int printk(const ch return r; } EXPORT_SYMBOL(printk); - -#else /* CONFIG_PRINTK */ - -#define LOG_LINE_MAX 0 -#define PREFIX_MAX 0 -#define printk_time false - -static u64 syslog_seq; -static u32 syslog_idx; -static u64 console_seq; -static u32 console_idx; -static u64 exclusive_console_stop_seq; -static u64 log_first_seq; -static u32 log_first_idx; -static u64 log_next_seq; -static char *log_text(const struct printk_log *msg) { return NULL; } -static char *log_dict(const struct printk_log *msg) { return NULL; } -static struct printk_log *log_from_idx(u32 idx) { return NULL; } -static u32 log_next(u32 idx) { return 0; } -static ssize_t msg_print_ext_header(char *buf, size_t size, - struct printk_log *msg, - u64 seq) { return 0; } -static ssize_t msg_print_ext_body(char *buf, size_t size, - char *dict, size_t dict_len, - char *text, size_t text_len) { return 0; } -static void console_lock_spinning_enable(void) { } -static int console_lock_spinning_disable_and_check(void) { return 0; } -static void call_console_drivers(const char *ext_text, size_t ext_len, - const char *text, size_t len) {} -static size_t msg_print_text(const struct printk_log *msg, bool syslog, - bool time, char *buf, size_t size) { return 0; } -static bool suppress_message_printing(int level) { return false; } - #endif /* CONFIG_PRINTK */ #ifdef CONFIG_EARLY_PRINTK @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2262 @ int is_console_locked(void) } EXPORT_SYMBOL(is_console_locked); -/* - * Check if we have any console that is capable of printing while cpu is - * booting or shutting down. Requires console_sem. - */ -static int have_callable_console(void) -{ - struct console *con; - - for_each_console(con) - if ((con->flags & CON_ENABLED) && - (con->flags & CON_ANYTIME)) - return 1; - - return 0; -} - -/* - * Can we actually use the console at this time on this cpu? - * - * Console drivers may assume that per-cpu resources have been allocated. So - * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't - * call them until this CPU is officially up. - */ -static inline int can_use_console(void) -{ - return cpu_online(raw_smp_processor_id()) || have_callable_console(); -} - /** * console_unlock - unlock the console system * * Releases the console_lock which the caller holds on the console system * and the console driver list. * - * While the console_lock was held, console output may have been buffered - * by printk(). If this is the case, console_unlock(); emits - * the output prior to releasing the lock. - * - * If there is output waiting, we wake /dev/kmsg and syslog() users. - * * console_unlock(); may be called from any context. */ void console_unlock(void) { - static char ext_text[CONSOLE_EXT_LOG_MAX]; - static char text[LOG_LINE_MAX + PREFIX_MAX]; - unsigned long flags; - bool do_cond_resched, retry; - if (console_suspended) { up_console_sem(); return; } - /* - * Console drivers are called with interrupts disabled, so - * @console_may_schedule should be cleared before; however, we may - * end up dumping a lot of lines, for example, if called from - * console registration path, and should invoke cond_resched() - * between lines if allowable. Not doing so can cause a very long - * scheduling stall on a slow console leading to RCU stall and - * softlockup warnings which exacerbate the issue with more - * messages practically incapacitating the system. - * - * console_trylock() is not able to detect the preemptive - * context reliably. Therefore the value must be stored before - * and cleared after the the "again" goto label. - */ - do_cond_resched = console_may_schedule; -again: - console_may_schedule = 0; - - /* - * We released the console_sem lock, so we need to recheck if - * cpu is online and (if not) is there at least one CON_ANYTIME - * console. - */ - if (!can_use_console()) { - console_locked = 0; - up_console_sem(); - return; - } - - for (;;) { - struct printk_log *msg; - size_t ext_len = 0; - size_t len; - - printk_safe_enter_irqsave(flags); - raw_spin_lock(&logbuf_lock); - if (console_seq < log_first_seq) { - len = sprintf(text, - "** %llu printk messages dropped **\n", - log_first_seq - console_seq); - - /* messages are gone, move to first one */ - console_seq = log_first_seq; - console_idx = log_first_idx; - } else { - len = 0; - } -skip: - if (console_seq == log_next_seq) - break; - - msg = log_from_idx(console_idx); - if (suppress_message_printing(msg->level)) { - /* - * Skip record we have buffered and already printed - * directly to the console when we received it, and - * record that has level above the console loglevel. - */ - console_idx = log_next(console_idx); - console_seq++; - goto skip; - } - - /* Output to all consoles once old messages replayed. */ - if (unlikely(exclusive_console && - console_seq >= exclusive_console_stop_seq)) { - exclusive_console = NULL; - } - - len += msg_print_text(msg, - console_msg_format & MSG_FORMAT_SYSLOG, - printk_time, text + len, sizeof(text) - len); - if (nr_ext_console_drivers) { - ext_len = msg_print_ext_header(ext_text, - sizeof(ext_text), - msg, console_seq); - ext_len += msg_print_ext_body(ext_text + ext_len, - sizeof(ext_text) - ext_len, - log_dict(msg), msg->dict_len, - log_text(msg), msg->text_len); - } - console_idx = log_next(console_idx); - console_seq++; - raw_spin_unlock(&logbuf_lock); - - /* - * While actively printing out messages, if another printk() - * were to occur on another CPU, it may wait for this one to - * finish. This task can not be preempted if there is a - * waiter waiting to take over. - */ - console_lock_spinning_enable(); - - stop_critical_timings(); /* don't trace print latency */ - call_console_drivers(ext_text, ext_len, text, len); - start_critical_timings(); - - if (console_lock_spinning_disable_and_check()) { - printk_safe_exit_irqrestore(flags); - return; - } - - printk_safe_exit_irqrestore(flags); - - if (do_cond_resched) - cond_resched(); - } - console_locked = 0; - - raw_spin_unlock(&logbuf_lock); - up_console_sem(); - - /* - * Someone could have filled up the buffer again, so re-check if there's - * something to flush. In case we cannot trylock the console_sem again, - * there's a new owner and the console_unlock() from them will do the - * flush, no worries. - */ - raw_spin_lock(&logbuf_lock); - retry = console_seq != log_next_seq; - raw_spin_unlock(&logbuf_lock); - printk_safe_exit_irqrestore(flags); - - if (retry && console_trylock()) - goto again; } EXPORT_SYMBOL(console_unlock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2329 @ void console_unblank(void) void console_flush_on_panic(enum con_flush_mode mode) { /* - * If someone else is holding the console lock, trylock will fail - * and may_schedule may be set. Ignore and proceed to unlock so - * that messages are flushed out. As this can be called from any - * context and we don't want to get preempted while flushing, - * ensure may_schedule is cleared. + * FIXME: This is currently a NOP. Emergency messages will have been + * printed, but what about if write_atomic is not available on the + * console? What if the printk kthread is still alive? */ - console_trylock(); - console_may_schedule = 0; - - if (mode == CONSOLE_REPLAY_ALL) { - unsigned long flags; - - logbuf_lock_irqsave(flags); - console_seq = log_first_seq; - console_idx = log_first_idx; - logbuf_unlock_irqrestore(flags); - } - console_unlock(); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2410 @ early_param("keep_bootcon", keep_bootcon void register_console(struct console *newcon) { int i; - unsigned long flags; struct console *bcon = NULL; struct console_cmdline *c; static bool has_preferred; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2525 @ void register_console(struct console *ne if (newcon->flags & CON_EXTENDED) nr_ext_console_drivers++; - if (newcon->flags & CON_PRINTBUFFER) { - /* - * console_unlock(); will print out the buffered messages - * for us. - */ - logbuf_lock_irqsave(flags); - console_seq = syslog_seq; - console_idx = syslog_idx; - /* - * We're about to replay the log buffer. Only do this to the - * just-registered console to avoid excessive message spam to - * the already-registered consoles. - * - * Set exclusive_console with disabled interrupts to reduce - * race window with eventual console_flush_on_panic() that - * ignores console_lock. - */ - exclusive_console = newcon; - exclusive_console_stop_seq = console_seq; - logbuf_unlock_irqrestore(flags); - } console_unlock(); console_sysfs_notify(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2534 @ void register_console(struct console *ne * boot consoles, real consoles, etc - this is to ensure that end * users know there might be something in the kernel's log buffer that * went to the bootconsole (that they do not see on the real console) + * + * This message is also important because it will trigger the + * printk kthread to begin dumping the log buffer to the newly + * registered console. */ pr_info("%sconsole [%s%d] enabled\n", (newcon->flags & CON_BOOT) ? "boot" : "" , @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2681 @ static int __init printk_late_init(void) late_initcall(printk_late_init); #if defined CONFIG_PRINTK -/* - * Delayed printk version, for scheduler-internal messages: - */ -#define PRINTK_PENDING_WAKEUP 0x01 -#define PRINTK_PENDING_OUTPUT 0x02 +static int printk_kthread_func(void *data) +{ + struct prb_iterator iter; + struct printk_log *msg; + size_t ext_len; + char *ext_text; + u64 master_seq; + size_t len; + char *text; + char *buf; + int ret; -static DEFINE_PER_CPU(int, printk_pending); + ext_text = kmalloc(CONSOLE_EXT_LOG_MAX, GFP_KERNEL); + text = kmalloc(PRINTK_SPRINT_MAX, GFP_KERNEL); + buf = kmalloc(PRINTK_RECORD_MAX, GFP_KERNEL); + if (!ext_text || !text || !buf) + return -1; -static void wake_up_klogd_work_func(struct irq_work *irq_work) -{ - int pending = __this_cpu_xchg(printk_pending, 0); + prb_iter_init(&iter, &printk_rb, NULL); - if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); + /* the printk kthread never exits */ + for (;;) { + ret = prb_iter_wait_next(&iter, buf, + PRINTK_RECORD_MAX, &master_seq); + if (ret == -ERESTARTSYS) { + continue; + } else if (ret < 0) { + /* iterator invalid, start over */ + prb_iter_init(&iter, &printk_rb, NULL); + continue; + } + + msg = (struct printk_log *)buf; + format_text(msg, master_seq, ext_text, &ext_len, text, + &len, printk_time); + + console_lock(); + call_console_drivers(master_seq, ext_text, ext_len, text, len, + msg->level, msg->facility); + if (len > 0 || ext_len > 0) + printk_delay(msg->level); + console_unlock(); } - if (pending & PRINTK_PENDING_WAKEUP) - wake_up_interruptible(&log_wait); -} + kfree(ext_text); + kfree(text); + kfree(buf); -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { - .func = wake_up_klogd_work_func, - .flags = IRQ_WORK_LAZY, -}; + return 0; +} -void wake_up_klogd(void) +static int __init init_printk_kthread(void) { - preempt_disable(); - if (waitqueue_active(&log_wait)) { - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + struct task_struct *thread; + + thread = kthread_run(printk_kthread_func, NULL, "printk"); + if (IS_ERR(thread)) { + pr_err("printk: unable to create printing thread\n"); + return PTR_ERR(thread); } - preempt_enable(); -} -void defer_console_output(void) -{ - preempt_disable(); - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - preempt_enable(); + return 0; } +late_initcall(init_printk_kthread); -int vprintk_deferred(const char *fmt, va_list args) +static int vprintk_deferred(const char *fmt, va_list args) { - int r; - - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - defer_console_output(); - - return r; + return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args); } int printk_deferred(const char *fmt, ...) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2870 @ module_param_named(always_kmsg_dump, alw */ void kmsg_dump(enum kmsg_dump_reason reason) { + struct kmsg_dumper dumper_local; struct kmsg_dumper *dumper; - unsigned long flags; if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump) return; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2881 @ void kmsg_dump(enum kmsg_dump_reason rea if (dumper->max_reason && reason > dumper->max_reason) continue; - /* initialize iterator with data about the stored records */ - dumper->active = true; + /* + * use a local copy to avoid modifying the + * iterator used by any other cpus/contexts + */ + memcpy(&dumper_local, dumper, sizeof(dumper_local)); - logbuf_lock_irqsave(flags); - dumper->cur_seq = clear_seq; - dumper->cur_idx = clear_idx; - dumper->next_seq = log_next_seq; - dumper->next_idx = log_next_idx; - logbuf_unlock_irqrestore(flags); + /* initialize iterator with data about the stored records */ + dumper_local.active = true; + kmsg_dump_rewind(&dumper_local); /* invoke dumper which will iterate over records */ - dumper->dump(dumper, reason); - - /* reset iterator */ - dumper->active = false; + dumper_local.dump(&dumper_local, reason); } rcu_read_unlock(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2919 @ void kmsg_dump(enum kmsg_dump_reason rea bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog, char *line, size_t size, size_t *len) { + struct prb_iterator iter; struct printk_log *msg; - size_t l = 0; - bool ret = false; + struct prb_handle h; + bool cont = false; + char *msgbuf; + char *rbuf; + size_t l; + u64 seq; + int ret; if (!dumper->active) - goto out; + return cont; + + rbuf = prb_reserve(&h, &sprint_rb, PRINTK_RECORD_MAX); + if (!rbuf) + return cont; + msgbuf = rbuf; +retry: + for (;;) { + prb_iter_init(&iter, &printk_rb, &seq); - if (dumper->cur_seq < log_first_seq) { - /* messages are gone, move to first available one */ - dumper->cur_seq = log_first_seq; - dumper->cur_idx = log_first_idx; + if (dumper->line_seq == seq) { + /* already where we want to be */ + break; + } else if (dumper->line_seq < seq) { + /* messages are gone, move to first available one */ + dumper->line_seq = seq; + break; + } + + ret = prb_iter_seek(&iter, dumper->line_seq); + if (ret > 0) { + /* seeked to line_seq */ + break; + } else if (ret == 0) { + /* + * The end of the list was hit without ever seeing + * line_seq. Reset it to the beginning of the list. + */ + prb_iter_init(&iter, &printk_rb, &dumper->line_seq); + break; + } + /* iterator invalid, start over */ } - /* last entry */ - if (dumper->cur_seq >= log_next_seq) + ret = prb_iter_next(&iter, msgbuf, PRINTK_RECORD_MAX, + &dumper->line_seq); + if (ret == 0) goto out; + else if (ret < 0) + goto retry; - msg = log_from_idx(dumper->cur_idx); + msg = (struct printk_log *)msgbuf; l = msg_print_text(msg, syslog, printk_time, line, size); - dumper->cur_idx = log_next(dumper->cur_idx); - dumper->cur_seq++; - ret = true; -out: if (len) *len = l; - return ret; + cont = true; +out: + prb_commit(&h); + return cont; } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3002 @ out: bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog, char *line, size_t size, size_t *len) { - unsigned long flags; bool ret; - logbuf_lock_irqsave(flags); ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len); - logbuf_unlock_irqrestore(flags); return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3032 @ EXPORT_SYMBOL_GPL(kmsg_dump_get_line); bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog, char *buf, size_t size, size_t *len) { - unsigned long flags; - u64 seq; - u32 idx; - u64 next_seq; - u32 next_idx; - size_t l = 0; - bool ret = false; + struct prb_iterator iter; bool time = printk_time; + struct printk_log *msg; + u64 new_end_seq = 0; + struct prb_handle h; + bool cont = false; + char *msgbuf; + u64 end_seq; + int textlen; + u64 seq = 0; + char *rbuf; + int l = 0; + int ret; if (!dumper->active) - goto out; + return cont; - logbuf_lock_irqsave(flags); - if (dumper->cur_seq < log_first_seq) { - /* messages are gone, move to first available one */ - dumper->cur_seq = log_first_seq; - dumper->cur_idx = log_first_idx; - } + rbuf = prb_reserve(&h, &sprint_rb, PRINTK_RECORD_MAX); + if (!rbuf) + return cont; + msgbuf = rbuf; - /* last entry */ - if (dumper->cur_seq >= dumper->next_seq) { - logbuf_unlock_irqrestore(flags); - goto out; - } + prb_iter_init(&iter, &printk_rb, NULL); - /* calculate length of entire buffer */ - seq = dumper->cur_seq; - idx = dumper->cur_idx; - while (seq < dumper->next_seq) { - struct printk_log *msg = log_from_idx(idx); + /* + * seek to the start record, which is set/modified + * by kmsg_dump_get_line_nolock() + */ + ret = prb_iter_seek(&iter, dumper->line_seq); + if (ret <= 0) + prb_iter_init(&iter, &printk_rb, &seq); - l += msg_print_text(msg, true, time, NULL, 0); - idx = log_next(idx); - seq++; + /* work with a local end seq to have a constant value */ + end_seq = dumper->buffer_end_seq; + if (!end_seq) { + /* initialize end seq to "infinity" */ + end_seq = -1; + dumper->buffer_end_seq = end_seq; } +retry: + if (seq >= end_seq) + goto out; - /* move first record forward until length fits into the buffer */ - seq = dumper->cur_seq; - idx = dumper->cur_idx; - while (l >= size && seq < dumper->next_seq) { - struct printk_log *msg = log_from_idx(idx); + /* count the total bytes after seq */ + textlen = count_remaining(&iter, end_seq, msgbuf, + PRINTK_RECORD_MAX, 0, time); + + /* move iter forward until length fits into the buffer */ + while (textlen > size) { + ret = prb_iter_next(&iter, msgbuf, PRINTK_RECORD_MAX, &seq); + if (ret == 0) { + break; + } else if (ret < 0 || seq >= end_seq) { + prb_iter_init(&iter, &printk_rb, &seq); + goto retry; + } - l -= msg_print_text(msg, true, time, NULL, 0); - idx = log_next(idx); - seq++; + msg = (struct printk_log *)msgbuf; + textlen -= msg_print_text(msg, true, time, NULL, 0); } - /* last message in next interation */ - next_seq = seq; - next_idx = idx; + /* save end seq for the next interation */ + new_end_seq = seq + 1; + + /* copy messages to buffer */ + while (l < size) { + ret = prb_iter_next(&iter, msgbuf, PRINTK_RECORD_MAX, &seq); + if (ret == 0) { + break; + } else if (ret < 0) { + /* + * iterator (and thus also the start position) + * invalid, start over from beginning of list + */ + prb_iter_init(&iter, &printk_rb, NULL); + continue; + } - l = 0; - while (seq < dumper->next_seq) { - struct printk_log *msg = log_from_idx(idx); + if (seq >= end_seq) + break; - l += msg_print_text(msg, syslog, time, buf + l, size - l); - idx = log_next(idx); - seq++; + msg = (struct printk_log *)msgbuf; + textlen = msg_print_text(msg, syslog, time, buf + l, size - l); + if (textlen > 0) + l += textlen; + cont = true; } - dumper->next_seq = next_seq; - dumper->next_idx = next_idx; - ret = true; - logbuf_unlock_irqrestore(flags); -out: - if (len) + if (cont && len) *len = l; - return ret; +out: + prb_commit(&h); + if (new_end_seq) + dumper->buffer_end_seq = new_end_seq; + return cont; } EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3142 @ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer); */ void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper) { - dumper->cur_seq = clear_seq; - dumper->cur_idx = clear_idx; - dumper->next_seq = log_next_seq; - dumper->next_idx = log_next_idx; + dumper->line_seq = 0; + dumper->buffer_end_seq = 0; } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3156 @ void kmsg_dump_rewind_nolock(struct kmsg */ void kmsg_dump_rewind(struct kmsg_dumper *dumper) { - unsigned long flags; - - logbuf_lock_irqsave(flags); kmsg_dump_rewind_nolock(dumper); - logbuf_unlock_irqrestore(flags); } EXPORT_SYMBOL_GPL(kmsg_dump_rewind); +static bool console_can_emergency(int level) +{ + struct console *con; + + for_each_console(con) { + if (!(con->flags & CON_ENABLED)) + continue; + if (con->write_atomic && oops_in_progress) + return true; + if (con->write && (con->flags & CON_BOOT)) + return true; + } + return false; +} + +static void call_emergency_console_drivers(int level, const char *text, + size_t text_len) +{ + struct console *con; + + for_each_console(con) { + if (!(con->flags & CON_ENABLED)) + continue; + if (con->write_atomic && oops_in_progress) { + con->write_atomic(con, text, text_len); + continue; + } + if (con->write && (con->flags & CON_BOOT)) { + con->write(con, text, text_len); + continue; + } + } +} + +static void printk_emergency(char *buffer, int level, u64 ts_nsec, u16 cpu, + char *text, u16 text_len) +{ + struct printk_log msg; + size_t prefix_len; + + if (!console_can_emergency(level)) + return; + + msg.level = level; + msg.ts_nsec = ts_nsec; + msg.cpu = cpu; + msg.facility = 0; + + /* "text" must have PREFIX_MAX preceding bytes available */ + + prefix_len = print_prefix(&msg, + console_msg_format & MSG_FORMAT_SYSLOG, + printk_time, buffer); + /* move the prefix forward to the beginning of the message text */ + text -= prefix_len; + memmove(text, buffer, prefix_len); + text_len += prefix_len; + + text[text_len++] = '\n'; + + call_emergency_console_drivers(level, text, text_len); + + touch_softlockup_watchdog_sync(); + clocksource_touch_watchdog(); + rcu_cpu_stall_reset(); + touch_nmi_watchdog(); + + printk_delay(level); +} #endif + +void console_atomic_lock(unsigned int *flags) +{ + prb_lock(&printk_cpulock, flags); +} +EXPORT_SYMBOL(console_atomic_lock); + +void console_atomic_unlock(unsigned int flags) +{ + prb_unlock(&printk_cpulock, flags); +} +EXPORT_SYMBOL(console_atomic_unlock); Index: linux-5.4.5-rt3/kernel/printk/printk_safe.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/printk/printk_safe.c +++ /dev/null @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1 @ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * printk_safe.c - Safe printk for printk-deadlock-prone contexts - */ - -#include <linux/preempt.h> -#include <linux/spinlock.h> -#include <linux/debug_locks.h> -#include <linux/smp.h> -#include <linux/cpumask.h> -#include <linux/irq_work.h> -#include <linux/printk.h> - -#include "internal.h" - -/* - * printk() could not take logbuf_lock in NMI context. Instead, - * it uses an alternative implementation that temporary stores - * the strings into a per-CPU buffer. The content of the buffer - * is later flushed into the main ring buffer via IRQ work. - * - * The alternative implementation is chosen transparently - * by examinig current printk() context mask stored in @printk_context - * per-CPU variable. - * - * The implementation allows to flush the strings also from another CPU. - * There are situations when we want to make sure that all buffers - * were handled or when IRQs are blocked. - */ -static int printk_safe_irq_ready __read_mostly; - -#define SAFE_LOG_BUF_LEN ((1 << CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT) - \ - sizeof(atomic_t) - \ - sizeof(atomic_t) - \ - sizeof(struct irq_work)) - -struct printk_safe_seq_buf { - atomic_t len; /* length of written data */ - atomic_t message_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ - unsigned char buffer[SAFE_LOG_BUF_LEN]; -}; - -static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); -static DEFINE_PER_CPU(int, printk_context); - -#ifdef CONFIG_PRINTK_NMI -static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); -#endif - -/* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) -{ - if (printk_safe_irq_ready) - irq_work_queue(&s->work); -} - -/* - * Add a message to per-CPU context-dependent buffer. NMI and printk-safe - * have dedicated buffers, because otherwise printk-safe preempted by - * NMI-printk would have overwritten the NMI messages. - * - * The messages are flushed from irq work (or from panic()), possibly, - * from other CPU, concurrently with printk_safe_log_store(). Should this - * happen, printk_safe_log_store() will notice the buffer->len mismatch - * and repeat the write. - */ -static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, - const char *fmt, va_list args) -{ - int add; - size_t len; - va_list ap; - -again: - len = atomic_read(&s->len); - - /* The trailing '\0' is not counted into len. */ - if (len >= sizeof(s->buffer) - 1) { - atomic_inc(&s->message_lost); - queue_flush_work(s); - return 0; - } - - /* - * Make sure that all old data have been read before the buffer - * was reset. This is not needed when we just append data. - */ - if (!len) - smp_rmb(); - - va_copy(ap, args); - add = vscnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap); - va_end(ap); - if (!add) - return 0; - - /* - * Do it once again if the buffer has been flushed in the meantime. - * Note that atomic_cmpxchg() is an implicit memory barrier that - * makes sure that the data were written before updating s->len. - */ - if (atomic_cmpxchg(&s->len, len, len + add) != len) - goto again; - - queue_flush_work(s); - return add; -} - -static inline void printk_safe_flush_line(const char *text, int len) -{ - /* - * Avoid any console drivers calls from here, because we may be - * in NMI or printk_safe context (when in panic). The messages - * must go only into the ring buffer at this stage. Consoles will - * get explicitly called later when a crashdump is not generated. - */ - printk_deferred("%.*s", len, text); -} - -/* printk part of the temporary buffer line by line */ -static int printk_safe_flush_buffer(const char *start, size_t len) -{ - const char *c, *end; - bool header; - - c = start; - end = start + len; - header = true; - - /* Print line by line. */ - while (c < end) { - if (*c == '\n') { - printk_safe_flush_line(start, c - start + 1); - start = ++c; - header = true; - continue; - } - - /* Handle continuous lines or missing new line. */ - if ((c + 1 < end) && printk_get_level(c)) { - if (header) { - c = printk_skip_level(c); - continue; - } - - printk_safe_flush_line(start, c - start); - start = c++; - header = true; - continue; - } - - header = false; - c++; - } - - /* Check if there was a partial line. Ignore pure header. */ - if (start < end && !header) { - static const char newline[] = KERN_CONT "\n"; - - printk_safe_flush_line(start, end - start); - printk_safe_flush_line(newline, strlen(newline)); - } - - return len; -} - -static void report_message_lost(struct printk_safe_seq_buf *s) -{ - int lost = atomic_xchg(&s->message_lost, 0); - - if (lost) - printk_deferred("Lost %d message(s)!\n", lost); -} - -/* - * Flush data from the associated per-CPU buffer. The function - * can be called either via IRQ work or independently. - */ -static void __printk_safe_flush(struct irq_work *work) -{ - static raw_spinlock_t read_lock = - __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); - unsigned long flags; - size_t len; - int i; - - /* - * The lock has two functions. First, one reader has to flush all - * available message to make the lockless synchronization with - * writers easier. Second, we do not want to mix messages from - * different CPUs. This is especially important when printing - * a backtrace. - */ - raw_spin_lock_irqsave(&read_lock, flags); - - i = 0; -more: - len = atomic_read(&s->len); - - /* - * This is just a paranoid check that nobody has manipulated - * the buffer an unexpected way. If we printed something then - * @len must only increase. Also it should never overflow the - * buffer size. - */ - if ((i && i >= len) || len > sizeof(s->buffer)) { - const char *msg = "printk_safe_flush: internal error\n"; - - printk_safe_flush_line(msg, strlen(msg)); - len = 0; - } - - if (!len) - goto out; /* Someone else has already flushed the buffer. */ - - /* Make sure that data has been written up to the @len */ - smp_rmb(); - i += printk_safe_flush_buffer(s->buffer + i, len - i); - - /* - * Check that nothing has got added in the meantime and truncate - * the buffer. Note that atomic_cmpxchg() is an implicit memory - * barrier that makes sure that the data were copied before - * updating s->len. - */ - if (atomic_cmpxchg(&s->len, len, 0) != len) - goto more; - -out: - report_message_lost(s); - raw_spin_unlock_irqrestore(&read_lock, flags); -} - -/** - * printk_safe_flush - flush all per-cpu nmi buffers. - * - * The buffers are flushed automatically via IRQ work. This function - * is useful only when someone wants to be sure that all buffers have - * been flushed at some point. - */ -void printk_safe_flush(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { -#ifdef CONFIG_PRINTK_NMI - __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work); -#endif - __printk_safe_flush(&per_cpu(safe_print_seq, cpu).work); - } -} - -/** - * printk_safe_flush_on_panic - flush all per-cpu nmi buffers when the system - * goes down. - * - * Similar to printk_safe_flush() but it can be called even in NMI context when - * the system goes down. It does the best effort to get NMI messages into - * the main ring buffer. - * - * Note that it could try harder when there is only one CPU online. - */ -void printk_safe_flush_on_panic(void) -{ - /* - * Make sure that we could access the main ring buffer. - * Do not risk a double release when more CPUs are up. - */ - if (raw_spin_is_locked(&logbuf_lock)) { - if (num_online_cpus() > 1) - return; - - debug_locks_off(); - raw_spin_lock_init(&logbuf_lock); - } - - printk_safe_flush(); -} - -#ifdef CONFIG_PRINTK_NMI -/* - * Safe printk() for NMI context. It uses a per-CPU buffer to - * store the message. NMIs are not nested, so there is always only - * one writer running. But the buffer might get flushed from another - * CPU, so we need to be careful. - */ -static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) -{ - struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); - - return printk_safe_log_store(s, fmt, args); -} - -void notrace printk_nmi_enter(void) -{ - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); -} - -void notrace printk_nmi_exit(void) -{ - this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK); -} - -/* - * Marks a code that might produce many messages in NMI context - * and the risk of losing them is more critical than eventual - * reordering. - * - * It has effect only when called in NMI context. Then printk() - * will try to store the messages into the main logbuf directly - * and use the per-CPU buffers only as a fallback when the lock - * is not available. - */ -void printk_nmi_direct_enter(void) -{ - if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) - this_cpu_or(printk_context, PRINTK_NMI_DIRECT_CONTEXT_MASK); -} - -void printk_nmi_direct_exit(void) -{ - this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK); -} - -#else - -static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) -{ - return 0; -} - -#endif /* CONFIG_PRINTK_NMI */ - -/* - * Lock-less printk(), to avoid deadlocks should the printk() recurse - * into itself. It uses a per-CPU buffer to store the message, just like - * NMI. - */ -static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args) -{ - struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq); - - return printk_safe_log_store(s, fmt, args); -} - -/* Can be preempted by NMI. */ -void __printk_safe_enter(void) -{ - this_cpu_inc(printk_context); -} - -/* Can be preempted by NMI. */ -void __printk_safe_exit(void) -{ - this_cpu_dec(printk_context); -} - -__printf(1, 0) int vprintk_func(const char *fmt, va_list args) -{ - /* - * Try to use the main logbuf even in NMI. But avoid calling console - * drivers that might have their own locks. - */ - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK) && - raw_spin_trylock(&logbuf_lock)) { - int len; - - len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args); - raw_spin_unlock(&logbuf_lock); - defer_console_output(); - return len; - } - - /* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */ - if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) - return vprintk_nmi(fmt, args); - - /* Use extra buffer to prevent a recursion deadlock in safe mode. */ - if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) - return vprintk_safe(fmt, args); - - /* No obstacles. */ - return vprintk_default(fmt, args); -} - -void __init printk_safe_init(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct printk_safe_seq_buf *s; - - s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); - -#ifdef CONFIG_PRINTK_NMI - s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); -#endif - } - - /* - * In the highly unlikely event that a NMI were to trigger at - * this moment. Make sure IRQ work is set up before this - * variable is set. - */ - barrier(); - printk_safe_irq_ready = 1; - - /* Flush pending messages that did not have scheduled IRQ works. */ - printk_safe_flush(); -} Index: linux-5.4.5-rt3/kernel/ptrace.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/ptrace.c +++ linux-5.4.5-rt3/kernel/ptrace.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:183 @ static bool ptrace_freeze_traced(struct spin_lock_irq(&task->sighand->siglock); if (task_is_traced(task) && !__fatal_signal_pending(task)) { - task->state = __TASK_TRACED; + unsigned long flags; + + raw_spin_lock_irqsave(&task->pi_lock, flags); + if (task->state & __TASK_TRACED) + task->state = __TASK_TRACED; + else + task->saved_state = __TASK_TRACED; + raw_spin_unlock_irqrestore(&task->pi_lock, flags); ret = true; } spin_unlock_irq(&task->sighand->siglock); Index: linux-5.4.5-rt3/kernel/rcu/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/Kconfig +++ linux-5.4.5-rt3/kernel/rcu/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:164 @ config RCU_FAST_NO_HZ config RCU_BOOST bool "Enable RCU priority boosting" - depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT - default n + depends on (RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT) || PREEMPT_RT + default y if PREEMPT_RT help This option boosts the priority of preempted RCU readers that block the current preemptible RCU grace period for too long. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:203 @ config RCU_NOCB_CPU specified at boot time by the rcu_nocbs parameter. For each such CPU, a kthread ("rcuox/N") will be created to invoke callbacks, where the "N" is the CPU being offloaded, and where - the "p" for RCU-preempt (PREEMPT kernels) and "s" for RCU-sched - (!PREEMPT kernels). Nothing prevents this kthread from running + the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched + (!PREEMPTION kernels). Nothing prevents this kthread from running on the specified CPUs, but (1) the kthreads may be preempted between each callback, and (2) affinity or cgroups can be used to force the kthreads to run on whatever set of CPUs is desired. Index: linux-5.4.5-rt3/kernel/rcu/rcutorture.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/rcutorture.c +++ linux-5.4.5-rt3/kernel/rcu/rcutorture.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:63 @ MODULE_AUTHOR("Paul E. McKenney <paulmck #define RCUTORTURE_RDR_RBH 0x08 /* ... rcu_read_lock_bh(). */ #define RCUTORTURE_RDR_SCHED 0x10 /* ... rcu_read_lock_sched(). */ #define RCUTORTURE_RDR_RCU 0x20 /* ... entering another RCU reader. */ -#define RCUTORTURE_RDR_NBITS 6 /* Number of bits defined above. */ +#define RCUTORTURE_RDR_ATOM_BH 0x40 /* ... disabling bh while atomic */ +#define RCUTORTURE_RDR_ATOM_RBH 0x80 /* ... RBH while atomic */ +#define RCUTORTURE_RDR_NBITS 8 /* Number of bits defined above. */ #define RCUTORTURE_MAX_EXTEND \ (RCUTORTURE_RDR_BH | RCUTORTURE_RDR_IRQ | RCUTORTURE_RDR_PREEMPT | \ - RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED) + RCUTORTURE_RDR_RBH | RCUTORTURE_RDR_SCHED | \ + RCUTORTURE_RDR_ATOM_BH | RCUTORTURE_RDR_ATOM_RBH) #define RCUTORTURE_RDR_MAX_LOOPS 0x7 /* Maximum reader extensions. */ /* Must be power of two minus one. */ #define RCUTORTURE_RDR_MAX_SEGS (RCUTORTURE_RDR_MAX_LOOPS + 3) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1158 @ static void rcutorture_one_extend(int *r WARN_ON_ONCE((idxold >> RCUTORTURE_RDR_SHIFT) > 1); rtrsp->rt_readstate = newstate; - /* First, put new protection in place to avoid critical-section gap. */ + /* + * First, put new protection in place to avoid critical-section gap. + * Disable preemption around the ATOM disables to ensure that + * in_atomic() is true. + */ if (statesnew & RCUTORTURE_RDR_BH) local_bh_disable(); + if (statesnew & RCUTORTURE_RDR_RBH) + rcu_read_lock_bh(); if (statesnew & RCUTORTURE_RDR_IRQ) local_irq_disable(); if (statesnew & RCUTORTURE_RDR_PREEMPT) preempt_disable(); - if (statesnew & RCUTORTURE_RDR_RBH) - rcu_read_lock_bh(); if (statesnew & RCUTORTURE_RDR_SCHED) rcu_read_lock_sched(); + preempt_disable(); + if (statesnew & RCUTORTURE_RDR_ATOM_BH) + local_bh_disable(); + if (statesnew & RCUTORTURE_RDR_ATOM_RBH) + rcu_read_lock_bh(); + preempt_enable(); if (statesnew & RCUTORTURE_RDR_RCU) idxnew = cur_ops->readlock() << RCUTORTURE_RDR_SHIFT; - /* Next, remove old protection, irq first due to bh conflict. */ + /* + * Next, remove old protection, in decreasing order of strength + * to avoid unlock paths that aren't safe in the stronger + * context. Disable preemption around the ATOM enables in + * case the context was only atomic due to IRQ disabling. + */ + preempt_disable(); if (statesold & RCUTORTURE_RDR_IRQ) local_irq_enable(); - if (statesold & RCUTORTURE_RDR_BH) + if (statesold & RCUTORTURE_RDR_ATOM_BH) local_bh_enable(); + if (statesold & RCUTORTURE_RDR_ATOM_RBH) + rcu_read_unlock_bh(); + preempt_enable(); if (statesold & RCUTORTURE_RDR_PREEMPT) preempt_enable(); - if (statesold & RCUTORTURE_RDR_RBH) - rcu_read_unlock_bh(); if (statesold & RCUTORTURE_RDR_SCHED) rcu_read_unlock_sched(); + if (statesold & RCUTORTURE_RDR_BH) + local_bh_enable(); + if (statesold & RCUTORTURE_RDR_RBH) + rcu_read_unlock_bh(); if (statesold & RCUTORTURE_RDR_RCU) cur_ops->readunlock(idxold >> RCUTORTURE_RDR_SHIFT); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1239 @ rcutorture_extend_mask(int oldmask, stru int mask = rcutorture_extend_mask_max(); unsigned long randmask1 = torture_random(trsp) >> 8; unsigned long randmask2 = randmask1 >> 3; + unsigned long preempts = RCUTORTURE_RDR_PREEMPT | RCUTORTURE_RDR_SCHED; + unsigned long preempts_irq = preempts | RCUTORTURE_RDR_IRQ; + unsigned long nonatomic_bhs = RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH; + unsigned long atomic_bhs = RCUTORTURE_RDR_ATOM_BH | + RCUTORTURE_RDR_ATOM_RBH; + unsigned long tmp; WARN_ON_ONCE(mask >> RCUTORTURE_RDR_SHIFT); /* Mostly only one bit (need preemption!), sometimes lots of bits. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1252 @ rcutorture_extend_mask(int oldmask, stru mask = mask & randmask2; else mask = mask & (1 << (randmask2 % RCUTORTURE_RDR_NBITS)); - /* Can't enable bh w/irq disabled. */ - if ((mask & RCUTORTURE_RDR_IRQ) && - ((!(mask & RCUTORTURE_RDR_BH) && (oldmask & RCUTORTURE_RDR_BH)) || - (!(mask & RCUTORTURE_RDR_RBH) && (oldmask & RCUTORTURE_RDR_RBH)))) - mask |= RCUTORTURE_RDR_BH | RCUTORTURE_RDR_RBH; + + /* + * Can't enable bh w/irq disabled. + */ + tmp = atomic_bhs | nonatomic_bhs; + if (mask & RCUTORTURE_RDR_IRQ) + mask |= oldmask & tmp; + + /* + * Ideally these sequences would be detected in debug builds + * (regardless of RT), but until then don't stop testing + * them on non-RT. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + /* + * Can't release the outermost rcu lock in an irq disabled + * section without preemption also being disabled, if irqs + * had ever been enabled during this RCU critical section + * (could leak a special flag and delay reporting the qs). + */ + if ((oldmask & RCUTORTURE_RDR_RCU) && + (mask & RCUTORTURE_RDR_IRQ) && + !(mask & preempts)) + mask |= RCUTORTURE_RDR_RCU; + + /* Can't modify atomic bh in non-atomic context */ + if ((oldmask & atomic_bhs) && (mask & atomic_bhs) && + !(mask & preempts_irq)) { + mask |= oldmask & preempts_irq; + if (mask & RCUTORTURE_RDR_IRQ) + mask |= oldmask & tmp; + } + if ((mask & atomic_bhs) && !(mask & preempts_irq)) + mask |= RCUTORTURE_RDR_PREEMPT; + + /* Can't modify non-atomic bh in atomic context */ + tmp = nonatomic_bhs; + if (oldmask & preempts_irq) + mask &= ~tmp; + if ((oldmask | mask) & preempts_irq) + mask |= oldmask & tmp; + } + return mask ?: RCUTORTURE_RDR_RCU; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1796 @ static void rcu_torture_fwd_cb_cr(struct // Give the scheduler a chance, even on nohz_full CPUs. static void rcu_torture_fwd_prog_cond_resched(unsigned long iter) { - if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { + if (IS_ENABLED(CONFIG_PREEMPTION) && IS_ENABLED(CONFIG_NO_HZ_FULL)) { // Real call_rcu() floods hit userspace, so emulate that. if (need_resched() || (iter & 0xfff)) schedule(); Index: linux-5.4.5-rt3/kernel/rcu/srcutiny.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/srcutiny.c +++ linux-5.4.5-rt3/kernel/rcu/srcutiny.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:106 @ EXPORT_SYMBOL_GPL(__srcu_read_unlock); /* * Workqueue handler to drive one grace period and invoke any callbacks - * that become ready as a result. Single-CPU and !PREEMPT operation + * that become ready as a result. Single-CPU and !PREEMPTION operation * means that we get away with murder on synchronization. ;-) */ void srcu_drive_gp(struct work_struct *wp) Index: linux-5.4.5-rt3/kernel/rcu/srcutree.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/srcutree.c +++ linux-5.4.5-rt3/kernel/rcu/srcutree.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:28 @ #include <linux/delay.h> #include <linux/module.h> #include <linux/srcu.h> +#include <linux/locallock.h> #include "rcu.h" #include "rcu_segcblist.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:739 @ static void srcu_flip(struct srcu_struct smp_mb(); /* D */ /* Pairs with C. */ } +static DEFINE_LOCAL_IRQ_LOCK(sp_llock); /* * If SRCU is likely idle, return true, otherwise return false. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:769 @ static bool srcu_might_be_idle(struct sr unsigned long t; /* If the local srcu_data structure has callbacks, not idle. */ - local_irq_save(flags); + local_lock_irqsave(sp_llock, flags); sdp = this_cpu_ptr(ssp->sda); if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { - local_irq_restore(flags); + local_unlock_irqrestore(sp_llock, flags); return false; /* Callbacks already present, so not idle. */ } - local_irq_restore(flags); + local_unlock_irqrestore(sp_llock, flags); /* * No local callbacks, so probabalistically probe global state. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:855 @ static void __call_srcu(struct srcu_stru } rhp->func = func; idx = srcu_read_lock(ssp); - local_irq_save(flags); + local_lock_irqsave(sp_llock, flags); sdp = this_cpu_ptr(ssp->sda); spin_lock_rcu_node(sdp); rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:871 @ static void __call_srcu(struct srcu_stru sdp->srcu_gp_seq_needed_exp = s; needexp = true; } - spin_unlock_irqrestore_rcu_node(sdp, flags); + spin_unlock_rcu_node(sdp); + local_unlock_irqrestore(sp_llock, flags); if (needgp) srcu_funnel_gp_start(ssp, sdp, s, do_norm); else if (needexp) Index: linux-5.4.5-rt3/kernel/rcu/tree.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/tree.c +++ linux-5.4.5-rt3/kernel/rcu/tree.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:103 @ struct rcu_state rcu_state = { static bool dump_tree; module_param(dump_tree, bool, 0444); /* By default, use RCU_SOFTIRQ instead of rcuc kthreads. */ -static bool use_softirq = 1; +static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT); +#ifndef CONFIG_PREEMPT_RT module_param(use_softirq, bool, 0444); +#endif /* Control rcu_node-tree auto-balancing at boot time. */ static bool rcu_fanout_exact; module_param(rcu_fanout_exact, bool, 0444); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1099 @ static int rcu_implicit_dynticks_qs(stru !rdp->rcu_iw_pending && rdp->rcu_iw_gp_seq != rnp->gp_seq && (rnp->ffmask & rdp->grpmask)) { init_irq_work(&rdp->rcu_iw, rcu_iw_handler); + rdp->rcu_iw.flags = IRQ_WORK_HARD_IRQ; rdp->rcu_iw_pending = true; rdp->rcu_iw_gp_seq = rnp->gp_seq; irq_work_queue_on(&rdp->rcu_iw, rdp->cpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2673 @ EXPORT_SYMBOL_GPL(kfree_call_rcu); /* * During early boot, any blocking grace-period wait automatically - * implies a grace period. Later on, this is never the case for PREEMPT. + * implies a grace period. Later on, this is never the case for PREEMPTION. * - * Howevr, because a context switch is a grace period for !PREEMPT, any + * Howevr, because a context switch is a grace period for !PREEMPTION, any * blocking grace-period wait automatically implies a grace period if * there is only one CPU online at any point time during execution of * either synchronize_rcu() or synchronize_rcu_expedited(). It is OK to Index: linux-5.4.5-rt3/kernel/rcu/tree_exp.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/tree_exp.h +++ linux-5.4.5-rt3/kernel/rcu/tree_exp.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:673 @ static void rcu_exp_handler(void *unused } } -/* PREEMPT=y, so no PREEMPT=n expedited grace period to clean up after. */ +/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */ static void sync_sched_exp_online_cleanup(int cpu) { } Index: linux-5.4.5-rt3/kernel/rcu/tree_plugin.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/tree_plugin.h +++ linux-5.4.5-rt3/kernel/rcu/tree_plugin.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:290 @ void rcu_note_context_switch(bool preemp struct task_struct *t = current; struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp; + int sleeping_l = 0; trace_rcu_utilization(TPS("Start context switch")); lockdep_assert_irqs_disabled(); - WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0); +#if defined(CONFIG_PREEMPT_RT) + sleeping_l = t->sleeping_lock; +#endif + WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l); if (t->rcu_read_lock_nesting > 0 && !t->rcu_read_unlock_special.b.blocked) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:795 @ static void __init rcu_bootup_announce(v } /* - * Note a quiescent state for PREEMPT=n. Because we do not need to know + * Note a quiescent state for PREEMPTION=n. Because we do not need to know * how many quiescent states passed, just if there was at least one since * the start of the grace period, this just sets a flag. The caller must * have disabled preemption. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:845 @ void rcu_all_qs(void) EXPORT_SYMBOL_GPL(rcu_all_qs); /* - * Note a PREEMPT=n context switch. The caller must have disabled interrupts. + * Note a PREEMPTION=n context switch. The caller must have disabled interrupts. */ void rcu_note_context_switch(bool preempt) { Index: linux-5.4.5-rt3/kernel/rcu/update.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/rcu/update.c +++ linux-5.4.5-rt3/kernel/rcu/update.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:58 @ extern int rcu_expedited; /* from sysctl module_param(rcu_expedited, int, 0); extern int rcu_normal; /* from sysctl */ module_param(rcu_normal, int, 0); -static int rcu_normal_after_boot; +static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT); +#ifndef CONFIG_PREEMPT_RT module_param(rcu_normal_after_boot, int, 0); +#endif #endif /* #ifndef CONFIG_TINY_RCU */ #ifdef CONFIG_DEBUG_LOCK_ALLOC Index: linux-5.4.5-rt3/kernel/sched/completion.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/completion.c +++ linux-5.4.5-rt3/kernel/sched/completion.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:32 @ void complete(struct completion *x) { unsigned long flags; - spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); if (x->done != UINT_MAX) x->done++; - __wake_up_locked(&x->wait, TASK_NORMAL, 1); - spin_unlock_irqrestore(&x->wait.lock, flags); + swake_up_locked(&x->wait); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); } EXPORT_SYMBOL(complete); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ void complete_all(struct completion *x) { unsigned long flags; - spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); x->done = UINT_MAX; - __wake_up_locked(&x->wait, TASK_NORMAL, 0); - spin_unlock_irqrestore(&x->wait.lock, flags); + swake_up_all_locked(&x->wait); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); } EXPORT_SYMBOL(complete_all); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:73 @ do_wait_for_common(struct completion *x, long (*action)(long), long timeout, int state) { if (!x->done) { - DECLARE_WAITQUEUE(wait, current); + DECLARE_SWAITQUEUE(wait); - __add_wait_queue_entry_tail_exclusive(&x->wait, &wait); do { if (signal_pending_state(state, current)) { timeout = -ERESTARTSYS; break; } + __prepare_to_swait(&x->wait, &wait); __set_current_state(state); - spin_unlock_irq(&x->wait.lock); + raw_spin_unlock_irq(&x->wait.lock); timeout = action(timeout); - spin_lock_irq(&x->wait.lock); + raw_spin_lock_irq(&x->wait.lock); } while (!x->done && timeout); - __remove_wait_queue(&x->wait, &wait); + __finish_swait(&x->wait, &wait); if (!x->done) return timeout; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:103 @ __wait_for_common(struct completion *x, complete_acquire(x); - spin_lock_irq(&x->wait.lock); + raw_spin_lock_irq(&x->wait.lock); timeout = do_wait_for_common(x, action, timeout, state); - spin_unlock_irq(&x->wait.lock); + raw_spin_unlock_irq(&x->wait.lock); complete_release(x); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:294 @ bool try_wait_for_completion(struct comp if (!READ_ONCE(x->done)) return false; - spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); if (!x->done) ret = false; else if (x->done != UINT_MAX) x->done--; - spin_unlock_irqrestore(&x->wait.lock, flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); return ret; } EXPORT_SYMBOL(try_wait_for_completion); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:325 @ bool completion_done(struct completion * * otherwise we can end up freeing the completion before complete() * is done referencing it. */ - spin_lock_irqsave(&x->wait.lock, flags); - spin_unlock_irqrestore(&x->wait.lock, flags); + raw_spin_lock_irqsave(&x->wait.lock, flags); + raw_spin_unlock_irqrestore(&x->wait.lock, flags); return true; } EXPORT_SYMBOL(completion_done); Index: linux-5.4.5-rt3/kernel/sched/core.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/core.c +++ linux-5.4.5-rt3/kernel/sched/core.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:59 @ const_debug unsigned int sysctl_sched_fe * Number of tasks to iterate in a single balance run. * Limited because this is done with IRQs disabled. */ +#ifdef CONFIG_PREEMPT_RT +const_debug unsigned int sysctl_sched_nr_migrate = 8; +#else const_debug unsigned int sysctl_sched_nr_migrate = 32; +#endif /* * period over which we measure -rt task CPU usage in us. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:417 @ static bool set_nr_if_polling(struct tas #endif #endif -static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task) +static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task, + bool sleeper) { - struct wake_q_node *node = &task->wake_q; + struct wake_q_node *node; + + if (sleeper) + node = &task->wake_q_sleeper; + else + node = &task->wake_q; /* * Atomically grab the task, if ->wake_q is !nil already it means @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:461 @ static bool __wake_q_add(struct wake_q_h */ void wake_q_add(struct wake_q_head *head, struct task_struct *task) { - if (__wake_q_add(head, task)) + if (__wake_q_add(head, task, false)) + get_task_struct(task); +} + +void wake_q_add_sleeper(struct wake_q_head *head, struct task_struct *task) +{ + if (__wake_q_add(head, task, true)) get_task_struct(task); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:490 @ void wake_q_add(struct wake_q_head *head */ void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task) { - if (!__wake_q_add(head, task)) + if (!__wake_q_add(head, task, false)) put_task_struct(task); } -void wake_up_q(struct wake_q_head *head) +void __wake_up_q(struct wake_q_head *head, bool sleeper) { struct wake_q_node *node = head->first; while (node != WAKE_Q_TAIL) { struct task_struct *task; - task = container_of(node, struct task_struct, wake_q); + if (sleeper) + task = container_of(node, struct task_struct, wake_q_sleeper); + else + task = container_of(node, struct task_struct, wake_q); + BUG_ON(!task); /* Task can safely be re-inserted now: */ node = node->next; - task->wake_q.next = NULL; + if (sleeper) + task->wake_q_sleeper.next = NULL; + else + task->wake_q.next = NULL; /* * wake_up_process() executes a full barrier, which pairs with * the queueing in wake_q_add() so as not to miss wakeups. */ - wake_up_process(task); + if (sleeper) + wake_up_lock_sleeper(task); + else + wake_up_process(task); + put_task_struct(task); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:558 @ void resched_curr(struct rq *rq) trace_sched_wake_idle_without_ipi(cpu); } +#ifdef CONFIG_PREEMPT_LAZY + +static int tsk_is_polling(struct task_struct *p) +{ +#ifdef TIF_POLLING_NRFLAG + return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG); +#else + return 0; +#endif +} + +void resched_curr_lazy(struct rq *rq) +{ + struct task_struct *curr = rq->curr; + int cpu; + + if (!sched_feat(PREEMPT_LAZY)) { + resched_curr(rq); + return; + } + + lockdep_assert_held(&rq->lock); + + if (test_tsk_need_resched(curr)) + return; + + if (test_tsk_need_resched_lazy(curr)) + return; + + set_tsk_need_resched_lazy(curr); + + cpu = cpu_of(rq); + if (cpu == smp_processor_id()) + return; + + /* NEED_RESCHED_LAZY must be visible before we test polling */ + smp_mb(); + if (!tsk_is_polling(curr)) + smp_send_reschedule(cpu); +} +#endif + void resched_cpu(int cpu) { struct rq *rq = cpu_rq(cpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1532 @ static inline bool is_cpu_allowed(struct if (!cpumask_test_cpu(cpu, p->cpus_ptr)) return false; - if (is_per_cpu_kthread(p)) + if (is_per_cpu_kthread(p) || __migrate_disabled(p)) return cpu_online(cpu); return cpu_active(cpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1581 @ static struct rq *move_queued_task(struc struct migration_arg { struct task_struct *task; int dest_cpu; + bool done; }; /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1617 @ static int migration_cpu_stop(void *data struct task_struct *p = arg->task; struct rq *rq = this_rq(); struct rq_flags rf; + int dest_cpu = arg->dest_cpu; + + /* We don't look at arg after this point. */ + smp_mb(); + arg->done = true; /* * The original target CPU might have gone down and we might @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1644 @ static int migration_cpu_stop(void *data */ if (task_rq(p) == rq) { if (task_on_rq_queued(p)) - rq = __migrate_task(rq, &rf, p, arg->dest_cpu); + rq = __migrate_task(rq, &rf, p, dest_cpu); else - p->wake_cpu = arg->dest_cpu; + p->wake_cpu = dest_cpu; } rq_unlock(rq, &rf); raw_spin_unlock(&p->pi_lock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1662 @ static int migration_cpu_stop(void *data void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask) { cpumask_copy(&p->cpus_mask, new_mask); - p->nr_cpus_allowed = cpumask_weight(new_mask); + if (p->cpus_ptr == &p->cpus_mask) + p->nr_cpus_allowed = cpumask_weight(new_mask); } +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +int __migrate_disabled(struct task_struct *p) +{ + return p->migrate_disable; +} +EXPORT_SYMBOL_GPL(__migrate_disabled); +#endif + void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask) { struct rq *rq = task_rq(p); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1740 @ static int __set_cpus_allowed_ptr(struct goto out; } - if (cpumask_equal(p->cpus_ptr, new_mask)) + if (cpumask_equal(&p->cpus_mask, new_mask)) goto out; dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1762 @ static int __set_cpus_allowed_ptr(struct } /* Can the task run on the task's current CPU? If so, we're done */ - if (cpumask_test_cpu(task_cpu(p), new_mask)) + if (cpumask_test_cpu(task_cpu(p), new_mask) || + p->cpus_ptr != &p->cpus_mask) goto out; if (task_running(rq, p) || p->state == TASK_WAKING) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1960 @ out: } #endif /* CONFIG_NUMA_BALANCING */ +static bool check_task_state(struct task_struct *p, long match_state) +{ + bool match = false; + + raw_spin_lock_irq(&p->pi_lock); + if (p->state == match_state || p->saved_state == match_state) + match = true; + raw_spin_unlock_irq(&p->pi_lock); + + return match; +} + /* * wait_task_inactive - wait for a thread to unschedule. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2016 @ unsigned long wait_task_inactive(struct * is actually now running somewhere else! */ while (task_running(rq, p)) { - if (match_state && unlikely(p->state != match_state)) + if (match_state && !check_task_state(p, match_state)) return 0; cpu_relax(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2031 @ unsigned long wait_task_inactive(struct running = task_running(rq, p); queued = task_on_rq_queued(p); ncsw = 0; - if (!match_state || p->state == match_state) + if (!match_state || p->state == match_state || + p->saved_state == match_state) ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ task_rq_unlock(rq, p, &rf); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2620 @ try_to_wake_up(struct task_struct *p, un int cpu, success = 0; preempt_disable(); + +#ifndef CONFIG_PREEMPT_RT if (p == current) { /* * We're waking current, this means 'p->on_rq' and 'task_cpu(p) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2644 @ try_to_wake_up(struct task_struct *p, un trace_sched_wakeup(p); goto out; } - +#endif /* * If we are going to wake up a thread waiting for CONDITION we * need to ensure that CONDITION=1 done by the caller can not be @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2653 @ try_to_wake_up(struct task_struct *p, un */ raw_spin_lock_irqsave(&p->pi_lock, flags); smp_mb__after_spinlock(); - if (!(p->state & state)) - goto unlock; + if (!(p->state & state)) { + /* + * The task might be running due to a spinlock sleeper + * wakeup. Check the saved state and set it to running + * if the wakeup condition is true. + */ + if (!(wake_flags & WF_LOCK_SLEEPER)) { + if (p->saved_state & state) { + p->saved_state = TASK_RUNNING; + success = 1; + } + } + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + goto out_nostat; + } + /* + * If this is a regular wakeup, then we can unconditionally + * clear the saved state of a "lock sleeper". + */ + if (!(wake_flags & WF_LOCK_SLEEPER)) + p->saved_state = TASK_RUNNING; trace_sched_waking(p); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2765 @ try_to_wake_up(struct task_struct *p, un ttwu_queue(p, cpu, wake_flags); unlock: raw_spin_unlock_irqrestore(&p->pi_lock, flags); +#ifndef CONFIG_PREEMPT_RT out: +#endif if (success) ttwu_stat(p, cpu, wake_flags); +out_nostat: preempt_enable(); return success; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2793 @ int wake_up_process(struct task_struct * } EXPORT_SYMBOL(wake_up_process); +/** + * wake_up_lock_sleeper - Wake up a specific process blocked on a "sleeping lock" + * @p: The process to be woken up. + * + * Same as wake_up_process() above, but wake_flags=WF_LOCK_SLEEPER to indicate + * the nature of the wakeup. + */ +int wake_up_lock_sleeper(struct task_struct *p) +{ + return try_to_wake_up(p, TASK_UNINTERRUPTIBLE, WF_LOCK_SLEEPER); +} + int wake_up_state(struct task_struct *p, unsigned int state) { return try_to_wake_up(p, state, 0); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3047 @ int sched_fork(unsigned long clone_flags p->on_cpu = 0; #endif init_task_preempt_count(p); +#ifdef CONFIG_HAVE_PREEMPT_LAZY + task_thread_info(p)->preempt_lazy_count = 0; +#endif #ifdef CONFIG_SMP plist_node_init(&p->pushable_tasks, MAX_PRIO); RB_CLEAR_NODE(&p->pushable_dl_tasks); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3377 @ static struct rq *finish_task_switch(str * provided by mmdrop(), * - a sync_core for SYNC_CORE. */ + /* + * We use mmdrop_delayed() here so we don't have to do the + * full __mmdrop() when we are the last user. + */ if (mm) { membarrier_mm_sync_core_before_usermode(mm); - mmdrop(mm); + mmdrop_delayed(mm); } if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); - /* - * Remove function-return probe instances associated with this - * task and put them back on the free list. - */ - kprobe_flush_task(prev); - - /* Task is done with its stack. */ - put_task_stack(prev); - put_task_struct_rcu_user(prev); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4091 @ restart: BUG(); } +static void migrate_disabled_sched(struct task_struct *p); + /* * __schedule() is the main scheduler function. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4163 @ static void __sched notrace __schedule(b rq_lock(rq, &rf); smp_mb__after_spinlock(); + if (__migrate_disabled(prev)) + migrate_disabled_sched(prev); + /* Promote REQ to ACT */ rq->clock_update_flags <<= 1; update_rq_clock(rq); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4187 @ static void __sched notrace __schedule(b next = pick_next_task(rq, prev, &rf); clear_tsk_need_resched(prev); + clear_tsk_need_resched_lazy(prev); clear_preempt_need_resched(); if (likely(prev != next)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4375 @ static void __sched notrace preempt_sche } while (need_resched()); } +#ifdef CONFIG_PREEMPT_LAZY +/* + * If TIF_NEED_RESCHED is then we allow to be scheduled away since this is + * set by a RT task. Oterwise we try to avoid beeing scheduled out as long as + * preempt_lazy_count counter >0. + */ +static __always_inline int preemptible_lazy(void) +{ + if (test_thread_flag(TIF_NEED_RESCHED)) + return 1; + if (current_thread_info()->preempt_lazy_count) + return 0; + return 1; +} + +#else + +static inline int preemptible_lazy(void) +{ + return 1; +} + +#endif + #ifdef CONFIG_PREEMPTION /* * This is the entry point to schedule() from in-kernel preemption @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4412 @ asmlinkage __visible void __sched notrac */ if (likely(!preemptible())) return; - + if (!preemptible_lazy()) + return; preempt_schedule_common(); } NOKPROBE_SYMBOL(preempt_schedule); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4440 @ asmlinkage __visible void __sched notrac if (likely(!preemptible())) return; + if (!preemptible_lazy()) + return; + do { /* * Because the function tracer can trace preempt_count_sub() @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6233 @ void init_idle(struct task_struct *idle, /* Set the preempt count _outside_ the spinlocks! */ init_idle_preempt_count(idle, cpu); - +#ifdef CONFIG_HAVE_PREEMPT_LAZY + task_thread_info(idle)->preempt_lazy_count = 0; +#endif /* * The idle tasks have their own, simple scheduling class: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6340 @ void sched_setnuma(struct task_struct *p #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_HOTPLUG_CPU +static DEFINE_PER_CPU(struct mm_struct *, idle_last_mm); + /* * Ensure that the idle task is using init_mm right before its CPU goes * offline. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6357 @ void idle_task_exit(void) current->active_mm = &init_mm; finish_arch_post_lock_switch(); } - mmdrop(mm); + /* + * Defer the cleanup to an alive cpu. On RT we can neither + * call mmdrop() nor mmdrop_delayed() from here. + */ + per_cpu(idle_last_mm, smp_processor_id()) = mm; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6439 @ static void migrate_tasks(struct rq *dea break; next = __pick_migrate_task(rq); + WARN_ON_ONCE(__migrate_disabled(next)); /* * Rules for changing task_struct::cpus_mask are holding @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6668 @ int sched_cpu_dying(unsigned int cpu) update_max_interval(); nohz_balance_exit_idle(rq); hrtick_clear(rq); + if (per_cpu(idle_last_mm, cpu)) { + mmdrop_delayed(per_cpu(idle_last_mm, cpu)); + per_cpu(idle_last_mm, cpu) = NULL; + } return 0; } #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6903 @ void __init sched_init(void) #ifdef CONFIG_DEBUG_ATOMIC_SLEEP static inline int preempt_count_equals(int preempt_offset) { - int nested = preempt_count() + rcu_preempt_depth(); + int nested = preempt_count() + sched_rcu_preempt_depth(); return (nested == preempt_offset); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:8129 @ const u32 sched_prio_to_wmult[40] = { }; #undef CREATE_TRACE_POINTS + +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) + +static inline void +update_nr_migratory(struct task_struct *p, long delta) +{ + if (unlikely((p->sched_class == &rt_sched_class || + p->sched_class == &dl_sched_class) && + p->nr_cpus_allowed > 1)) { + if (p->sched_class == &rt_sched_class) + task_rq(p)->rt.rt_nr_migratory += delta; + else + task_rq(p)->dl.dl_nr_migratory += delta; + } +} + +static inline void +migrate_disable_update_cpus_allowed(struct task_struct *p) +{ + p->cpus_ptr = cpumask_of(smp_processor_id()); + update_nr_migratory(p, -1); + p->nr_cpus_allowed = 1; +} + +static inline void +migrate_enable_update_cpus_allowed(struct task_struct *p) +{ + struct rq *rq; + struct rq_flags rf; + + rq = task_rq_lock(p, &rf); + p->cpus_ptr = &p->cpus_mask; + p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask); + update_nr_migratory(p, 1); + task_rq_unlock(rq, p, &rf); +} + +void migrate_disable(void) +{ + preempt_disable(); + + if (++current->migrate_disable == 1) { + this_rq()->nr_pinned++; + preempt_lazy_disable(); +#ifdef CONFIG_SCHED_DEBUG + WARN_ON_ONCE(current->pinned_on_cpu >= 0); + current->pinned_on_cpu = smp_processor_id(); +#endif + } + + preempt_enable(); +} +EXPORT_SYMBOL(migrate_disable); + +static void migrate_disabled_sched(struct task_struct *p) +{ + if (p->migrate_disable_scheduled) + return; + + migrate_disable_update_cpus_allowed(p); + p->migrate_disable_scheduled = 1; +} + +static DEFINE_PER_CPU(struct cpu_stop_work, migrate_work); +static DEFINE_PER_CPU(struct migration_arg, migrate_arg); + +void migrate_enable(void) +{ + struct task_struct *p = current; + struct rq *rq = this_rq(); + int cpu = task_cpu(p); + + WARN_ON_ONCE(p->migrate_disable <= 0); + if (p->migrate_disable > 1) { + p->migrate_disable--; + return; + } + + preempt_disable(); + +#ifdef CONFIG_SCHED_DEBUG + WARN_ON_ONCE(current->pinned_on_cpu != cpu); + current->pinned_on_cpu = -1; +#endif + + WARN_ON_ONCE(rq->nr_pinned < 1); + + p->migrate_disable = 0; + rq->nr_pinned--; +#ifdef CONFIG_HOTPLUG_CPU + if (rq->nr_pinned == 0 && unlikely(!cpu_active(cpu)) && + takedown_cpu_task) + wake_up_process(takedown_cpu_task); +#endif + + if (!p->migrate_disable_scheduled) + goto out; + + p->migrate_disable_scheduled = 0; + + migrate_enable_update_cpus_allowed(p); + + WARN_ON(smp_processor_id() != cpu); + if (!is_cpu_allowed(p, cpu)) { + struct migration_arg __percpu *arg; + struct cpu_stop_work __percpu *work; + struct rq_flags rf; + + work = this_cpu_ptr(&migrate_work); + arg = this_cpu_ptr(&migrate_arg); + WARN_ON_ONCE(!arg->done && !work->disabled && work->arg); + + arg->task = p; + arg->done = false; + + rq = task_rq_lock(p, &rf); + update_rq_clock(rq); + arg->dest_cpu = select_fallback_rq(cpu, p); + task_rq_unlock(rq, p, &rf); + + stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop, + arg, work); + } + +out: + preempt_lazy_enable(); + preempt_enable(); +} +EXPORT_SYMBOL(migrate_enable); + +int cpu_nr_pinned(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + return rq->nr_pinned; +} + +#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +static void migrate_disabled_sched(struct task_struct *p) +{ +} + +void migrate_disable(void) +{ +#ifdef CONFIG_SCHED_DEBUG + current->migrate_disable++; +#endif + barrier(); +} +EXPORT_SYMBOL(migrate_disable); + +void migrate_enable(void) +{ +#ifdef CONFIG_SCHED_DEBUG + struct task_struct *p = current; + + WARN_ON_ONCE(p->migrate_disable <= 0); + p->migrate_disable--; +#endif + barrier(); +} +EXPORT_SYMBOL(migrate_enable); + +#else +static void migrate_disabled_sched(struct task_struct *p) +{ +} + +#endif Index: linux-5.4.5-rt3/kernel/sched/debug.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/debug.c +++ linux-5.4.5-rt3/kernel/sched/debug.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:961 @ void proc_sched_show_task(struct task_st P(dl.runtime); P(dl.deadline); } +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) + P(migrate_disable); +#endif + P(nr_cpus_allowed); #undef PN_SCHEDSTAT #undef PN #undef __PN Index: linux-5.4.5-rt3/kernel/sched/fair.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/fair.c +++ linux-5.4.5-rt3/kernel/sched/fair.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4126 @ check_preempt_tick(struct cfs_rq *cfs_rq ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; if (delta_exec > ideal_runtime) { - resched_curr(rq_of(cfs_rq)); + resched_curr_lazy(rq_of(cfs_rq)); /* * The current task ran long enough, ensure it doesn't get * re-elected due to buddy favours. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4150 @ check_preempt_tick(struct cfs_rq *cfs_rq return; if (delta > ideal_runtime) - resched_curr(rq_of(cfs_rq)); + resched_curr_lazy(rq_of(cfs_rq)); } static void @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4293 @ entity_tick(struct cfs_rq *cfs_rq, struc * validating it and just reschedule. */ if (queued) { - resched_curr(rq_of(cfs_rq)); + resched_curr_lazy(rq_of(cfs_rq)); return; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4418 @ static void __account_cfs_rq_runtime(str * hierarchy can be throttled */ if (!assign_cfs_rq_runtime(cfs_rq) && likely(cfs_rq->curr)) - resched_curr(rq_of(cfs_rq)); + resched_curr_lazy(rq_of(cfs_rq)); } static __always_inline @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5131 @ static void hrtick_start_fair(struct rq if (delta < 0) { if (rq->curr == p) - resched_curr(rq); + resched_curr_lazy(rq); return; } hrtick_start(rq, delta); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6733 @ static void check_preempt_wakeup(struct return; preempt: - resched_curr(rq); + resched_curr_lazy(rq); /* * Only set the backward buddy when the current task is still * on the rq. This can happen when a wakeup gets interleaved @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:10000 @ static void task_fork_fair(struct task_s * 'current' within the tree based on its new key value. */ swap(curr->vruntime, se->vruntime); - resched_curr(rq); + resched_curr_lazy(rq); } se->vruntime -= cfs_rq->min_vruntime; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:10024 @ prio_changed_fair(struct rq *rq, struct */ if (rq->curr == p) { if (p->prio > oldprio) - resched_curr(rq); + resched_curr_lazy(rq); } else check_preempt_curr(rq, p, 0); } Index: linux-5.4.5-rt3/kernel/sched/features.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/features.h +++ linux-5.4.5-rt3/kernel/sched/features.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:48 @ SCHED_FEAT(DOUBLE_TICK, false) */ SCHED_FEAT(NONTASK_CAPACITY, true) +#ifdef CONFIG_PREEMPT_RT +SCHED_FEAT(TTWU_QUEUE, false) +# ifdef CONFIG_PREEMPT_LAZY +SCHED_FEAT(PREEMPT_LAZY, true) +# endif +#else + /* * Queue remote wakeups on the target CPU and process them * using the scheduler IPI. Reduces rq->lock contention/bounces. */ SCHED_FEAT(TTWU_QUEUE, true) +#endif /* * When doing wakeups, attempt to limit superfluous scans of the LLC domain. Index: linux-5.4.5-rt3/kernel/sched/isolation.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/isolation.c +++ linux-5.4.5-rt3/kernel/sched/isolation.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:11 @ * */ #include "sched.h" +#include "../../mm/internal.h" DEFINE_STATIC_KEY_FALSE(housekeeping_overridden); EXPORT_SYMBOL_GPL(housekeeping_overridden); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:143 @ static int __init housekeeping_setup(cha static int __init housekeeping_nohz_full_setup(char *str) { unsigned int flags; + int ret; flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC; - return housekeeping_setup(str, flags); + ret = housekeeping_setup(str, flags); + + /* + * Protect struct pagevec with a lock instead using preemption disable; + * with lock protection, remote handling of events instead of queue + * work on remote cpu is default behavior. + */ + if (ret) + static_branch_enable(&use_pvec_lock); + + return ret; } __setup("nohz_full=", housekeeping_nohz_full_setup); Index: linux-5.4.5-rt3/kernel/sched/sched.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/sched.h +++ linux-5.4.5-rt3/kernel/sched/sched.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1002 @ struct rq { /* Must be inspected within a rcu lock section */ struct cpuidle_state *idle_state; #endif + +#if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP) + int nr_pinned; +#endif }; #ifdef CONFIG_FAIR_GROUP_SCHED @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1651 @ static inline int task_on_rq_migrating(s #define WF_SYNC 0x01 /* Waker goes to sleep after wakeup */ #define WF_FORK 0x02 /* Child wakeup after fork */ #define WF_MIGRATED 0x4 /* Internal use, task got migrated */ +#define WF_LOCK_SLEEPER 0x08 /* wakeup spinlock "sleeper" */ /* * To aid in avoiding the subversion of "niceness" due to uneven distribution @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1879 @ extern void reweight_task(struct task_st extern void resched_curr(struct rq *rq); extern void resched_cpu(int cpu); +#ifdef CONFIG_PREEMPT_LAZY +extern void resched_curr_lazy(struct rq *rq); +#else +static inline void resched_curr_lazy(struct rq *rq) +{ + resched_curr(rq); +} +#endif + extern struct rt_bandwidth def_rt_bandwidth; extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime); Index: linux-5.4.5-rt3/kernel/sched/swait.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/swait.c +++ linux-5.4.5-rt3/kernel/sched/swait.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:35 @ void swake_up_locked(struct swait_queue_ } EXPORT_SYMBOL(swake_up_locked); +void swake_up_all_locked(struct swait_queue_head *q) +{ + struct swait_queue *curr; + int wakes = 0; + + while (!list_empty(&q->task_list)) { + + curr = list_first_entry(&q->task_list, typeof(*curr), + task_list); + wake_up_process(curr->task); + list_del_init(&curr->task_list); + wakes++; + } + if (pm_in_action) + return; + WARN(wakes > 2, "complete_all() with %d waiters\n", wakes); +} +EXPORT_SYMBOL(swake_up_all_locked); + void swake_up_one(struct swait_queue_head *q) { unsigned long flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:73 @ void swake_up_all(struct swait_queue_hea struct swait_queue *curr; LIST_HEAD(tmp); + WARN_ON(irqs_disabled()); raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:92 @ void swake_up_all(struct swait_queue_hea } EXPORT_SYMBOL(swake_up_all); -static void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) +void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait) { wait->task = current; if (list_empty(&wait->task_list)) Index: linux-5.4.5-rt3/kernel/sched/topology.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sched/topology.c +++ linux-5.4.5-rt3/kernel/sched/topology.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:505 @ static int init_rootdomain(struct root_d rd->rto_cpu = -1; raw_spin_lock_init(&rd->rto_lock); init_irq_work(&rd->rto_push_work, rto_push_irq_work_func); + rd->rto_push_work.flags |= IRQ_WORK_HARD_IRQ; #endif init_dl_bw(&rd->dl_bw); Index: linux-5.4.5-rt3/kernel/signal.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/signal.c +++ linux-5.4.5-rt3/kernel/signal.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:23 @ #include <linux/sched/task.h> #include <linux/sched/task_stack.h> #include <linux/sched/cputime.h> +#include <linux/sched/rt.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/proc_fs.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:407 @ void task_join_group_stop(struct task_st } } +static inline struct sigqueue *get_task_cache(struct task_struct *t) +{ + struct sigqueue *q = t->sigqueue_cache; + + if (cmpxchg(&t->sigqueue_cache, q, NULL) != q) + return NULL; + return q; +} + +static inline int put_task_cache(struct task_struct *t, struct sigqueue *q) +{ + if (cmpxchg(&t->sigqueue_cache, NULL, q) == NULL) + return 0; + return 1; +} + /* * allocate a new signal queue record * - this may be called without locks if and only if t == current, otherwise an * appropriate lock must be held to stop the target task from exiting */ static struct sigqueue * -__sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, int override_rlimit) +__sigqueue_do_alloc(int sig, struct task_struct *t, gfp_t flags, + int override_rlimit, int fromslab) { struct sigqueue *q = NULL; struct user_struct *user; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:447 @ __sigqueue_alloc(int sig, struct task_st if (override_rlimit || atomic_read(&user->sigpending) <= task_rlimit(t, RLIMIT_SIGPENDING)) { - q = kmem_cache_alloc(sigqueue_cachep, flags); + if (!fromslab) + q = get_task_cache(t); + if (!q) + q = kmem_cache_alloc(sigqueue_cachep, flags); } else { print_dropped_signal(sig); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:467 @ __sigqueue_alloc(int sig, struct task_st return q; } +static struct sigqueue * +__sigqueue_alloc(int sig, struct task_struct *t, gfp_t flags, + int override_rlimit) +{ + return __sigqueue_do_alloc(sig, t, flags, override_rlimit, 0); +} + static void __sigqueue_free(struct sigqueue *q) { if (q->flags & SIGQUEUE_PREALLOC) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:483 @ static void __sigqueue_free(struct sigqu kmem_cache_free(sigqueue_cachep, q); } +static void sigqueue_free_current(struct sigqueue *q) +{ + struct user_struct *up; + + if (q->flags & SIGQUEUE_PREALLOC) + return; + + up = q->user; + if (rt_prio(current->normal_prio) && !put_task_cache(current, q)) { + atomic_dec(&up->sigpending); + free_uid(up); + } else + __sigqueue_free(q); +} + void flush_sigqueue(struct sigpending *queue) { struct sigqueue *q; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:511 @ void flush_sigqueue(struct sigpending *q } /* + * Called from __exit_signal. Flush tsk->pending and + * tsk->sigqueue_cache + */ +void flush_task_sigqueue(struct task_struct *tsk) +{ + struct sigqueue *q; + + flush_sigqueue(&tsk->pending); + + q = get_task_cache(tsk); + if (q) + kmem_cache_free(sigqueue_cachep, q); +} + +/* * Flush all pending signals for this kthread. */ void flush_signals(struct task_struct *t) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:649 @ still_pending: (info->si_code == SI_TIMER) && (info->si_sys_private); - __sigqueue_free(first); + sigqueue_free_current(first); } else { /* * Ok, it wasn't in the queue. This must be @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:686 @ int dequeue_signal(struct task_struct *t bool resched_timer = false; int signr; + WARN_ON_ONCE(tsk != current); + /* We only dequeue private signals from ourselves, we don't let * signalfd steal them */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1371 @ force_sig_info_to_task(struct kernel_sig struct k_sigaction *action; int sig = info->si_signo; + /* + * On some archs, PREEMPT_RT has to delay sending a signal from a trap + * since it can not enable preemption, and the signal code's spin_locks + * turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME which will + * send the signal on exit of the trap. + */ +#ifdef ARCH_RT_DELAYS_SIGNAL_SEND + if (in_atomic()) { + struct task_struct *t = current; + + if (WARN_ON_ONCE(t->forced_info.si_signo)) + return 0; + + if (is_si_special(info)) { + WARN_ON_ONCE(info != SEND_SIG_PRIV); + t->forced_info.si_signo = info->si_signo; + t->forced_info.si_errno = 0; + t->forced_info.si_code = SI_KERNEL; + t->forced_info.si_pid = 0; + t->forced_info.si_uid = 0; + } else { + t->forced_info = *info; + } + + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); + return 0; + } +#endif spin_lock_irqsave(&t->sighand->siglock, flags); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1896 @ EXPORT_SYMBOL(kill_pid); */ struct sigqueue *sigqueue_alloc(void) { - struct sigqueue *q = __sigqueue_alloc(-1, current, GFP_KERNEL, 0); + /* Preallocated sigqueue objects always from the slabcache ! */ + struct sigqueue *q = __sigqueue_do_alloc(-1, current, GFP_KERNEL, 0, 1); if (q) q->flags |= SIGQUEUE_PREALLOC; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2289 @ static void ptrace_stop(int exit_code, i if (gstop_done && ptrace_reparented(current)) do_notify_parent_cldstop(current, false, why); - /* - * Don't want to allow preemption here, because - * sys_ptrace() needs this task to be inactive. - * - * XXX: implement read_unlock_no_resched(). - */ - preempt_disable(); read_unlock(&tasklist_lock); cgroup_enter_frozen(); - preempt_enable_no_resched(); freezable_schedule(); cgroup_leave_frozen(true); } else { Index: linux-5.4.5-rt3/kernel/smp.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/smp.c +++ linux-5.4.5-rt3/kernel/smp.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:398 @ call: } EXPORT_SYMBOL_GPL(smp_call_function_any); -/** - * smp_call_function_many(): Run a function on a set of other CPUs. - * @mask: The set of cpus to run on (only runs on online subset). - * @func: The function to run. This must be fast and non-blocking. - * @info: An arbitrary pointer to pass to the function. - * @wait: If true, wait (atomically) until function has completed - * on other CPUs. - * - * If @wait is true, then returns once @func has returned. - * - * You must not call this function with disabled interrupts or from a - * hardware interrupt handler or from a bottom half handler. Preemption - * must be disabled when calling this function. - */ -void smp_call_function_many(const struct cpumask *mask, - smp_call_func_t func, void *info, bool wait) +static void smp_call_function_many_cond(const struct cpumask *mask, + smp_call_func_t func, void *info, + bool wait, smp_cond_func_t cond_func) { struct call_function_data *cfd; int cpu, next_cpu, this_cpu = smp_processor_id(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:438 @ void smp_call_function_many(const struct /* Fastpath: do that cpu by itself. */ if (next_cpu >= nr_cpu_ids) { - smp_call_function_single(cpu, func, info, wait); + if (!cond_func || cond_func(cpu, info)) + smp_call_function_single(cpu, func, info, wait); return; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:456 @ void smp_call_function_many(const struct for_each_cpu(cpu, cfd->cpumask) { call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu); + if (cond_func && !cond_func(cpu, info)) + continue; + csd_lock(csd); if (wait) csd->flags |= CSD_FLAG_SYNCHRONOUS; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:480 @ void smp_call_function_many(const struct } } } + +/** + * smp_call_function_many(): Run a function on a set of other CPUs. + * @mask: The set of cpus to run on (only runs on online subset). + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait (atomically) until function has completed + * on other CPUs. + * + * If @wait is true, then returns once @func has returned. + * + * You must not call this function with disabled interrupts or from a + * hardware interrupt handler or from a bottom half handler. Preemption + * must be disabled when calling this function. + */ +void smp_call_function_many(const struct cpumask *mask, + smp_call_func_t func, void *info, bool wait) +{ + smp_call_function_many_cond(mask, func, info, wait, NULL); +} EXPORT_SYMBOL(smp_call_function_many); /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:682 @ EXPORT_SYMBOL(on_each_cpu_mask); * @info: An arbitrary pointer to pass to both functions. * @wait: If true, wait (atomically) until function has * completed on other CPUs. - * @gfp_flags: GFP flags to use when allocating the cpumask - * used internally by the function. - * - * The function might sleep if the GFP flags indicates a non - * atomic allocation is allowed. * * Preemption is disabled to protect against CPUs going offline but not online. * CPUs going online during the call will not be seen or sent an IPI. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:689 @ EXPORT_SYMBOL(on_each_cpu_mask); * You must not call this function with disabled interrupts or * from a hardware interrupt handler or from a bottom half handler. */ -void on_each_cpu_cond_mask(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags, const struct cpumask *mask) -{ - cpumask_var_t cpus; - int cpu, ret; - - might_sleep_if(gfpflags_allow_blocking(gfp_flags)); - - if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) { - preempt_disable(); - for_each_cpu(cpu, mask) - if (cond_func(cpu, info)) - __cpumask_set_cpu(cpu, cpus); - on_each_cpu_mask(cpus, func, info, wait); - preempt_enable(); - free_cpumask_var(cpus); - } else { - /* - * No free cpumask, bother. No matter, we'll - * just have to IPI them one by one. - */ - preempt_disable(); - for_each_cpu(cpu, mask) - if (cond_func(cpu, info)) { - ret = smp_call_function_single(cpu, func, - info, wait); - WARN_ON_ONCE(ret); - } - preempt_enable(); +void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait, const struct cpumask *mask) +{ + int cpu = get_cpu(); + + smp_call_function_many_cond(mask, func, info, wait, cond_func); + if (cpumask_test_cpu(cpu, mask) && cond_func(cpu, info)) { + unsigned long flags; + + local_irq_save(flags); + func(info); + local_irq_restore(flags); } + put_cpu(); } EXPORT_SYMBOL(on_each_cpu_cond_mask); -void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags) +void on_each_cpu_cond(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait) { - on_each_cpu_cond_mask(cond_func, func, info, wait, gfp_flags, - cpu_online_mask); + on_each_cpu_cond_mask(cond_func, func, info, wait, cpu_online_mask); } EXPORT_SYMBOL(on_each_cpu_cond); Index: linux-5.4.5-rt3/kernel/softirq.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/softirq.c +++ linux-5.4.5-rt3/kernel/softirq.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:28 @ #include <linux/smpboot.h> #include <linux/tick.h> #include <linux/irq.h> +#ifdef CONFIG_PREEMPT_RT +#include <linux/locallock.h> +#endif #define CREATE_TRACE_POINTS #include <trace/events/irq.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:108 @ static bool ksoftirqd_running(unsigned l * softirq and whether we just have bh disabled. */ +#ifdef CONFIG_PREEMPT_RT +static DEFINE_LOCAL_IRQ_LOCK(bh_lock); +static DEFINE_PER_CPU(long, softirq_counter); + +void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) +{ + unsigned long __maybe_unused flags; + long soft_cnt; + + WARN_ON_ONCE(in_irq()); + if (!in_atomic()) { + local_lock(bh_lock); + rcu_read_lock(); + } + soft_cnt = this_cpu_inc_return(softirq_counter); + WARN_ON_ONCE(soft_cnt == 0); + current->softirq_count += SOFTIRQ_DISABLE_OFFSET; + +#ifdef CONFIG_TRACE_IRQFLAGS + local_irq_save(flags); + if (soft_cnt == 1) + trace_softirqs_off(ip); + local_irq_restore(flags); +#endif +} +EXPORT_SYMBOL(__local_bh_disable_ip); + +static void local_bh_disable_rt(void) +{ + local_bh_disable(); +} + +void _local_bh_enable(void) +{ + unsigned long __maybe_unused flags; + long soft_cnt; + + soft_cnt = this_cpu_dec_return(softirq_counter); + WARN_ON_ONCE(soft_cnt < 0); + +#ifdef CONFIG_TRACE_IRQFLAGS + local_irq_save(flags); + if (soft_cnt == 0) + trace_softirqs_on(_RET_IP_); + local_irq_restore(flags); +#endif + + current->softirq_count -= SOFTIRQ_DISABLE_OFFSET; + if (!in_atomic()) { + rcu_read_unlock(); + local_unlock(bh_lock); + } +} + +void _local_bh_enable_rt(void) +{ + _local_bh_enable(); +} + +void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) +{ + u32 pending; + long count; + + WARN_ON_ONCE(in_irq()); + lockdep_assert_irqs_enabled(); + + local_irq_disable(); + count = this_cpu_read(softirq_counter); + + if (unlikely(count == 1)) { + pending = local_softirq_pending(); + if (pending && !ksoftirqd_running(pending)) { + if (!in_atomic()) + __do_softirq(); + else + wakeup_softirqd(); + } + trace_softirqs_on(ip); + } + count = this_cpu_dec_return(softirq_counter); + WARN_ON_ONCE(count < 0); + local_irq_enable(); + + if (!in_atomic()) { + rcu_read_unlock(); + local_unlock(bh_lock); + } + + current->softirq_count -= SOFTIRQ_DISABLE_OFFSET; + preempt_check_resched(); +} +EXPORT_SYMBOL(__local_bh_enable_ip); + +#else +static void local_bh_disable_rt(void) { } +static void _local_bh_enable_rt(void) { } + /* * This one is for softirq.c-internal use, * where hardirqs are disabled legitimately: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:300 @ void __local_bh_enable_ip(unsigned long preempt_check_resched(); } EXPORT_SYMBOL(__local_bh_enable_ip); +#endif /* * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:371 @ asmlinkage __visible void __softirq_entr pending = local_softirq_pending(); account_irq_enter_time(current); +#ifdef CONFIG_PREEMPT_RT + current->softirq_count |= SOFTIRQ_OFFSET; +#else __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET); +#endif in_hardirq = lockdep_softirq_start(); restart: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:409 @ restart: h++; pending >>= softirq_bit; } - +#ifndef CONFIG_PREEMPT_RT if (__this_cpu_read(ksoftirqd) == current) rcu_softirq_qs(); +#endif local_irq_disable(); pending = local_softirq_pending(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:426 @ restart: lockdep_softirq_end(in_hardirq); account_irq_exit_time(current); +#ifdef CONFIG_PREEMPT_RT + current->softirq_count &= ~SOFTIRQ_OFFSET; +#else __local_bh_enable(SOFTIRQ_OFFSET); +#endif WARN_ON_ONCE(in_interrupt()); current_restore_flags(old_flags, PF_MEMALLOC); } +#ifndef CONFIG_PREEMPT_RT asmlinkage __visible void do_softirq(void) { __u32 pending; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:453 @ asmlinkage __visible void do_softirq(voi local_irq_restore(flags); } +#endif /* * Enter an interrupt context. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:474 @ void irq_enter(void) __irq_enter(); } +#ifdef CONFIG_PREEMPT_RT + +static inline void invoke_softirq(void) +{ + if (this_cpu_read(softirq_counter) == 0) + wakeup_softirqd(); +} + +#else + static inline void invoke_softirq(void) { if (ksoftirqd_running(local_softirq_pending())) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:509 @ static inline void invoke_softirq(void) wakeup_softirqd(); } } +#endif static inline void tick_irq_exit(void) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:547 @ void irq_exit(void) /* * This function must run with irqs disabled! */ +#ifdef CONFIG_PREEMPT_RT +void raise_softirq_irqoff(unsigned int nr) +{ + __raise_softirq_irqoff(nr); + + /* + * If we're in an hard interrupt we let irq return code deal + * with the wakeup of ksoftirqd. + */ + if (in_irq()) + return; + /* + * If were are not in BH-disabled section then we have to wake + * ksoftirqd. + */ + if (this_cpu_read(softirq_counter) == 0) + wakeup_softirqd(); +} + +#else + inline void raise_softirq_irqoff(unsigned int nr) { __raise_softirq_irqoff(nr); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:585 @ inline void raise_softirq_irqoff(unsigne wakeup_softirqd(); } +#endif + void raise_softirq(unsigned int nr) { unsigned long flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:714 @ void tasklet_kill(struct tasklet_struct while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) { do { - yield(); + local_bh_disable(); + local_bh_enable(); } while (test_bit(TASKLET_STATE_SCHED, &t->state)); } tasklet_unlock_wait(t); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:745 @ static int ksoftirqd_should_run(unsigned static void run_ksoftirqd(unsigned int cpu) { + local_bh_disable_rt(); local_irq_disable(); if (local_softirq_pending()) { /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:754 @ static void run_ksoftirqd(unsigned int c */ __do_softirq(); local_irq_enable(); + _local_bh_enable_rt(); cond_resched(); return; } local_irq_enable(); + _local_bh_enable_rt(); } #ifdef CONFIG_HOTPLUG_CPU @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:833 @ static struct smp_hotplug_thread softirq static __init int spawn_ksoftirqd(void) { +#ifdef CONFIG_PREEMPT_RT + int cpu; + + for_each_possible_cpu(cpu) + lockdep_set_novalidate_class(per_cpu_ptr(&bh_lock.lock, cpu)); +#endif + cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, takeover_tasklets); BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:848 @ static __init int spawn_ksoftirqd(void) } early_initcall(spawn_ksoftirqd); +#ifdef CONFIG_PREEMPT_RT + +/* + * On preempt-rt a softirq running context might be blocked on a + * lock. There might be no other runnable task on this CPU because the + * lock owner runs on some other CPU. So we have to go into idle with + * the pending bit set. Therefor we need to check this otherwise we + * warn about false positives which confuses users and defeats the + * whole purpose of this test. + * + * This code is called with interrupts disabled. + */ +void softirq_check_pending_idle(void) +{ + struct task_struct *tsk = __this_cpu_read(ksoftirqd); + static int rate_limit; + bool okay = false; + u32 warnpending; + + if (rate_limit >= 10) + return; + + warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK; + if (!warnpending) + return; + + if (!tsk) + return; + /* + * If ksoftirqd is blocked on a lock then we may go idle with pending + * softirq. + */ + raw_spin_lock(&tsk->pi_lock); + if (tsk->pi_blocked_on || tsk->state == TASK_RUNNING || + (tsk->state == TASK_UNINTERRUPTIBLE && tsk->sleeping_lock)) { + okay = true; + } + raw_spin_unlock(&tsk->pi_lock); + if (okay) + return; + /* + * The softirq lock is held in non-atomic context and the owner is + * blocking on a lock. It will schedule softirqs once the counter goes + * back to zero. + */ + if (this_cpu_read(softirq_counter) > 0) + return; + + printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n", + warnpending); + rate_limit++; +} + +#else + +void softirq_check_pending_idle(void) +{ + static int ratelimit; + + if (ratelimit < 10 && + (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { + pr_warn("NOHZ: local_softirq_pending %02x\n", + (unsigned int) local_softirq_pending()); + ratelimit++; + } +} + +#endif + /* * [ These __weak aliases are kept in a separate compilation unit, so that * GCC does not inline them incorrectly. ] Index: linux-5.4.5-rt3/kernel/stop_machine.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/stop_machine.c +++ linux-5.4.5-rt3/kernel/stop_machine.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:89 @ static bool cpu_stop_queue_work(unsigned enabled = stopper->enabled; if (enabled) __cpu_stop_queue_work(stopper, work, &wakeq); - else if (work->done) - cpu_stop_signal_done(work->done); + else { + work->disabled = true; + if (work->done) + cpu_stop_signal_done(work->done); + } raw_spin_unlock_irqrestore(&stopper->lock, flags); wake_up_q(&wakeq); Index: linux-5.4.5-rt3/kernel/sysctl.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/sysctl.c +++ linux-5.4.5-rt3/kernel/sysctl.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1491 @ static struct ctl_table vm_table[] = { .extra1 = &min_extfrag_threshold, .extra2 = &max_extfrag_threshold, }, +#ifndef CONFIG_PREEMPT_RT { .procname = "compact_unevictable_allowed", .data = &sysctl_compact_unevictable_allowed, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1501 @ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, - +#endif #endif /* CONFIG_COMPACTION */ { .procname = "min_free_kbytes", Index: linux-5.4.5-rt3/kernel/time/hrtimer.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/hrtimer.c +++ linux-5.4.5-rt3/kernel/time/hrtimer.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1822 @ static void __hrtimer_init_sleeper(struc * expiry. */ if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - if (task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT)) + if ((task_is_realtime(current) && !(mode & HRTIMER_MODE_SOFT)) || system_state != SYSTEM_RUNNING) mode |= HRTIMER_MODE_HARD; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1985 @ SYSCALL_DEFINE2(nanosleep_time32, struct } #endif +#ifdef CONFIG_PREEMPT_RT +/* + * Sleep for 1 ms in hope whoever holds what we want will let it go. + */ +void cpu_chill(void) +{ + unsigned int freeze_flag = current->flags & PF_NOFREEZE; + struct task_struct *self = current; + ktime_t chill_time; + + raw_spin_lock_irq(&self->pi_lock); + self->saved_state = self->state; + __set_current_state_no_track(TASK_UNINTERRUPTIBLE); + raw_spin_unlock_irq(&self->pi_lock); + + chill_time = ktime_set(0, NSEC_PER_MSEC); + + current->flags |= PF_NOFREEZE; + sleeping_lock_inc(); + schedule_hrtimeout(&chill_time, HRTIMER_MODE_REL_HARD); + sleeping_lock_dec(); + if (!freeze_flag) + current->flags &= ~PF_NOFREEZE; + + raw_spin_lock_irq(&self->pi_lock); + __set_current_state_no_track(self->saved_state); + self->saved_state = TASK_RUNNING; + raw_spin_unlock_irq(&self->pi_lock); +} +EXPORT_SYMBOL(cpu_chill); +#endif + /* * Functions related to boot-time initialization: */ Index: linux-5.4.5-rt3/kernel/time/jiffies.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/jiffies.c +++ linux-5.4.5-rt3/kernel/time/jiffies.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:61 @ static struct clocksource clocksource_ji .max_cycles = 10, }; -__cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock); +__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); +__cacheline_aligned_in_smp seqcount_t jiffies_seq; #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:71 @ u64 get_jiffies_64(void) u64 ret; do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); ret = jiffies_64; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); return ret; } EXPORT_SYMBOL(get_jiffies_64); Index: linux-5.4.5-rt3/kernel/time/posix-cpu-timers.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/posix-cpu-timers.c +++ linux-5.4.5-rt3/kernel/time/posix-cpu-timers.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ * Implement CPU time clocks for the POSIX clock interface. */ +#include <uapi/linux/sched/types.h> #include <linux/sched/signal.h> #include <linux/sched/cputime.h> +#include <linux/sched/rt.h> #include <linux/posix-timers.h> #include <linux/errno.h> #include <linux/math64.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:20 @ #include <linux/workqueue.h> #include <linux/compat.h> #include <linux/sched/deadline.h> +#include <linux/smpboot.h> #include "posix-timers.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ void posix_cputimers_group_init(struct p pct->bases[CPUCLOCK_PROF].nextevt = cpu_limit * NSEC_PER_SEC; pct->timers_active = true; } +#ifdef CONFIG_PREEMPT_RT + pct->posix_timer_list = NULL; +#endif } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:449 @ static int posix_cpu_timer_del(struct k_ return ret; } +static DEFINE_PER_CPU(spinlock_t, cpu_timer_expiry_lock) = __SPIN_LOCK_UNLOCKED(cpu_timer_expiry_lock); + +static void posix_cpu_wait_running(struct k_itimer *timer) +{ + int cpu = timer->it.cpu.firing_cpu; + + if (cpu >= 0) { + spinlock_t *expiry_lock = per_cpu_ptr(&cpu_timer_expiry_lock, cpu); + + spin_lock_irq(expiry_lock); + spin_unlock_irq(expiry_lock); + } +} + static void cleanup_timerqueue(struct timerqueue_head *head) { struct timerqueue_node *node; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:800 @ static u64 collect_timerqueue(struct tim return expires; ctmr->firing = 1; + ctmr->firing_cpu = smp_processor_id(); cpu_timer_dequeue(ctmr); list_add_tail(&ctmr->elist, firing); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:828 @ static inline void check_dl_overrun(stru } } -static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard) +static bool check_rlimit(struct task_struct *tsk, u64 time, u64 limit, + int signo, bool rt, bool hard) { if (time < limit) return false; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:837 @ static bool check_rlimit(u64 time, u64 l if (print_fatal_signals) { pr_info("%s Watchdog Timeout (%s): %s[%d]\n", rt ? "RT" : "CPU", hard ? "hard" : "soft", - current->comm, task_pid_nr(current)); + tsk->comm, task_pid_nr(tsk)); } - __group_send_sig_info(signo, SEND_SIG_PRIV, current); + __group_send_sig_info(signo, SEND_SIG_PRIV, tsk); return true; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:875 @ static void check_thread_timers(struct t /* At the hard limit, send SIGKILL. No further action. */ if (hard != RLIM_INFINITY && - check_rlimit(rttime, hard, SIGKILL, true, true)) + check_rlimit(tsk, rttime, hard, SIGKILL, true, true)) return; /* At the soft limit, send a SIGXCPU every second */ - if (check_rlimit(rttime, soft, SIGXCPU, true, false)) { + if (check_rlimit(tsk, rttime, soft, SIGXCPU, true, false)) { soft += USEC_PER_SEC; tsk->signal->rlim[RLIMIT_RTTIME].rlim_cur = soft; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:974 @ static void check_process_timers(struct /* At the hard limit, send SIGKILL. No further action. */ if (hard != RLIM_INFINITY && - check_rlimit(ptime, hardns, SIGKILL, false, true)) + check_rlimit(tsk, ptime, hardns, SIGKILL, false, true)) return; /* At the soft limit, send a SIGXCPU every second */ - if (check_rlimit(ptime, softns, SIGXCPU, false, false)) { + if (check_rlimit(tsk, ptime, softns, SIGXCPU, false, false)) { sig->rlim[RLIMIT_CPU].rlim_cur = soft + 1; softns += NSEC_PER_SEC; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1135 @ static inline bool fastpath_timer_check( * already updated our counts. We need to check if any timers fire now. * Interrupts are disabled. */ -void run_posix_cpu_timers(void) +static void __run_posix_cpu_timers(struct task_struct *tsk) { - struct task_struct *tsk = current; struct k_itimer *timer, *next; unsigned long flags; + spinlock_t *expiry_lock; LIST_HEAD(firing); - lockdep_assert_irqs_disabled(); - /* * The fast path checks that there are no expired thread or thread * group timers. If that's so, just return. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1149 @ void run_posix_cpu_timers(void) if (!fastpath_timer_check(tsk)) return; - if (!lock_task_sighand(tsk, &flags)) + expiry_lock = this_cpu_ptr(&cpu_timer_expiry_lock); + spin_lock(expiry_lock); + + if (!lock_task_sighand(tsk, &flags)) { + spin_unlock(expiry_lock); return; + } /* * Here we take off tsk->signal->cpu_timers[N] and * tsk->cpu_timers[N] all the timers that are firing, and @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1188 @ void run_posix_cpu_timers(void) list_del_init(&timer->it.cpu.elist); cpu_firing = timer->it.cpu.firing; timer->it.cpu.firing = 0; + timer->it.cpu.firing_cpu = -1; /* * The firing flag is -1 if we collided with a reset * of the timer, which already reported this @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1198 @ void run_posix_cpu_timers(void) cpu_timer_fire(timer); spin_unlock(&timer->it_lock); } + spin_unlock(expiry_lock); } +#ifdef CONFIG_PREEMPT_RT +#include <linux/kthread.h> +#include <linux/cpu.h> +DEFINE_PER_CPU(struct task_struct *, posix_timer_task); +DEFINE_PER_CPU(struct task_struct *, posix_timer_tasklist); +DEFINE_PER_CPU(bool, posix_timer_th_active); + +static void posix_cpu_kthread_fn(unsigned int cpu) +{ + struct task_struct *tsk = NULL; + struct task_struct *next = NULL; + + BUG_ON(per_cpu(posix_timer_task, cpu) != current); + + /* grab task list */ + raw_local_irq_disable(); + tsk = per_cpu(posix_timer_tasklist, cpu); + per_cpu(posix_timer_tasklist, cpu) = NULL; + raw_local_irq_enable(); + + /* its possible the list is empty, just return */ + if (!tsk) + return; + + /* Process task list */ + while (1) { + /* save next */ + next = tsk->posix_cputimers.posix_timer_list; + + /* run the task timers, clear its ptr and + * unreference it + */ + __run_posix_cpu_timers(tsk); + tsk->posix_cputimers.posix_timer_list = NULL; + put_task_struct(tsk); + + /* check if this is the last on the list */ + if (next == tsk) + break; + tsk = next; + } +} + +static inline int __fastpath_timer_check(struct task_struct *tsk) +{ + /* tsk == current, ensure it is safe to use ->signal/sighand */ + if (unlikely(tsk->exit_state)) + return 0; + + if (!expiry_cache_is_inactive(&tsk->posix_cputimers)) + return 1; + + if (!expiry_cache_is_inactive(&tsk->signal->posix_cputimers)) + return 1; + + return 0; +} + +void run_posix_cpu_timers(void) +{ + unsigned int cpu = smp_processor_id(); + struct task_struct *tsk = current; + struct task_struct *tasklist; + + BUG_ON(!irqs_disabled()); + + if (per_cpu(posix_timer_th_active, cpu) != true) + return; + + /* get per-cpu references */ + tasklist = per_cpu(posix_timer_tasklist, cpu); + + /* check to see if we're already queued */ + if (!tsk->posix_cputimers.posix_timer_list && __fastpath_timer_check(tsk)) { + get_task_struct(tsk); + if (tasklist) { + tsk->posix_cputimers.posix_timer_list = tasklist; + } else { + /* + * The list is terminated by a self-pointing + * task_struct + */ + tsk->posix_cputimers.posix_timer_list = tsk; + } + per_cpu(posix_timer_tasklist, cpu) = tsk; + + wake_up_process(per_cpu(posix_timer_task, cpu)); + } +} + +static int posix_cpu_kthread_should_run(unsigned int cpu) +{ + return __this_cpu_read(posix_timer_tasklist) != NULL; +} + +static void posix_cpu_kthread_park(unsigned int cpu) +{ + this_cpu_write(posix_timer_th_active, false); +} + +static void posix_cpu_kthread_unpark(unsigned int cpu) +{ + this_cpu_write(posix_timer_th_active, true); +} + +static void posix_cpu_kthread_setup(unsigned int cpu) +{ + struct sched_param sp; + + sp.sched_priority = MAX_RT_PRIO - 1; + sched_setscheduler_nocheck(current, SCHED_FIFO, &sp); + posix_cpu_kthread_unpark(cpu); +} + +static struct smp_hotplug_thread posix_cpu_thread = { + .store = &posix_timer_task, + .thread_should_run = posix_cpu_kthread_should_run, + .thread_fn = posix_cpu_kthread_fn, + .thread_comm = "posixcputmr/%u", + .setup = posix_cpu_kthread_setup, + .park = posix_cpu_kthread_park, + .unpark = posix_cpu_kthread_unpark, +}; + +static int __init posix_cpu_thread_init(void) +{ + /* Start one for boot CPU. */ + unsigned long cpu; + int ret; + + /* init the per-cpu posix_timer_tasklets */ + for_each_possible_cpu(cpu) + per_cpu(posix_timer_tasklist, cpu) = NULL; + + ret = smpboot_register_percpu_thread(&posix_cpu_thread); + WARN_ON(ret); + + return 0; +} +early_initcall(posix_cpu_thread_init); + +#else /* CONFIG_PREEMPT_RT */ +void run_posix_cpu_timers(void) +{ + lockdep_assert_irqs_disabled(); + __run_posix_cpu_timers(current); +} +#endif /* CONFIG_PREEMPT_RT */ + /* * Set one of the process-wide special case CPU timers or RLIMIT_CPU. * The tsk->sighand->siglock must be held by the caller. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1461 @ static int do_cpu_nanosleep(const clocki spin_unlock_irq(&timer.it_lock); while (error == TIMER_RETRY) { + + posix_cpu_wait_running(&timer); /* * We need to handle case when timer was or is in the * middle of firing. In other cases we already freed @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1581 @ const struct k_clock clock_posix_cpu = { .timer_del = posix_cpu_timer_del, .timer_get = posix_cpu_timer_get, .timer_rearm = posix_cpu_timer_rearm, + .timer_wait_running = posix_cpu_wait_running, }; const struct k_clock clock_process = { Index: linux-5.4.5-rt3/kernel/time/tick-common.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/tick-common.c +++ linux-5.4.5-rt3/kernel/time/tick-common.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:86 @ int tick_is_oneshot_available(void) static void tick_periodic(int cpu) { if (tick_do_timer_cpu == cpu) { - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); /* Keep track of the next tick event */ tick_next_period = ktime_add(tick_next_period, tick_period); do_timer(1); - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:166 @ void tick_setup_periodic(struct clock_ev ktime_t next; do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); next = tick_next_period; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT); Index: linux-5.4.5-rt3/kernel/time/tick-sched.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/tick-sched.c +++ linux-5.4.5-rt3/kernel/time/tick-sched.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:68 @ static void tick_do_update_jiffies64(kti return; /* Reevaluate with jiffies_lock held */ - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); delta = ktime_sub(now, last_jiffies_update); if (delta >= tick_period) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:95 @ static void tick_do_update_jiffies64(kti /* Keep the tick_next_period variable up to date */ tick_next_period = ktime_add(last_jiffies_update, tick_period); } else { - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); return; } - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:111 @ static ktime_t tick_init_jiffy_update(vo { ktime_t period; - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); /* Did we start the jiffies update yet ? */ if (last_jiffies_update == 0) last_jiffies_update = tick_next_period; period = last_jiffies_update; - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); return period; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:242 @ static void nohz_full_kick_func(struct i static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = { .func = nohz_full_kick_func, + .flags = IRQ_WORK_HARD_IRQ, }; /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:674 @ static ktime_t tick_nohz_next_event(stru /* Read jiffies and the time when jiffies were updated last */ do { - seq = read_seqbegin(&jiffies_lock); + seq = read_seqcount_begin(&jiffies_seq); basemono = last_jiffies_update; basejiff = jiffies; - } while (read_seqretry(&jiffies_lock, seq)); + } while (read_seqcount_retry(&jiffies_seq, seq)); ts->last_jiffies = basejiff; ts->timer_expires_base = basemono; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:907 @ static bool can_stop_idle_tick(int cpu, return false; if (unlikely(local_softirq_pending())) { - static int ratelimit; - - if (ratelimit < 10 && - (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { - pr_warn("NOHZ: local_softirq_pending %02x\n", - (unsigned int) local_softirq_pending()); - ratelimit++; - } + softirq_check_pending_idle(); return false; } Index: linux-5.4.5-rt3/kernel/time/timekeeping.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/timekeeping.c +++ linux-5.4.5-rt3/kernel/time/timekeeping.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2400 @ EXPORT_SYMBOL(hardpps); */ void xtime_update(unsigned long ticks) { - write_seqlock(&jiffies_lock); + raw_spin_lock(&jiffies_lock); + write_seqcount_begin(&jiffies_seq); do_timer(ticks); - write_sequnlock(&jiffies_lock); + write_seqcount_end(&jiffies_seq); + raw_spin_unlock(&jiffies_lock); update_wall_time(); } Index: linux-5.4.5-rt3/kernel/time/timekeeping.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/timekeeping.h +++ linux-5.4.5-rt3/kernel/time/timekeeping.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:28 @ static inline void sched_clock_resume(vo extern void do_timer(unsigned long ticks); extern void update_wall_time(void); -extern seqlock_t jiffies_lock; +extern raw_spinlock_t jiffies_lock; +extern seqcount_t jiffies_seq; #define CS_NAME_LEN 32 Index: linux-5.4.5-rt3/kernel/time/timer.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/time/timer.c +++ linux-5.4.5-rt3/kernel/time/timer.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1786 @ static __latent_entropy void run_timer_s { struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + irq_work_tick_soft(); + __run_timers(base); if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); Index: linux-5.4.5-rt3/kernel/trace/trace.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/trace/trace.c +++ linux-5.4.5-rt3/kernel/trace/trace.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2344 @ tracing_generic_entry_update(struct trac struct task_struct *tsk = current; entry->preempt_count = pc & 0xff; + entry->preempt_lazy_count = preempt_lazy_count(); entry->pid = (tsk) ? tsk->pid : 0; entry->type = type; entry->flags = @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2356 @ tracing_generic_entry_update(struct trac ((pc & NMI_MASK ) ? TRACE_FLAG_NMI : 0) | ((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) | ((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) | - (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) | + (tif_need_resched_now() ? TRACE_FLAG_NEED_RESCHED : 0) | + (need_resched_lazy() ? TRACE_FLAG_NEED_RESCHED_LAZY : 0) | (test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0); + + entry->migrate_disable = (tsk) ? __migrate_disabled(tsk) & 0xFF : 0; } EXPORT_SYMBOL_GPL(tracing_generic_entry_update); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3586 @ unsigned long trace_total_entries(struct static void print_lat_help_header(struct seq_file *m) { - seq_puts(m, "# _------=> CPU# \n" - "# / _-----=> irqs-off \n" - "# | / _----=> need-resched \n" - "# || / _---=> hardirq/softirq \n" - "# ||| / _--=> preempt-depth \n" - "# |||| / delay \n" - "# cmd pid ||||| time | caller \n" - "# \\ / ||||| \\ | / \n"); + seq_puts(m, "# _--------=> CPU# \n" + "# / _-------=> irqs-off \n" + "# | / _------=> need-resched \n" + "# || / _-----=> need-resched_lazy \n" + "# ||| / _----=> hardirq/softirq \n" + "# |||| / _---=> preempt-depth \n" + "# ||||| / _--=> preempt-lazy-depth\n" + "# |||||| / _-=> migrate-disable \n" + "# ||||||| / delay \n" + "# cmd pid |||||||| time | caller \n" + "# \\ / |||||||| \\ | / \n"); } static void print_event_info(struct trace_buffer *buf, struct seq_file *m) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3632 @ static void print_func_help_header_irq(s seq_printf(m, "# %.*s _-----=> irqs-off\n", prec, space); seq_printf(m, "# %.*s / _----=> need-resched\n", prec, space); - seq_printf(m, "# %.*s| / _---=> hardirq/softirq\n", prec, space); - seq_printf(m, "# %.*s|| / _--=> preempt-depth\n", prec, space); - seq_printf(m, "# %.*s||| / delay\n", prec, space); - seq_printf(m, "# TASK-PID %.*sCPU# |||| TIMESTAMP FUNCTION\n", prec, " TGID "); - seq_printf(m, "# | | %.*s | |||| | |\n", prec, " | "); + seq_printf(m, "# %.*s| / _----=> need-resched\n", prec, space); + seq_printf(m, "# %.*s|| / _---=> hardirq/softirq\n", prec, space); + seq_printf(m, "# %.*s||| / _--=> preempt-depth\n", prec, space); + seq_printf(m, "# %.*s||||/ delay\n", prec, space); + seq_printf(m, "# TASK-PID %.*sCPU# ||||| TIMESTAMP FUNCTION\n", prec, " TGID "); + seq_printf(m, "# | | %.*s | ||||| | |\n", prec, " | "); } void @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3671 @ print_trace_header(struct seq_file *m, s "desktop", #elif defined(CONFIG_PREEMPT) "preempt", +#elif defined(CONFIG_PREEMPT_RT) + "preempt_rt", #else "unknown", #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:8940 @ void ftrace_dump(enum ftrace_dump_mode o tracing_off(); local_irq_save(flags); - printk_nmi_direct_enter(); /* Simulate the iterator */ trace_init_global_iter(&iter); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:9016 @ void ftrace_dump(enum ftrace_dump_mode o atomic_dec(&per_cpu_ptr(iter.trace_buffer->data, cpu)->disabled); } atomic_dec(&dump_running); - printk_nmi_direct_exit(); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(ftrace_dump); Index: linux-5.4.5-rt3/kernel/trace/trace.h =================================================================== --- linux-5.4.5-rt3.orig/kernel/trace/trace.h +++ linux-5.4.5-rt3/kernel/trace/trace.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:129 @ struct kretprobe_trace_entry_head { * NEED_RESCHED - reschedule is requested * HARDIRQ - inside an interrupt handler * SOFTIRQ - inside a softirq handler + * NEED_RESCHED_LAZY - lazy reschedule is requested */ enum trace_flag_type { TRACE_FLAG_IRQS_OFF = 0x01, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:139 @ enum trace_flag_type { TRACE_FLAG_SOFTIRQ = 0x10, TRACE_FLAG_PREEMPT_RESCHED = 0x20, TRACE_FLAG_NMI = 0x40, + TRACE_FLAG_NEED_RESCHED_LAZY = 0x80, }; #define TRACE_BUF_SIZE 1024 Index: linux-5.4.5-rt3/kernel/trace/trace_events.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/trace/trace_events.c +++ linux-5.4.5-rt3/kernel/trace/trace_events.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:184 @ static int trace_define_common_fields(vo __common_field(unsigned char, flags); __common_field(unsigned char, preempt_count); __common_field(int, pid); + __common_field(unsigned short, migrate_disable); + __common_field(unsigned short, padding); return ret; } Index: linux-5.4.5-rt3/kernel/trace/trace_output.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/trace/trace_output.c +++ linux-5.4.5-rt3/kernel/trace/trace_output.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:429 @ int trace_print_lat_fmt(struct trace_seq { char hardsoft_irq; char need_resched; + char need_resched_lazy; char irqs_off; int hardirq; int softirq; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:460 @ int trace_print_lat_fmt(struct trace_seq break; } + need_resched_lazy = + (entry->flags & TRACE_FLAG_NEED_RESCHED_LAZY) ? 'L' : '.'; + hardsoft_irq = (nmi && hardirq) ? 'Z' : nmi ? 'z' : @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:471 @ int trace_print_lat_fmt(struct trace_seq softirq ? 's' : '.' ; - trace_seq_printf(s, "%c%c%c", - irqs_off, need_resched, hardsoft_irq); + trace_seq_printf(s, "%c%c%c%c", + irqs_off, need_resched, need_resched_lazy, + hardsoft_irq); if (entry->preempt_count) trace_seq_printf(s, "%x", entry->preempt_count); else trace_seq_putc(s, '.'); + if (entry->preempt_lazy_count) + trace_seq_printf(s, "%x", entry->preempt_lazy_count); + else + trace_seq_putc(s, '.'); + + if (entry->migrate_disable) + trace_seq_printf(s, "%x", entry->migrate_disable); + else + trace_seq_putc(s, '.'); + return !trace_seq_has_overflowed(s); } Index: linux-5.4.5-rt3/kernel/up.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/up.c +++ linux-5.4.5-rt3/kernel/up.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:71 @ EXPORT_SYMBOL(on_each_cpu_mask); * Preemption is disabled here to make sure the cond_func is called under the * same condtions in UP and SMP. */ -void on_each_cpu_cond_mask(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags, const struct cpumask *mask) +void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait, const struct cpumask *mask) { unsigned long flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:86 @ void on_each_cpu_cond_mask(bool (*cond_f } EXPORT_SYMBOL(on_each_cpu_cond_mask); -void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info), - smp_call_func_t func, void *info, bool wait, - gfp_t gfp_flags) +void on_each_cpu_cond(smp_cond_func_t cond_func, smp_call_func_t func, + void *info, bool wait) { - on_each_cpu_cond_mask(cond_func, func, info, wait, gfp_flags, NULL); + on_each_cpu_cond_mask(cond_func, func, info, wait, NULL); } EXPORT_SYMBOL(on_each_cpu_cond); Index: linux-5.4.5-rt3/kernel/workqueue.c =================================================================== --- linux-5.4.5-rt3.orig/kernel/workqueue.c +++ linux-5.4.5-rt3/kernel/workqueue.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:53 @ #include <linux/uaccess.h> #include <linux/sched/isolation.h> #include <linux/nmi.h> +#include <linux/swait.h> #include "workqueue_internal.h" @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:149 @ enum { /* struct worker is defined in workqueue_internal.h */ struct worker_pool { - spinlock_t lock; /* the pool lock */ + raw_spinlock_t lock; /* the pool lock */ int cpu; /* I: the associated cpu */ int node; /* I: the associated node ID */ int id; /* I: pool ID */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:304 @ static struct workqueue_attrs *wq_update static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */ static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */ -static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */ -static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */ +static DEFINE_RAW_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */ +static DECLARE_SWAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */ static LIST_HEAD(workqueues); /* PR: list of all workqueues */ static bool workqueue_freezing; /* PL: have wqs started freezing? */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:836 @ static struct worker *first_idle_worker( * Wake up the first idle worker of @pool. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void wake_up_worker(struct worker_pool *pool) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:889 @ void wq_worker_sleeping(struct task_stru return; worker->sleeping = 1; - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* * The counterpart of the following dec_and_test, implied mb, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:908 @ void wq_worker_sleeping(struct task_stru if (next) wake_up_process(next->task); } - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:919 @ void wq_worker_sleeping(struct task_stru * the scheduler to get a worker's last known identity. * * CONTEXT: - * spin_lock_irq(rq->lock) + * raw_spin_lock_irq(rq->lock) * * This function is called during schedule() when a kworker is going * to sleep. It's used by psi to identify aggregation workers during @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:950 @ work_func_t wq_worker_last_func(struct t * Set @flags in @worker->flags and adjust nr_running accordingly. * * CONTEXT: - * spin_lock_irq(pool->lock) + * raw_spin_lock_irq(pool->lock) */ static inline void worker_set_flags(struct worker *worker, unsigned int flags) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:975 @ static inline void worker_set_flags(stru * Clear @flags in @worker->flags and adjust nr_running accordingly. * * CONTEXT: - * spin_lock_irq(pool->lock) + * raw_spin_lock_irq(pool->lock) */ static inline void worker_clr_flags(struct worker *worker, unsigned int flags) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1023 @ static inline void worker_clr_flags(stru * actually occurs, it should be easy to locate the culprit work function. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). * * Return: * Pointer to worker which is executing @work if found, %NULL @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1058 @ static struct worker *find_worker_execut * nested inside outer list_for_each_entry_safe(). * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void move_linked_works(struct work_struct *work, struct list_head *head, struct work_struct **nextp) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1136 @ static void put_pwq_unlocked(struct pool * As both pwqs and pools are RCU protected, the * following lock operations are safe. */ - spin_lock_irq(&pwq->pool->lock); + raw_spin_lock_irq(&pwq->pool->lock); put_pwq(pwq); - spin_unlock_irq(&pwq->pool->lock); + raw_spin_unlock_irq(&pwq->pool->lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1171 @ static void pwq_activate_first_delayed(s * decrement nr_in_flight of its pwq and handle workqueue flushing. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1270 @ static int try_to_grab_pending(struct wo if (!pool) goto fail; - spin_lock(&pool->lock); + raw_spin_lock(&pool->lock); /* * work->data is guaranteed to point to pwq only while the work * item is queued on pwq->wq, and both updating work->data to point @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1299 @ static int try_to_grab_pending(struct wo /* work->data points to pwq iff queued, point to pool */ set_work_pool_and_keep_pending(work, pool->id); - spin_unlock(&pool->lock); + raw_spin_unlock(&pool->lock); rcu_read_unlock(); return 1; } - spin_unlock(&pool->lock); + raw_spin_unlock(&pool->lock); fail: rcu_read_unlock(); local_irq_restore(*flags); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1324 @ fail: * work_struct flags. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work, struct list_head *head, unsigned int extra_flags) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1439 @ retry: if (last_pool && last_pool != pwq->pool) { struct worker *worker; - spin_lock(&last_pool->lock); + raw_spin_lock(&last_pool->lock); worker = find_worker_executing_work(last_pool, work); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1447 @ retry: pwq = worker->current_pwq; } else { /* meh... not running there, queue here */ - spin_unlock(&last_pool->lock); - spin_lock(&pwq->pool->lock); + raw_spin_unlock(&last_pool->lock); + raw_spin_lock(&pwq->pool->lock); } } else { - spin_lock(&pwq->pool->lock); + raw_spin_lock(&pwq->pool->lock); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1464 @ retry: */ if (unlikely(!pwq->refcnt)) { if (wq->flags & WQ_UNBOUND) { - spin_unlock(&pwq->pool->lock); + raw_spin_unlock(&pwq->pool->lock); cpu_relax(); goto retry; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1496 @ retry: insert_work(pwq, work, worklist, work_flags); out: - spin_unlock(&pwq->pool->lock); + raw_spin_unlock(&pwq->pool->lock); rcu_read_unlock(); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1616 @ EXPORT_SYMBOL_GPL(queue_work_node); void delayed_work_timer_fn(struct timer_list *t) { struct delayed_work *dwork = from_timer(dwork, t, timer); + unsigned long flags; - /* should have been called from irqsafe timer with irq already off */ + local_irq_save(flags); __queue_work(dwork->cpu, dwork->wq, &dwork->work); + local_irq_restore(flags); } EXPORT_SYMBOL(delayed_work_timer_fn); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1767 @ EXPORT_SYMBOL(queue_rcu_work); * necessary. * * LOCKING: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void worker_enter_idle(struct worker *worker) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1807 @ static void worker_enter_idle(struct wor * @worker is leaving idle state. Update stats. * * LOCKING: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void worker_leave_idle(struct worker *worker) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1945 @ static struct worker *create_worker(stru worker_attach_to_pool(worker, pool); /* start the newly created worker */ - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); worker->pool->nr_workers++; worker_enter_idle(worker); wake_up_process(worker->task); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); return worker; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1968 @ fail: * be idle. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void destroy_worker(struct worker *worker) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1994 @ static void idle_worker_timeout(struct t { struct worker_pool *pool = from_timer(pool, t, idle_timer); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); while (too_many_workers(pool)) { struct worker *worker; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2012 @ static void idle_worker_timeout(struct t destroy_worker(worker); } - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } static void send_mayday(struct work_struct *work) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2043 @ static void pool_mayday_timeout(struct t struct worker_pool *pool = from_timer(pool, t, mayday_timer); struct work_struct *work; - spin_lock_irq(&pool->lock); - spin_lock(&wq_mayday_lock); /* for wq->maydays */ + raw_spin_lock_irq(&pool->lock); + raw_spin_lock(&wq_mayday_lock); /* for wq->maydays */ if (need_to_create_worker(pool)) { /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2057 @ static void pool_mayday_timeout(struct t send_mayday(work); } - spin_unlock(&wq_mayday_lock); - spin_unlock_irq(&pool->lock); + raw_spin_unlock(&wq_mayday_lock); + raw_spin_unlock_irq(&pool->lock); mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2077 @ static void pool_mayday_timeout(struct t * may_start_working() %true. * * LOCKING: - * spin_lock_irq(pool->lock) which may be released and regrabbed + * raw_spin_lock_irq(pool->lock) which may be released and regrabbed * multiple times. Does GFP_KERNEL allocations. Called only from * manager. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2086 @ __releases(&pool->lock) __acquires(&pool->lock) { restart: - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */ mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2102 @ restart: } del_timer_sync(&pool->mayday_timer); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* * This is necessary even after a new worker was just successfully * created as @pool->lock was dropped and the new worker might have @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2125 @ restart: * and may_start_working() is true. * * CONTEXT: - * spin_lock_irq(pool->lock) which may be released and regrabbed + * raw_spin_lock_irq(pool->lock) which may be released and regrabbed * multiple times. Does GFP_KERNEL allocations. * * Return: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2148 @ static bool manage_workers(struct worker pool->manager = NULL; pool->flags &= ~POOL_MANAGER_ACTIVE; - wake_up(&wq_manager_wait); + swake_up_one(&wq_manager_wait); return true; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2164 @ static bool manage_workers(struct worker * call this function to process a work. * * CONTEXT: - * spin_lock_irq(pool->lock) which is released and regrabbed. + * raw_spin_lock_irq(pool->lock) which is released and regrabbed. */ static void process_one_work(struct worker *worker, struct work_struct *work) __releases(&pool->lock) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2246 @ __acquires(&pool->lock) */ set_work_pool_and_clear_pending(work, pool->id); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); lock_map_acquire(&pwq->wq->lockdep_map); lock_map_acquire(&lockdep_map); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2292 @ __acquires(&pool->lock) } /* - * The following prevents a kworker from hogging CPU on !PREEMPT + * The following prevents a kworker from hogging CPU on !PREEMPTION * kernels, where a requeueing work item waiting for something to * happen could deadlock with stop_machine as such work item could * indefinitely requeue itself while all other CPUs are trapped in @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2301 @ __acquires(&pool->lock) */ cond_resched(); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* clear cpu intensive status */ if (unlikely(cpu_intensive)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2327 @ __acquires(&pool->lock) * fetches a work from the top and executes it. * * CONTEXT: - * spin_lock_irq(pool->lock) which may be released and regrabbed + * raw_spin_lock_irq(pool->lock) which may be released and regrabbed * multiple times. */ static void process_scheduled_works(struct worker *worker) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2369 @ static int worker_thread(void *__worker) /* tell the scheduler that this is a workqueue worker */ set_pf_worker(true); woke_up: - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* am I supposed to die? */ if (unlikely(worker->flags & WORKER_DIE)) { - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); WARN_ON_ONCE(!list_empty(&worker->entry)); set_pf_worker(false); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2439 @ sleep: */ worker_enter_idle(worker); __set_current_state(TASK_IDLE); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); schedule(); goto woke_up; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2493 @ repeat: should_stop = kthread_should_stop(); /* see whether any pwq is asking for help */ - spin_lock_irq(&wq_mayday_lock); + raw_spin_lock_irq(&wq_mayday_lock); while (!list_empty(&wq->maydays)) { struct pool_workqueue *pwq = list_first_entry(&wq->maydays, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2505 @ repeat: __set_current_state(TASK_RUNNING); list_del_init(&pwq->mayday_node); - spin_unlock_irq(&wq_mayday_lock); + raw_spin_unlock_irq(&wq_mayday_lock); worker_attach_to_pool(rescuer, pool); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* * Slurp in all works issued via this workqueue and @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2538 @ repeat: * incur MAYDAY_INTERVAL delay inbetween. */ if (need_to_create_worker(pool)) { - spin_lock(&wq_mayday_lock); + raw_spin_lock(&wq_mayday_lock); /* * Queue iff we aren't racing destruction * and somebody else hasn't queued it already. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2547 @ repeat: get_pwq(pwq); list_add_tail(&pwq->mayday_node, &wq->maydays); } - spin_unlock(&wq_mayday_lock); + raw_spin_unlock(&wq_mayday_lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2565 @ repeat: if (need_more_worker(pool)) wake_up_worker(pool); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); worker_detach_from_pool(rescuer); - spin_lock_irq(&wq_mayday_lock); + raw_spin_lock_irq(&wq_mayday_lock); } - spin_unlock_irq(&wq_mayday_lock); + raw_spin_unlock_irq(&wq_mayday_lock); if (should_stop) { __set_current_state(TASK_RUNNING); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2652 @ static void wq_barrier_func(struct work_ * underneath us, so we can't reliably determine pwq from @target. * * CONTEXT: - * spin_lock_irq(pool->lock). + * raw_spin_lock_irq(pool->lock). */ static void insert_wq_barrier(struct pool_workqueue *pwq, struct wq_barrier *barr, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2739 @ static bool flush_workqueue_prep_pwqs(st for_each_pwq(pwq, wq) { struct worker_pool *pool = pwq->pool; - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); if (flush_color >= 0) { WARN_ON_ONCE(pwq->flush_color != -1); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2756 @ static bool flush_workqueue_prep_pwqs(st pwq->work_color = work_color; } - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } if (flush_color >= 0 && atomic_dec_and_test(&wq->nr_pwqs_to_flush)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2956 @ reflush: for_each_pwq(pwq, wq) { bool drained; - spin_lock_irq(&pwq->pool->lock); + raw_spin_lock_irq(&pwq->pool->lock); drained = !pwq->nr_active && list_empty(&pwq->delayed_works); - spin_unlock_irq(&pwq->pool->lock); + raw_spin_unlock_irq(&pwq->pool->lock); if (drained) continue; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2994 @ static bool start_flush_work(struct work return false; } - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* see the comment in try_to_grab_pending() with the same code */ pwq = get_work_pwq(work); if (pwq) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3010 @ static bool start_flush_work(struct work check_flush_dependency(pwq->wq, work); insert_wq_barrier(pwq, barr, work, worker); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); /* * Force a lock recursion deadlock when using flush_work() inside a @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3029 @ static bool start_flush_work(struct work rcu_read_unlock(); return true; already_gone: - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); rcu_read_unlock(); return false; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3422 @ static bool wqattrs_equal(const struct w */ static int init_worker_pool(struct worker_pool *pool) { - spin_lock_init(&pool->lock); + raw_spin_lock_init(&pool->lock); pool->id = -1; pool->cpu = -1; pool->node = NUMA_NO_NODE; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3548 @ static void put_unbound_pool(struct work * @pool's workers from blocking on attach_mutex. We're the last * manager and @pool gets freed with the flag set. */ - spin_lock_irq(&pool->lock); - wait_event_lock_irq(wq_manager_wait, + raw_spin_lock_irq(&pool->lock); + swait_event_lock_irq(wq_manager_wait, !(pool->flags & POOL_MANAGER_ACTIVE), pool->lock); pool->flags |= POOL_MANAGER_ACTIVE; while ((worker = first_idle_worker(pool))) destroy_worker(worker); WARN_ON(pool->nr_workers || pool->nr_idle); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); mutex_lock(&wq_pool_attach_mutex); if (!list_empty(&pool->workers)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3712 @ static void pwq_adjust_max_active(struct return; /* this function can be called during early boot w/ irq disabled */ - spin_lock_irqsave(&pwq->pool->lock, flags); + raw_spin_lock_irqsave(&pwq->pool->lock, flags); /* * During [un]freezing, the caller is responsible for ensuring that @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3735 @ static void pwq_adjust_max_active(struct pwq->max_active = 0; } - spin_unlock_irqrestore(&pwq->pool->lock, flags); + raw_spin_unlock_irqrestore(&pwq->pool->lock, flags); } /* initialize newly alloced @pwq which is associated with @wq and @pool */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4137 @ static void wq_update_unbound_numa(struc use_dfl_pwq: mutex_lock(&wq->mutex); - spin_lock_irq(&wq->dfl_pwq->pool->lock); + raw_spin_lock_irq(&wq->dfl_pwq->pool->lock); get_pwq(wq->dfl_pwq); - spin_unlock_irq(&wq->dfl_pwq->pool->lock); + raw_spin_unlock_irq(&wq->dfl_pwq->pool->lock); old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq); out_unlock: mutex_unlock(&wq->mutex); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4352 @ void destroy_workqueue(struct workqueue_ struct worker *rescuer = wq->rescuer; /* this prevents new queueing */ - spin_lock_irq(&wq_mayday_lock); + raw_spin_lock_irq(&wq_mayday_lock); wq->rescuer = NULL; - spin_unlock_irq(&wq_mayday_lock); + raw_spin_unlock_irq(&wq_mayday_lock); /* rescuer will empty maydays list before exiting */ kthread_stop(rescuer->task); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4550 @ unsigned int work_busy(struct work_struc rcu_read_lock(); pool = get_work_pool(work); if (pool) { - spin_lock_irqsave(&pool->lock, flags); + raw_spin_lock_irqsave(&pool->lock, flags); if (find_worker_executing_work(pool, work)) ret |= WORK_BUSY_RUNNING; - spin_unlock_irqrestore(&pool->lock, flags); + raw_spin_unlock_irqrestore(&pool->lock, flags); } rcu_read_unlock(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4760 @ void show_workqueue_state(void) pr_info("workqueue %s: flags=0x%x\n", wq->name, wq->flags); for_each_pwq(pwq, wq) { - spin_lock_irqsave(&pwq->pool->lock, flags); + raw_spin_lock_irqsave(&pwq->pool->lock, flags); if (pwq->nr_active || !list_empty(&pwq->delayed_works)) show_pwq(pwq); - spin_unlock_irqrestore(&pwq->pool->lock, flags); + raw_spin_unlock_irqrestore(&pwq->pool->lock, flags); /* * We could be printing a lot from atomic context, e.g. * sysrq-t -> show_workqueue_state(). Avoid triggering @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4777 @ void show_workqueue_state(void) struct worker *worker; bool first = true; - spin_lock_irqsave(&pool->lock, flags); + raw_spin_lock_irqsave(&pool->lock, flags); if (pool->nr_workers == pool->nr_idle) goto next_pool; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4796 @ void show_workqueue_state(void) } pr_cont("\n"); next_pool: - spin_unlock_irqrestore(&pool->lock, flags); + raw_spin_unlock_irqrestore(&pool->lock, flags); /* * We could be printing a lot from atomic context, e.g. * sysrq-t -> show_workqueue_state(). Avoid triggering @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4826 @ void wq_worker_comm(char *buf, size_t si struct worker_pool *pool = worker->pool; if (pool) { - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* * ->desc tracks information (wq name or * set_worker_desc()) for the latest execution. If @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4840 @ void wq_worker_comm(char *buf, size_t si scnprintf(buf + off, size - off, "-%s", worker->desc); } - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4871 @ static void unbind_workers(int cpu) for_each_cpu_worker_pool(pool, cpu) { mutex_lock(&wq_pool_attach_mutex); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); /* * We've blocked all attach/detach operations. Make all workers @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4885 @ static void unbind_workers(int cpu) pool->flags |= POOL_DISASSOCIATED; - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); mutex_unlock(&wq_pool_attach_mutex); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4911 @ static void unbind_workers(int cpu) * worker blocking could lead to lengthy stalls. Kick off * unbound chain execution of currently pending work items. */ - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); wake_up_worker(pool); - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4940 @ static void rebind_workers(struct worker WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0); - spin_lock_irq(&pool->lock); + raw_spin_lock_irq(&pool->lock); pool->flags &= ~POOL_DISASSOCIATED; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4979 @ static void rebind_workers(struct worker WRITE_ONCE(worker->flags, worker_flags); } - spin_unlock_irq(&pool->lock); + raw_spin_unlock_irq(&pool->lock); } /** Index: linux-5.4.5-rt3/lib/Kconfig.debug =================================================================== --- linux-5.4.5-rt3.orig/lib/Kconfig.debug +++ linux-5.4.5-rt3/lib/Kconfig.debug @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ config CONSOLE_LOGLEVEL_QUIET will be used as the loglevel. IOW passing "quiet" will be the equivalent of passing "loglevel=<CONSOLE_LOGLEVEL_QUIET>" +config CONSOLE_LOGLEVEL_EMERGENCY + int "Emergency console loglevel (1-15)" + range 1 15 + default "5" + help + The loglevel to determine if a console message is an emergency + message. + + If supported by the console driver, emergency messages will be + flushed to the console immediately. This can cause significant system + latencies so the value should be set such that only significant + messages are classified as emergency messages. + + Setting a default here is equivalent to passing in + emergency_loglevel=<x> in the kernel bootargs. emergency_loglevel=<x> + continues to override whatever value is specified here as well. + config MESSAGE_LOGLEVEL_DEFAULT int "Default message log level (1-7)" range 1 7 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1073 @ config DEBUG_TIMEKEEPING config DEBUG_PREEMPT bool "Debug preemptible kernel" - depends on DEBUG_KERNEL && PREEMPT && TRACE_IRQFLAGS_SUPPORT + depends on DEBUG_KERNEL && PREEMPTION && TRACE_IRQFLAGS_SUPPORT default y help If you say Y here then the kernel will use a debug variant of the @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1251 @ config DEBUG_ATOMIC_SLEEP config DEBUG_LOCKING_API_SELFTESTS bool "Locking API boot-time self-tests" - depends on DEBUG_KERNEL + depends on DEBUG_KERNEL && !PREEMPT_RT help Say Y here if you want the kernel to run a short self-test during bootup. The self-test checks whether common types of locking bugs Index: linux-5.4.5-rt3/lib/Makefile =================================================================== --- linux-5.4.5-rt3.orig/lib/Makefile +++ linux-5.4.5-rt3/lib/Makefile @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:29 @ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o timerqueue.o xarray.o \ - idr.o extable.o \ + idr.o extable.o printk_ringbuffer.o \ sha1.o chacha.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ Index: linux-5.4.5-rt3/lib/bust_spinlocks.c =================================================================== --- linux-5.4.5-rt3.orig/lib/bust_spinlocks.c +++ linux-5.4.5-rt3/lib/bust_spinlocks.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:29 @ void bust_spinlocks(int yes) unblank_screen(); #endif console_unblank(); - if (--oops_in_progress == 0) - wake_up_klogd(); + --oops_in_progress; } } Index: linux-5.4.5-rt3/lib/debugobjects.c =================================================================== --- linux-5.4.5-rt3.orig/lib/debugobjects.c +++ linux-5.4.5-rt3/lib/debugobjects.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:536 @ __debug_object_init(void *addr, struct d struct debug_obj *obj; unsigned long flags; - fill_pool(); +#ifdef CONFIG_PREEMPT_RT + if (preempt_count() == 0 && !irqs_disabled()) +#endif + fill_pool(); db = get_bucket((unsigned long) addr); Index: linux-5.4.5-rt3/lib/irq_poll.c =================================================================== --- linux-5.4.5-rt3.orig/lib/irq_poll.c +++ linux-5.4.5-rt3/lib/irq_poll.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:40 @ void irq_poll_sched(struct irq_poll *iop list_add_tail(&iop->list, this_cpu_ptr(&blk_cpu_iopoll)); raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); local_irq_restore(flags); + preempt_check_resched_rt(); } EXPORT_SYMBOL(irq_poll_sched); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:76 @ void irq_poll_complete(struct irq_poll * local_irq_save(flags); __irq_poll_complete(iop); local_irq_restore(flags); + preempt_check_resched_rt(); } EXPORT_SYMBOL(irq_poll_complete); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:101 @ static void __latent_entropy irq_poll_so } local_irq_enable(); + preempt_check_resched_rt(); /* Even though interrupts have been re-enabled, this * access is safe because interrupts can only add new @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:139 @ static void __latent_entropy irq_poll_so __raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); local_irq_enable(); + preempt_check_resched_rt(); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:203 @ static int irq_poll_cpu_dead(unsigned in this_cpu_ptr(&blk_cpu_iopoll)); __raise_softirq_irqoff(IRQ_POLL_SOFTIRQ); local_irq_enable(); + preempt_check_resched_rt(); return 0; } Index: linux-5.4.5-rt3/lib/locking-selftest.c =================================================================== --- linux-5.4.5-rt3.orig/lib/locking-selftest.c +++ linux-5.4.5-rt3/lib/locking-selftest.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:745 @ GENERATE_TESTCASE(init_held_rtmutex); #include "locking-selftest-spin-hardirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_spin) +#ifndef CONFIG_PREEMPT_RT + #include "locking-selftest-rlock-hardirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_hard_rlock) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:762 @ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_ #include "locking-selftest-wlock-softirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe1_soft_wlock) +#endif + #undef E1 #undef E2 +#ifndef CONFIG_PREEMPT_RT /* * Enabling hardirqs with a softirq-safe lock held: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:800 @ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A #undef E1 #undef E2 +#endif + /* * Enabling irqs with an irq-safe lock held: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:825 @ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2A #include "locking-selftest-spin-hardirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_spin) +#ifndef CONFIG_PREEMPT_RT + #include "locking-selftest-rlock-hardirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_hard_rlock) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:842 @ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B #include "locking-selftest-wlock-softirq.h" GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B_soft_wlock) +#endif + #undef E1 #undef E2 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:875 @ GENERATE_PERMUTATIONS_2_EVENTS(irqsafe2B #include "locking-selftest-spin-hardirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_spin) +#ifndef CONFIG_PREEMPT_RT + #include "locking-selftest-rlock-hardirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_hard_rlock) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:892 @ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_ #include "locking-selftest-wlock-softirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_soft_wlock) +#endif + #undef E1 #undef E2 #undef E3 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:927 @ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe3_ #include "locking-selftest-spin-hardirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_spin) +#ifndef CONFIG_PREEMPT_RT + #include "locking-selftest-rlock-hardirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_hard_rlock) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:944 @ GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_ #include "locking-selftest-wlock-softirq.h" GENERATE_PERMUTATIONS_3_EVENTS(irqsafe4_soft_wlock) +#endif + #undef E1 #undef E2 #undef E3 +#ifndef CONFIG_PREEMPT_RT + /* * read-lock / write-lock irq inversion. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1014 @ GENERATE_PERMUTATIONS_3_EVENTS(irq_inver #undef E2 #undef E3 +#endif + +#ifndef CONFIG_PREEMPT_RT + /* * read-lock / write-lock recursion that is actually safe. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1056 @ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_ #undef E2 #undef E3 +#endif + /* * read-lock / write-lock recursion that is unsafe. */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2088 @ void locking_selftest(void) printk(" --------------------------------------------------------------------------\n"); +#ifndef CONFIG_PREEMPT_RT /* * irq-context testcases: */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2101 @ void locking_selftest(void) DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion); // DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2); +#else + /* On -rt, we only do hardirq context test for raw spinlock */ + DO_TESTCASE_1B("hard-irqs-on + irq-safe-A", irqsafe1_hard_spin, 12); + DO_TESTCASE_1B("hard-irqs-on + irq-safe-A", irqsafe1_hard_spin, 21); + + DO_TESTCASE_1B("hard-safe-A + irqs-on", irqsafe2B_hard_spin, 12); + DO_TESTCASE_1B("hard-safe-A + irqs-on", irqsafe2B_hard_spin, 21); + + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 123); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 132); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 213); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 231); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 312); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #1", irqsafe3_hard_spin, 321); + + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 123); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 132); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 213); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 231); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 312); + DO_TESTCASE_1B("hard-safe-A + unsafe-B #2", irqsafe4_hard_spin, 321); +#endif ww_tests(); Index: linux-5.4.5-rt3/lib/nmi_backtrace.c =================================================================== --- linux-5.4.5-rt3.orig/lib/nmi_backtrace.c +++ linux-5.4.5-rt3/lib/nmi_backtrace.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:78 @ void nmi_trigger_cpumask_backtrace(const touch_softlockup_watchdog(); } - /* - * Force flush any remote buffers that might be stuck in IRQ context - * and therefore could not run their irq_work. - */ - printk_safe_flush(); - clear_bit_unlock(0, &backtrace_flag); put_cpu(); } Index: linux-5.4.5-rt3/lib/printk_ringbuffer.c =================================================================== --- /dev/null +++ linux-5.4.5-rt3/lib/printk_ringbuffer.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4 @ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/sched.h> +#include <linux/smp.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/printk_ringbuffer.h> + +#define PRB_SIZE(rb) (1 << rb->size_bits) +#define PRB_SIZE_BITMASK(rb) (PRB_SIZE(rb) - 1) +#define PRB_INDEX(rb, lpos) (lpos & PRB_SIZE_BITMASK(rb)) +#define PRB_WRAPS(rb, lpos) (lpos >> rb->size_bits) +#define PRB_WRAP_LPOS(rb, lpos, xtra) \ + ((PRB_WRAPS(rb, lpos) + xtra) << rb->size_bits) +#define PRB_DATA_SIZE(e) (e->size - sizeof(struct prb_entry)) +#define PRB_DATA_ALIGN sizeof(long) + +static bool __prb_trylock(struct prb_cpulock *cpu_lock, + unsigned int *cpu_store) +{ + unsigned long *flags; + unsigned int cpu; + + cpu = get_cpu(); + + *cpu_store = atomic_read(&cpu_lock->owner); + /* memory barrier to ensure the current lock owner is visible */ + smp_rmb(); + if (*cpu_store == -1) { + flags = per_cpu_ptr(cpu_lock->irqflags, cpu); + local_irq_save(*flags); + if (atomic_try_cmpxchg_acquire(&cpu_lock->owner, + cpu_store, cpu)) { + return true; + } + local_irq_restore(*flags); + } else if (*cpu_store == cpu) { + return true; + } + + put_cpu(); + return false; +} + +/* + * prb_lock: Perform a processor-reentrant spin lock. + * @cpu_lock: A pointer to the lock object. + * @cpu_store: A "flags" pointer to store lock status information. + * + * If no processor has the lock, the calling processor takes the lock and + * becomes the owner. If the calling processor is already the owner of the + * lock, this function succeeds immediately. If lock is locked by another + * processor, this function spins until the calling processor becomes the + * owner. + * + * It is safe to call this function from any context and state. + */ +void prb_lock(struct prb_cpulock *cpu_lock, unsigned int *cpu_store) +{ + for (;;) { + if (__prb_trylock(cpu_lock, cpu_store)) + break; + cpu_relax(); + } +} + +/* + * prb_unlock: Perform a processor-reentrant spin unlock. + * @cpu_lock: A pointer to the lock object. + * @cpu_store: A "flags" object storing lock status information. + * + * Release the lock. The calling processor must be the owner of the lock. + * + * It is safe to call this function from any context and state. + */ +void prb_unlock(struct prb_cpulock *cpu_lock, unsigned int cpu_store) +{ + unsigned long *flags; + unsigned int cpu; + + cpu = atomic_read(&cpu_lock->owner); + atomic_set_release(&cpu_lock->owner, cpu_store); + + if (cpu_store == -1) { + flags = per_cpu_ptr(cpu_lock->irqflags, cpu); + local_irq_restore(*flags); + } + + put_cpu(); +} + +static struct prb_entry *to_entry(struct printk_ringbuffer *rb, + unsigned long lpos) +{ + char *buffer = rb->buffer; + buffer += PRB_INDEX(rb, lpos); + return (struct prb_entry *)buffer; +} + +static int calc_next(struct printk_ringbuffer *rb, unsigned long tail, + unsigned long lpos, int size, unsigned long *calced_next) +{ + unsigned long next_lpos; + int ret = 0; +again: + next_lpos = lpos + size; + if (next_lpos - tail > PRB_SIZE(rb)) + return -1; + + if (PRB_WRAPS(rb, lpos) != PRB_WRAPS(rb, next_lpos)) { + lpos = PRB_WRAP_LPOS(rb, next_lpos, 0); + ret |= 1; + goto again; + } + + *calced_next = next_lpos; + return ret; +} + +static bool push_tail(struct printk_ringbuffer *rb, unsigned long tail) +{ + unsigned long new_tail; + struct prb_entry *e; + unsigned long head; + + if (tail != atomic_long_read(&rb->tail)) + return true; + + e = to_entry(rb, tail); + if (e->size != -1) + new_tail = tail + e->size; + else + new_tail = PRB_WRAP_LPOS(rb, tail, 1); + + /* make sure the new tail does not overtake the head */ + head = atomic_long_read(&rb->head); + if (head - new_tail > PRB_SIZE(rb)) + return false; + + atomic_long_cmpxchg(&rb->tail, tail, new_tail); + return true; +} + +/* + * prb_commit: Commit a reserved entry to the ring buffer. + * @h: An entry handle referencing the data entry to commit. + * + * Commit data that has been reserved using prb_reserve(). Once the data + * block has been committed, it can be invalidated at any time. If a writer + * is interested in using the data after committing, the writer should make + * its own copy first or use the prb_iter_ reader functions to access the + * data in the ring buffer. + * + * It is safe to call this function from any context and state. + */ +void prb_commit(struct prb_handle *h) +{ + struct printk_ringbuffer *rb = h->rb; + bool changed = false; + struct prb_entry *e; + unsigned long head; + unsigned long res; + + for (;;) { + if (atomic_read(&rb->ctx) != 1) { + /* the interrupted context will fixup head */ + atomic_dec(&rb->ctx); + break; + } + /* assign sequence numbers before moving head */ + head = atomic_long_read(&rb->head); + res = atomic_long_read(&rb->reserve); + while (head != res) { + e = to_entry(rb, head); + if (e->size == -1) { + head = PRB_WRAP_LPOS(rb, head, 1); + continue; + } + while (atomic_long_read(&rb->lost)) { + atomic_long_dec(&rb->lost); + rb->seq++; + } + e->seq = ++rb->seq; + head += e->size; + changed = true; + } + atomic_long_set_release(&rb->head, res); + + atomic_dec(&rb->ctx); + + if (atomic_long_read(&rb->reserve) == res) + break; + atomic_inc(&rb->ctx); + } + + prb_unlock(rb->cpulock, h->cpu); + + if (changed) { + atomic_long_inc(&rb->wq_counter); + if (wq_has_sleeper(rb->wq)) { +#ifdef CONFIG_IRQ_WORK + irq_work_queue(rb->wq_work); +#else + if (!in_nmi()) + wake_up_interruptible_all(rb->wq); +#endif + } + } +} + +/* + * prb_reserve: Reserve an entry within a ring buffer. + * @h: An entry handle to be setup and reference an entry. + * @rb: A ring buffer to reserve data within. + * @size: The number of bytes to reserve. + * + * Reserve an entry of at least @size bytes to be used by the caller. If + * successful, the data region of the entry belongs to the caller and cannot + * be invalidated by any other task/context. For this reason, the caller + * should call prb_commit() as quickly as possible in order to avoid preventing + * other tasks/contexts from reserving data in the case that the ring buffer + * has wrapped. + * + * It is safe to call this function from any context and state. + * + * Returns a pointer to the reserved entry (and @h is setup to reference that + * entry) or NULL if it was not possible to reserve data. + */ +char *prb_reserve(struct prb_handle *h, struct printk_ringbuffer *rb, + unsigned int size) +{ + unsigned long tail, res1, res2; + int ret; + + if (size == 0) + return NULL; + size += sizeof(struct prb_entry); + size += PRB_DATA_ALIGN - 1; + size &= ~(PRB_DATA_ALIGN - 1); + if (size >= PRB_SIZE(rb)) + return NULL; + + h->rb = rb; + prb_lock(rb->cpulock, &h->cpu); + + atomic_inc(&rb->ctx); + + do { + for (;;) { + tail = atomic_long_read(&rb->tail); + res1 = atomic_long_read(&rb->reserve); + ret = calc_next(rb, tail, res1, size, &res2); + if (ret >= 0) + break; + if (!push_tail(rb, tail)) { + prb_commit(h); + return NULL; + } + } + } while (!atomic_long_try_cmpxchg_acquire(&rb->reserve, &res1, res2)); + + h->entry = to_entry(rb, res1); + + if (ret) { + /* handle wrap */ + h->entry->size = -1; + h->entry = to_entry(rb, PRB_WRAP_LPOS(rb, res2, 0)); + } + + h->entry->size = size; + + return &h->entry->data[0]; +} + +/* + * prb_iter_copy: Copy an iterator. + * @dest: The iterator to copy to. + * @src: The iterator to copy from. + * + * Make a deep copy of an iterator. This is particularly useful for making + * backup copies of an iterator in case a form of rewinding it needed. + * + * It is safe to call this function from any context and state. But + * note that this function is not atomic. Callers should not make copies + * to/from iterators that can be accessed by other tasks/contexts. + */ +void prb_iter_copy(struct prb_iterator *dest, struct prb_iterator *src) +{ + memcpy(dest, src, sizeof(*dest)); +} + +/* + * prb_iter_init: Initialize an iterator for a ring buffer. + * @iter: The iterator to initialize. + * @rb: A ring buffer to that @iter should iterate. + * @seq: The sequence number of the position preceding the first record. + * May be NULL. + * + * Initialize an iterator to be used with a specified ring buffer. If @seq + * is non-NULL, it will be set such that prb_iter_next() will provide a + * sequence value of "@seq + 1" if no records were missed. + * + * It is safe to call this function from any context and state. + */ +void prb_iter_init(struct prb_iterator *iter, struct printk_ringbuffer *rb, + u64 *seq) +{ + memset(iter, 0, sizeof(*iter)); + iter->rb = rb; + iter->lpos = PRB_INIT; + + if (!seq) + return; + + for (;;) { + struct prb_iterator tmp_iter; + int ret; + + prb_iter_copy(&tmp_iter, iter); + + ret = prb_iter_next(&tmp_iter, NULL, 0, seq); + if (ret < 0) + continue; + + if (ret == 0) + *seq = 0; + else + (*seq)--; + break; + } +} + +static bool is_valid(struct printk_ringbuffer *rb, unsigned long lpos) +{ + unsigned long head, tail; + + tail = atomic_long_read(&rb->tail); + head = atomic_long_read(&rb->head); + head -= tail; + lpos -= tail; + + if (lpos >= head) + return false; + return true; +} + +/* + * prb_iter_data: Retrieve the record data at the current position. + * @iter: Iterator tracking the current position. + * @buf: A buffer to store the data of the record. May be NULL. + * @size: The size of @buf. (Ignored if @buf is NULL.) + * @seq: The sequence number of the record. May be NULL. + * + * If @iter is at a record, provide the data and/or sequence number of that + * record (if specified by the caller). + * + * It is safe to call this function from any context and state. + * + * Returns >=0 if the current record contains valid data (returns 0 if @buf + * is NULL or returns the size of the data block if @buf is non-NULL) or + * -EINVAL if @iter is now invalid. + */ +int prb_iter_data(struct prb_iterator *iter, char *buf, int size, u64 *seq) +{ + struct printk_ringbuffer *rb = iter->rb; + unsigned long lpos = iter->lpos; + unsigned int datsize = 0; + struct prb_entry *e; + + if (buf || seq) { + e = to_entry(rb, lpos); + if (!is_valid(rb, lpos)) + return -EINVAL; + /* memory barrier to ensure valid lpos */ + smp_rmb(); + if (buf) { + datsize = PRB_DATA_SIZE(e); + /* memory barrier to ensure load of datsize */ + smp_rmb(); + if (!is_valid(rb, lpos)) + return -EINVAL; + if (PRB_INDEX(rb, lpos) + datsize > + PRB_SIZE(rb) - PRB_DATA_ALIGN) { + return -EINVAL; + } + if (size > datsize) + size = datsize; + memcpy(buf, &e->data[0], size); + } + if (seq) + *seq = e->seq; + /* memory barrier to ensure loads of entry data */ + smp_rmb(); + } + + if (!is_valid(rb, lpos)) + return -EINVAL; + + return datsize; +} + +/* + * prb_iter_next: Advance to the next record. + * @iter: Iterator tracking the current position. + * @buf: A buffer to store the data of the next record. May be NULL. + * @size: The size of @buf. (Ignored if @buf is NULL.) + * @seq: The sequence number of the next record. May be NULL. + * + * If a next record is available, @iter is advanced and (if specified) + * the data and/or sequence number of that record are provided. + * + * It is safe to call this function from any context and state. + * + * Returns 1 if @iter was advanced, 0 if @iter is at the end of the list, or + * -EINVAL if @iter is now invalid. + */ +int prb_iter_next(struct prb_iterator *iter, char *buf, int size, u64 *seq) +{ + struct printk_ringbuffer *rb = iter->rb; + unsigned long next_lpos; + struct prb_entry *e; + unsigned int esize; + + if (iter->lpos == PRB_INIT) { + next_lpos = atomic_long_read(&rb->tail); + } else { + if (!is_valid(rb, iter->lpos)) + return -EINVAL; + /* memory barrier to ensure valid lpos */ + smp_rmb(); + e = to_entry(rb, iter->lpos); + esize = e->size; + /* memory barrier to ensure load of size */ + smp_rmb(); + if (!is_valid(rb, iter->lpos)) + return -EINVAL; + next_lpos = iter->lpos + esize; + } + if (next_lpos == atomic_long_read(&rb->head)) + return 0; + if (!is_valid(rb, next_lpos)) + return -EINVAL; + /* memory barrier to ensure valid lpos */ + smp_rmb(); + + iter->lpos = next_lpos; + e = to_entry(rb, iter->lpos); + esize = e->size; + /* memory barrier to ensure load of size */ + smp_rmb(); + if (!is_valid(rb, iter->lpos)) + return -EINVAL; + if (esize == -1) + iter->lpos = PRB_WRAP_LPOS(rb, iter->lpos, 1); + + if (prb_iter_data(iter, buf, size, seq) < 0) + return -EINVAL; + + return 1; +} + +/* + * prb_iter_wait_next: Advance to the next record, blocking if none available. + * @iter: Iterator tracking the current position. + * @buf: A buffer to store the data of the next record. May be NULL. + * @size: The size of @buf. (Ignored if @buf is NULL.) + * @seq: The sequence number of the next record. May be NULL. + * + * If a next record is already available, this function works like + * prb_iter_next(). Otherwise block interruptible until a next record is + * available. + * + * When a next record is available, @iter is advanced and (if specified) + * the data and/or sequence number of that record are provided. + * + * This function might sleep. + * + * Returns 1 if @iter was advanced, -EINVAL if @iter is now invalid, or + * -ERESTARTSYS if interrupted by a signal. + */ +int prb_iter_wait_next(struct prb_iterator *iter, char *buf, int size, u64 *seq) +{ + unsigned long last_seen; + int ret; + + for (;;) { + last_seen = atomic_long_read(&iter->rb->wq_counter); + + ret = prb_iter_next(iter, buf, size, seq); + if (ret != 0) + break; + + ret = wait_event_interruptible(*iter->rb->wq, + last_seen != atomic_long_read(&iter->rb->wq_counter)); + if (ret < 0) + break; + } + + return ret; +} + +/* + * prb_iter_seek: Seek forward to a specific record. + * @iter: Iterator to advance. + * @seq: Record number to advance to. + * + * Advance @iter such that a following call to prb_iter_data() will provide + * the contents of the specified record. If a record is specified that does + * not yet exist, advance @iter to the end of the record list. + * + * Note that iterators cannot be rewound. So if a record is requested that + * exists but is previous to @iter in position, @iter is considered invalid. + * + * It is safe to call this function from any context and state. + * + * Returns 1 on succces, 0 if specified record does not yet exist (@iter is + * now at the end of the list), or -EINVAL if @iter is now invalid. + */ +int prb_iter_seek(struct prb_iterator *iter, u64 seq) +{ + u64 cur_seq; + int ret; + + /* first check if the iterator is already at the wanted seq */ + if (seq == 0) { + if (iter->lpos == PRB_INIT) + return 1; + else + return -EINVAL; + } + if (iter->lpos != PRB_INIT) { + if (prb_iter_data(iter, NULL, 0, &cur_seq) >= 0) { + if (cur_seq == seq) + return 1; + if (cur_seq > seq) + return -EINVAL; + } + } + + /* iterate to find the wanted seq */ + for (;;) { + ret = prb_iter_next(iter, NULL, 0, &cur_seq); + if (ret <= 0) + break; + + if (cur_seq == seq) + break; + + if (cur_seq > seq) { + ret = -EINVAL; + break; + } + } + + return ret; +} + +/* + * prb_buffer_size: Get the size of the ring buffer. + * @rb: The ring buffer to get the size of. + * + * Return the number of bytes used for the ring buffer entry storage area. + * Note that this area stores both entry header and entry data. Therefore + * this represents an upper bound to the amount of data that can be stored + * in the ring buffer. + * + * It is safe to call this function from any context and state. + * + * Returns the size in bytes of the entry storage area. + */ +int prb_buffer_size(struct printk_ringbuffer *rb) +{ + return PRB_SIZE(rb); +} + +/* + * prb_inc_lost: Increment the seq counter to signal a lost record. + * @rb: The ring buffer to increment the seq of. + * + * Increment the seq counter so that a seq number is intentially missing + * for the readers. This allows readers to identify that a record is + * missing. A writer will typically use this function if prb_reserve() + * fails. + * + * It is safe to call this function from any context and state. + */ +void prb_inc_lost(struct printk_ringbuffer *rb) +{ + atomic_long_inc(&rb->lost); +} Index: linux-5.4.5-rt3/lib/radix-tree.c =================================================================== --- linux-5.4.5-rt3.orig/lib/radix-tree.c +++ linux-5.4.5-rt3/lib/radix-tree.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:29 @ #include <linux/slab.h> #include <linux/string.h> #include <linux/xarray.h> - +#include <linux/locallock.h> /* * Radix tree node cache. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:75 @ struct radix_tree_preload { struct radix_tree_node *nodes; }; static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, }; +static DEFINE_LOCAL_IRQ_LOCK(radix_tree_preloads_lock); static inline struct radix_tree_node *entry_to_node(void *ptr) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:273 @ radix_tree_node_alloc(gfp_t gfp_mask, st * succeed in getting a node here (and never reach * kmem_cache_alloc) */ - rtp = this_cpu_ptr(&radix_tree_preloads); + rtp = &get_locked_var(radix_tree_preloads_lock, radix_tree_preloads); if (rtp->nr) { ret = rtp->nodes; rtp->nodes = ret->parent; rtp->nr--; } + put_locked_var(radix_tree_preloads_lock, radix_tree_preloads); /* * Update the allocation stack trace as this is more useful * for debugging. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:345 @ static __must_check int __radix_tree_pre */ gfp_mask &= ~__GFP_ACCOUNT; - preempt_disable(); + local_lock(radix_tree_preloads_lock); rtp = this_cpu_ptr(&radix_tree_preloads); while (rtp->nr < nr) { - preempt_enable(); + local_unlock(radix_tree_preloads_lock); node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask); if (node == NULL) goto out; - preempt_disable(); + local_lock(radix_tree_preloads_lock); rtp = this_cpu_ptr(&radix_tree_preloads); if (rtp->nr < nr) { node->parent = rtp->nodes; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:394 @ int radix_tree_maybe_preload(gfp_t gfp_m if (gfpflags_allow_blocking(gfp_mask)) return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE); /* Preloading doesn't help anything with this gfp mask, skip it */ - preempt_disable(); + local_lock(radix_tree_preloads_lock); return 0; } EXPORT_SYMBOL(radix_tree_maybe_preload); +void radix_tree_preload_end(void) +{ + local_unlock(radix_tree_preloads_lock); +} +EXPORT_SYMBOL(radix_tree_preload_end); + static unsigned radix_tree_load_root(const struct radix_tree_root *root, struct radix_tree_node **nodep, unsigned long *maxindex) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1489 @ EXPORT_SYMBOL(radix_tree_tagged); void idr_preload(gfp_t gfp_mask) { if (__radix_tree_preload(gfp_mask, IDR_PRELOAD_SIZE)) - preempt_disable(); + local_lock(radix_tree_preloads_lock); } EXPORT_SYMBOL(idr_preload); +void idr_preload_end(void) +{ + local_unlock(radix_tree_preloads_lock); +} +EXPORT_SYMBOL(idr_preload_end); + void __rcu **idr_get_free(struct radix_tree_root *root, struct radix_tree_iter *iter, gfp_t gfp, unsigned long max) Index: linux-5.4.5-rt3/lib/scatterlist.c =================================================================== --- linux-5.4.5-rt3.orig/lib/scatterlist.c +++ linux-5.4.5-rt3/lib/scatterlist.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:814 @ void sg_miter_stop(struct sg_mapping_ite flush_kernel_dcache_page(miter->page); if (miter->__flags & SG_MITER_ATOMIC) { - WARN_ON_ONCE(preemptible()); + WARN_ON_ONCE(!pagefault_disabled()); kunmap_atomic(miter->addr); } else kunmap(miter->page); Index: linux-5.4.5-rt3/lib/smp_processor_id.c =================================================================== --- linux-5.4.5-rt3.orig/lib/smp_processor_id.c +++ linux-5.4.5-rt3/lib/smp_processor_id.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:26 @ unsigned int check_preemption_disabled(c * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ - if (cpumask_equal(current->cpus_ptr, cpumask_of(this_cpu))) +#if defined(CONFIG_PREEMPT_RT) && (defined(CONFIG_SMP) || defined(CONFIG_SCHED_DEBUG)) + if (current->migrate_disable) + goto out; +#endif + + if (current->nr_cpus_allowed == 1) goto out; /* Index: linux-5.4.5-rt3/localversion-rt =================================================================== --- /dev/null +++ linux-5.4.5-rt3/localversion-rt @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1 @ +-rt9 Index: linux-5.4.5-rt3/mm/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/mm/Kconfig +++ linux-5.4.5-rt3/mm/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:372 @ config NOMMU_INITIAL_TRIM_EXCESS config TRANSPARENT_HUGEPAGE bool "Transparent Hugepage Support" - depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE + depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE && !PREEMPT_RT select COMPACTION select XARRAY_MULTI help Index: linux-5.4.5-rt3/mm/compaction.c =================================================================== --- linux-5.4.5-rt3.orig/mm/compaction.c +++ linux-5.4.5-rt3/mm/compaction.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1593 @ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +#define sysctl_compact_unevictable_allowed 0 +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2247 @ check_drain: block_start_pfn(cc->migrate_pfn, cc->order); if (last_migrated_pfn < current_block_start) { - cpu = get_cpu(); - lru_add_drain_cpu(cpu); - drain_local_pages(cc->zone); - put_cpu(); + if (static_branch_likely(&use_pvec_lock)) { + cpu = raw_smp_processor_id(); + lru_add_drain_cpu(cpu); + drain_cpu_pages(cpu, cc->zone); + } else { + cpu = get_cpu(); + lru_add_drain_cpu(cpu); + drain_local_pages(cc->zone); + put_cpu(); + } /* No more flushing until we migrate again */ last_migrated_pfn = 0; } Index: linux-5.4.5-rt3/mm/highmem.c =================================================================== --- linux-5.4.5-rt3.orig/mm/highmem.c +++ linux-5.4.5-rt3/mm/highmem.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:33 @ #include <linux/kgdb.h> #include <asm/tlbflush.h> - +#ifndef CONFIG_PREEMPT_RT #if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32) DEFINE_PER_CPU(int, __kmap_atomic_idx); +EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx); +#endif #endif /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:113 @ static inline wait_queue_head_t *get_pkm atomic_long_t _totalhigh_pages __read_mostly; EXPORT_SYMBOL(_totalhigh_pages); -EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx); - unsigned int nr_free_highpages (void) { struct zone *zone; Index: linux-5.4.5-rt3/mm/internal.h =================================================================== --- linux-5.4.5-rt3.orig/mm/internal.h +++ linux-5.4.5-rt3/mm/internal.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:35 @ /* Do not use these with a slab allocator */ #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) +#ifdef CONFIG_PREEMPT_RT +extern struct static_key_true use_pvec_lock; +#else +extern struct static_key_false use_pvec_lock; +#endif + void page_writeback_init(void); vm_fault_t do_swap_page(struct vm_fault *vmf); Index: linux-5.4.5-rt3/mm/kmemleak.c =================================================================== --- linux-5.4.5-rt3.orig/mm/kmemleak.c +++ linux-5.4.5-rt3/mm/kmemleak.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:16 @ * * The following locks and mutexes are used by kmemleak: * - * - kmemleak_lock (rwlock): protects the object_list modifications and + * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and * accesses to the object_tree_root. The object_list is the main list * holding the metadata (struct kmemleak_object) for the allocated memory * blocks. The object_tree_root is a red black tree used to look-up @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:25 @ * object_tree_root in the create_object() function called from the * kmemleak_alloc() callback and removed in delete_object() called from the * kmemleak_free() callback - * - kmemleak_object.lock (spinlock): protects a kmemleak_object. Accesses to - * the metadata (e.g. count) are protected by this lock. Note that some - * members of this structure may be protected by other means (atomic or - * kmemleak_lock). This lock is also held when scanning the corresponding - * memory block to avoid the kernel freeing it via the kmemleak_free() - * callback. This is less heavyweight than holding a global lock like - * kmemleak_lock during scanning + * - kmemleak_object.lock (raw_spinlock_t): protects a kmemleak_object. + * Accesses to the metadata (e.g. count) are protected by this lock. Note + * that some members of this structure may be protected by other means + * (atomic or kmemleak_lock). This lock is also held when scanning the + * corresponding memory block to avoid the kernel freeing it via the + * kmemleak_free() callback. This is less heavyweight than holding a global + * lock like kmemleak_lock during scanning. * - scan_mutex (mutex): ensures that only one thread may scan the memory for * unreferenced objects at a time. The gray_list contains the objects which * are already referenced or marked as false positives and need to be @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:138 @ struct kmemleak_scan_area { * (use_count) and freed using the RCU mechanism. */ struct kmemleak_object { - spinlock_t lock; + raw_spinlock_t lock; unsigned int flags; /* object status flags */ struct list_head object_list; struct list_head gray_list; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:194 @ static int mem_pool_free_count = ARRAY_S static LIST_HEAD(mem_pool_free_list); /* search tree for object boundaries */ static struct rb_root object_tree_root = RB_ROOT; -/* rw_lock protecting the access to object_list and object_tree_root */ -static DEFINE_RWLOCK(kmemleak_lock); +/* protecting the access to object_list and object_tree_root */ +static DEFINE_RAW_SPINLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:429 @ static struct kmemleak_object *mem_pool_ } /* slab allocation failed, try the memory pool */ - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); object = list_first_entry_or_null(&mem_pool_free_list, typeof(*object), object_list); if (object) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:438 @ static struct kmemleak_object *mem_pool_ object = &mem_pool[--mem_pool_free_count]; else pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n"); - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); return object; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:456 @ static void mem_pool_free(struct kmemlea } /* add the object to the memory pool free list */ - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); list_add(&object->object_list, &mem_pool_free_list); - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:517 @ static struct kmemleak_object *find_and_ struct kmemleak_object *object; rcu_read_lock(); - read_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); object = lookup_object(ptr, alias); - read_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); /* check whether the object is still available */ if (object && !get_object(object)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:549 @ static struct kmemleak_object *find_and_ unsigned long flags; struct kmemleak_object *object; - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); object = lookup_object(ptr, alias); if (object) __remove_object(object); - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); return object; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:588 @ static struct kmemleak_object *create_ob INIT_LIST_HEAD(&object->object_list); INIT_LIST_HEAD(&object->gray_list); INIT_HLIST_HEAD(&object->area_list); - spin_lock_init(&object->lock); + raw_spin_lock_init(&object->lock); atomic_set(&object->use_count, 1); object->flags = OBJECT_ALLOCATED; object->pointer = ptr; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:620 @ static struct kmemleak_object *create_ob /* kernel backtrace */ object->trace_len = __save_stack_trace(object->trace); - write_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr); min_addr = min(min_addr, untagged_ptr); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:652 @ static struct kmemleak_object *create_ob list_add_tail_rcu(&object->object_list, &object_list); out: - write_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); return object; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:670 @ static void __delete_object(struct kmeml * Locking here also ensures that the corresponding memory block * cannot be freed when it is being scanned. */ - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); object->flags &= ~OBJECT_ALLOCATED; - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:742 @ static void paint_it(struct kmemleak_obj { unsigned long flags; - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); __paint_it(object, color); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } static void paint_ptr(unsigned long ptr, int color) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:801 @ static void add_scan_area(unsigned long if (scan_area_cache) area = kmem_cache_alloc(scan_area_cache, gfp_kmemleak_mask(gfp)); - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if (!area) { pr_warn_once("Cannot allocate a scan area, scanning the full object\n"); /* mark the object for full scan to avoid false positives */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:823 @ static void add_scan_area(unsigned long hlist_add_head(&area->node, &object->area_list); out_unlock: - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:845 @ static void object_set_excess_ref(unsign return; } - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); object->excess_ref = excess_ref; - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:867 @ static void object_no_scan(unsigned long return; } - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); object->flags |= OBJECT_NO_SCAN; - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1029 @ void __ref kmemleak_update_trace(const v return; } - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); object->trace_len = __save_stack_trace(object->trace); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1236 @ static void scan_block(void *_start, voi unsigned long flags; unsigned long untagged_ptr; - read_lock_irqsave(&kmemleak_lock, flags); + raw_spin_lock_irqsave(&kmemleak_lock, flags); for (ptr = start; ptr < end; ptr++) { struct kmemleak_object *object; unsigned long pointer; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1271 @ static void scan_block(void *_start, voi * previously acquired in scan_object(). These locks are * enclosed by scan_mutex. */ - spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); /* only pass surplus references (object already gray) */ if (color_gray(object)) { excess_ref = object->excess_ref; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1280 @ static void scan_block(void *_start, voi excess_ref = 0; update_refs(object); } - spin_unlock(&object->lock); + raw_spin_unlock(&object->lock); if (excess_ref) { object = lookup_object(excess_ref, 0); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1289 @ static void scan_block(void *_start, voi if (object == scanned) /* circular reference, ignore */ continue; - spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock_nested(&object->lock, SINGLE_DEPTH_NESTING); update_refs(object); - spin_unlock(&object->lock); + raw_spin_unlock(&object->lock); } } - read_unlock_irqrestore(&kmemleak_lock, flags); + raw_spin_unlock_irqrestore(&kmemleak_lock, flags); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1327 @ static void scan_object(struct kmemleak_ * Once the object->lock is acquired, the corresponding memory block * cannot be freed (the same lock is acquired in delete_object). */ - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if (object->flags & OBJECT_NO_SCAN) goto out; if (!(object->flags & OBJECT_ALLOCATED)) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1347 @ static void scan_object(struct kmemleak_ if (start >= end) break; - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); cond_resched(); - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); } while (object->flags & OBJECT_ALLOCATED); } else hlist_for_each_entry(area, &object->area_list, node) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1357 @ static void scan_object(struct kmemleak_ (void *)(area->start + area->size), object); out: - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1410 @ static void kmemleak_scan(void) /* prepare the kmemleak_object's */ rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); #ifdef DEBUG /* * With a few exceptions there should be a maximum of @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1427 @ static void kmemleak_scan(void) if (color_gray(object) && get_object(object)) list_add_tail(&object->gray_list, &gray_list); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } rcu_read_unlock(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1495 @ static void kmemleak_scan(void) */ rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if (color_white(object) && (object->flags & OBJECT_ALLOCATED) && update_checksum(object) && get_object(object)) { /* color it gray temporarily */ object->count = object->min_count; list_add_tail(&object->gray_list, &gray_list); } - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } rcu_read_unlock(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1522 @ static void kmemleak_scan(void) */ rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if (unreferenced_object(object) && !(object->flags & OBJECT_REPORTED)) { object->flags |= OBJECT_REPORTED; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1532 @ static void kmemleak_scan(void) new_leaks++; } - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } rcu_read_unlock(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1684 @ static int kmemleak_seq_show(struct seq_ struct kmemleak_object *object = v; unsigned long flags; - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) print_unreferenced(seq, object); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1717 @ static int dump_str_object_info(const ch return -EINVAL; } - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); dump_object_info(object); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); put_object(object); return 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1738 @ static void kmemleak_clear(void) rcu_read_lock(); list_for_each_entry_rcu(object, &object_list, object_list) { - spin_lock_irqsave(&object->lock, flags); + raw_spin_lock_irqsave(&object->lock, flags); if ((object->flags & OBJECT_REPORTED) && unreferenced_object(object)) __paint_it(object, KMEMLEAK_GREY); - spin_unlock_irqrestore(&object->lock, flags); + raw_spin_unlock_irqrestore(&object->lock, flags); } rcu_read_unlock(); Index: linux-5.4.5-rt3/mm/memcontrol.c =================================================================== --- linux-5.4.5-rt3.orig/mm/memcontrol.c +++ linux-5.4.5-rt3/mm/memcontrol.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:66 @ #include <net/sock.h> #include <net/ip.h> #include "slab.h" +#include <linux/locallock.h> #include <linux/uaccess.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:96 @ int do_swap_account __read_mostly; static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq); #endif +static DEFINE_LOCAL_IRQ_LOCK(event_lock); + /* Whether legacy memory+swap accounting is active */ static bool do_memsw_account(void) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2271 @ static void drain_all_stock(struct mem_c * as well as workers from this path always operate on the local * per-cpu data. CPU up doesn't touch memcg_stock at all. */ - curcpu = get_cpu(); + curcpu = get_cpu_light(); for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); struct mem_cgroup *memcg; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2292 @ static void drain_all_stock(struct mem_c schedule_work_on(cpu, &stock->work); } } - put_cpu(); + put_cpu_light(); mutex_unlock(&percpu_charge_mutex); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5502 @ static int mem_cgroup_move_account(struc ret = 0; - local_irq_disable(); + local_lock_irq(event_lock); mem_cgroup_charge_statistics(to, page, compound, nr_pages); memcg_check_events(to, page); mem_cgroup_charge_statistics(from, page, compound, -nr_pages); memcg_check_events(from, page); - local_irq_enable(); + local_unlock_irq(event_lock); out_unlock: unlock_page(page); out: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6555 @ void mem_cgroup_commit_charge(struct pag commit_charge(page, memcg, lrucare); - local_irq_disable(); + local_lock_irq(event_lock); mem_cgroup_charge_statistics(memcg, page, compound, nr_pages); memcg_check_events(memcg, page); - local_irq_enable(); + local_unlock_irq(event_lock); if (do_memsw_account() && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6627 @ static void uncharge_batch(const struct memcg_oom_recover(ug->memcg); } - local_irq_save(flags); + local_lock_irqsave(event_lock, flags); __mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon); __mod_memcg_state(ug->memcg, MEMCG_CACHE, -ug->nr_file); __mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6635 @ static void uncharge_batch(const struct __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, nr_pages); memcg_check_events(ug->memcg, ug->dummy_page); - local_irq_restore(flags); + local_unlock_irqrestore(event_lock, flags); if (!mem_cgroup_is_root(ug->memcg)) css_put_many(&ug->memcg->css, nr_pages); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6798 @ void mem_cgroup_migrate(struct page *old commit_charge(newpage, memcg, false); - local_irq_save(flags); + local_lock_irqsave(event_lock, flags); mem_cgroup_charge_statistics(memcg, newpage, compound, nr_pages); memcg_check_events(memcg, newpage); - local_irq_restore(flags); + local_unlock_irqrestore(event_lock, flags); } DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6993 @ void mem_cgroup_swapout(struct page *pag struct mem_cgroup *memcg, *swap_memcg; unsigned int nr_entries; unsigned short oldid; + unsigned long flags; VM_BUG_ON_PAGE(PageLRU(page), page); VM_BUG_ON_PAGE(page_count(page), page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:7039 @ void mem_cgroup_swapout(struct page *pag * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. */ + local_lock_irqsave(event_lock, flags); +#ifndef CONFIG_PREEMPT_RT VM_BUG_ON(!irqs_disabled()); +#endif mem_cgroup_charge_statistics(memcg, page, PageTransHuge(page), -nr_entries); memcg_check_events(memcg, page); + local_unlock_irqrestore(event_lock, flags); if (!mem_cgroup_is_root(memcg)) css_put_many(&memcg->css, nr_entries); Index: linux-5.4.5-rt3/mm/memory.c =================================================================== --- linux-5.4.5-rt3.orig/mm/memory.c +++ linux-5.4.5-rt3/mm/memory.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2136 @ static inline int pte_unmap_same(struct pte_t *page_table, pte_t orig_pte) { int same = 1; -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) if (sizeof(pte_t) > sizeof(unsigned long)) { spinlock_t *ptl = pte_lockptr(mm, pmd); spin_lock(ptl); Index: linux-5.4.5-rt3/mm/page_alloc.c =================================================================== --- linux-5.4.5-rt3.orig/mm/page_alloc.c +++ linux-5.4.5-rt3/mm/page_alloc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:64 @ #include <linux/hugetlb.h> #include <linux/sched/rt.h> #include <linux/sched/mm.h> +#include <linux/locallock.h> #include <linux/page_owner.h> #include <linux/kthread.h> #include <linux/memcontrol.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:361 @ EXPORT_SYMBOL(nr_node_ids); EXPORT_SYMBOL(nr_online_nodes); #endif +static DEFINE_LOCAL_IRQ_LOCK(pa_lock); + +#ifdef CONFIG_PREEMPT_RT +# define cpu_lock_irqsave(cpu, flags) \ + local_lock_irqsave_on(pa_lock, flags, cpu) +# define cpu_unlock_irqrestore(cpu, flags) \ + local_unlock_irqrestore_on(pa_lock, flags, cpu) +#else +# define cpu_lock_irqsave(cpu, flags) local_irq_save(flags) +# define cpu_unlock_irqrestore(cpu, flags) local_irq_restore(flags) +#endif + int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1252 @ static inline void prefetch_buddy(struct } /* - * Frees a number of pages from the PCP lists + * Frees a number of pages which have been collected from the pcp lists. * Assumes all pages on list are in same zone, and of same order. * count is the number of pages to free. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1262 @ static inline void prefetch_buddy(struct * And clear the zone's pages_scanned counter, to hold off the "all pages are * pinned" detection logic. */ -static void free_pcppages_bulk(struct zone *zone, int count, - struct per_cpu_pages *pcp) +static void free_pcppages_bulk(struct zone *zone, struct list_head *head, + bool zone_retry) +{ + bool isolated_pageblocks; + struct page *page, *tmp; + unsigned long flags; + + spin_lock_irqsave(&zone->lock, flags); + isolated_pageblocks = has_isolate_pageblock(zone); + + /* + * Use safe version since after __free_one_page(), + * page->lru.next will not point to original list. + */ + list_for_each_entry_safe(page, tmp, head, lru) { + int mt = get_pcppage_migratetype(page); + + if (page_zone(page) != zone) { + /* + * free_unref_page_list() sorts pages by zone. If we end + * up with pages from a different NUMA nodes belonging + * to the same ZONE index then we need to redo with the + * correct ZONE pointer. Skip the page for now, redo it + * on the next iteration. + */ + WARN_ON_ONCE(zone_retry == false); + if (zone_retry) + continue; + } + + /* MIGRATE_ISOLATE page should not go to pcplists */ + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); + /* Pageblock could have been isolated meanwhile */ + if (unlikely(isolated_pageblocks)) + mt = get_pageblock_migratetype(page); + + list_del(&page->lru); + __free_one_page(page, page_to_pfn(page), zone, 0, mt); + trace_mm_page_pcpu_drain(page, 0, mt); + } + spin_unlock_irqrestore(&zone->lock, flags); +} + +static void isolate_pcp_pages(int count, struct per_cpu_pages *pcp, + struct list_head *dst) + { int migratetype = 0; int batch_free = 0; int prefetch_nr = 0; - bool isolated_pageblocks; - struct page *page, *tmp; - LIST_HEAD(head); + struct page *page; while (count) { struct list_head *list; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1344 @ static void free_pcppages_bulk(struct zo if (bulkfree_pcp_prepare(page)) continue; - list_add_tail(&page->lru, &head); + list_add_tail(&page->lru, dst); /* * We are going to put the page back to the global @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1359 @ static void free_pcppages_bulk(struct zo prefetch_buddy(page); } while (--count && --batch_free && !list_empty(list)); } - - spin_lock(&zone->lock); - isolated_pageblocks = has_isolate_pageblock(zone); - - /* - * Use safe version since after __free_one_page(), - * page->lru.next will not point to original list. - */ - list_for_each_entry_safe(page, tmp, &head, lru) { - int mt = get_pcppage_migratetype(page); - /* MIGRATE_ISOLATE page should not go to pcplists */ - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); - /* Pageblock could have been isolated meanwhile */ - if (unlikely(isolated_pageblocks)) - mt = get_pageblock_migratetype(page); - - __free_one_page(page, page_to_pfn(page), zone, 0, mt); - trace_mm_page_pcpu_drain(page, 0, mt); - } - spin_unlock(&zone->lock); } static void free_one_page(struct zone *zone, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1459 @ static void __free_pages_ok(struct page return; migratetype = get_pfnblock_migratetype(page, pfn); - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); __count_vm_events(PGFREE, 1 << order); free_one_page(page_zone(page), page, pfn, order, migratetype); - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); } void __free_pages_core(struct page *page, unsigned int order) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2826 @ void drain_zone_pages(struct zone *zone, { unsigned long flags; int to_drain, batch; + LIST_HEAD(dst); - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); batch = READ_ONCE(pcp->batch); to_drain = min(pcp->count, batch); if (to_drain > 0) - free_pcppages_bulk(zone, to_drain, pcp); - local_irq_restore(flags); + isolate_pcp_pages(to_drain, pcp, &dst); + + local_unlock_irqrestore(pa_lock, flags); + + if (to_drain > 0) + free_pcppages_bulk(zone, &dst, false); } #endif @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2853 @ static void drain_pages_zone(unsigned in unsigned long flags; struct per_cpu_pageset *pset; struct per_cpu_pages *pcp; + LIST_HEAD(dst); + int count; - local_irq_save(flags); + cpu_lock_irqsave(cpu, flags); pset = per_cpu_ptr(zone->pageset, cpu); pcp = &pset->pcp; - if (pcp->count) - free_pcppages_bulk(zone, pcp->count, pcp); - local_irq_restore(flags); + count = pcp->count; + if (count) + isolate_pcp_pages(count, pcp, &dst); + + cpu_unlock_irqrestore(cpu, flags); + + if (count) + free_pcppages_bulk(zone, &dst, false); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2886 @ static void drain_pages(unsigned int cpu } } +void drain_cpu_pages(unsigned int cpu, struct zone *zone) +{ + if (zone) + drain_pages_zone(cpu, zone); + else + drain_pages(cpu); +} + /* * Spill all of this CPU's per-cpu pages back into the buddy allocator. * @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2904 @ void drain_local_pages(struct zone *zone { int cpu = smp_processor_id(); - if (zone) - drain_pages_zone(cpu, zone); - else - drain_pages(cpu); + drain_cpu_pages(cpu, zone); } static void drain_local_pages_wq(struct work_struct *work) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2991 @ void drain_all_pages(struct zone *zone) cpumask_clear_cpu(cpu, &cpus_with_pcps); } - for_each_cpu(cpu, &cpus_with_pcps) { - struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu); + if (static_branch_likely(&use_pvec_lock)) { + for_each_cpu(cpu, &cpus_with_pcps) + drain_cpu_pages(cpu, zone); + } else { + for_each_cpu(cpu, &cpus_with_pcps) { + struct pcpu_drain *drain = per_cpu_ptr(&pcpu_drain, cpu); - drain->zone = zone; - INIT_WORK(&drain->work, drain_local_pages_wq); - queue_work_on(cpu, mm_percpu_wq, &drain->work); + drain->zone = zone; + INIT_WORK(&drain->work, drain_local_pages_wq); + queue_work_on(cpu, mm_percpu_wq, &drain->work); + } + for_each_cpu(cpu, &cpus_with_pcps) + flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work); } - for_each_cpu(cpu, &cpus_with_pcps) - flush_work(&per_cpu_ptr(&pcpu_drain, cpu)->work); mutex_unlock(&pcpu_drain_mutex); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3076 @ static bool free_unref_page_prepare(stru return true; } -static void free_unref_page_commit(struct page *page, unsigned long pfn) +static void free_unref_page_commit(struct page *page, unsigned long pfn, + struct list_head *dst) { struct zone *zone = page_zone(page); struct per_cpu_pages *pcp; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3106 @ static void free_unref_page_commit(struc pcp->count++; if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); - free_pcppages_bulk(zone, batch, pcp); + + isolate_pcp_pages(batch, pcp, dst); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3118 @ void free_unref_page(struct page *page) { unsigned long flags; unsigned long pfn = page_to_pfn(page); + struct zone *zone = page_zone(page); + LIST_HEAD(dst); if (!free_unref_page_prepare(page, pfn)) return; - local_irq_save(flags); - free_unref_page_commit(page, pfn); - local_irq_restore(flags); + local_lock_irqsave(pa_lock, flags); + free_unref_page_commit(page, pfn, &dst); + local_unlock_irqrestore(pa_lock, flags); + if (!list_empty(&dst)) + free_pcppages_bulk(zone, &dst, false); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3139 @ void free_unref_page_list(struct list_he struct page *page, *next; unsigned long flags, pfn; int batch_count = 0; + struct list_head dsts[__MAX_NR_ZONES]; + int i; + + for (i = 0; i < __MAX_NR_ZONES; i++) + INIT_LIST_HEAD(&dsts[i]); /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3153 @ void free_unref_page_list(struct list_he set_page_private(page, pfn); } - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); list_for_each_entry_safe(page, next, list, lru) { unsigned long pfn = page_private(page); + enum zone_type type; set_page_private(page, 0); trace_mm_page_free_batched(page); - free_unref_page_commit(page, pfn); + type = page_zonenum(page); + free_unref_page_commit(page, pfn, &dsts[type]); /* * Guard against excessive IRQ disabled times when we get * a large list of pages to free. */ if (++batch_count == SWAP_CLUSTER_MAX) { - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); batch_count = 0; - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); } } - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); + + for (i = 0; i < __MAX_NR_ZONES; ) { + struct page *page; + struct zone *zone; + + if (list_empty(&dsts[i])) { + i++; + continue; + } + + page = list_first_entry(&dsts[i], struct page, lru); + zone = page_zone(page); + + free_pcppages_bulk(zone, &dsts[i], true); + } } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3323 @ static struct page *rmqueue_pcplist(stru struct page *page; unsigned long flags; - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); pcp = &this_cpu_ptr(zone->pageset)->pcp; list = &pcp->lists[migratetype]; page = __rmqueue_pcplist(zone, migratetype, alloc_flags, pcp, list); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3331 @ static struct page *rmqueue_pcplist(stru __count_zid_vm_events(PGALLOC, page_zonenum(page), 1); zone_statistics(preferred_zone, zone); } - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); return page; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3358 @ struct page *rmqueue(struct zone *prefer * allocate greater than order-1 page units with __GFP_NOFAIL. */ WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); - spin_lock_irqsave(&zone->lock, flags); + local_spin_lock_irqsave(pa_lock, &zone->lock, flags); do { page = NULL; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3378 @ struct page *rmqueue(struct zone *prefer __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone); - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); out: /* Separate test+clear to avoid unnecessary atomics */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3391 @ out: return page; failed: - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); return NULL; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:8615 @ void zone_pcp_reset(struct zone *zone) struct per_cpu_pageset *pset; /* avoid races with drain_pages() */ - local_irq_save(flags); + local_lock_irqsave(pa_lock, flags); if (zone->pageset != &boot_pageset) { for_each_online_cpu(cpu) { pset = per_cpu_ptr(zone->pageset, cpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:8624 @ void zone_pcp_reset(struct zone *zone) free_percpu(zone->pageset); zone->pageset = &boot_pageset; } - local_irq_restore(flags); + local_unlock_irqrestore(pa_lock, flags); } #ifdef CONFIG_MEMORY_HOTREMOVE Index: linux-5.4.5-rt3/mm/slab.c =================================================================== --- linux-5.4.5-rt3.orig/mm/slab.c +++ linux-5.4.5-rt3/mm/slab.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:236 @ static void kmem_cache_node_init(struct parent->shared = NULL; parent->alien = NULL; parent->colour_next = 0; - spin_lock_init(&parent->list_lock); + raw_spin_lock_init(&parent->list_lock); parent->free_objects = 0; parent->free_touched = 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:561 @ static noinline void cache_free_pfmemall page_node = page_to_nid(page); n = get_node(cachep, page_node); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); free_block(cachep, &objp, 1, page_node, &list); - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); slabs_destroy(cachep, &list); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:691 @ static void __drain_alien_cache(struct k struct kmem_cache_node *n = get_node(cachep, node); if (ac->avail) { - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); /* * Stuff objects into the remote nodes shared array first. * That way we could avoid the overhead of putting the objects @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:702 @ static void __drain_alien_cache(struct k free_block(cachep, ac->entry, ac->avail, node, list); ac->avail = 0; - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:775 @ static int __cache_free_alien(struct kme slabs_destroy(cachep, &list); } else { n = get_node(cachep, page_node); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); free_block(cachep, &objp, 1, page_node, &list); - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); slabs_destroy(cachep, &list); } return 1; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:818 @ static int init_cache_node(struct kmem_c */ n = get_node(cachep, node); if (n) { - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); n->free_limit = (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num; - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:900 @ static int setup_kmem_cache_node(struct goto fail; n = get_node(cachep, node); - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); if (n->shared && force_change) { free_block(cachep, n->shared->entry, n->shared->avail, node, &list); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:918 @ static int setup_kmem_cache_node(struct new_alien = NULL; } - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); slabs_destroy(cachep, &list); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:957 @ static void cpuup_canceled(long cpu) if (!n) continue; - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); /* Free limit for this kmem_cache_node */ n->free_limit -= cachep->batchcount; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:968 @ static void cpuup_canceled(long cpu) nc->avail = 0; if (!cpumask_empty(mask)) { - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); goto free_slab; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:982 @ static void cpuup_canceled(long cpu) alien = n->alien; n->alien = NULL; - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); kfree(shared); if (alien) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1166 @ static void __init init_list(struct kmem /* * Do not assume that spinlocks can be initialized via memcpy: */ - spin_lock_init(&ptr->list_lock); + raw_spin_lock_init(&ptr->list_lock); MAKE_ALL_LISTS(cachep, ptr, nodeid); cachep->node[nodeid] = ptr; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1337 @ slab_out_of_memory(struct kmem_cache *ca for_each_kmem_cache_node(cachep, node, n) { unsigned long total_slabs, free_slabs, free_objs; - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); total_slabs = n->total_slabs; free_slabs = n->free_slabs; free_objs = n->free_objects; - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); pr_warn(" node %d: slabs: %ld/%ld, objs: %ld/%ld\n", node, total_slabs - free_slabs, total_slabs, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2099 @ static void check_spinlock_acquired(stru { #ifdef CONFIG_SMP check_irq_off(); - assert_spin_locked(&get_node(cachep, numa_mem_id())->list_lock); + assert_raw_spin_locked(&get_node(cachep, numa_mem_id())->list_lock); #endif } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2107 @ static void check_spinlock_acquired_node { #ifdef CONFIG_SMP check_irq_off(); - assert_spin_locked(&get_node(cachep, node)->list_lock); + assert_raw_spin_locked(&get_node(cachep, node)->list_lock); #endif } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2147 @ static void do_drain(void *arg) check_irq_off(); ac = cpu_cache_get(cachep); n = get_node(cachep, node); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); free_block(cachep, ac->entry, ac->avail, node, &list); - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); slabs_destroy(cachep, &list); ac->avail = 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2167 @ static void drain_cpu_caches(struct kmem drain_alien_cache(cachep, n->alien); for_each_kmem_cache_node(cachep, node, n) { - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); drain_array_locked(cachep, n->shared, node, true, &list); - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); slabs_destroy(cachep, &list); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2191 @ static int drain_freelist(struct kmem_ca nr_freed = 0; while (nr_freed < tofree && !list_empty(&n->slabs_free)) { - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); p = n->slabs_free.prev; if (p == &n->slabs_free) { - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); goto out; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2207 @ static int drain_freelist(struct kmem_ca * to the cache. */ n->free_objects -= cache->num; - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); slab_destroy(cache, page); nr_freed++; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2660 @ static void cache_grow_end(struct kmem_c INIT_LIST_HEAD(&page->slab_list); n = get_node(cachep, page_to_nid(page)); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); n->total_slabs++; if (!page->active) { list_add_tail(&page->slab_list, &n->slabs_free); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2670 @ static void cache_grow_end(struct kmem_c STATS_INC_GROWN(cachep); n->free_objects += cachep->num - page->active; - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); fixup_objfreelist_debug(cachep, &list); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2836 @ static struct page *get_first_slab(struc { struct page *page; - assert_spin_locked(&n->list_lock); + assert_raw_spin_locked(&n->list_lock); page = list_first_entry_or_null(&n->slabs_partial, struct page, slab_list); if (!page) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2863 @ static noinline void *cache_alloc_pfmema if (!gfp_pfmemalloc_allowed(flags)) return NULL; - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); page = get_first_slab(n, true); if (!page) { - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); return NULL; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2875 @ static noinline void *cache_alloc_pfmema fixup_slab_list(cachep, n, page, &list); - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); fixup_objfreelist_debug(cachep, &list); return obj; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2934 @ static void *cache_alloc_refill(struct k if (!n->free_objects && (!shared || !shared->avail)) goto direct_grow; - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); shared = READ_ONCE(n->shared); /* See if we can refill from the shared array */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2958 @ static void *cache_alloc_refill(struct k must_grow: n->free_objects -= ac->avail; alloc_done: - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); fixup_objfreelist_debug(cachep, &list); direct_grow: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3183 @ static void *____cache_alloc_node(struct BUG_ON(!n); check_irq_off(); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); page = get_first_slab(n, false); if (!page) goto must_grow; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3201 @ static void *____cache_alloc_node(struct fixup_slab_list(cachep, n, page, &list); - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); fixup_objfreelist_debug(cachep, &list); return obj; must_grow: - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid); if (page) { /* This slab isn't counted yet so don't update free_objects */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3382 @ static void cache_flusharray(struct kmem check_irq_off(); n = get_node(cachep, node); - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); if (n->shared) { struct array_cache *shared_array = n->shared; int max = shared_array->limit - shared_array->avail; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3411 @ free_done: STATS_SET_FREEABLE(cachep, i); } #endif - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); slabs_destroy(cachep, &list); ac->avail -= batchcount; memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3833 @ static int __do_tune_cpucache(struct kme node = cpu_to_mem(cpu); n = get_node(cachep, node); - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); free_block(cachep, ac->entry, ac->avail, node, &list); - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); slabs_destroy(cachep, &list); } free_percpu(prev); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3960 @ static void drain_array(struct kmem_cach return; } - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); drain_array_locked(cachep, ac, node, false, &list); - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); slabs_destroy(cachep, &list); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4046 @ void get_slabinfo(struct kmem_cache *cac for_each_kmem_cache_node(cachep, node, n) { check_irq_on(); - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); total_slabs += n->total_slabs; free_slabs += n->free_slabs; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4055 @ void get_slabinfo(struct kmem_cache *cac if (n->shared) shared_avail += n->shared->avail; - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); } num_objs = total_slabs * cachep->num; active_slabs = total_slabs - free_slabs; Index: linux-5.4.5-rt3/mm/slab.h =================================================================== --- linux-5.4.5-rt3.orig/mm/slab.h +++ linux-5.4.5-rt3/mm/slab.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:599 @ static inline void slab_post_alloc_hook( * The slab lists for all objects. */ struct kmem_cache_node { - spinlock_t list_lock; + raw_spinlock_t list_lock; #ifdef CONFIG_SLAB struct list_head slabs_partial; /* partial list first, better asm code */ Index: linux-5.4.5-rt3/mm/slub.c =================================================================== --- linux-5.4.5-rt3.orig/mm/slub.c +++ linux-5.4.5-rt3/mm/slub.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1179 @ static noinline int free_debug_processin unsigned long uninitialized_var(flags); int ret = 0; - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); slab_lock(page); if (s->flags & SLAB_CONSISTENCY_CHECKS) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1214 @ out: bulk_cnt, cnt); slab_unlock(page); - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); if (!ret) slab_fix(s, "Object at 0x%p not freed", object); return ret; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1384 @ static inline void dec_slabs_node(struct #endif /* CONFIG_SLUB_DEBUG */ +struct slub_free_list { + raw_spinlock_t lock; + struct list_head list; +}; +static DEFINE_PER_CPU(struct slub_free_list, slub_free_list); + /* * Hooks for other subsystems that check memory allocations. In a typical * production configuration these hooks all should produce no code at all. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1630 @ static struct page *allocate_slab(struct void *start, *p, *next; int idx; bool shuffle; + bool enableirqs = false; flags &= gfp_allowed_mask; if (gfpflags_allow_blocking(flags)) + enableirqs = true; + +#ifdef CONFIG_PREEMPT_RT + if (system_state > SYSTEM_BOOTING) + enableirqs = true; +#endif + if (enableirqs) local_irq_enable(); flags |= s->allocflags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1700 @ static struct page *allocate_slab(struct page->frozen = 1; out: - if (gfpflags_allow_blocking(flags)) + if (enableirqs) local_irq_disable(); if (!page) return NULL; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1748 @ static void __free_slab(struct kmem_cach __free_pages(page, order); } +static void free_delayed(struct list_head *h) +{ + while (!list_empty(h)) { + struct page *page = list_first_entry(h, struct page, lru); + + list_del(&page->lru); + __free_slab(page->slab_cache, page); + } +} + static void rcu_free_slab(struct rcu_head *h) { struct page *page = container_of(h, struct page, rcu_head); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1769 @ static void free_slab(struct kmem_cache { if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) { call_rcu(&page->rcu_head, rcu_free_slab); + } else if (irqs_disabled()) { + struct slub_free_list *f = this_cpu_ptr(&slub_free_list); + + raw_spin_lock(&f->lock); + list_add(&page->lru, &f->list); + raw_spin_unlock(&f->lock); } else __free_slab(s, page); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1882 @ static void *get_partial_node(struct kme if (!n || !n->nr_partial) return NULL; - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); list_for_each_entry_safe(page, page2, &n->partial, slab_list) { void *t; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1907 @ static void *get_partial_node(struct kme break; } - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); return object; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1996 @ static void *get_partial(struct kmem_cac return get_any_partial(s, flags, c); } -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* * Calculate the next globally unique transaction for disambiguiation * during cmpxchg. The transactions start with the cpu number and are then @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2041 @ static inline void note_cmpxchg_failure( pr_info("%s %s: cmpxchg redo ", n, s->name); -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION if (tid_to_cpu(tid) != tid_to_cpu(actual_tid)) pr_warn("due to cpu change %d -> %d\n", tid_to_cpu(tid), tid_to_cpu(actual_tid)); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2155 @ redo: * that acquire_slab() will see a slab page that * is frozen */ - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); } } else { m = M_FULL; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2166 @ redo: * slabs from diagnostic functions will not see * any frozen slabs. */ - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2190 @ redo: goto redo; if (lock) - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); if (m == M_PARTIAL) stat(s, tail); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2229 @ static void unfreeze_partials(struct kme n2 = get_node(s, page_to_nid(page)); if (n != n2) { if (n) - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); n = n2; - spin_lock(&n->list_lock); + raw_spin_lock(&n->list_lock); } do { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2261 @ static void unfreeze_partials(struct kme } if (n) - spin_unlock(&n->list_lock); + raw_spin_unlock(&n->list_lock); while (discard_page) { page = discard_page; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2298 @ static void put_cpu_partial(struct kmem_ pobjects = oldpage->pobjects; pages = oldpage->pages; if (drain && pobjects > s->cpu_partial) { + struct slub_free_list *f; unsigned long flags; + LIST_HEAD(tofree); /* * partial array is full. Move the existing * set to the per node partial list. */ local_irq_save(flags); unfreeze_partials(s, this_cpu_ptr(s->cpu_slab)); + f = this_cpu_ptr(&slub_free_list); + raw_spin_lock(&f->lock); + list_splice_init(&f->list, &tofree); + raw_spin_unlock(&f->lock); local_irq_restore(flags); + free_delayed(&tofree); oldpage = NULL; pobjects = 0; pages = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2380 @ static bool has_cpu_slab(int cpu, void * static void flush_all(struct kmem_cache *s) { - on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC); + LIST_HEAD(tofree); + int cpu; + + on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1); + for_each_online_cpu(cpu) { + struct slub_free_list *f; + + if (!has_cpu_slab(cpu, s)) + continue; + + f = &per_cpu(slub_free_list, cpu); + raw_spin_lock_irq(&f->lock); + list_splice_init(&f->list, &tofree); + raw_spin_unlock_irq(&f->lock); + free_delayed(&tofree); + } } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2450 @ static unsigned long count_partial(struc unsigned long x = 0; struct page *page; - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) x += get_count(page); - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); return x; } #endif /* CONFIG_SLUB_DEBUG || CONFIG_SYSFS */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2592 @ static inline void *get_freelist(struct * already disabled (which is the case for bulk allocation). */ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, - unsigned long addr, struct kmem_cache_cpu *c) + unsigned long addr, struct kmem_cache_cpu *c, + struct list_head *to_free) { + struct slub_free_list *f; void *freelist; struct page *page; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2651 @ load_freelist: VM_BUG_ON(!c->page->frozen); c->freelist = get_freepointer(s, freelist); c->tid = next_tid(c->tid); + +out: + f = this_cpu_ptr(&slub_free_list); + raw_spin_lock(&f->lock); + list_splice_init(&f->list, to_free); + raw_spin_unlock(&f->lock); + return freelist; new_slab: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2673 @ new_slab: if (unlikely(!freelist)) { slab_out_of_memory(s, gfpflags, node); - return NULL; + goto out; } page = c->page; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2686 @ new_slab: goto new_slab; /* Slab failed checks. Next slab needed */ deactivate_slab(s, page, get_freepointer(s, freelist), c); - return freelist; + goto out; } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2698 @ static void *__slab_alloc(struct kmem_ca { void *p; unsigned long flags; + LIST_HEAD(tofree); local_irq_save(flags); -#ifdef CONFIG_PREEMPT +#ifdef CONFIG_PREEMPTION /* * We may have been preempted and rescheduled on a different * cpu before disabling interrupts. Need to reload cpu area @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2710 @ static void *__slab_alloc(struct kmem_ca c = this_cpu_ptr(s->cpu_slab); #endif - p = ___slab_alloc(s, gfpflags, node, addr, c); + p = ___slab_alloc(s, gfpflags, node, addr, c, &tofree); local_irq_restore(flags); + free_delayed(&tofree); return p; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2756 @ redo: * as we end up on the original cpu again when doing the cmpxchg. * * We should guarantee that tid and kmem_cache are retrieved on - * the same cpu. It could be different if CONFIG_PREEMPT so we need + * the same cpu. It could be different if CONFIG_PREEMPTION so we need * to check if it is matched or not. */ do { tid = this_cpu_read(s->cpu_slab->tid); c = raw_cpu_ptr(s->cpu_slab); - } while (IS_ENABLED(CONFIG_PREEMPT) && + } while (IS_ENABLED(CONFIG_PREEMPTION) && unlikely(tid != READ_ONCE(c->tid))); /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2911 @ static void __slab_free(struct kmem_cach do { if (unlikely(n)) { - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); n = NULL; } prior = page->freelist; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2943 @ static void __slab_free(struct kmem_cach * Otherwise the list_lock will synchronize with * other processors updating the list of slabs. */ - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2984 @ static void __slab_free(struct kmem_cach add_partial(n, page, DEACTIVATE_TO_TAIL); stat(s, FREE_ADD_PARTIAL); } - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); return; slab_empty: @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2999 @ slab_empty: remove_full(s, n, page); } - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); stat(s, FREE_SLAB); discard_slab(s, page); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3036 @ redo: do { tid = this_cpu_read(s->cpu_slab->tid); c = raw_cpu_ptr(s->cpu_slab); - } while (IS_ENABLED(CONFIG_PREEMPT) && + } while (IS_ENABLED(CONFIG_PREEMPTION) && unlikely(tid != READ_ONCE(c->tid))); /* Same with comment on barrier() in slab_alloc_node() */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3202 @ int kmem_cache_alloc_bulk(struct kmem_ca void **p) { struct kmem_cache_cpu *c; + LIST_HEAD(to_free); int i; /* memcg and kmem_cache debug support */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3226 @ int kmem_cache_alloc_bulk(struct kmem_ca * of re-populating per CPU c->freelist */ p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, - _RET_IP_, c); + _RET_IP_, c, &to_free); if (unlikely(!p[i])) goto error; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3241 @ int kmem_cache_alloc_bulk(struct kmem_ca } c->tid = next_tid(c->tid); local_irq_enable(); + free_delayed(&to_free); /* Clear memory outside IRQ disabled fastpath loop */ if (unlikely(slab_want_init_on_alloc(flags, s))) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3256 @ int kmem_cache_alloc_bulk(struct kmem_ca return i; error: local_irq_enable(); + free_delayed(&to_free); slab_post_alloc_hook(s, flags, i, p); __kmem_cache_free_bulk(s, i, p); return 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3392 @ static void init_kmem_cache_node(struct kmem_cache_node *n) { n->nr_partial = 0; - spin_lock_init(&n->list_lock); + raw_spin_lock_init(&n->list_lock); INIT_LIST_HEAD(&n->partial); #ifdef CONFIG_SLUB_DEBUG atomic_long_set(&n->nr_slabs, 0); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3741 @ static void list_slab_objects(struct kme const char *text) { #ifdef CONFIG_SLUB_DEBUG +#ifdef CONFIG_PREEMPT_RT + /* XXX move out of irq-off section */ + slab_err(s, page, text, s->name); +#else + void *addr = page_address(page); void *p; unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3765 @ static void list_slab_objects(struct kme slab_unlock(page); bitmap_free(map); #endif +#endif } + /* * Attempt to free all partial slabs on a node. * This is called from __kmem_cache_shutdown(). We must take list_lock @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3780 @ static void free_partial(struct kmem_cac struct page *page, *h; BUG_ON(irqs_disabled()); - spin_lock_irq(&n->list_lock); + raw_spin_lock_irq(&n->list_lock); list_for_each_entry_safe(page, h, &n->partial, slab_list) { if (!page->inuse) { remove_partial(n, page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3790 @ static void free_partial(struct kmem_cac "Objects remaining in %s on __kmem_cache_shutdown()"); } } - spin_unlock_irq(&n->list_lock); + raw_spin_unlock_irq(&n->list_lock); list_for_each_entry_safe(page, h, &discard, slab_list) discard_slab(s, page); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4062 @ int __kmem_cache_shrink(struct kmem_cach for (i = 0; i < SHRINK_PROMOTE_MAX; i++) INIT_LIST_HEAD(promote + i); - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); /* * Build lists of slabs to discard or promote. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4093 @ int __kmem_cache_shrink(struct kmem_cach for (i = SHRINK_PROMOTE_MAX - 1; i >= 0; i--) list_splice(promote + i, &n->partial); - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); /* Release empty slabs */ list_for_each_entry_safe(page, t, &discard, slab_list) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4300 @ void __init kmem_cache_init(void) { static __initdata struct kmem_cache boot_kmem_cache, boot_kmem_cache_node; + int cpu; + + for_each_possible_cpu(cpu) { + raw_spin_lock_init(&per_cpu(slub_free_list, cpu).lock); + INIT_LIST_HEAD(&per_cpu(slub_free_list, cpu).list); + } if (debug_guardpage_minorder()) slub_max_order = 0; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4507 @ static int validate_slab_node(struct kme struct page *page; unsigned long flags; - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) { validate_slab_slab(s, page, map); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4529 @ static int validate_slab_node(struct kme s->name, count, atomic_long_read(&n->nr_slabs)); out: - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); return count; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4715 @ static int list_locations(struct kmem_ca if (!atomic_long_read(&n->nr_slabs)) continue; - spin_lock_irqsave(&n->list_lock, flags); + raw_spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) process_slab(&t, s, page, alloc, map); list_for_each_entry(page, &n->full, slab_list) process_slab(&t, s, page, alloc, map); - spin_unlock_irqrestore(&n->list_lock, flags); + raw_spin_unlock_irqrestore(&n->list_lock, flags); } for (i = 0; i < t.count; i++) { Index: linux-5.4.5-rt3/mm/swap.c =================================================================== --- linux-5.4.5-rt3.orig/mm/swap.c +++ linux-5.4.5-rt3/mm/swap.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:47 @ /* How many pages do we try to swap or page in/out together? */ int page_cluster; -static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); -static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); -static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); -static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); -static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); +#ifdef CONFIG_PREEMPT_RT +DEFINE_STATIC_KEY_TRUE(use_pvec_lock); +#else +DEFINE_STATIC_KEY_FALSE(use_pvec_lock); +#endif + +struct swap_pagevec { + spinlock_t lock; + struct pagevec pvec; +}; + +#define DEFINE_PER_CPU_PAGEVEC(lvar) \ + DEFINE_PER_CPU(struct swap_pagevec, lvar) = { \ + .lock = __SPIN_LOCK_UNLOCKED((lvar).lock) } + +static DEFINE_PER_CPU_PAGEVEC(lru_add_pvec); +static DEFINE_PER_CPU_PAGEVEC(lru_rotate_pvecs); +static DEFINE_PER_CPU_PAGEVEC(lru_deactivate_file_pvecs); +static DEFINE_PER_CPU_PAGEVEC(lru_deactivate_pvecs); +static DEFINE_PER_CPU_PAGEVEC(lru_lazyfree_pvecs); #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); +static DEFINE_PER_CPU_PAGEVEC(activate_page_pvecs); #endif +static inline +struct swap_pagevec *lock_swap_pvec(struct swap_pagevec __percpu *p) +{ + struct swap_pagevec *swpvec; + + if (static_branch_likely(&use_pvec_lock)) { + swpvec = raw_cpu_ptr(p); + + spin_lock(&swpvec->lock); + } else { + swpvec = &get_cpu_var(*p); + } + return swpvec; +} + +static inline struct swap_pagevec * +lock_swap_pvec_cpu(struct swap_pagevec __percpu *p, int cpu) +{ + struct swap_pagevec *swpvec = per_cpu_ptr(p, cpu); + + if (static_branch_likely(&use_pvec_lock)) + spin_lock(&swpvec->lock); + + return swpvec; +} + +static inline struct swap_pagevec * +lock_swap_pvec_irqsave(struct swap_pagevec __percpu *p, unsigned long *flags) +{ + struct swap_pagevec *swpvec; + + if (static_branch_likely(&use_pvec_lock)) { + swpvec = raw_cpu_ptr(p); + + spin_lock_irqsave(&swpvec->lock, (*flags)); + } else { + local_irq_save(*flags); + + swpvec = this_cpu_ptr(p); + } + return swpvec; +} + +static inline struct swap_pagevec * +lock_swap_pvec_cpu_irqsave(struct swap_pagevec __percpu *p, int cpu, + unsigned long *flags) +{ + struct swap_pagevec *swpvec = per_cpu_ptr(p, cpu); + + if (static_branch_likely(&use_pvec_lock)) + spin_lock_irqsave(&swpvec->lock, *flags); + else + local_irq_save(*flags); + + return swpvec; +} + +static inline void unlock_swap_pvec(struct swap_pagevec *swpvec, + struct swap_pagevec __percpu *p) +{ + if (static_branch_likely(&use_pvec_lock)) + spin_unlock(&swpvec->lock); + else + put_cpu_var(*p); + +} + +static inline void unlock_swap_pvec_cpu(struct swap_pagevec *swpvec) +{ + if (static_branch_likely(&use_pvec_lock)) + spin_unlock(&swpvec->lock); +} + +static inline void +unlock_swap_pvec_irqrestore(struct swap_pagevec *swpvec, unsigned long flags) +{ + if (static_branch_likely(&use_pvec_lock)) + spin_unlock_irqrestore(&swpvec->lock, flags); + else + local_irq_restore(flags); +} + /* * This path almost never happens for VM activity - pages are normally * freed via pagevecs. But it gets used by networking. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:350 @ void rotate_reclaimable_page(struct page { if (!PageLocked(page) && !PageDirty(page) && !PageUnevictable(page) && PageLRU(page)) { + struct swap_pagevec *swpvec; struct pagevec *pvec; unsigned long flags; get_page(page); - local_irq_save(flags); - pvec = this_cpu_ptr(&lru_rotate_pvecs); + + swpvec = lock_swap_pvec_irqsave(&lru_rotate_pvecs, &flags); + pvec = &swpvec->pvec; if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_move_tail(pvec); - local_irq_restore(flags); + unlock_swap_pvec_irqrestore(swpvec, flags); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:395 @ static void __activate_page(struct page #ifdef CONFIG_SMP static void activate_page_drain(int cpu) { - struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu); + struct swap_pagevec *swpvec = lock_swap_pvec_cpu(&activate_page_pvecs, cpu); + struct pagevec *pvec = &swpvec->pvec; if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, __activate_page, NULL); + unlock_swap_pvec_cpu(swpvec); } static bool need_activate_page_drain(int cpu) { - return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0; + return pagevec_count(per_cpu_ptr(&activate_page_pvecs.pvec, cpu)) != 0; } void activate_page(struct page *page) { page = compound_head(page); if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(activate_page_pvecs); + struct swap_pagevec *swpvec; + struct pagevec *pvec; get_page(page); + swpvec = lock_swap_pvec(&activate_page_pvecs); + pvec = &swpvec->pvec; if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, __activate_page, NULL); - put_cpu_var(activate_page_pvecs); + unlock_swap_pvec(swpvec, &activate_page_pvecs); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:442 @ void activate_page(struct page *page) static void __lru_cache_activate_page(struct page *page) { - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); + struct swap_pagevec *swpvec = lock_swap_pvec(&lru_add_pvec); + struct pagevec *pvec = &swpvec->pvec; int i; /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:465 @ static void __lru_cache_activate_page(st } } - put_cpu_var(lru_add_pvec); + unlock_swap_pvec(swpvec, &lru_add_pvec); } /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:507 @ EXPORT_SYMBOL(mark_page_accessed); static void __lru_cache_add(struct page *page) { - struct pagevec *pvec = &get_cpu_var(lru_add_pvec); + struct swap_pagevec *swpvec = lock_swap_pvec(&lru_add_pvec); + struct pagevec *pvec = &swpvec->pvec; get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) __pagevec_lru_add(pvec); - put_cpu_var(lru_add_pvec); + unlock_swap_pvec(swpvec, &lru_add_pvec); } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:697 @ static void lru_lazyfree_fn(struct page */ void lru_add_drain_cpu(int cpu) { - struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu); + struct swap_pagevec *swpvec = lock_swap_pvec_cpu(&lru_add_pvec, cpu); + struct pagevec *pvec = &swpvec->pvec; + unsigned long flags; if (pagevec_count(pvec)) __pagevec_lru_add(pvec); + unlock_swap_pvec_cpu(swpvec); - pvec = &per_cpu(lru_rotate_pvecs, cpu); + swpvec = lock_swap_pvec_cpu_irqsave(&lru_rotate_pvecs, cpu, &flags); + pvec = &swpvec->pvec; if (pagevec_count(pvec)) { - unsigned long flags; /* No harm done if a racing interrupt already did this */ - local_irq_save(flags); pagevec_move_tail(pvec); - local_irq_restore(flags); } + unlock_swap_pvec_irqrestore(swpvec, flags); - pvec = &per_cpu(lru_deactivate_file_pvecs, cpu); + swpvec = lock_swap_pvec_cpu(&lru_deactivate_file_pvecs, cpu); + pvec = &swpvec->pvec; if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + unlock_swap_pvec_cpu(swpvec); - pvec = &per_cpu(lru_deactivate_pvecs, cpu); + swpvec = lock_swap_pvec_cpu(&lru_deactivate_pvecs, cpu); + pvec = &swpvec->pvec; if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + unlock_swap_pvec_cpu(swpvec); - pvec = &per_cpu(lru_lazyfree_pvecs, cpu); + swpvec = lock_swap_pvec_cpu(&lru_lazyfree_pvecs, cpu); + pvec = &swpvec->pvec; if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); + unlock_swap_pvec_cpu(swpvec); activate_page_drain(cpu); } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:745 @ void lru_add_drain_cpu(int cpu) */ void deactivate_file_page(struct page *page) { + struct swap_pagevec *swpvec; + struct pagevec *pvec; + /* * In a workload with many unevictable page such as mprotect, * unevictable page deactivation for accelerating reclaim is pointless. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:756 @ void deactivate_file_page(struct page *p return; if (likely(get_page_unless_zero(page))) { - struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs); + swpvec = lock_swap_pvec(&lru_deactivate_file_pvecs); + pvec = &swpvec->pvec; if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); - put_cpu_var(lru_deactivate_file_pvecs); + unlock_swap_pvec(swpvec, &lru_deactivate_file_pvecs); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:776 @ void deactivate_file_page(struct page *p void deactivate_page(struct page *page) { if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + struct swap_pagevec *swpvec; + struct pagevec *pvec; + + swpvec = lock_swap_pvec(&lru_deactivate_pvecs); + pvec = &swpvec->pvec; get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); - put_cpu_var(lru_deactivate_pvecs); + unlock_swap_pvec(swpvec, &lru_deactivate_pvecs); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:798 @ void deactivate_page(struct page *page) */ void mark_page_lazyfree(struct page *page) { + struct swap_pagevec *swpvec; + struct pagevec *pvec; + if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && !PageSwapCache(page) && !PageUnevictable(page)) { - struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs); + swpvec = lock_swap_pvec(&lru_lazyfree_pvecs); + pvec = &swpvec->pvec; get_page(page); if (!pagevec_add(pvec, page) || PageCompound(page)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); - put_cpu_var(lru_lazyfree_pvecs); + unlock_swap_pvec(swpvec, &lru_lazyfree_pvecs); } } void lru_add_drain(void) { - lru_add_drain_cpu(get_cpu()); - put_cpu(); + if (static_branch_likely(&use_pvec_lock)) { + lru_add_drain_cpu(raw_smp_processor_id()); + } else { + lru_add_drain_cpu(get_cpu()); + put_cpu(); + } } #ifdef CONFIG_SMP @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:841 @ static void lru_add_drain_per_cpu(struct */ void lru_add_drain_all(void) { - static DEFINE_MUTEX(lock); - static struct cpumask has_work; - int cpu; + if (static_branch_likely(&use_pvec_lock)) { + int cpu; - /* - * Make sure nobody triggers this path before mm_percpu_wq is fully - * initialized. - */ - if (WARN_ON(!mm_percpu_wq)) - return; + for_each_online_cpu(cpu) { + if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) || + pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) || + need_activate_page_drain(cpu)) { + lru_add_drain_cpu(cpu); + } + } + } else { + static DEFINE_MUTEX(lock); + static struct cpumask has_work; + int cpu; - mutex_lock(&lock); - cpumask_clear(&has_work); + /* + * Make sure nobody triggers this path before mm_percpu_wq + * is fully initialized. + */ + if (WARN_ON(!mm_percpu_wq)) + return; - for_each_online_cpu(cpu) { - struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); + mutex_lock(&lock); + cpumask_clear(&has_work); - if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || - pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || - pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || - pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || - pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || - need_activate_page_drain(cpu)) { - INIT_WORK(work, lru_add_drain_per_cpu); - queue_work_on(cpu, mm_percpu_wq, work); - cpumask_set_cpu(cpu, &has_work); + for_each_online_cpu(cpu) { + struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); + + if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) || + pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs.pvec, cpu)) || + pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) || + need_activate_page_drain(cpu)) { + INIT_WORK(work, lru_add_drain_per_cpu); + queue_work_on(cpu, mm_percpu_wq, work); + cpumask_set_cpu(cpu, &has_work); + } } - } - for_each_cpu(cpu, &has_work) - flush_work(&per_cpu(lru_add_drain_work, cpu)); + for_each_cpu(cpu, &has_work) + flush_work(&per_cpu(lru_add_drain_work, cpu)); - mutex_unlock(&lock); + mutex_unlock(&lock); + } } #else void lru_add_drain_all(void) Index: linux-5.4.5-rt3/mm/vmalloc.c =================================================================== --- linux-5.4.5-rt3.orig/mm/vmalloc.c +++ linux-5.4.5-rt3/mm/vmalloc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1080 @ static struct vmap_area *alloc_vmap_area retry: /* - * Preload this CPU with one extra vmap_area object to ensure - * that we have it available when fit type of free area is - * NE_FIT_TYPE. + * Preload this CPU with one extra vmap_area object. It is used + * when fit type of free area is NE_FIT_TYPE. Please note, it + * does not guarantee that an allocation occurs on a CPU that + * is preloaded, instead we minimize the case when it is not. + * It can happen because of cpu migration, because there is a + * race until the below spinlock is taken. * * The preload is done in non-atomic context, thus it allows us * to use more permissive allocation masks to be more stable under - * low memory condition and high memory pressure. + * low memory condition and high memory pressure. In rare case, + * if not preloaded, GFP_NOWAIT is used. * - * Even if it fails we do not really care about that. Just proceed - * as it is. "overflow" path will refill the cache we allocate from. + * Set "pva" to NULL here, because of "retry" path. */ - preempt_disable(); - if (!__this_cpu_read(ne_fit_preload_node)) { - preempt_enable(); - pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); - preempt_disable(); + pva = NULL; - if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) { - if (pva) - kmem_cache_free(vmap_area_cachep, pva); - } - } + if (!this_cpu_read(ne_fit_preload_node)) + /* + * Even if it fails we do not really care about that. + * Just proceed as it is. If needed "overflow" path + * will refill the cache we allocate from. + */ + pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node); spin_lock(&vmap_area_lock); - preempt_enable(); + + if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) + kmem_cache_free(vmap_area_cachep, pva); /* * If an allocation fails, the "vend" address is @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1465 @ static void *new_vmap_block(unsigned int struct vmap_block *vb; struct vmap_area *va; unsigned long vb_idx; - int node, err; + int node, err, cpu; void *vaddr; node = numa_node_id(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1508 @ static void *new_vmap_block(unsigned int BUG_ON(err); radix_tree_preload_end(); - vbq = &get_cpu_var(vmap_block_queue); + cpu = get_cpu_light(); + vbq = this_cpu_ptr(&vmap_block_queue); spin_lock(&vbq->lock); list_add_tail_rcu(&vb->free_list, &vbq->free); spin_unlock(&vbq->lock); - put_cpu_var(vmap_block_queue); + put_cpu_light(); return vaddr; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1582 @ static void *vb_alloc(unsigned long size struct vmap_block *vb; void *vaddr = NULL; unsigned int order; + int cpu; BUG_ON(offset_in_page(size)); BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1597 @ static void *vb_alloc(unsigned long size order = get_order(size); rcu_read_lock(); - vbq = &get_cpu_var(vmap_block_queue); + cpu = get_cpu_light(); + vbq = this_cpu_ptr(&vmap_block_queue); list_for_each_entry_rcu(vb, &vbq->free, free_list) { unsigned long pages_off; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1621 @ static void *vb_alloc(unsigned long size break; } - put_cpu_var(vmap_block_queue); + put_cpu_light(); rcu_read_unlock(); /* Allocate new block if nothing was found */ Index: linux-5.4.5-rt3/mm/vmstat.c =================================================================== --- linux-5.4.5-rt3.orig/mm/vmstat.c +++ linux-5.4.5-rt3/mm/vmstat.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:324 @ void __mod_zone_page_state(struct zone * long x; long t; + preempt_disable_rt(); x = delta + __this_cpu_read(*p); t = __this_cpu_read(pcp->stat_threshold); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:334 @ void __mod_zone_page_state(struct zone * x = 0; } __this_cpu_write(*p, x); + preempt_enable_rt(); } EXPORT_SYMBOL(__mod_zone_page_state); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:346 @ void __mod_node_page_state(struct pglist long x; long t; + preempt_disable_rt(); x = delta + __this_cpu_read(*p); t = __this_cpu_read(pcp->stat_threshold); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:356 @ void __mod_node_page_state(struct pglist x = 0; } __this_cpu_write(*p, x); + preempt_enable_rt(); } EXPORT_SYMBOL(__mod_node_page_state); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:389 @ void __inc_zone_state(struct zone *zone, s8 __percpu *p = pcp->vm_stat_diff + item; s8 v, t; + preempt_disable_rt(); v = __this_cpu_inc_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:398 @ void __inc_zone_state(struct zone *zone, zone_page_state_add(v + overstep, zone, item); __this_cpu_write(*p, -overstep); } + preempt_enable_rt(); } void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:407 @ void __inc_node_state(struct pglist_data s8 __percpu *p = pcp->vm_node_stat_diff + item; s8 v, t; + preempt_disable_rt(); v = __this_cpu_inc_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:416 @ void __inc_node_state(struct pglist_data node_page_state_add(v + overstep, pgdat, item); __this_cpu_write(*p, -overstep); } + preempt_enable_rt(); } void __inc_zone_page_state(struct page *page, enum zone_stat_item item) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:437 @ void __dec_zone_state(struct zone *zone, s8 __percpu *p = pcp->vm_stat_diff + item; s8 v, t; + preempt_disable_rt(); v = __this_cpu_dec_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:446 @ void __dec_zone_state(struct zone *zone, zone_page_state_add(v - overstep, zone, item); __this_cpu_write(*p, overstep); } + preempt_enable_rt(); } void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:455 @ void __dec_node_state(struct pglist_data s8 __percpu *p = pcp->vm_node_stat_diff + item; s8 v, t; + preempt_disable_rt(); v = __this_cpu_dec_return(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:464 @ void __dec_node_state(struct pglist_data node_page_state_add(v - overstep, pgdat, item); __this_cpu_write(*p, overstep); } + preempt_enable_rt(); } void __dec_zone_page_state(struct page *page, enum zone_stat_item item) Index: linux-5.4.5-rt3/mm/workingset.c =================================================================== --- linux-5.4.5-rt3.orig/mm/workingset.c +++ linux-5.4.5-rt3/mm/workingset.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:370 @ static struct list_lru shadow_nodes; void workingset_update_node(struct xa_node *node) { + struct address_space *mapping; + /* * Track non-empty nodes that contain only shadow entries; * unlink those that contain pages or are being freed. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:380 @ void workingset_update_node(struct xa_no * already where they should be. The list_empty() test is safe * as node->private_list is protected by the i_pages lock. */ - VM_WARN_ON_ONCE(!irqs_disabled()); /* For __inc_lruvec_page_state */ + mapping = container_of(node->array, struct address_space, i_pages); + lockdep_assert_held(&mapping->i_pages.xa_lock); if (node->count && node->count == node->nr_values) { if (list_empty(&node->private_list)) { Index: linux-5.4.5-rt3/mm/zsmalloc.c =================================================================== --- linux-5.4.5-rt3.orig/mm/zsmalloc.c +++ linux-5.4.5-rt3/mm/zsmalloc.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:60 @ #include <linux/wait.h> #include <linux/pagemap.h> #include <linux/fs.h> +#include <linux/locallock.h> #define ZSPAGE_MAGIC 0x58 @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:78 @ */ #define ZS_MAX_ZSPAGE_ORDER 2 #define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER) - #define ZS_HANDLE_SIZE (sizeof(unsigned long)) +#ifdef CONFIG_PREEMPT_RT + +struct zsmalloc_handle { + unsigned long addr; + struct mutex lock; +}; + +#define ZS_HANDLE_ALLOC_SIZE (sizeof(struct zsmalloc_handle)) + +#else + +#define ZS_HANDLE_ALLOC_SIZE (sizeof(unsigned long)) +#endif + /* * Object location (<PFN>, <obj_idx>) is encoded as * as single (unsigned long) handle value. @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:343 @ static void SetZsPageMovable(struct zs_p static int create_cache(struct zs_pool *pool) { - pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE, + pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_ALLOC_SIZE, 0, 0, NULL); if (!pool->handle_cachep) return 1; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:367 @ static void destroy_cache(struct zs_pool static unsigned long cache_alloc_handle(struct zs_pool *pool, gfp_t gfp) { - return (unsigned long)kmem_cache_alloc(pool->handle_cachep, - gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); + void *p; + + p = kmem_cache_alloc(pool->handle_cachep, + gfp & ~(__GFP_HIGHMEM|__GFP_MOVABLE)); +#ifdef CONFIG_PREEMPT_RT + if (p) { + struct zsmalloc_handle *zh = p; + + mutex_init(&zh->lock); + } +#endif + return (unsigned long)p; } +#ifdef CONFIG_PREEMPT_RT +static struct zsmalloc_handle *zs_get_pure_handle(unsigned long handle) +{ + return (void *)(handle &~((1 << OBJ_TAG_BITS) - 1)); +} +#endif + static void cache_free_handle(struct zs_pool *pool, unsigned long handle) { kmem_cache_free(pool->handle_cachep, (void *)handle); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:406 @ static void cache_free_zspage(struct zs_ static void record_obj(unsigned long handle, unsigned long obj) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + WRITE_ONCE(zh->addr, obj); +#else /* * lsb of @obj represents handle lock while other bits * represent object value the handle is pointing so * updating shouldn't do store tearing. */ WRITE_ONCE(*(unsigned long *)handle, obj); +#endif } /* zpool driver */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:500 @ MODULE_ALIAS("zpool-zsmalloc"); /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ static DEFINE_PER_CPU(struct mapping_area, zs_map_area); +static DEFINE_LOCAL_IRQ_LOCK(zs_map_area_lock); static bool is_zspage_isolated(struct zspage *zspage) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:910 @ static unsigned long location_to_obj(str static unsigned long handle_to_obj(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return zh->addr; +#else return *(unsigned long *)handle; +#endif } static unsigned long obj_to_head(struct page *page, void *obj) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:930 @ static unsigned long obj_to_head(struct static inline int testpin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return mutex_is_locked(&zh->lock); +#else return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static inline int trypin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return mutex_trylock(&zh->lock); +#else return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void pin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return mutex_lock(&zh->lock); +#else bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void unpin_tag(unsigned long handle) { +#ifdef CONFIG_PREEMPT_RT + struct zsmalloc_handle *zh = zs_get_pure_handle(handle); + + return mutex_unlock(&zh->lock); +#else bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle); +#endif } static void reset_page(struct page *page) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1395 @ void *zs_map_object(struct zs_pool *pool class = pool->size_class[class_idx]; off = (class->size * obj_idx) & ~PAGE_MASK; - area = &get_cpu_var(zs_map_area); + area = &get_locked_var(zs_map_area_lock, zs_map_area); area->vm_mm = mm; if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1449 @ void zs_unmap_object(struct zs_pool *poo __zs_unmap_object(area, pages, off, class->size); } - put_cpu_var(zs_map_area); + put_locked_var(zs_map_area_lock, zs_map_area); migrate_read_unlock(zspage); unpin_tag(handle); Index: linux-5.4.5-rt3/mm/zswap.c =================================================================== --- linux-5.4.5-rt3.orig/mm/zswap.c +++ linux-5.4.5-rt3/mm/zswap.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:21 @ #include <linux/highmem.h> #include <linux/slab.h> #include <linux/spinlock.h> +#include <linux/locallock.h> #include <linux/types.h> #include <linux/atomic.h> #include <linux/frontswap.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:984 @ static void zswap_fill_page(void *ptr, u memset_l(page, value, PAGE_SIZE / sizeof(unsigned long)); } +/* protect zswap_dstmem from concurrency */ +static DEFINE_LOCAL_IRQ_LOCK(zswap_dstmem_lock); /********************************* * frontswap hooks **********************************/ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1063 @ static int zswap_frontswap_store(unsigne } /* compress */ - dst = get_cpu_var(zswap_dstmem); - tfm = *get_cpu_ptr(entry->pool->tfm); + dst = get_locked_var(zswap_dstmem_lock, zswap_dstmem); + tfm = *this_cpu_ptr(entry->pool->tfm); src = kmap_atomic(page); ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen); kunmap_atomic(src); - put_cpu_ptr(entry->pool->tfm); if (ret) { ret = -EINVAL; goto put_dstmem; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1091 @ static int zswap_frontswap_store(unsigne memcpy(buf, &zhdr, hlen); memcpy(buf + hlen, dst, dlen); zpool_unmap_handle(entry->pool->zpool, handle); - put_cpu_var(zswap_dstmem); + put_locked_var(zswap_dstmem_lock, zswap_dstmem); /* populate entry */ entry->offset = offset; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1119 @ insert_entry: return 0; put_dstmem: - put_cpu_var(zswap_dstmem); + put_locked_var(zswap_dstmem_lock, zswap_dstmem); zswap_pool_put(entry->pool); freepage: zswap_entry_cache_free(entry); Index: linux-5.4.5-rt3/net/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/net/Kconfig +++ linux-5.4.5-rt3/net/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:281 @ config CGROUP_NET_CLASSID config NET_RX_BUSY_POLL bool - default y + default y if !PREEMPT_RT config BQL bool Index: linux-5.4.5-rt3/net/core/dev.c =================================================================== --- linux-5.4.5-rt3.orig/net/core/dev.c +++ linux-5.4.5-rt3/net/core/dev.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:199 @ static unsigned int napi_gen_id = NR_CPU static DEFINE_READ_MOSTLY_HASHTABLE(napi_hash, 8); static seqcount_t devnet_rename_seq; +static DEFINE_MUTEX(devnet_rename_mutex); static inline void dev_base_seq_inc(struct net *net) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:222 @ static inline struct hlist_head *dev_ind static inline void rps_lock(struct softnet_data *sd) { #ifdef CONFIG_RPS - spin_lock(&sd->input_pkt_queue.lock); + raw_spin_lock(&sd->input_pkt_queue.raw_lock); #endif } static inline void rps_unlock(struct softnet_data *sd) { #ifdef CONFIG_RPS - spin_unlock(&sd->input_pkt_queue.lock); + raw_spin_unlock(&sd->input_pkt_queue.raw_lock); #endif } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:824 @ EXPORT_SYMBOL(dev_get_by_napi_id); * * The use of raw_seqcount_begin() and cond_resched() before * retrying is required as we want to give the writers a chance - * to complete when CONFIG_PREEMPT is not set. + * to complete when CONFIG_PREEMPTION is not set. */ int netdev_get_name(struct net *net, char *name, int ifindex) { @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:843 @ retry: strcpy(name, dev->name); rcu_read_unlock(); if (read_seqcount_retry(&devnet_rename_seq, seq)) { - cond_resched(); + mutex_lock(&devnet_rename_mutex); + mutex_unlock(&devnet_rename_mutex); goto retry; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1121 @ int dev_change_name(struct net_device *d likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK))) return -EBUSY; - write_seqcount_begin(&devnet_rename_seq); + mutex_lock(&devnet_rename_mutex); + __raw_write_seqcount_begin(&devnet_rename_seq); - if (strncmp(newname, dev->name, IFNAMSIZ) == 0) { - write_seqcount_end(&devnet_rename_seq); - return 0; - } + if (strncmp(newname, dev->name, IFNAMSIZ) == 0) + goto outunlock; memcpy(oldname, dev->name, IFNAMSIZ); err = dev_get_valid_name(net, dev, newname); - if (err < 0) { - write_seqcount_end(&devnet_rename_seq); - return err; - } + if (err < 0) + goto outunlock; if (oldname[0] && !strchr(oldname, '%')) netdev_info(dev, "renamed from %s\n", oldname); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1144 @ rollback: if (ret) { memcpy(dev->name, oldname, IFNAMSIZ); dev->name_assign_type = old_assign_type; - write_seqcount_end(&devnet_rename_seq); - return ret; + err = ret; + goto outunlock; } - write_seqcount_end(&devnet_rename_seq); + __raw_write_seqcount_end(&devnet_rename_seq); + mutex_unlock(&devnet_rename_mutex); netdev_adjacent_rename_links(dev, oldname); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1170 @ rollback: /* err >= 0 after dev_alloc_name() or stores the first errno */ if (err >= 0) { err = ret; - write_seqcount_begin(&devnet_rename_seq); + mutex_lock(&devnet_rename_mutex); + __raw_write_seqcount_begin(&devnet_rename_seq); memcpy(dev->name, oldname, IFNAMSIZ); memcpy(oldname, newname, IFNAMSIZ); dev->name_assign_type = old_assign_type; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1184 @ rollback: } return err; + +outunlock: + __raw_write_seqcount_end(&devnet_rename_seq); + mutex_unlock(&devnet_rename_mutex); + return err; } /** @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2678 @ static void __netif_reschedule(struct Qd sd->output_queue_tailp = &q->next_sched; raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); + preempt_check_resched_rt(); } void __netif_schedule(struct Qdisc *q) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:2741 @ void __dev_kfree_skb_irq(struct sk_buff __this_cpu_write(softnet_data.completion_queue, skb); raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_restore(flags); + preempt_check_resched_rt(); } EXPORT_SYMBOL(__dev_kfree_skb_irq); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:3429 @ end_run: * This permits qdisc->running owner to get the lock more * often and dequeue packets faster. */ +#ifdef CONFIG_PREEMPT_RT + contended = true; +#else contended = qdisc_is_running(q); +#endif if (unlikely(contended)) spin_lock(&q->busylock); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4226 @ drop: rps_unlock(sd); local_irq_restore(flags); + preempt_check_resched_rt(); atomic_long_inc(&skb->dev->rx_dropped); kfree_skb(skb); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4441 @ static int netif_rx_internal(struct sk_b struct rps_dev_flow voidflow, *rflow = &voidflow; int cpu; - preempt_disable(); + migrate_disable(); rcu_read_lock(); cpu = get_rps_cpu(skb->dev, skb, &rflow); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4451 @ static int netif_rx_internal(struct sk_b ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail); rcu_read_unlock(); - preempt_enable(); + migrate_enable(); } else #endif { unsigned int qtail; - ret = enqueue_to_backlog(skb, get_cpu(), &qtail); - put_cpu(); + ret = enqueue_to_backlog(skb, get_cpu_light(), &qtail); + put_cpu_light(); } return ret; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:4497 @ int netif_rx_ni(struct sk_buff *skb) trace_netif_rx_ni_entry(skb); - preempt_disable(); + local_bh_disable(); err = netif_rx_internal(skb); - if (local_softirq_pending()) - do_softirq(); - preempt_enable(); + local_bh_enable(); trace_netif_rx_ni_exit(err); return err; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5251 @ static void flush_backlog(struct work_st skb_queue_walk_safe(&sd->input_pkt_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->input_pkt_queue); - kfree_skb(skb); + __skb_queue_tail(&sd->tofree_queue, skb); input_queue_head_incr(sd); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5261 @ static void flush_backlog(struct work_st skb_queue_walk_safe(&sd->process_queue, skb, tmp) { if (skb->dev->reg_state == NETREG_UNREGISTERING) { __skb_unlink(skb, &sd->process_queue); - kfree_skb(skb); + __skb_queue_tail(&sd->tofree_queue, skb); input_queue_head_incr(sd); } } + if (!skb_queue_empty(&sd->tofree_queue)) + raise_softirq_irqoff(NET_RX_SOFTIRQ); local_bh_enable(); + } static void flush_all_backlogs(void) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5852 @ static void net_rps_action_and_irq_enabl sd->rps_ipi_list = NULL; local_irq_enable(); + preempt_check_resched_rt(); /* Send pending IPI's to kick RPS processing on remote cpus. */ net_rps_send_ipi(remsd); } else #endif local_irq_enable(); + preempt_check_resched_rt(); } static bool sd_has_rps_ipi_waiting(struct softnet_data *sd) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5889 @ static int process_backlog(struct napi_s while (again) { struct sk_buff *skb; + local_irq_disable(); while ((skb = __skb_dequeue(&sd->process_queue))) { + local_irq_enable(); rcu_read_lock(); __netif_receive_skb(skb); rcu_read_unlock(); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5899 @ static int process_backlog(struct napi_s if (++work >= quota) return work; + local_irq_disable(); } - local_irq_disable(); rps_lock(sd); if (skb_queue_empty(&sd->input_pkt_queue)) { /* @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:5939 @ void __napi_schedule(struct napi_struct local_irq_save(flags); ____napi_schedule(this_cpu_ptr(&softnet_data), n); local_irq_restore(flags); + preempt_check_resched_rt(); } EXPORT_SYMBOL(__napi_schedule); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6383 @ static __latent_entropy void net_rx_acti unsigned long time_limit = jiffies + usecs_to_jiffies(netdev_budget_usecs); int budget = netdev_budget; + struct sk_buff_head tofree_q; + struct sk_buff *skb; LIST_HEAD(list); LIST_HEAD(repoll); + __skb_queue_head_init(&tofree_q); + local_irq_disable(); + skb_queue_splice_init(&sd->tofree_queue, &tofree_q); list_splice_init(&sd->poll_list, &list); local_irq_enable(); + while ((skb = __skb_dequeue(&tofree_q))) + kfree_skb(skb); + for (;;) { struct napi_struct *n; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:9899 @ static int dev_cpu_dead(unsigned int old raise_softirq_irqoff(NET_TX_SOFTIRQ); local_irq_enable(); + preempt_check_resched_rt(); #ifdef CONFIG_RPS remsd = oldsd->rps_ipi_list; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:9913 @ static int dev_cpu_dead(unsigned int old netif_rx_ni(skb); input_queue_head_incr(oldsd); } - while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) { + while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) { netif_rx_ni(skb); input_queue_head_incr(oldsd); } + while ((skb = __skb_dequeue(&oldsd->tofree_queue))) { + kfree_skb(skb); + } return 0; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:10230 @ static int __init net_dev_init(void) INIT_WORK(flush, flush_backlog); - skb_queue_head_init(&sd->input_pkt_queue); - skb_queue_head_init(&sd->process_queue); + skb_queue_head_init_raw(&sd->input_pkt_queue); + skb_queue_head_init_raw(&sd->process_queue); + skb_queue_head_init_raw(&sd->tofree_queue); #ifdef CONFIG_XFRM_OFFLOAD skb_queue_head_init(&sd->xfrm_backlog); #endif Index: linux-5.4.5-rt3/net/core/gen_estimator.c =================================================================== --- linux-5.4.5-rt3.orig/net/core/gen_estimator.c +++ linux-5.4.5-rt3/net/core/gen_estimator.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:45 @ struct net_rate_estimator { struct gnet_stats_basic_packed *bstats; spinlock_t *stats_lock; - seqcount_t *running; + net_seqlock_t *running; struct gnet_stats_basic_cpu __percpu *cpu_bstats; u8 ewma_log; u8 intvl_log; /* period : (250ms << intvl_log) */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:128 @ int gen_new_estimator(struct gnet_stats_ struct gnet_stats_basic_cpu __percpu *cpu_bstats, struct net_rate_estimator __rcu **rate_est, spinlock_t *lock, - seqcount_t *running, + net_seqlock_t *running, struct nlattr *opt) { struct gnet_estimator *parm = nla_data(opt); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:226 @ int gen_replace_estimator(struct gnet_st struct gnet_stats_basic_cpu __percpu *cpu_bstats, struct net_rate_estimator __rcu **rate_est, spinlock_t *lock, - seqcount_t *running, struct nlattr *opt) + net_seqlock_t *running, struct nlattr *opt) { return gen_new_estimator(bstats, cpu_bstats, rate_est, lock, running, opt); Index: linux-5.4.5-rt3/net/core/gen_stats.c =================================================================== --- linux-5.4.5-rt3.orig/net/core/gen_stats.c +++ linux-5.4.5-rt3/net/core/gen_stats.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:141 @ __gnet_stats_copy_basic_cpu(struct gnet_ } void -__gnet_stats_copy_basic(const seqcount_t *running, +__gnet_stats_copy_basic(net_seqlock_t *running, struct gnet_stats_basic_packed *bstats, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:154 @ __gnet_stats_copy_basic(const seqcount_t } do { if (running) - seq = read_seqcount_begin(running); + seq = net_seq_begin(running); bstats->bytes = b->bytes; bstats->packets = b->packets; - } while (running && read_seqcount_retry(running, seq)); + } while (running && net_seq_retry(running, seq)); } EXPORT_SYMBOL(__gnet_stats_copy_basic); static int -___gnet_stats_copy_basic(const seqcount_t *running, +___gnet_stats_copy_basic(net_seqlock_t *running, struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:203 @ ___gnet_stats_copy_basic(const seqcount_ * if the room in the socket buffer was not sufficient. */ int -gnet_stats_copy_basic(const seqcount_t *running, +gnet_stats_copy_basic(net_seqlock_t *running, struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:227 @ EXPORT_SYMBOL(gnet_stats_copy_basic); * if the room in the socket buffer was not sufficient. */ int -gnet_stats_copy_basic_hw(const seqcount_t *running, +gnet_stats_copy_basic_hw(net_seqlock_t *running, struct gnet_dump *d, struct gnet_stats_basic_cpu __percpu *cpu, struct gnet_stats_basic_packed *b) Index: linux-5.4.5-rt3/net/kcm/Kconfig =================================================================== --- linux-5.4.5-rt3.orig/net/kcm/Kconfig +++ linux-5.4.5-rt3/net/kcm/Kconfig @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:6 @ config AF_KCM tristate "KCM sockets" depends on INET + depends on !PREEMPT_RT select BPF_SYSCALL select STREAM_PARSER ---help--- Index: linux-5.4.5-rt3/net/packet/af_packet.c =================================================================== --- linux-5.4.5-rt3.orig/net/packet/af_packet.c +++ linux-5.4.5-rt3/net/packet/af_packet.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:60 @ #include <linux/if_packet.h> #include <linux/wireless.h> #include <linux/kernel.h> +#include <linux/delay.h> #include <linux/kmod.h> #include <linux/slab.h> #include <linux/vmalloc.h> @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:664 @ static void prb_retire_rx_blk_timer_expi if (BLOCK_NUM_PKTS(pbd)) { while (atomic_read(&pkc->blk_fill_in_prog)) { /* Waiting for skb_copy_bits to finish... */ - cpu_relax(); + cpu_chill(); } } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:926 @ static void prb_retire_current_block(str if (!(status & TP_STATUS_BLK_TMO)) { while (atomic_read(&pkc->blk_fill_in_prog)) { /* Waiting for skb_copy_bits to finish... */ - cpu_relax(); + cpu_chill(); } } prb_close_block(pkc, pbd, po, status); Index: linux-5.4.5-rt3/net/sched/sch_api.c =================================================================== --- linux-5.4.5-rt3.orig/net/sched/sch_api.c +++ linux-5.4.5-rt3/net/sched/sch_api.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1251 @ static struct Qdisc *qdisc_create(struct rcu_assign_pointer(sch->stab, stab); } if (tca[TCA_RATE]) { - seqcount_t *running; + net_seqlock_t *running; err = -EOPNOTSUPP; if (sch->flags & TCQ_F_MQROOT) { Index: linux-5.4.5-rt3/net/sched/sch_generic.c =================================================================== --- linux-5.4.5-rt3.orig/net/sched/sch_generic.c +++ linux-5.4.5-rt3/net/sched/sch_generic.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:560 @ struct Qdisc noop_qdisc = { .ops = &noop_qdisc_ops, .q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock), .dev_queue = &noop_netdev_queue, +#ifdef CONFIG_PREEMPT_RT + .running = __SEQLOCK_UNLOCKED(noop_qdisc.running), +#else .running = SEQCNT_ZERO(noop_qdisc.running), +#endif .busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock), .gso_skb = { .next = (struct sk_buff *)&noop_qdisc.gso_skb, @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:860 @ struct Qdisc *qdisc_alloc(struct netdev_ spin_lock_init(&sch->busylock); /* seqlock has the same scope of busylock, for NOLOCK qdisc */ spin_lock_init(&sch->seqlock); +#ifdef CONFIG_PREEMPT_RT + seqlock_init(&sch->running); +#else seqcount_init(&sch->running); +#endif sch->ops = ops; sch->flags = ops->static_flags; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:878 @ struct Qdisc *qdisc_alloc(struct netdev_ if (sch != &noop_qdisc) { lockdep_set_class(&sch->busylock, &dev->qdisc_tx_busylock_key); lockdep_set_class(&sch->seqlock, &dev->qdisc_tx_busylock_key); +#ifdef CONFIG_PREEMPT_RT + lockdep_set_class(&sch->running.seqcount, &dev->qdisc_running_key); + lockdep_set_class(&sch->running.lock, &dev->qdisc_running_key); +#else lockdep_set_class(&sch->running, &dev->qdisc_running_key); +#endif } return sch; @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:1231 @ void dev_deactivate_many(struct list_hea /* Wait for outstanding qdisc_run calls. */ list_for_each_entry(dev, head, close_list) { while (some_qdisc_is_busy(dev)) - yield(); + msleep(1); /* The new qdisc is assigned at this point so we can safely * unwind stale skb lists and qdisc statistics */ Index: linux-5.4.5-rt3/net/sunrpc/svc_xprt.c =================================================================== --- linux-5.4.5-rt3.orig/net/sunrpc/svc_xprt.c +++ linux-5.4.5-rt3/net/sunrpc/svc_xprt.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:414 @ void svc_xprt_do_enqueue(struct svc_xprt if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags)) return; - cpu = get_cpu(); + cpu = get_cpu_light(); pool = svc_pool_for_cpu(xprt->xpt_server, cpu); atomic_long_inc(&pool->sp_stats.packets); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:438 @ void svc_xprt_do_enqueue(struct svc_xprt rqstp = NULL; out_unlock: rcu_read_unlock(); - put_cpu(); + put_cpu_light(); trace_svc_xprt_do_enqueue(xprt, rqstp); } EXPORT_SYMBOL_GPL(svc_xprt_do_enqueue); Index: linux-5.4.5-rt3/security/apparmor/include/path.h =================================================================== --- linux-5.4.5-rt3.orig/security/apparmor/include/path.h +++ linux-5.4.5-rt3/security/apparmor/include/path.h @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:39 @ struct aa_buffers { #include <linux/percpu.h> #include <linux/preempt.h> +#include <linux/locallock.h> DECLARE_PER_CPU(struct aa_buffers, aa_buffers); +DECLARE_LOCAL_IRQ_LOCK(aa_buffers_lock); #define ASSIGN(FN, A, X, N) ((X) = FN(A, N)) #define EVAL1(FN, A, X) ASSIGN(FN, A, X, 0) /*X = FN(0)*/ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:52 @ DECLARE_PER_CPU(struct aa_buffers, aa_bu #define for_each_cpu_buffer(I) for ((I) = 0; (I) < MAX_PATH_BUFFERS; (I)++) -#ifdef CONFIG_DEBUG_PREEMPT +#ifdef CONFIG_PREEMPT_RT +static inline void AA_BUG_PREEMPT_ENABLED(const char *s) +{ + struct local_irq_lock *lv; + + lv = this_cpu_ptr(&aa_buffers_lock); + WARN_ONCE(lv->owner != current, + "__get_buffer without aa_buffers_lock\n"); +} + +#elif defined(CONFIG_DEBUG_PREEMPT) #define AA_BUG_PREEMPT_ENABLED(X) AA_BUG(preempt_count() <= 0, X) #else #define AA_BUG_PREEMPT_ENABLED(X) /* nop */ @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:78 @ DECLARE_PER_CPU(struct aa_buffers, aa_bu #define get_buffers(X...) \ do { \ - struct aa_buffers *__cpu_var = get_cpu_ptr(&aa_buffers); \ + struct aa_buffers *__cpu_var; \ + __cpu_var = get_locked_ptr(aa_buffers_lock, &aa_buffers); \ __get_buffers(__cpu_var, X); \ } while (0) #define put_buffers(X, Y...) \ do { \ __put_buffers(X, Y); \ - put_cpu_ptr(&aa_buffers); \ + put_locked_ptr(aa_buffers_lock, &aa_buffers); \ } while (0) #endif /* __AA_PATH_H */ Index: linux-5.4.5-rt3/security/apparmor/lsm.c =================================================================== --- linux-5.4.5-rt3.orig/security/apparmor/lsm.c +++ linux-5.4.5-rt3/security/apparmor/lsm.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:47 @ int apparmor_initialized; DEFINE_PER_CPU(struct aa_buffers, aa_buffers); - +DEFINE_LOCAL_IRQ_LOCK(aa_buffers_lock); /* * LSM hook functions Index: linux-5.4.5-rt3/virt/kvm/arm/arch_timer.c =================================================================== --- linux-5.4.5-rt3.orig/virt/kvm/arm/arch_timer.c +++ linux-5.4.5-rt3/virt/kvm/arm/arch_timer.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:83 @ static inline bool userspace_irqchip(str static void soft_timer_start(struct hrtimer *hrt, u64 ns) { hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns), - HRTIMER_MODE_ABS); + HRTIMER_MODE_ABS_HARD); } static void soft_timer_cancel(struct hrtimer *hrt) @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:700 @ void kvm_timer_vcpu_init(struct kvm_vcpu update_vtimer_cntvoff(vcpu, kvm_phys_timer_read()); ptimer->cntvoff = 0; - hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); timer->bg_timer.function = kvm_bg_timer_expire; - hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); + hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD); vtimer->hrtimer.function = kvm_hrtimer_expire; ptimer->hrtimer.function = kvm_hrtimer_expire; Index: linux-5.4.5-rt3/virt/kvm/arm/arm.c =================================================================== --- linux-5.4.5-rt3.orig/virt/kvm/arm/arm.c +++ linux-5.4.5-rt3/virt/kvm/arm/arm.c @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:703 @ int kvm_arch_vcpu_ioctl_run(struct kvm_v * involves poking the GIC, which must be done in a * non-preemptible context. */ - preempt_disable(); + migrate_disable(); kvm_pmu_flush_hwstate(vcpu); @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:752 @ int kvm_arch_vcpu_ioctl_run(struct kvm_v kvm_timer_sync_hwstate(vcpu); kvm_vgic_sync_hwstate(vcpu); local_irq_enable(); - preempt_enable(); + migrate_enable(); continue; } @ linux-5.4.5-rt3/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html:830 @ int kvm_arch_vcpu_ioctl_run(struct kvm_v /* Exit types that need handling before we can be preempted */ handle_exit_early(vcpu, run, ret); - preempt_enable(); + migrate_enable(); ret = handle_exit(vcpu, run, ret); }