- 07 Mar, 2021 18 commits
-
-
Announce all clock event chips as they are registered to the out-of-band tick device infrastructure, so that we can interpose on key handlers in their descriptors.
-
Add the core API for enabling (regular) kernel event notifications to a co-kernel running over the head domain. For instance, such a co-kernel may need to know when a task is about to be resumed upon signal receipt, or when it gets an access fault trap. This commit adds the client-side API for enabling such notification for class of events, but does not provide the notification points per se, which comes later.
-
A raw output handler (.write_raw) is added to the console descriptor for writing (short) text output unmodified, without any logging, header or preparation whatsoever, usable from any pipeline domain. The dedicated raw_printk() variant formats the output message then passes it on to the handler holding a hard spinlock, irqs off. This is a very basic debug channel for situations when resorting to the fairly complex printk() handling is not an option. Unlike early consoles, regular consoles can provide a raw output service past the boot sequence. Raw output handlers are typically provided by serial console devices.
-
When dumping a stack backtrace, we neither need nor want to disable root stage IRQs over the head stage, where CPU migration can't happen. Conversely, we neither need nor want to disable hard IRQs from the head stage, so that latency won't skyrocket either.
-
Just like printk(), dev_printk() cannot run from the head domain and/or with hard IRQs disabled. In such a case, log the output directly into the staging buffer we use for printk(). NOTE: when redirected to the buffer, the output does not include the dev_printk() header text but is merely sent as-is to the log.
-
The printk() machinery cannot immediately invoke the console driver(s) when called from the head domain, since such driver code belongs to the root domain and cannot be shared between domains. Output issued from the head domain is formatted then logged into a staging buffer, and a dedicated virtual IRQ is posted to the root domain for notification. When the virtual IRQ handler runs, the contents of the staging buffer is flushed to the printk() interface anew, which may eventually pass the output on to the console drivers from such a context.
-
We must not allow out-of-band activity to resume while we are busy suspending the devices in the system, until the PM sleep state has been fully entered. Pair existing virtual IRQ disabling calls which only apply to the root domain with hard ones.
-
We might have out-of-band code calling try_module_get() from the head domain, or from a section covered by a hard spinlock where the root domain must not reschedule. This requires the preemption management calls in try_module_get() (and the converse module_put()) to be converted to their hard variant. REVISIT: using try_module_get() from such contexts is questionable, client domains should be fixed.
-
Make the KGDB stub runnable over the head domain since we may take traps and interrupts from that context too, by converting the locks to hard spinlocks.
-
Context tracking is a prerequisite for FULL_NOHZ, so that the RCU subsystem can detect CPU idleness without relying on the (regular) timer tick. Out-of-band activity running over the head domain should by definition not be involved in such detection logic, as the root domain has no knowledge of what happens - and when - on the head domain whatsoever.
-
There can be no CPU migration from the head stage, however the out-of-band code currently running smp_processor_id() might have preempted the regular kernel code from within a preemptible section, which might cause false positive in the end. These are the two reasons why we certainly neither need nor want to do the preemption check in that case.
-
Because of the virtualization of interrupt masking for the regular kernel code when the pipeline is enabled, atomic helpers relying on common interrupt disabling helpers such as local_irq_save/restore pairs would not be atomic anymore, leading to data corruption. This commit restores true atomicity for the atomic helpers that would be otherwise affected by interrupt virtualization.
-
Some inner code of the interrupt pipeline may have to traverse regular kernel code which manipulates the preemption count, expecting full serialization including with out-of-band contexts. The hard_preempt_*() variants are substituted to the original preempt_{enable, disable}() calls in these cases.
-
As described in Documentation/ipipe.rst, irq_chip drivers need to be specifically adapted for dealing with interrupt pipelining safely. The basic issue to address is proper serialization between some irq_chip handlers which may be called from out-of-band context immediately upon receipt of an IRQ, and the rest of the driver which may access the same data / IO registers from the regular - in-band - context on the same CPU. This commit converts the generic irq_chip lock to a hard spinlock, which ensures such serialization.
-
The latency tracer is a variant of ftrace's 'function' tracer providing detailed information about the current interrupt state at each function entry (i.e. virtual interrupt flag and CPU interrupt disable bit). This commit introduces the generic tracer code, which builds upon the regular ftrace API. The arch-specific code should provide for ipipe_read_tsc(), a helper routine returning a 64bit monotonic time value for timestamping purpose. HAVE_IPIPE_TRACER_SUPPORT should be selected by the arch-specific code for enabling the tracer, which in turn makes CONFIG_IPIPE_TRACE available from the Kconfig interface.
-
The out-of-band tick device manages the timer hardware by interposing on selected clockevent handlers transparently, so that a client domain (e.g. a co-kernel) eventually controls such hardware for scheduling the high-precision timer events it needs to. Those events are delivered to out-of-hand activities running on the head stage, unimpeded by (only virtually) interrupt-free sections of the regular kernel code. This commit introduces the generic API for controlling the out-of-band tick device from a co-kernel. It also provides for the internal API clock event chip drivers should use for enabling high-precision timing for their hardware. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com>
-
Hard spinlocks manipulate the CPU interrupt mask, without affecting the kernel preemption state in locking/unlocking operations. This type of spinlock is useful for implementing a critical section to serialize concurrent accesses from both in-band and out-of-band contexts, i.e. from root and head stages. Hard spinlocks exclusively depend on the pre-existing arch-specific bits which implement regular spinlocks. They can be seen as basic spinlocks still affecting the CPU's interrupt state when all other spinlock types only deal with the virtual interrupt flag managed by the pipeline core - i.e. only disable interrupts for the regular in-band kernel activity.
-
This commit provides the arch-independent bits for implementing the interrupt pipeline core, a lightweight layer introducing a separate, high-priority execution stage for handling all IRQs in pseudo-NMI mode, which cannot be delayed by the regular kernel code. See Documentation/ipipe.rst for details about interrupt pipelining. Architectures which support interrupt pipelining should select HAVE_IPIPE_SUPPORT, along with implementing the required arch-specific code. In such a case, CONFIG_IPIPE becomes available to the user via the Kconfig interface for enabling the feature.
-
- 04 Mar, 2021 22 commits
-
-
Greg Kroah-Hartman authored
Tested-by:
Hulk Robot <hulkci@huawei.com> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Jason Self <jason@bluehome.net> Link: https://lore.kernel.org/r/20210301161048.294656001@linuxfoundation.org Link: https://lore.kernel.org/r/20210301194420.658523615@linuxfoundation.org Link: https://lore.kernel.org/r/20210302122324.851128359@linuxfoundation.org Link: https://lore.kernel.org/r/20210302123219.029306163@linuxfoundation.org Link: https://lore.kernel.org/r/20210302192606.592235492@linuxfoundation.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John Wang authored
commit d050d049f8b8077025292c1ecf456c4ee7f96861 upstream. Signed-off-by:
John Wang <wangzhiqiang.bj@bytedance.com> Reviewed-by:
Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20201202051634.490-2-wangzhiqiang.bj@bytedance.com Signed-off-by:
Joel Stanley <joel@jms.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Takeshi Misawa authored
commit fc0494ead6398609c49afa37bc949b61c5c16b91 upstream. If qrtr_endpoint_register() failed, tun is leaked. Fix this, by freeing tun in error path. syzbot report: BUG: memory leak unreferenced object 0xffff88811848d680 (size 64): comm "syz-executor684", pid 10171, jiffies 4294951561 (age 26.070s) hex dump (first 32 bytes): 80 dd 0a 84 ff ff ff ff 00 00 00 00 00 00 00 00 ................ 90 d6 48 18 81 88 ff ff 90 d6 48 18 81 88 ff ff ..H.......H..... backtrace: [<0000000018992a50>] kmalloc include/linux/slab.h:552 [inline] [<0000000018992a50>] kzalloc include/linux/slab.h:682 [inline] [<0000000018992a50>] qrtr_tun_open+0x22/0x90 net/qrtr/tun.c:35 [<0000000003a453ef>] misc_open+0x19c/0x1e0 drivers/char/misc.c:141 [<00000000dec38ac8>] chrdev_open+0x10d/0x340 fs/char_dev.c:414 [<0000000079094996>] do_dentry_open+0x1e6/0x620 fs/open.c:817 [<000000004096d290>] do_open fs/namei.c:3252 [inline] [<000000004096d290>] path_openat+0x74a/0x1b00 fs/namei.c:3369 [<00000000b8e64241>] do_filp_open+0xa0/0x190 fs/namei.c:3396 [<00000000a3299422>] do_sys_openat2+0xed/0x230 fs/open.c:1172 [<000000002c1bdcef>] do_sys_open fs/open.c:1188 [inline] [<000000002c1bdcef>] __do_sys_openat fs/open.c:1204 [inline] [<000000002c1bdcef>] __se_sys_openat fs/open.c:1199 [inline] [<000000002c1bdcef>] __x64_sys_openat+0x7f/0xe0 fs/open.c:1199 [<00000000f3a5728f>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 [<000000004b38b7ec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 28fb4e59 ("net: qrtr: Expose tunneling endpoint to user space") Reported-by: syzbot+5d6e4af21385f5cfc56a@syzkaller.appspotmail.com Signed-off-by:
Takeshi Misawa <jeliantsurux@gmail.com> Link: https://lore.kernel.org/r/20210221234427.GA2140@DESKTOP Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit 2099b145d77c1d53f5711f029c37cc537897cee6 upstream. In case of a system crash, dm-era might fail to mark blocks as written in its metadata, although the corresponding writes to these blocks were passed down to the origin device and completed successfully. Consider the following sequence of events: 1. We write to a block that has not been yet written in the current era 2. era_map() checks the in-core bitmap for the current era and sees that the block is not marked as written. 3. The write is deferred for submission after the metadata have been updated and committed. 4. The worker thread processes the deferred write (process_deferred_bios()) and marks the block as written in the in-core bitmap, **before** committing the metadata. 5. The worker thread starts committing the metadata. 6. We do more writes that map to the same block as the write of step (1) 7. era_map() checks the in-core bitmap and sees that the block is marked as written, **although the metadata have not been committed yet**. 8. These writes are passed down to the origin device immediately and the device reports them as completed. 9. The system crashes, e.g., power failure, before the commit from step (5) finishes. When the system recovers and we query the dm-era target for the list of written blocks it doesn't report the aforementioned block as written, although the writes of step (6) completed successfully. The issue is that era_map() decides whether to defer or not a write based on non committed information. The root cause of the bug is that we update the in-core bitmap, **before** committing the metadata. Fix this by updating the in-core bitmap **after** successfully committing the metadata. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vlad Buslov authored
commit 396d7f23adf9e8c436dd81a69488b5b6a865acf8 upstream. When police action is created by cls API tcf_exts_validate() first conditional that calls tcf_action_init_1() directly, the action idr is not updated according to latest changes in action API that require caller to commit newly created action to idr with tcf_idr_insert_many(). This results such action not being accessible through act API and causes crash reported by syzbot: ================================================================== BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline] BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204 CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 ================================================================== Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G B 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 panic+0x306/0x73d kernel/panic.c:231 end_report+0x58/0x5e mm/kasan/report.c:100 __kasan_report mm/kasan/report.c:403 [inline] kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] __tcf_idr_release net/sched/act_api.c:178 [inline] tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598 tc_action_net_exit include/net/act_api.h:151 [inline] police_exit_net+0x168/0x360 net/sched/act_police.c:390 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Kernel Offset: disabled Fix the issue by calling tcf_idr_insert_many() after successful action initialization. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com Signed-off-by:
Vlad Buslov <vladbu@nvidia.com> Reviewed-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit ee576c47db60432c37e54b1e2b43a8ca6d3a8dca upstream. The icmp{,v6}_send functions make all sorts of use of skb->cb, casting it with IPCB or IP6CB, assuming the skb to have come directly from the inet layer. But when the packet comes from the ndo layer, especially when forwarded, there's no telling what might be in skb->cb at that point. As a result, the icmp sending code risks reading bogus memory contents, which can result in nasty stack overflows such as this one reported by a user: panic+0x108/0x2ea __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read from it. The optlen parameter there is of particular note, as it can induce writes beyond bounds. There are quite a few ways that can happen in __ip_options_echo. For example: // sptr/skb are attacker-controlled skb bytes sptr = skb_network_header(skb); // dptr/dopt points to stack memory allocated by __icmp_send dptr = dopt->__data; // sopt is the corrupt skb->cb in question if (sopt->rr) { optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data // this now writes potentially attacker-controlled data, over // flowing the stack: memcpy(dptr, sptr+sopt->rr, optlen); } In the icmpv6_send case, the story is similar, but not as dire, as only IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is worse than the iif case, but it is passed to ipv6_find_tlv, which does a bit of bounds checking on the value. This is easy to simulate by doing a `memset(skb->cb, 0x41, sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by good fortune and the rarity of icmp sending from that context that we've avoided reports like this until now. For example, in KASAN: BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 Write of size 38 at addr ffff888006f1f80e by task ping/89 CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 Call Trace: dump_stack+0x9a/0xcc print_address_description.constprop.0+0x1a/0x160 __kasan_report.cold+0x20/0x38 kasan_report+0x32/0x40 check_memory_region+0x145/0x1a0 memcpy+0x39/0x60 __ip_options_echo+0xa0e/0x12b0 __icmp_send+0x744/0x1700 Actually, out of the 4 drivers that do this, only gtp zeroed the cb for the v4 case, while the rest did not. So this commit actually removes the gtp-specific zeroing, while putting the code where it belongs in the shared infrastructure of icmp{,v6}_ndo_send. This commit fixes the issue by passing an empty IPCB or IP6CB along to the functions that actually do the work. For the icmp_send, this was already trivial, thanks to __icmp_send providing the plumbing function. For icmpv6_send, this required a tiny bit of refactoring to make it behave like the v4 case, after which it was straight forward. Fixes: a2b78e9b ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") Reported-by:
SinYu <liuxyon@gmail.com> Reviewed-by:
Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Leon Romanovsky authored
commit 1faba27f11c8da244e793546a1b35a9b1da8208e upstream. The W=1 compilation of allmodconfig generates the following warning: net/ipv6/icmp.c:448:6: warning: no previous prototype for 'icmp6_send' [-Wmissing-prototypes] 448 | void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, | ^~~~~~~~~~ Fix it by providing function declaration for builds with ipv6 as a module. Signed-off-by:
Leon Romanovsky <leonro@nvidia.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
commit cc7a21b6fbd945f8d8f61422ccd27203c1fafeb7 upstream. If IPv6 is builtin, we do not need an expensive indirect call to reach icmp6_send(). v2: put inline keyword before the type to avoid sparse warnings. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit 45942ba890e6f35232727a5fa33d732681f4eb9f upstream. Because xfrmi is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit 67c9a7e1e3ac491b5df018803639addc36f154ba upstream. Because sunvnet is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. While we're at it, doing the additional route lookup before calling icmp_ndo_send is superfluous, since this is the job of the icmp code in the first place. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Cc: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit e0fce6f945a26d4e953a147fe7ca11410322c9fe upstream. Because gtp is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Cc: Harald Welte <laforge@gnumonks.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit a8e41f6033a0c5633d55d6e35993c9e2005d872f upstream. The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by:
Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jason A. Donenfeld authored
commit 0b41713b606694257b90d61ba7e2712d8457648b upstream. This introduces a helper function to be called only by network drivers that wraps calls to icmp[v6]_send in a conntrack transformation, in case NAT has been used. We don't want to pollute the non-driver path, though, so we introduce this as a helper to be called by places that actually make use of this, as suggested by Florian. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ville Syrjälä authored
commit 7a6c6243b44a439bda4bf099032be35ebcf53406 upstream. The BXT/GLK DPLL can't generate certain frequencies. We already reject the 233-240MHz range on both. But on GLK the DPLL max frequency was bumped from 300MHz to 594MHz, so now we get to also worry about the 446-480MHz range (double the original problem range). Reject any frequency within the higher problematic range as well. Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3000 Signed-off-by:
Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210203093044.30532-1-ville.syrjala@linux.intel.com Reviewed-by:
Mika Kahola <mika.kahola@intel.com> (cherry picked from commit 41751b3e5c1ac656a86f8d45a8891115281b729e) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit cca2c6aebe86f68103a8615074b3578e854b5016 upstream. Metadata resize shouldn't happen in the ctr. The ctr loads a temporary (inactive) table that will only become active upon resume. That is why resize should always be done in terms of resume. Otherwise a load (ctr) whose inactive table never becomes active will incorrectly resize the metadata. Also, perform the resize directly in preresume, instead of using the worker to do it. The worker might run other metadata operations, e.g., it could start digestion, before resizing the metadata. These operations will end up using the old size. This could lead to errors, like: device-mapper: era: metadata_digest_transcribe_writeset: dm_array_set_value failed device-mapper: era: process_old_eras: digest step failed, stopping digestion The reason of the above error is that the worker started the digestion of the archived writeset using the old, larger size. As a result, metadata_digest_transcribe_writeset tried to write beyond the end of the era array. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit 2524933307fd0036d5c32357c693c021ab09a0b0 upstream. In case of devices with at most 64 blocks, the digestion of consecutive eras uses the writeset of the first era as the writeset of all eras to digest, leading to lost writes. That is, we lose the information about what blocks were written during the affected eras. The digestion code uses a dm_disk_bitset object to access the archived writesets. This structure includes a one word (64-bit) cache to reduce the number of array lookups. This structure is initialized only once, in metadata_digest_start(), when we kick off digestion. But, when we insert a new writeset into the writeset tree, before the digestion of the previous writeset is done, or equivalently when there are multiple writesets in the writeset tree to digest, then all these writesets are digested using the same cache and the cache is not re-initialized when moving from one writeset to the next. For devices with more than 64 blocks, i.e., the size of the cache, the cache is indirectly invalidated when we move to a next set of blocks, so we avoid the bug. But for devices with at most 64 blocks we end up using the same cached data for digesting all archived writesets, i.e., the cache is loaded when digesting the first writeset and it never gets reloaded, until the digestion is done. As a result, the writeset of the first era to digest is used as the writeset of all the following archived eras, leading to lost writes. Fix this by reinitializing the dm_disk_bitset structure, and thus invalidating the cache, every time the digestion code starts digesting a new writeset. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit 64f2d15afe7b336aafebdcd14cc835ecf856df4b upstream. Fix the writeset tree equality test function to use the right value size when comparing two btree values. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by:
Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit 904e6b266619c2da5c58b5dce14ae30629e39645 upstream. Deallocate the memory allocated for the in-core bitsets when destroying the target and in error paths. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by:
Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit c8e846ff93d5eaa5384f6f325a1687ac5921aade upstream. dm-era doesn't support changing the data block size of existing devices, so check explicitly that the requested block size for a new target matches the one stored in the metadata. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Reviewed-by:
Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nikos Tsironis authored
commit de89afc1e40fdfa5f8b666e5d07c43d21a1d3be0 upstream. Following a system crash, dm-era fails to recover the committed writeset for the current era, leading to lost writes. That is, we lose the information about what blocks were written during the affected era. dm-era assumes that the writeset of the current era is archived when the device is suspended. So, when resuming the device, it just moves on to the next era, ignoring the committed writeset. This assumption holds when the device is properly shut down. But, when the system crashes, the code that suspends the target never runs, so the writeset for the current era is not archived. There are three issues that cause the committed writeset to get lost: 1. dm-era doesn't load the committed writeset when opening the metadata 2. The code that resizes the metadata wipes the information about the committed writeset (assuming it was loaded at step 1) 3. era_preresume() starts a new era, without taking into account that the current era might not have been archived, due to a system crash. To fix this: 1. Load the committed writeset when opening the metadata 2. Fix the code that resizes the metadata to make sure it doesn't wipe the loaded writeset 3. Fix era_preresume() to check for a loaded writeset and archive it, before starting a new era. Fixes: eec40579 ("dm: add era target") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by:
Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit 4134455f2aafdfeab50cabb4cccb35e916034b93 upstream. Do not attempt to write any data beyond the end of the underlying data device while shrinking it. The DM writecache device must be suspended when the underlying data device is shrunk. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit a666e5c05e7c4aaabb2c5d58117b0946803d03d2 upstream. The system would deadlock when swapping to a dm-crypt device. The reason is that for each incoming write bio, dm-crypt allocates memory that holds encrypted data. These excessive allocations exhaust all the memory and the result is either deadlock or OOM trigger. This patch limits the number of in-flight swap bios, so that the memory consumed by dm-crypt is limited. The limit is enforced if the target set the "limit_swap_bios" variable and if the bio has REQ_SWAP set. Non-swap bios are not affected becuase taking the semaphore would cause performance degradation. This is similar to request-based drivers - they will also block when the number of requests is over the limit. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-