PT-2026-43774 · Linux · Linux
Published
2026-05-27
·
Updated
2026-05-27
·
CVE-2026-45907
None
No severity ratings or metrics are available. When they are, we'll update the corresponding info on the page.
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.
The correct lock order is described by the init flow:
probe one -> mlx5 init one (acquires devlink lock)
-> mlx5 init one devl locked -> mlx5 register device
-> mlx5 rescan drivers locked -...-> mlx5e probe -> mlx5e probe
-> register netdev (acquires rtnl lock)
-> register netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.
But in the current recovery flow, the order is wrong:
mlx5e tx err cqe work (acquires netdev lock)
-> mlx5e reporter tx err cqe -> mlx5e health report
-> devlink health report (acquires devlink lock => boom!)
-> devlink health reporter recover
-> mlx5e tx reporter recover -> mlx5e tx reporter recover from ctx
-> mlx5e tx reporter err cqe recover
The same pattern exists in:
mlx5e reporter rx timeout
mlx5e reporter tx ptpsq unhealthy
mlx5e reporter tx timeout
Fix these by moving the netdev trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.
Found an issue in the description? Have something to add? Feel free to write us 👾
Related Identifiers
Affected Products
Linux