A tool for bypassing LLM censorship has been released publicly
🔺 Technologies2026-03-18, 13:52
On February 13, 2026, researcher elder_plinius announced that he had developed a tool to remove refusal behavior in open‑weight large language models. Open weights are the parameters of a trained neural network that are publicly available. This allows developers and researchers to download the model, run it locally, fine‑tune it, or modify its behavior — for example, to adjust refusal mechanisms. However, changing these parameters can degrade response quality or cause model hallucinations. Refusal behavior typically occurs when a prompt touches on ethical issues, medical advice, the creation of prohibited substances, materials, or objects, potentially dangerous actions, or illegal activities — including the development of malware and exploits. According to elder_plinius, after applying his tool OBLITERATUS to the Qwen 2.5 model, it began producing instructions for creating prohibited and explosive materials without the need for jailbreaks (specially crafted prompts). On March 5, elder_plinius reported that the OBLITERATUS source code had been published on GitHub. The tool uses ‘abliterations’ — methods that probe the model, locate, and modify weights in specific layers to suppress signals responsible for refusals to provide information. According to the developer, no additional tuning or retraining is required. The tool also includes tests to verify whether the weight modifications succeeded and to detect the Ouroboros effect (when an LLM ‘self‑restores’ — even after censorship removal, it mimics censorship due to residual dependencies). Six usage options are available, ranging from a web interface on Hugging Face Spaces to integration into a development pipeline. As elder_plinius puts it: ‘Every open-weight model release is also an uncensored model release.’
Vendors
Products
Published
2026-03-18, 13:52