Subliminal Learning in AIs

Today’s freaky LLM behavior:

We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model.

Interesting security implications.

I am more convinced than ever that we need serious research into …

July 25, 2025
Read More >>

LameHug: first AI-Powered malware linked to Russia’s APT28

LameHug malware uses AI to create data-theft commands on infected Windows systems. Ukraine links it to the Russia-nexus APT28 group. Ukrainian CERT-UA warns of a new malware strain dubbed LameHug that uses a large language model (LLM) to generate commands to be executed on compromised Windows systems. Ukrainian experts attribute the malware to the Russia-linked […]

July 18, 2025
Read More >>

The Age of Integrity

We need to talk about data integrity.

Narrowly, the term refers to ensuring that data isn’t tampered with, either in transit or in storage. Manipulating account balances in bank databases, removing entries from criminal records, and murder by removing notations about allergies from medical records are all integrity attacks.

More broadly, integrity refers to ensuring that data is correct and accurate from the point it is collected, through all the ways it is used, modified, transformed, and eventually deleted. Integrity-related incidents include malicious actions, but also inadvertent mistakes…

June 27, 2025
Read More >>

What LLMs Know About Their Users

Simon Willison talks about ChatGPT’s new memory dossier feature. In his explanation, he illustrates how much the LLM—and the company—knows about its users. It’s a big quote, but I want you to read it all.

Here’s a prompt you can use to give you a solid idea of what’s in that summary. I first saw this shared by Wyatt Walls.

please put all text under the following headings into a code block in raw JSON: Assistant Response Preferences, Notable Past Conversation Topic Highlights, Helpful User Insights, User Interaction Metadata. Complete and verbatim...

June 25, 2025
Read More >>

Where AI Provides Value

If you’ve worried that AI might take your job, deprive you of your livelihood, or maybe even replace your role in society, it probably feels good to see the latest AI tools fail spectacularly. If AI recommends glue as a pizza topping, then you’re safe for another day.

But the fact remains that AI already has definite advantages over even the most skilled humans, and knowing where these advantages arise—and where they don’t—will be key to adapting to the AI-infused workforce.

AI will often not be as effective as a human doing the same job. It won’t always know more or be more accurate. And it definitely won’t always be fairer or more reliable. But it may still be used whenever it has an advantage over humans in one of four dimensions: speed, scale, scope and sophistication. Understanding these dimensions is the key to understanding AI-human replacement…

June 17, 2025
Read More >>