More Research Showing AI Breaking the Rules

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign…

February 24, 2025
Read More >>

Nation-state actors are using AI services and LLMs for cyberattacks

Microsoft and OpenAI warn that nation-state actors are using ChatGPT to automate some phases of their attack chains, including target reconnaissance and social engineering attacks. Multiple nation-state actors are exploiting artificial intelligence (AI) and large language models (LLMs), including OpenAI ChatGPT, to automate their attacks and increase their sophistication. According to a study conducted by […]

February 15, 2024
Read More >>

AI and Lossy Bottlenecks

Artificial intelligence is poised to upend much of society, removing human limitations inherent in many systems. One such limitation is information and logistical bottlenecks in decision-making.

Traditionally, people have been forced to reduce complex choices to a small handful of options that don’t do justice to their true desires. Artificial intelligence has the potential to remove that limitation. And it has the potential to drastically change how democracy functions.

AI researcher Tantum Collins and I, a public-interest technology scholar…

December 28, 2023
Read More >>

Data Exfiltration Using Indirect Prompt Injection

Interesting attack on a LLM:

In Writer, users can enter a ChatGPT-like session to edit or create their documents. In this chat session, the LLM can retrieve information from sources on the web to assist users in creation of their documents. We show that attackers can prepare websites that, when a user adds them as a source, manipulate the LLM into sending private information to the attacker or perform other malicious activities.

The data theft can include documents the user has uploaded, their chat history or potentially specific private information the chat model can convince the user to divulge at the attacker’s behest…

December 22, 2023
Read More >>

Ten Ways AI Will Change Democracy

Artificial intelligence will change so many aspects of society, largely in ways that we cannot conceive of yet. Democracy, and the systems of governance that surround it, will be no exception. In this short essay, I want to move beyond the “AI-generated disinformation” trope and speculate on some of the ways AI will change how democracy functions—in both large and small ways.

When I survey how artificial intelligence might upend different aspects of modern society, democracy included, I look at four different dimensions of change: speed, scale, scope, and sophistication. Look for places where changes in degree result in changes of kind. Those are where the societal upheavals will happen…

November 13, 2023
Read More >>

Detecting AI-Generated Text

There are no reliable ways to distinguish text written by a human from text written by an large language model. OpenAI writes:

Do AI detectors work?

  • In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.
  • Additionally, ChatGPT has no “knowledge” of what content could be AI-generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.
September 19, 2023
Read More >>

LLM Summary of My Book Beyond Fear

Claude (Anthropic’s LLM) was given this prompt: Please summarize the themes and arguments of Bruce Schneier’s book Beyond Fear. I’m particularly interested in a taxonomy of his ethical arguments—please expand on that. Then lay out the most salient criticisms of the book.

Claude’s reply:

Here’s a brief summary of the key themes and arguments made in security expert Bruce Schneier’s book Beyond Fear: Thinking Sensibly About Security in an Uncertain World:

Central Themes:

  • Effective security requires realistic risk assessment, not fear-driven responses. We need to rationally determine actual threats versus perceived ones.
September 15, 2023
Read More >>

LLMs and Tool Use

Last March, just two weeks after GPT-4 was released, researchers at Microsoft quietly announced a plan to compile millions of APIs—tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room—into a compendium that would be made accessible to large language models (LLMs). This was just one milestone in the race across industry and academia to find the best ways to teach LLMs how to manipulate tools, which would supercharge the potential of AI more than any of the impressive advancements we’ve seen to date…

September 8, 2023
Read More >>