Anthropic's 'Fable' AI Model Draws Criticism for Overly Restrictive Guardrails
Anthropic's newly released 'Fable' AI, a public version of its cybersecurity model Mythos, faces backlash from researchers. Its safety guardrails, intended to prevent misuse, are reportedly blocking even benign cybersecurity-related queries and code-writing tasks, hindering its potential utility and sparking debate about AI safety limitations.
Key points
- Anthropic launched 'Fable', a publicly accessible version of its specialized cybersecurity AI model 'Mythos', on Tuesday.
- Researchers report Fable's safety guardrails are overly restrictive, halting requests even for innocuous cybersecurity-related tasks.
- Fable's safety measures flag messages for potential cybersecurity or biology topics, pausing the chat.
- The restrictions aim to prevent the AI from being used for malware development or software compromise.
- Cybersecurity professionals have voiced concerns online, stating the guardrails hinder the AI's practical application.
- Anthropic had previously limited 'Mythos' access through Project Glasswing before expanding it to hundreds of organizations.
Anthropic's latest AI model, 'Fable,' released Tuesday, is encountering criticism from cybersecurity researchers shortly after its debut. Billed as a public iteration of the company's advanced cybersecurity model, 'Mythos,' Fable is reportedly hampered by strict safety guardrails. These restrictions, designed to prevent the AI from aiding in malicious activities like malware creation, are proving too broad in their application.
Several cybersecurity professionals have expressed frustration online, noting that Fable rejects requests that are only tangentially related to cybersecurity. Security researcher Valentina Palmiotti highlighted that even simple tasks, such as reading a blog post, can trigger the model's safety measures. When activated, Fable's guardrails pause the interaction, stating that the message was flagged for cybersecurity or biology topics. Similar concerns about potential misuse led to restrictions on biological applications, aiming to prevent the development of biological weapons.
Anthropic had initially offered its 'Mythos' model through a controlled access program called Project Glasswing, involving a select group of companies and organizations. Recently, access was broadened to hundreds of entities across 15 countries. However, despite these efforts and good intentions, many experts find the broad and seemingly arbitrary nature of Fable's restrictions to be a significant drawback, questioning its overall usefulness in the cybersecurity domain.
Sources
The WireByte editorial team synthesises technology news from multiple primary sources, verifies the facts, and links every source. Articles are produced with AI assistance and reviewed under our editorial policy.