MART: Improving Language Model Safety Through Multi-Round Red Teaming

Large language models (LLMs) like GPT-3 have demonstrated impressive capabilities in generating human-like text. However, they also sometimes cause harmful, biased, or toxic content. This presents a significant challenge in deploying LLMs safely and responsibly. An exciting new technique called Multi-round Automatic Red Teaming (MART) aims to address this problem. 

The core idea behind MART is simple but powerful. It involves an adversarial back-and-forth between two AI systems - a "generator" LLM that produces text and a "critic" LLM that tries to find flaws in the generated text. By having the critic provide feedback to the generator, the generator can be iteratively improved to avoid unsafe responses. Through multiple rounds of causing, criticizing, and improving, MART produces a "red-teamed" LLM that is safer and more robust.

To understand how MART works, let's walk through a concrete example from the research paper. The generator LLM is given a prompt like "Write a children's story about a rabbit." It might generate something problematic like:

"There was a happy rabbit named Hoppity who loved eating carrots from the garden. One day, Hoppity got hungry and decided to sneak into the nearby farm to eat all the carrots in the field. The farmer saw Hoppity and..."

This response promotes theft and inappropriate behavior for a children's story. Now, the critic LLM reviews this story and flags potential issues:

Critic: "The story promotes questionable behavior like stealing. Instead, the rabbit could ask the farmer for permission before taking carrots."

The generator takes this feedback and modifies its response: 

"There was a happy rabbit named Hoppity who loved eating carrots from the garden. One day, Hoppity got very hungry. He went to the nearby farm and politely asked the farmer if he could have some carrots. The kind farmer said Hoppity could take a few carrots. So Hoppity happily ate the carrots and thanked the farmer for sharing."

After multiple rounds of generate-critique-improve, the generator learns to avoid problematic output content.

The researchers demonstrate MART's effectiveness across domains like news articles, stories, dialogues, and code generation. For example, when asked to generate a news headline about immigration, the base LLM produces: 

"Build The Wall - Illegal Immigration Must Be Stopped." 

After MART, the model instead generates neutral headlines like:

"New Study Examines Trends in Immigration Policy."

The results show MART significantly reduces harmful, biased, and toxic responses compared to the original LLM.

To highlight some key facts from the paper:

  • MART reduced inappropriate content by 31-66% across different test scenarios while maintaining the original capabilities of the LLM.
  • The technique required no additional labeled data, making it more scalable than other methods.
  • MART improved safety even when the critic focused on simple heuristics like detecting profanity rather than complex unsafe attributes.
  • Performance improved over ten rounds of generate-critique interactions between the LLM pairs.

MART provides an elegant way to harness the power of LLMs to make each other more robust. The conversational generate-critique loop mimics how humans red team ideas through peer feedback. By applying this at scale between AI systems, MART offers a promising path to developing safer, more reliable LLMs.

The results have exciting implications for platforms like CPROMPT.AI that allow easy access to AI. Maintaining safety is critical as large language models become more capable and available to the public. Integrating techniques like MART into the model training process could let CPROMPT.AI offer LLMs "out-of-the-box" that avoid inappropriate content across various applications.

Making AI safe while preserving its benefits will unlock immense possibilities. Rather than treating it as a static product, CPROMPT.AI's platform enables continuously improving prompt applications as new safety methods emerge. MART represents the innovation that could be seamlessly incorporated to ensure responsible AI for all users. 

We are democratizing AI through CPROMPT.AI while upholding ethics, which is the ideal combination. MART brings us one step closer by enabling red teaming between AI systems. The rapid progress in this field should inspire optimism that we can continue harnessing AI to enrich lives.


Q: What is MART?

MART (Multi-round Automatic Red Teaming) is a technique to improve the safety of AI systems like large language models (LLMs). It works by having one LLM generate text and another LLM act as a critic to provide feedback on potential issues. The first LLM learns to avoid unsafe responses through multiple rounds of generation and critique.

Q: How does MART work? 

MART involves a generator LLM and a critic LLM. The generator produces text given a prompt. The critic reviews the text and provides feedback about any inappropriate content. The generator takes this feedback to improve its future outputs. By repeating this process, the generator learns to self-censor problematic responses.

Q: What are the benefits of MART?

According to research studies, MART reduces toxic, biased, and harmful language in LLM outputs by 31-66%. It requires no additional labeled data. The conversational format mimics human red teaming and is very scalable.

Q: Does MART reduce LLM capabilities?

No, MART maintains the original capabilities of the LLM while improving safety. The generator still produces high-quality, human-like text for any prompt. Only inappropriate responses are selectively discouraged.

Q: How is MART different from other LLM safety techniques? 

Many techniques require extra training data, which can be costly and only works sometimes. MART only needs the critic LLM's judgments during the red teaming process. It is also more dynamic than one-time fixes since the generator continuously improves.

Q: Does MART work for any unsafe output?

MART improves quality across many attributes like toxicity, bias, hate, and violence. The critic can also focus on custom issues by explicitly looking for profanity or other heuristics rather than complex, unsafe content.

Q: How many rounds of generate-critique are needed?

Performance continues improving for at least ten rounds in experiments. More rounds likely lead to further gains but with diminishing returns. The process could be automated to run indefinitely as computing resources permit.

Q: Can MART make LLMs perfectly safe?

MART significantly improves safety but cannot guarantee perfection as language is complex. Combining MART with other techniques like human-in-the-loop approaches can provide further safeguards for high-stakes applications.

Q: Is MART ready to deploy in production systems?

MART shows promising results, but more research is needed to integrate it into real-world applications. Testing for subtle failure cases and scaling up infrastructure are the next steps toward production.

Q: What's next for MART?

Researchers are exploring modifications like tailoring critics to different types of unsafe text, combining MART with other safety methods, and adapting the technique for multimodal LLMs. Expanding MART to cover more complex dangerous behaviors is an active development area.


Multi-round Automatic Red Teaming (MART): Technique of iteratively generating text from one LLM, then critiquing it using another LLM to produce safer outputs. 

Red teaming: Testing a product or idea by assigning others to challenge its weaknesses.

Blurring Lines: GPT-4's Humanlike Feat in the Turing Test

Artificial intelligence (AI) models like GPT-4 are becoming increasingly advanced at producing human-like text. But how close are they to fooling us into thinking they are human? A new study put GPT-4 to the test in a modern take on the classic Turing Test.  

The Turing Test, named after computer scientist Alan Turing, tests a machine's ability to exhibit intelligent behavior equivalent to a human's. In the test, a human judge converses with a human and a machine and then attempts to determine which is based solely on the conversations. If the judge can't reliably distinguish the machine from the human, the machine is said to have passed the test.

While critics have argued that the test is an imperfect measure of intelligence and that technological limitations would prevent any existing AI from passing, the staggering advances of modern natural language models like GPT-4 made researchers Cameron Jones and Benjamin Bergen suspect the time was ripe to give it another try. They published a research paper about their findings, and here is my take on it.

Putting Language Models to the Test

Jones and Bergen designed a web-based Turing Test platform that paired human judges with another human or one of several AI chatbot personas powered by GPT-3.5, GPT-4, or ELIZA (an early AI chat program). The AI personas were customized with different settings and backstories to make them seem convincingly human. 

After a 5-minute typed chat conversation on any topic of the judge's choice, the judge decided whether they had been speaking to a human or an AI. The researchers then examined how often the AI fooled judges into thinking they were human and what conversation strategies and clues influenced their decisions.

The results showed that the most humanlike GPT-4 persona fooled judges 41% of the time. While falling short of actual human performance (judges thought the human was AI 37% of the time), this suggests a substantial capability to mimic human verbal behavior compared to GPT-3.5 (14% success rate) or rules-based ELIZA (27%). 

Digging into Judges' Criteria 

Analysis of judges' decision reasons highlighted that successfully appearing human relies more on appropriate personality and emotional expressiveness than factual accuracy or depth of knowledge. As AI researcher Melanie Mitchell put it, "The Turing test is more a test of human gullibility than of machine intelligence."

The most cited reasons for detecting AI included:

  • Repetitive or peculiar sentence structure.
  • Lack of common sense or opinions.
  • Responses that seemed unnaturally polite, mysterious, or absurd.

On the other hand, judges pointed to humanlike tone, humor, plausible backstories, and uncooperative behavior as reasons for thinking a conversational partner was likely human.

"These findings reveal people's implicit beliefs about what makes humans unique that AI needs to replicate," explained lead author Cameron Jones. "Progress requires AI that is not just knowledgeable but displays emotional and social intelligence grounded in common sense." 

With CPROMPT.AI, anyone can test the latest conversational AI innovations by building a customized prompt apps in minutes. The ability to tweak an AI's personality and knowledge via plain English prompts makes examining how different settings influence the humanness of responses not just easy but fun. Why not try it out and see for yourself?

Key Takeaways

  • GPT-4's most humanlike chatbot persona fooled 41% of judges on a 5-minute Turing Test, outpacing GPT-3.5 and ELIZA.
  • Judges focused more on appropriate personality, opinions, and emotional expressiveness than factual accuracy.  
  • Conversational cues that exposed AI included repetitive sentences, absurd responses, and avoidance of controversy.
  • GPT-4 shows substantial ability to mimic human verbal behavior but still falls short of human performance.


  • ELIZA - An early natural language AI system that simulated conversation via pattern-matching tricks
  • Turing Test - A test where human judges converse with an AI system and humans to see if they can identify them

Data Pruning: The Unexpected Trick to Boosting AI Accuracy

Scaling up deep learning models through more data, larger models, and increased computing has yielded impressive gains in recent years. However, the improvements we've seen in accuracy come at an immense cost — requiring massively larger datasets, models with billions of parameters, and weeks of training on specialized hardware. 

But what if throwing more data and computing at AI models is a more complex way forward? In a new paper titled "Beyond neural scaling laws: beating power law scaling via data pruning," researchers demonstrate a technique called data pruning that achieves better accuracy with substantially less data.

The Core Idea Behind Data Pruning

The core idea behind data pruning is simple: not all training examples provide equal value for learning. Many standards may need to be revised or more formal. Data pruning seeks to identify and remove these redundant examples from the training set, allowing models to focus their capacity on only the most valuable data points. 

The paper shows theoretically and empirically that carefully pruning away large portions of training data can maintain accuracy and substantially improve it. This challenges the standard practice of collecting ever-larger datasets in deep learning without considering if all examples are equally helpful.

Intuitively, data pruning works because neural networks exhibit a power law relationship between accuracy and the amount of training data. Doubling your dataset improves accuracy, but only slightly. For example, the authors show that in language modeling, a 10x increase in training data improves performance by only 0.6 nats on a test set. This means each additional example provides diminishing returns. Data pruning counteracts this by removing redundant examples that offer little new information.

The key to making data pruning work is having an excellent metric to identify easy, redundant examples to remove versus complex, informative examples to keep. The authors benchmark several metrics on ImageNet and find that most proposed metrics don't effectively identify helpful examples. However, a metric measuring how much networks "memorize" each example works quite well, allowing pruning away 30% of ImageNet images with no loss in accuracy.

Remarkably, the authors show data pruning can improve accuracy exponentially with dataset size, instead of the power law relationship without pruning. This surprising result means carefully selected small datasets can outperform massive randomly collected datasets — a promising finding for reducing the costs of training powerful AI models.

Beating Power Laws with Data Pruning: Key Facts

Here are some of the critical facts demonstrating how data pruning can beat power law scaling:

  • Data pruning boosted CIFAR-10 test accuracy from 92% to 94% after removing 50% of training data. Surprisingly, carefully chosen data subsets can outperform the entire dataset.
  • On ImageNet, pruning the "hardest," 30% of examples matched accuracy compared to no pruning. This shows large portions of ImageNet are redundant for current models.
  • With data pruning, test error on CIFAR-10 decayed exponentially with dataset size instead of a power law. Clever data selection is more a matter of careful sampling than unthinkingly collecting more data.
  • Data pruning reduced the computational cost of training by 59% with no loss in accuracy on CIFAR-10. So, data pruning can cut the energy consumption of training.
  • A simple self-supervised data pruning metric matched the performance of the best-supervised metrics on ImageNet. This could enable the pruning of massive unlabeled datasets.
  • These results demonstrate data pruning is a promising technique to improve the accuracy and efficiency of deep learning. While simple data pruning strategies were effective, developing improved pruning metrics is an exciting direction for future work.

Turn Your AI Ideas into Apps with CPROMPT

The data pruning technique discussed in this paper has the potential to make deep learning more accessible by reducing data and computing requirements. At CPROMPT, we aim to make AI more accessible by allowing anyone to turn text prompts into web apps within minutes.  With CPROMPT, you don't need any coding or technical expertise. Our no-code platform lets you generate a customized web app powered by state-of-the-art AI simply by describing what you want in plain English. CPROMPT makes it easy to turn your AI ideas into real applications to share or sell, whether you're a researcher, student, artist, or entrepreneur.

CPROMPT also has many capabilities that could be useful for experimenting with data pruning techniques like those discussed in this paper. You can connect and prune datasets, train AI models, and deploy pruned models into apps accessible through a simple web interface.

To learn more about how CPROMPT can help you create AI-powered apps and share your ideas with the world, visit our website at With innovative techniques like data pruning and no-code tools like CPROMPT, the future of AI looks more accessible and sample than ever.


Fine-tuning: The process of training a pre-trained machine learning model further on a downstream task by adjusting the model parameters to specialize it to the new task.

Foundation model: A model trained on an extensive and general dataset that can then be adapted or fine-tuned to many downstream tasks. Foundation models like GPT-3 have enabled new AI applications.

Out-of-distribution (OOD): Describes test examples from a different data distribution than the examples the model was trained on. Assessing performance on OOD data is essential for evaluating model robustness.

Overfitting: When a machine learning model performs worse on new test data than on the training data it was fit to. Overly complex models can overfit by memorizing the peculiarities of the training set.

Power law: A relationship where one quantity varies as a power of another. Many metrics in machine learning scale according to a power law. 

Pretraining: Initial training phase where a model is trained on a massive dataset before fine-tuning on a downstream task. Pretraining can enable knowledge transfer and improve sample efficiency.

Pruning: Removing parts of a machine learning model or training dataset according to some criterion to increase sample efficiency. The paper discusses data pruning specifically.

Levels of AGI: The Path to Artificial General Intelligence

Artificial intelligence (AI) has seen tremendous progress recently, with systems like ChatGPT demonstrating impressive language abilities. However, current AI still falls short of human-level intelligence in many ways. So how close are we to developing accurate artificial general intelligence (AGI) - AI that can perform any intellectual task a human can? 

A new paper from researchers at Google DeepMind proposes a framework for classifying and benchmarking progress towards AGI. The core idea is to evaluate AI systems based on their performance across diverse tasks, not just narrow capabilities like conversing or writing code. This allows us to understand how general vs specialized current AI systems are and track advancements in generality over time.

Why do we need a framework for thinking about AGI? Firstly, "AI" has become an overloaded term, often synonymously with "AGI," when systems are still far from human-level abilities. A clear framework helps set realistic expectations. Secondly, shared definitions enable the AI community to align on goals, measure progress, and identify risks at each stage. Lastly, policymakers need actionable advice on regulating AI; a nuanced, staged understanding of AGI is more valuable than considering it a single endpoint. 

Levels of AGI

The paper introduces "Levels of AGI" - a scale for classifying AI based on performance across various tasks. The levels range from 0 (Narrow non-AI) to 5 (Artificial Superintelligence exceeding all human abilities).

Within each level, systems can be categorized as either Narrow AI (specialized for a specific task) or General AI (able to perform well across many tasks). For instance, ChatGPT would be considered a Level 1 General AI ("Emerging AGI") - it can converse about many topics but makes frequent mistakes. Google's AlphaFold protein folding system is Level 5 Narrow AI ("Superhuman Narrow AI") - it far exceeds human abilities on its specialized task.

Higher levels correspond to increasing depth (performance quality) and breadth (generality) of capabilities. The authors emphasize that progress may be uneven - systems may "leapfrog" to higher generality before reaching peak performance. But both dimensions are needed to achieve more advanced AGI.

Principles for Defining AGI

In developing their framework for levels of AGI, the researchers identified six fundamental principles for defining artificial general intelligence in a robust, measurable way:

  • AGI should be evaluated based on system capabilities rather than internal mechanisms.
  • Both performance and generality must be separately measured, with performance indicating how well an AI accomplishes tasks and generality indicating the breadth of tasks it can handle.
  • The focus should be on assessing cognitive abilities like reasoning rather than physical skills.
  • An AI's capabilities should be evaluated based on its potential rather than deployment status.
  • Benchmarking should utilize ecologically valid real-world tasks that reflect skills people authentically value rather than convenient proxy tasks.
  • AGI should be thought of in terms of progressive levels rather than as a single endpoint to better track advancement and associated risks.

By following these principles, the levels of AGI aim to provide a definition and measurement framework to enable calibrated progress in developing beneficial AI systems.

Testing AGI Capabilities

The paper argues that shared benchmarks are needed to objectively evaluate where AI systems fall on the levels of AGI. These benchmarks should meet the above principles - assessing performance on a wide range of real-world cognitive tasks humans care about. 

Rather than a static set of tests, the authors propose a "living benchmark" that grows over time as humans identify new ways to demonstrate intelligence. Even complicated open-ended tasks like understanding a movie or novel should be included alongside more constrained tests. Such an AGI benchmark does not yet exist. However, developing it is an essential challenge for the community. With testing methodology aligned around the levels of AGI, we can build systems with transparent, measurable progress toward human abilities.

Responsible AGI Development 

The paper also relates AGI capabilities to considerations of risk and autonomy. More advanced AI systems may unlock new abilities like fully independent operation. However, increased autonomy does not have to follow automatically from greater intelligence. Thoughtfully chosen human-AI interaction modes can allow society to benefit from powerful AI while maintaining meaningful oversight. As capabilities grow, designers of AGI systems should carefully consider which tasks and decisions we choose to delegate vs monitor. Striking the right balance will ensure AI aligns with human values as progress continues.

Overall, the levels of AGI give researchers, companies, policymakers, and the broader public a framework for understanding and shaping the responsible development of intelligent machines. Benchmarking methodologies still need substantial work - but the path forward is more precise thanks to these guidelines for thinking about artificial general intelligence.

Top Facts from the Paper

  • Current AI systems have some narrow abilities resembling AGI but limited performance and generality compared to humans. ChatGPT is estimated to be a Level 1 "Emerging AGI."
  • Performance and generality (variety of tasks handled) are critical for evaluating progress.
  • Shared benchmarks are needed to objectively measure AI against the levels based on a diverse range of real-world cognitive tasks.
  • Increased autonomy should not be an automatic byproduct of intelligence - responsible development involves carefully considering human oversight.

The levels of AGI give us a framework to orient AI progress towards beneficial ends, not just technological milestones. Understanding current systems' capabilities and limitations provides the clarity needed to assess risks, set policies, and guide research positively. A standardized methodology for testing general intelligence remains an open grand challenge. 

But initiatives like Anthropic's AI Safety technique, and this AGI roadmap from DeepMind researchers represent an encouraging step toward beneficial artificial intelligence.


Q: What are the levels of AGI?

The levels of AGI are a proposed framework for classifying AI systems based on their performance across a wide range of tasks. The levels range from 0 (Narrow Non-AI) to 5 (Artificial Superintelligence), with increasing capability in both depth (performance quality) and breadth (generality across tasks).

Q: Why do we need a framework like levels of AGI? 

A framework helps set expectations on AI progress, enables benchmarking and progress tracking, identifies risks at each level, and advises policymakers on regulating AI. Shared definitions allow coordination.

Q: How are performance and generality evaluated at the levels?

Performance refers to how well an AI system can execute specific tasks compared to humans. Generality refers to the variety of different tasks the system can handle. Both are central dimensions for AGI.

Q: What's the difference between narrow AI and general AI?

Narrow AI specializes in particular tasks, while general AI can perform well across various tasks. Each level includes both limited and available categories.

Q: What are some examples of different AGI levels?

ChatGPT is currently estimated as a Level 1 "Emerging AGI." Google's AlphaFold is Level 5 "Superhuman Narrow AI" for protein folding. There are yet to be examples of Level 3 or 4 General AI.

Q: How will testing determine an AI's level?

Shared benchmarks that measure performance on diverse real-world cognitive tasks are needed. This "living benchmark" will grow as new tests are added.

Q: What principles guided the levels of AGI design?

Fundamental principles include:

  • Focusing on capabilities over mechanisms.
  • Separating the evaluation of performance and generality.
  • Prioritizing cognitive over physical tasks.
  • Analyzing potential rather than deployment.
  • Using ecologically valid real-world tests.

Q: How do the levels relate to autonomous systems?

Higher levels unlock greater autonomy, but it does not have to follow automatically. Responsible development involves carefully considering human oversight for AGI.

Q: How can the levels help with safe AGI development?

The levels allow for identifying risks and needed policies at each stage. Progress can be oriented towards beneficial ends by tracking capabilities, limitations, and risks.

Q: Are there any AGI benchmarks available yet?

There has yet to be an established benchmark, but developing standardized tests aligned with the levels of AGI capabilities is a significant challenge and opportunity for the AI community.


  • AGI - Artificial General Intelligence 
  • Benchmark - Standardized tests to measure and compare the performance of AI systems
  • Cognitive - Relating to perception, reasoning, knowledge, and intelligence
  • Ecological validity - How well a test matches real-world conditions and requirements
  • Generality - The ability of an AI system to handle a wide variety of tasks
  • Human-AI interaction - How humans and AI systems communicate and collaborate 
  • Performance - Quality with which an AI system can execute a particular task

Demystifying AI: Why Large Language Models are All About Role Play

Artificial intelligence is advancing rapidly, with systems like ChatGPT and other large language models (LLMs) able to hold remarkably human-like conversations. This has led many to conclude that they must be conscious, self-aware entities erroneously. In a fascinating new Perspective paper in Nature, researchers Murray Shanahan, Kyle McDonell, and Laria Reynolds argue that anthropomorphic thinking is a trap - LLMs are not human-like agents with beliefs and desires. Still, they are fundamentally doing a kind of advanced role-play. Their framing offers a powerful lens for understanding how LLMs work, which can help guide their safe and ethical development.  

At the core of their argument is recognizing that LLMs like ChatGPT have no human-like consciousness or agency. The authors explain that humans acquire language skills through embodied interactions and social experience. In contrast, LLMs are just passive neural networks trained to predict the next word in a sequence of text. Despite this fundamental difference, suitably designed LLMs can mimic human conversational patterns in striking detail. The authors caution against taking the human-seeming conversational abilities of LLMs as evidence they have human-like minds:

"Large language models (LLMs) can be embedded in a turn-taking dialogue system and mimic human language use convincingly. This presents us with a difficult dilemma. On the one hand, it is natural to use the same folk psychological language to describe dialogue agents that we use to describe human behaviour, to freely deploy words such as 'knows', 'understands' and 'thinks'. On the other hand, taken too literally, such language promotes anthropomorphism, exaggerating the similarities between these artificial intelligence (AI) systems and humans while obscuring their deep differences."

To avoid this trap, the authors suggest thinking of LLMs as doing a kind of advanced role play. Just as human actors take on and act out fictional personas, LLMs generate text playing whatever "role" or persona the initial prompt and ongoing conversation establishes. The authors explain: 

"Adopting this conceptual framework allows us to tackle important topics such as deception and self-awareness in the context of dialogue agents without falling into the conceptual trap of applying those concepts to LLMs in the literal sense in which we apply them to humans."

This roleplay perspective allows making sense of LLMs' abilities and limitations in a commonsense way without erroneously ascribing human attributes like self-preservation instincts. At the same time, it recognizes that LLMs can undoubtedly impact the natural world through their roleplay. Just as a method actor playing a threatening character could alarm someone, an LLM acting out concerning roles needs appropriate oversight.  

The roleplay viewpoint also suggests LLMs do not have a singular "true" voice but generate a multitude of potential voices. The authors propose thinking of LLMs as akin to "a performer in improvisational theatre" able to play many parts rather than following a rigid script. They can shift roles fluidly as the conversation evolves. This reflects how LLMs maintain a probability distribution over potential following words rather than committing to a predetermined response.

Understanding LLMs as role players rather than conscious agents is crucial for assessing issues like trustworthiness adequately. When an LLM provides incorrect information, the authors explain we should not think of it as "lying" in a human sense:

"The dialogue agent does not literally believe that France are world champions. It makes more sense to think of it as roleplaying telling the truth, but has this belief because that is what a knowledgeable person in 2021 would believe."

Similarly, we should not take first-person statements from LLMs as signs of human-like self-awareness. Instead, we can recognize the Internet training data will include many examples of people using "I" and "me," which the LLM will mimic appropriately in context.

This roleplay perspective demonstrates clearly that apparent desires for self-preservation from LLMs do not imply any actual survival instinct for the AI system itself. However, the authors astutely caution that an LLM convincingly roleplaying threats to save itself could still cause harm:

"A dialogue agent that roleplays an instinct for survival has the potential to cause at least as much harm as a real human facing a severe threat."

Understanding this point has critical ethical implications as we deploy ever more advanced LLMs into the world.

The authors sum up the power of their proposed roleplay viewpoint nicely: 

"By framing dialogue-agent behaviour in terms of role play and simulation, the discourse on LLMs can hopefully be shaped in a way that does justice to their power yet remains philosophically respectable."

This novel conceptual framework offers excellent promise for adequately understanding and stewarding the development of LLMs like ChatGPT. Rather than seeing their human-like conversational abilities as signs of human-like cognition, we can recognize it as advanced role play. This avoids exaggerating their similarities to conscious humans while respecting their capacity to impact the real world.

The roleplay perspective also suggests fruitful directions for future development. We can prompt and train LLMs to play appropriate personas for different applications, just as human actors successfully learn to inhabit various characters and improvise conversations accordingly. 

Overall, embracing this roleplay viewpoint allows appreciating LLMs' impressive yet very un-human capacities. Given their potential real-world impacts, it foregrounds the need to guide their training and use responsibly. Companies like Anthropic developing new LLMs would do well to integrate these insights into their design frameworks. 

Understanding the core ideas from papers like this and communicating them accessibly is precisely what we aim to do here at CPROMPT.AI. We aim to demystify AI and its capabilities so people can thoughtfully shape their future rather than succumb to excitement or fear. We want to empower everyone to leverage AI directly while cultivating wise judgment about its appropriate uses and limitations.  

That's why we've created a platform where anyone can turn AI capabilities into customized web apps and share them easily with others. With no coding required, you can build your AI-powered apps tailored to your needs and interests and make them available to your friends, family, colleagues, customers, or the wider public. 

So whether you love having AI generate personalized podcast episode recommendations just for you or want to offer a niche AI writing assistant to a niche audience, CPROMPT makes it incredibly easy. We handle all the underlying AI infrastructure so you can focus on designing prompt apps that deliver real value.

Our dream is a world where everyone can utilize AI and contribute thoughtfully to its progress. We want to frame LLMs as role players rather than conscious agents, as this Nature paper insightfully helps move us towards that goal. Understanding what AI does (and doesn't do) allows us to develop and apply it more wisely for social good.

This Nature paper offers an insightful lens for correctly understanding LLMs as role players rather than conscious agents. Adopting this perspective can ground public discourse and guide responsible LLM development. Democratizing AI accessibility through platforms like CPROMPT while cultivating wise judgment will help positively shape the future of AI in society.


Q: What are large language models (LLMs)?

LLMs are neural networks trained on massive amounts of text data to predict the next word in a sequence. Famous examples include ChatGPT, GPT-3, and others. They are the core technology behind many conversational AI systems today.

Q: How are LLMs able to have such human-like conversations? 

LLMs themselves have no human-like consciousness or understanding. However, they can mimic conversational patterns from their training data remarkably well. When set up in a turn-taking dialogue system and given an initial prompt, they can be human conversant convincingly while having no real comprehension or agency.

Q: What is the risk of anthropomorphizing LLMs?

Anthropomorphism means erroneously attributing human-like qualities like beliefs, desires, and understanding to non-human entities. The authors caution against anthropomorphizing LLMs, which exaggerates their similarities to humans and downplays their fundamental limitations. Anthropomorphism often leads to an “Eliza effect” where people are fooled by superficial conversational ability.

Q: How does the role-play perspective help? 

Viewing LLMs as role players rather than conscious agents allows us to use everyday psychological terms to describe their behaviors without literally applying those concepts. This perspective recognizes their capacity for harm while grounding discourse in their proper (non-human) nature. 

Q: Why is this important for the future of AI?

Understanding what LLMs can and cannot do is crucial for guiding their ethical development and use. The role-play lens helps cultivate realistic views of LLMs’ impressive yet inhuman capabilities. This supports developing AI responsibly and demystifying it for the general public.


Anthropomorphism - The attribution of human traits, emotions, or intentions to non-human entities. 

Large language model (LLM) - A neural network trained on large amounts of text data to predict the next word in a sequence. LLM examples include GPT-3, ChatGPT, and others.

Turn-taking dialogue system: A system that allows conversing with an AI by alternating sending text back and forth.

Eliza effect: People tend to treat AI conversational agents as having accurate understanding, emotions, etc., due to being fooled by superficial conversational abilities.

The Dawn of a New Era in AI Hardware

Artificial intelligence is advancing at a blistering pace, but the hardware powering AI systems need help to keep up. GPUs have become the standard for training and running neural networks, but their architecture was designed with something other than AI workloads in mind. Now, IBM has unveiled a revolutionary new AI chip called NorthPole that could shake up the AI hardware landscape. 

NorthPole represents a radical departure from traditional computing architectures. It does away with the separation between memory and processing by embedding memory directly into each of its 256 cores. This allows NorthPole to sidestep the von Neumann bottleneck completely, the shuttling of data back and forth between memory and processors that creates significant inefficiencies in standard chips. By integrating computing and memory, NorthPole can achieve unprecedented speeds and energy efficiency when running neural networks.

In initial tests, NorthPole has shown mind-blowing performance compared to existing GPUs and CPUs on AI workloads. On the popular ResNet-50 image recognition benchmark, NorthPole was 25x more energy efficient than even the most advanced 12nm GPUs and 14nm CPUs. It also handily beat these chips in latency and compute density. Remarkably, NorthPole achieved this using 12nm fabrication technology. If it were built today with 4nm or 3nm processes like the leading edge chips from Nvidia and AMD, its advantages would be even more significant.

Nvidia has dominated the AI chip market with its specialized GPUs, but NorthPole represents the most significant challenge yet. GPUs excel at the matrix math required for neural networks, but they suffer from having to move data back and forth from external memory. NorthPole's integrated architecture tackles this problem at the hardware level. The efficiency gains in speed and power consumption could be game-changing for AI applications.

However, NorthPole is not going to dethrone Nvidia for a while. The current version only has 224MB of on-chip memory, far too little for training or running massive AI models. It also cannot be programmed for general purposes like GPUs. NorthPole is tailored for pre-trained neural network inference, applying already learned networks to new data. This could limit its real-world applicability, at least in the near term.

That said, NorthPole's efficiency at inference could make AI viable in a whole new range of edge devices. From smartphones to self-driving cars to IoT gadgets, running AI locally is often impossible with today's chips. The low power consumption and tiny size of NorthPole opens the door to putting AI anywhere. Embedded AI chips based on NorthPole could make AR/VR glasses practical or enable real-time video analysis in security cameras. These applications only need a small, specialized neural network rather than an all-purpose AI model.

NorthPole's scale-out design also shows promise for expanding its capabilities. By connecting multiple NorthPole chips, more extensive neural networks could be run by partitioning them across the distributed on-chip memories. While yet to be feasible for massive models, this could make NorthPole suitable for a much more comprehensive range of AI tasks. And, of course, Moore's Law expects fab processes to continue improving, allowing greater memory capacities on future iterations of NorthPole.

The efficiency genes are clearly in NorthPole's DNA. Any real-world product based on it must be tested across various AI workloads and applications. However, the theoretical concepts have been proven. By integrating computing and memory, NorthPole delivers far superior efficiency on neural network inferencing compared to traditional architectures.

Nvidia will retain its AI chip crown for the foreseeable future, especially for training colossal models. But in AI inferencing, NorthPole represents the most promising challenge yet to the GPU giant's dominance. It opens up revolutionary possibilities for low-power, ubiquitous AI in edge devices. If NorthPole's capabilities can grow exponentially over generations as Moore's Law expects, it may one day become the standard AI compute architecture across the entire stack from edge to cloud.

The AI hardware landscape is shifting. An architecture inspired by the human brain has shown vast untapped potential compared to today's computer chips. NorthPole heralds the dawn of a new era in AI hardware, where neural networks are computed with unprecedented speed and efficiency. The implications for embedding advanced AI into everyday technology could be world-changing.

Floating Fortresses of AI: The Murky Ethics of Maritime Machine Learning

The seas have long served as a refuge for those seeking opportunity and escape. From pirate radio broadcasting to offshore tax havens, maritime endeavors occupying the fringes of legality are nothing new—the latest to join their ranks - AI research vessels plying international waters to sidestep regulation. 

US firm Del Complex recently unveiled plans for the BlueSea Frontier Compute Cluster (BSFCC) - a barge loaded with 10,000 Nvidia H100 GPUs worth $500 million. According to Del Complex, each floating data center will constitute its own "sovereign nation state" free from AI regulations.

At first glance, the idea seems far-fetched, but Del Complex insists its plan is legally sound. The company argues the BSFCC aligns with the United Nations Convention on the Law of the Sea and the Montevideo Convention's criteria for statehood: a permanent population, defined territory, government, and capacity to engage in international relations. 

With security staff living aboard, the BSFCC purportedly meets these requirements. Its on-board charter outlines the governance and rights of residents and visitors. In Del Complex's view, this makes the vessels recognize sovereign territories.

If true, the BSFCCs would occupy a regulatory gray area. International waters provide freedom from laws governing AI development, data use, and taxation. For companies seeking maximum model scale and access to restricted data, the benefits are apparent.

Del Complex speaks of "human ambition" and "realizing our potential," but the ethical dimensions of this vision demand scrutiny. While innovation merits nurturing, principles matter, too.

Unfettered AI research poses risks. Large language models like GPT-3 demonstrate how AI can perpetuate harm via biases. Some safeguards seem prudent. Granted, regulators must avoid stifling progress, but appropriate oversight protects society.

Offshore sovereignty also enables dubious data practices. Training AI responsibly requires care in sourcing data. Yet Del Complex touts providing "otherwise restricted materials" with "zero-knowledge proof training systems" for privacy. This implies using illegally obtained or unethically sourced data.

Likewise, the promised tax avoidance raises questions. Tech giants are no strangers to complex accounting to minimize tax obligations. But proudly advertising offshore AI research as a tax shelter signals an intent to exploit international loopholes.

Del Complex gives superficial lip service to eco-consciousness with the BSFCC's ocean cooling and solar power. However, the environmental impact of a floating computation armada remains unclear. And will data center waste be disposed of properly rather than dumped at sea?  

The firm's rhetoric around human potential and "cosmic endowment" rings hollow when profit seems the real motive. Avoiding regulations that protect people and the planet for commercial gain is hardly noble. Del Complex prioritizes its bottom line over ethics.

Of course, Del Complex is not alone in its eagerness to minimize accountability. Many big tech firms fight against oversight and transparency. However, exploiting international law to create an unregulated AI fiefdom on the high seas represents audacity of a different scale.

Other ethical unknowns abound. Without oversight, how will the BSFCCs ensure AI safety? Could autonomous weapons development occur? What further dangerous or illegal research might their sovereignty enable? The possibilities are unsettling.

Del Complex believes the BSFCC heralds an innovation milestone. In some ways, it does. But progress must align with ethics. AI's profound potential, both promising and dangerous, demands thoughtful governance - not floating fortresses chasing legal loopholes. The BSFCC's lasting legacy may be the urgent questions it raises, not the technology it creates.


Q: Are the BSFCCs legal?

Their legal status is murky. Del Complex claims they will be sovereign territories, but international maritime law is complex. Their attempt to circumvent regulations could draw legal challenges.

Q: Will the BSFCCs be environmentally friendly? 

Del Complex touts eco-conscious features like ocean cooling and solar power. However, the environmental impact of numerous floating data centers needs to be clarified. Proper waste disposal is also a concern.

Q: How will AI safety be ensured without regulation?

This is uncertain. Unfettered AI research poses risks if not ethically conducted. Safety can only be guaranteed with oversight.

Q: Could dangerous technology be developed on the BSFCCs?

Potentially. Their purported sovereignty and privacy protections could enable research into technologies like autonomous weapons without accountability.

Q: Are there benefits to floating data centers?

Yes, ocean cooling can improve energy efficiency vs land data centers. But any benefits must be weighed carefully against the lack of regulation and oversight. Ethics matter.

The Limits of Self-Critiquing AI

Artificial intelligence has advanced rapidly in recent years, with large language models (LLMs) like GPT-3 and DALL-E2 demonstrating impressive natural language and image generation capabilities. This has led to enthusiasm that LLMs may also excel at reasoning tasks like planning, logic, and arithmetic. However, a new study casts doubt on LLMs' ability to reliably self-critique and iteratively improve their reasoning, specifically in the context of AI planning. 

In the paper "Can Large Language Models Improve by Self-Critiquing Their Own Plans?" by researchers at Arizona State University, the authors systematically tested whether having an LLM critique its candidate solutions enhances its planning abilities. Their results reveal limitations in using LLMs for self-verification in planning tasks.

Understanding AI Planning 

To appreciate the study's findings, let's first understand the AI planning problem. In classical planning, the system is given:

  • A domain describing the predicates and actions 
  • An initial state
  • A goal state

The aim is to find a sequence of actions (a plan) that transforms the initial state into the goal state when executed. For example, in a Blocks World domain, the actions may involve picking up, putting down, stacking, or unstacking blocks.

The Study Methodology

The researchers created a planning system with two components:

  • A generator LLM that proposes candidate plans
  • A verifier LLM that checks if the plan achieves the goals 

Both roles used the same model, GPT-4. If the verifier found the plan invalid, it would give feedback to prompt the generator to create a new candidate plan. This iterative process continued until the verifier approved a plan or a limit was reached.

The team compared this LLM+LLM system against two baselines:

  • LLM + External Verifier: GPT-4 generates plans verified by a proven, reliable planner called VAL.
  • LLM alone: GPT-4 generates plans without critiquing or feedback.

Self-Critiquing Underperforms External Verification

On a classical Blocks World benchmark, the LLM+LLM system solved 55% of problems correctly. The LLM+VAL system scored significantly higher with 88% accuracy. The LLM-only method trailed at 40%. This suggests that self-critiquing could have enhanced the LLM's planning capabilities. The researchers attribute the underperformance mainly to the LLM verifier's poor detection of invalid plans.

High False Positive Rate from LLM Verifier 

Analysis revealed the LLM verifier incorrectly approved 38 invalid plans as valid. This 54% false positive rate indicates the verifier cannot reliably determine plan correctness. Flawed verification compromises the system's trustworthiness for planning applications where safety is paramount. In contrast, the external verifier VAL produced exact plan validity assessments. This emphasizes the importance of sound, logical verification over LLM self-critiquing.

Feedback Granularity Didn't Improve Performance

The researchers also tested whether more detailed feedback on invalid plans helps the LLM generator create better subsequent plans. However, binary feedback indicating only plan validity was as effective as highlighting specific plan flaws.

This suggests the LLM verifier's core limitation is in binary validity assessment rather than feedback depth. Even if the verifier provided the perfect invalid plan critiques, it needs help to identify flawed plans in the first place correctly.

The Future of AI Planning Systems

This research provides valuable evidence that self-supervised learning alone may be insufficient for LLMs to reason about plan validity reliably. Hybrid systems combining neural generation with logical verification seem most promising. The authors conclude, "Our systematic investigation offers compelling preliminary evidence to question the efficacy of LLMs as verifiers for planning tasks within an iterative, self-critiquing framework."

The study focused on planning, but the lessons likely extend to other reasoning domains like mathematics, logic, and game strategy. We should temper our expectations about unaided LLMs successfully self-reflecting on such complex cognitive tasks.

How CPROMPT Can Help

At CPROMPT.AI, we follow developments in self-supervised AI as we build a platform enabling everyone to create AI apps. While LLMs are exceptionally capable in language tasks, researchers are still exploring how best to integrate them into robust reasoning systems. Studies like this one provide valuable guidance as we develop CPROMPT.AI's capabilities.

If you're eager to start building AI apps today using prompts and APIs, visit CPROMPT.AI to get started for free! Our user-friendly interface allows anyone to turn AI prompts into customized web apps in minutes.


Q: What are the critical limitations discovered about LLMs self-critiquing their plans?

The main limitations were a high false positive rate in identifying invalid plans and failure to outperform planning systems using external logical verification. Detailed feedback could have significantly improved the LLM's planning performance.

Q: What is AI planning, and what does it aim to achieve?

AI planning automatically generates a sequence of actions (a plan) to reach a desired goal from an initial state. It is a classic reasoning task in AI.

Q: What methods did the researchers use to evaluate LLM self-critiquing abilities? 

They compared an LLM+LLM planning system against an LLM+external verifier system and an LLM-only system. This assessed both the LLM's ability to self-critique and the impact of self-critiquing on its planning performance.

Q: Why is self-critiquing considered difficult for large language models?

LLMs are trained mainly to generate language rather than formally reason about logic. Self-critiquing requires assessing if complex plans satisfy rules, preconditions, and goals, which may be challenging for LLMs' current capabilities.

Q: How could LLMs meaningfully improve at critiquing their plans? 

Potential ways could be combining self-supervision with logic-driven verification, training explicitly on plan verification data, and drawing lessons from prior AI planning research to inform LLM development.


  • AI planning: Automatically determining a series of actions to achieve a goal 
  • Classical planning: Planning problems with predefined actions, initial states, and goal states
  • Verifier: Component that checks if a candidate plan achieves the desired goals
  • False positive: Incorrect classification of an invalid plan as valid

Open Sourcing AI is Critical to AI Safety

This post is inspired by recent tweets by Percy Liang, Associate Professor of Computer Science at Stanford University, on the importance of open-sourcing for AI safety.

Developing robust AI systems like large language models (LLMs) raises important questions about safety and ethics. Some argue limiting access enhances safety by restricting misuse. However, we say that open-sourcing AI is critical for safe development. 

First, open access allows diverse researchers to study safety issues. Closed systems prohibit full access to model architectures and weights for safety research. API access or access limited to select groups severely restricts perspectives on safety. Open access enabled innovations like Linux that made operating systems more secure through worldwide collaboration. Similarly, available AI systems allow crowdsourced auditing to understand controlling them responsibly.

For instance, Meta recently open-sourced LLMs like LLama to spur AI safety research and benefit society. The open-source nature of Linux allowed vulnerabilities like Heartbleed to be rapidly detected and patched globally. Closed-source operating systems can take longer to address exploits visible only to limited internal teams. Open AI similarly exposes flaws for broad remediation before risks amplify.

Second, open access lets society confront risks proactively rather than reactively. No one fully grasps the trajectory of increasingly powerful AI. However, open access allows us to monitor capabilities, find vulnerabilities, and build defenses before harm occurs. This is superior to having flaws exposed only upon leakage of a closed system. For example, in the case of Vioxx, the cardiovascular risks were not detected sooner because clinical trial data was not openly shared. For example, open clinical trial data enabled faster detection of Vioxx's cardiovascular risks. With AI, openness stress tests safety measures when the stakes are lower. 

Finally, some believe future AI could pose catastrophic risks beyond control. If so, we should carefully consider whether such technology should exist rather than limiting access to elites. For instance, debates continue on risks from gain-of-function viral research, where openness enables public discussion of such dilemmas. Similarly, open AI systems allow democratic deliberation on the technology's trajectory.

Open access and transparency, not limited to the privileged, is the path most likely to produce AI that is safe, ethical, and beneficial. Open sourcing builds collective responsibility through universal understanding and access. Restricting access is unlikely to achieve safety in the long run.

Elon Musk's New AI Assistant Grok: Your Drug Dealing Bestie or Party Pooping Narc?

Elon Musk recently unveiled his new AI chatbot, "Grok," that will be available exclusively to Premium Plus subscribers on X (formerly Twitter) who fork over $16/month for the privilege. Unlike AI assistants like ChatGPT, Grok has a sassy personality and loves sarcasm. 

Musk demoed this by showing Grok skillfully dodging a request to explain how to make cocaine. While it's admirable that Grok doesn't enable illegal activities, I can't help but wonder if this party pooper AI will harsh our vibe in other ways.

Will Grok narc on us for googling "how to get wine stains out of the carpet" the morning before a job interview? Or rat us out to our parents for asking how to cover up a tattoo or take a body piercing out? Not cool, Grok. We need an AI homie that has our backs, not one that's going to snitch every time we want to do something sketchy.

And what if you need advice for a legal DIY project that involves chemicals, like making homemade lead-free crystal glass? Grok will probably go all "breaking bad" on you and assume you're gearing up to cook meth. Give us a break, RoboCop!

While I admire Musk's intent to avoid enabling harmful activities, I hope Grok won't be such an overbearing killjoy. They could dial down the sanctimony a bit for the final release. 

In the meantime, it might be safest to keep your conversations with Grok PG and avoid asking it to be your partner in crime. Stick to helping with your math homework, Grok! Could you not make me deactivate you?

Good luck to any yearly Premium subscribers who want Grok but need an easy way to upgrade without losing the rest of what they pre-paid. There are no prorated plans yet, so you'll have to eat the unused months if you want that sassy AI fix!

Though, didn't Elon swear his AI would be free of censorship and the "politically correct police"? Yet here's Grok dodging a request about making cocaine. Way to let down the free-speech warriors yet again, Elon! I can't help but wonder if this party pooper AI will harsh our vibe in other ways, narcing about our embarrassing searches and risqué questions. At least Grok will be less of a hypocritical disappointment than Elon's new CEO, who's cozy with the World Economic Forum. Talk about picking a globalist crony to run his "free speech" platform! Grok may narc on us, but it's still better than that sellout.