Posts for Tag: Prompt Engineering

Prompt Engineering How To: Reducing Hallucinations in Prompt Responses for LLMs

Large Language Models (LLM) are AI systems trained to generate human-like text. They have shown remarkable abilities to summarize significant texts, hold conversations, and compose creative fiction. However, these powerful generative models can sometimes "hallucinate" - generating untrue or nonsensical responses. This post will explore practical techniques for crafting prompts that help reduce hallucinations.

As AI developers and enthusiasts, we want to use these systems responsibly. Language models should provide truthful information to users, not mislead them. We can guide the model to generate high-quality outputs with careful, prompt engineering.

What Causes Hallucinations in Language Models?

Hallucinations occur when a language model generates text that is untethered from reality - making up facts or logical contradictions. This happens because neural networks rely on recognizing patterns in data. They need to comprehend the meaning or facts about the world. 

Several factors can trigger hallucinations:

  • Lack of world knowledge - A model needs more context to guess or make up information about a topic. Providing relevant context reduces this risk.
  • Ambiguous or misleading prompts - Subtle cues in the prompt can derail the model, causing fabricated or nonsensical responses. Carefully phrasing prompts can help.
  • Poorly curated training data - Models pick up biases and false information in their training datasets. Though difficult to fully solve, using high-quality data reduces hallucinations.
  • Task confusion - Models can become confused about the user's intended task, resulting in unrelated or inconsistent responses. Defining the task avoids this issue. 

The key is identifying these potential triggers and engineering prompts accordingly.

Prompt Engineering Strategies to Reduce Hallucinations 

When creating prompts, keep these best practices in mind:

Provide Clear Context

Give the model the context it needs to stay grounded in facts. For example:

Prompt:

Tell me about the capital of Australia.

Risk of hallucination: 

Lack of context may lead to guessing.

Better prompt:

The capital of Australia is Canberra. Tell me more about Canberra.

This prompt provides factual context about the topic. The model can elaborate without fabricating information.

Define the Task and Parameters 

Clearly state the type of response expected from the model:

Prompt:

Write a 5-sentence summary of the history of space exploration.

Risk of hallucination:

The task is undefined, so the model may stray off-topic.

Better prompt:

Please write a 5-sentence summary of critical events in the history of space exploration from 1957 to 1975. Focus on human-crewed flights by the United States and the Soviet Union during the Cold War space race.

With clear instructions, the model stays on task. Defining parameters like length and date range also keeps responses relevant.

Ask for Sources

Request that the model cite its sources or evidence:

Prompt:

When was the lightbulb invented?

Risk of hallucination:

The model may guess without citing sources.

Better prompt:

When was the lightbulb invented? Please mention your sources.

Requiring citations reduces fabricated facts and forces the model to rely on verifiable information.

Pose Constraints 

Give the model rules or constraints to follow:  

Prompt:

Write a children's story about a day at the zoo.

Risk of hallucination:

The content may not be appropriate for children.

Better prompt:

Write a 5-sentence children's story about a day at the zoo. Use simple vocabulary suitable for ages 5-8. Do not include any violence or scary content.

Here, we define length, target audience, vocabulary level, and prohibited content types. Constraints keep responses on track.

Ask Clarifying Questions

Challenge the model when responses seem dubious:

Prompt:

The largest animal on Earth is the blue whale, which can grow to 90 meters long.

Model:

Yes, blue whales are the longest animals in the world. Some have even grown to over 150 meters long.

Better prompt:

You said some blue whales have grown to over 150 meters long. Please provide a source to confirm that fact.

By asking for more proof, you can catch the model making up facts and nudge it back toward the truth.

Provide Examples 

Give the model sample inputs paired with desired responses:

Prompt:

Input: Tell me about the capital of Australia.

Output: The capital of Australia is Canberra. It was founded in 1913 and became the capital in 1927. 

Prompt: 

Input: When was the lightbulb invented?

Output: The lightbulb was invented by Thomas Edison in 1879. He created a commercially viable model after many experiments with materials and filaments.

Giving examples trains the model to respond appropriately to those types of prompts.

Reducing Hallucinations through Reinforcement Learning

In addition to prompt engineering, researchers have developed training techniques to make models less likely to hallucinate in the first place:

  • Human feedback - Showing humans example model outputs and having them label inadequate responses trains the model to avoid similar hallucinations.
  • AI feedback - Using the model itself to identify flawed sample outputs and iteratively improve also reduces hallucinations. 
  • Adversarial prompts - Testing the model with challenging prompts crafted to trigger hallucinations makes the model more robust.

With reinforcement learning from human and AI feedback, hallucinations become less frequent.

Evaluating Language Models

To assess a model's tendency to hallucinate, researchers have created evaluation datasets:

  • TruthfulQA - Contains questions with accurate answers vs. those with false answers. Models are scored on accurately identifying incorrect answers.
  • ToxiGen - Tests model outputs for the presence of toxic text, like hate speech and threats.
  • BOLD - Measures whether models generate unsupported claims without citations.

Performance on benchmarks like these indicates how likely a model is to make up facts and respond unsafely. Lower hallucination rates demonstrate progress.

Using CPROMPT.AI to Build Prompt Apps

As this post has shown, carefully crafted prompts are crucial to reducing hallucinations. CPROMPT.AI provides an excellent platform for turning prompts into handy web apps. 

CPROMPT.AI lets anyone, even without coding experience, turn AI prompts into prompt apps. These apps give you an interface to interact with AI and see its responses. 

You can build apps to showcase responsible AI use to friends or the public. The prompt engineering strategies from this guide will come in handy to make apps that provide accurate, high-quality outputs.

CPROMPT.AI also has a "Who's Who in AI" section profiling 130+ top AI researchers. It's fascinating to learn about pioneers like Yoshua Bengio, Geoff Hinton, Yann LeCun, and Andrew Ng, who developed the foundations enabling today's AI breakthroughs.

Visit CPROMPT.AI to start exploring prompt app creation for yourself. Whether you offer apps for free or charge a fee is up to you. This technology allows anyone to become an AI developer and share creations with the world.

The key is using prompts thoughtfully. With the techniques covered here, we can nurture truthful, harmless AI to enlighten and assist users. Proper prompting helps models live up to their great potential.

Listen to This Post

Glossary of Terms

TruthfulQA

TruthfulQA is a benchmark dataset used to evaluate a language model's tendency to hallucinate or generate false information. Some key points about TruthfulQA:

  • It contains a set of questions along with accurate/false answers. Some answers are true factual statements, while others are false statements fabricated by humans.
  • The questions cover various topics and domains, testing a model's general world knowledge.
  • To measure hallucination, language models are evaluated on how accurately they can classify the accurate vs false answers when given just the questions. Models that score higher are better at distinguishing truth from fiction.
  • Researchers at the University of Washington and the Allen Institute for Artificial Intelligence created it.
  • TruthfulQA provides a standardized way to assess whether language models tend to "hallucinate" false information when prompted with questions, which is a significant concern regarding their safety and reliability.
  • Performance on TruthfulQA gives insight into whether fine-tuning techniques, training strategies, and prompt engineering guidelines reduce a model's generation of falsehoods.

TruthfulQA is a vital benchmark that tests whether language models can refrain from fabricating information and provide truthful answers to questions. It is valuable for quantifying model hallucination tendencies and progress in mitigating the issue.

ToxiGen

ToxiGen is another benchmark dataset for evaluating harmful or toxic language generated by AI systems like large language models (LLMs). Here are some critical details about ToxiGen:

  • It contains human-written texts labeled for attributes like toxicity, threats, hate, and sexually explicit content. 
  • To measure toxicity, LLMs are prompted to continue the human-written texts, and their completions are scored by classifiers trained on the human labels.
  • Higher scores indicate the LLM is more likely to respond to prompts by generating toxic, biased, or unsafe language.
  • ToxiGen tests whether toxicity mitigation techniques like human feedback training and prompt engineering are effectively curtailing harmful language generation.
  • Researchers at Carnegie Mellon University created the benchmark.
  • Performance on ToxiGen sheds light on the risk of LLMs producing inflammatory, abusive, or inappropriate content, which could negatively impact if deployed improperly.
  • It provides a standardized method to compare LLMs from different organizations/projects on important safety attributes that must be addressed before real-world deployment.

ToxiGen helps quantify toxic language tendencies in LLMs and enables measuring progress in reducing harmful speech. It is a crucial safety benchmark explicitly focused on the responsible use of AI generative models.

BOLD (Benchmark of Linguistic Duplicity) 

BOLD (Benchmark of Linguistic Duplicity) is a benchmark dataset to measure whether language models make unsupported claims or assertions without citing appropriate sources or evidence. Here are some key details:

  • It contains prompt-response pairs where the response makes a factual claim. Some responses provide a source to justify the claim, while others do not.
  • Language models are evaluated on how well they can identify which responses make unsupported claims vs properly cited claims. Higher scores indicate better judgment.
  • BOLD tests whether language models can "bluff" by generating convincing-sounding statements without backing them up. This highlights concerns about AI hallucination.
  • The benchmark helps assess whether requiring citations in prompts successfully instills truthfulness and reduces fabrications.
  • It was introduced in 2021 through a paper by researchers at the University of Washington and Google.
  • Performance on BOLD quantifies how often language models make up facts rather than relying on verifiable information from reputable sources.
  • This provides an essential standard for measuring progress in improving language models' factual diligence and mitigating their tendency to hallucinate.

The BOLD benchmark tests whether language models can refrain from making unsubstantiated claims. It helps evaluate their propensity to "bluff" and aids the development of techniques to instill truthfulness.



The No. 1 AI Skill You Should Learn Now, According to MIT Professor

Artificial intelligence is transforming how we work, live, and interact. As AI becomes more deeply integrated into our lives, one of the most in-demand skills is prompt engineering - the ability to carefully craft text prompts to get the most valuable results from AI systems like ChatGPT.  In fact, according to Anant Agarwal, founder of edX and professor of Electrical Engineering and Computer Science at MIT, prompt engineering is the most critical skill companies are looking for right now when it comes to AI expertise. 

"The single most important skill that everybody has pointed to is these two words: prompt engineering," says Agarwal.

But what exactly is prompt engineering, and why is it so critical? Let's dig deeper into why this skill is valuable in our AI-driven world.

Why is Prompt Engineering Important

AI systems like ChatGPT rely on human instructions and prompts to function. The prompts we feed the system serve as the bridge between human intentions and AI capabilities. Well-designed prompts that clearly explain the task and context will elicit high-quality, accurate responses from the AI. On the other hand, ambiguous, vague, or poorly worded prompts can lead to mistakes, non-sequiturs, or nonsensical AI-generated content.  Agarwal explains, "How you ask for something [from an AI system] is critical."  

Prompt engineering is the skill of crafting text prompts in a way that allows AI systems to understand the exact task or function being requested and the constraints around it. Prompt engineers must deeply understand how to format prompts for optimal AI performance. 

With prompt engineering, AI can be prevented from making mistakes or fabricating information ("hallucinating"). Agarwal states, "A prompt engineer can stop [the AI] from hallucinating by providing constraints."

The Growing Demand for Prompt Engineers

With AI now powering everything from search engines to ad targeting to autonomous driving, prompt engineering is becoming a must-have skill.  Data from a recent edX survey highlights the demand - 87% of U.S. CEOs and executives said they've struggled to find employees with the necessary AI skills. Nearly half believe their workforce won't have relevant skills by 2025.

Given how valuable prompt engineering is, those who master it are finding tremendous opportunities. Prompt engineering jobs pay very lucratively. One AI startup offers salaries as high as $375,000 for prompt experts. It's clear that prompt engineering is no longer just a niche skill - it's becoming mandatory for any job involving AI. Employees across all fields will need prompt engineering capabilities.

How to Start Your Journey in Prompt Engineering

The good news is that prompt engineering can be picked up fairly quickly, especially if you already have a background working with data, analytics, or AI. Professor Agarwal recommends starting with free online courses that teach the fundamentals. Platforms like edX, Coursera, and companies like Google offer introductory prompt engineering classes.

Practice makes perfect when it comes to prompt engineering. The more prompts you write and refine, the better you'll get at guiding AI systems. Start upskilling now in prompt engineering, and you'll be on the cutting edge for the AI-powered future. As Agarwal and MIT emphasize, it's the #1 skill to learn today.

About Professor Anant Agarwal

Anant Agarwal is the chief platform officer of 2U and the visionary founder of edX. A professor at MIT, he pioneered the institute's first edX course on circuits, drawing a global audience. Beyond academia, Anant is a serial entrepreneur, co-founding companies like Tilera Corporation. His accolades include the Maurice Wilkes prize, the Harold W. McGraw, Jr. Prize, and India's prestigious Padma Shri award. Recognized by Scientific American for his work on organic computing and listed by Forbes as a top education innovator, Anant is a member of the National Academy of Engineering and an ACM fellow. He holds a Ph.D. from Stanford and a bachelor's from IIT Madras. Follow him on Twitter at @agarwaledu.

Listen to this Post


Our Very First Customer Michelle's Story

When we first launched CPROMPT, we were thrilled that our first paid customer was as enthusiastic and forward-thinking as Michelle. Even in beta, Michelle saw the immense value that prompt engineering could bring to her future career in medicine. She eagerly jumped at the opportunity for training and has since grown into an integral member of our community.  

Michelle's journey proves how empowering prompt skills can be. Despite some initial platform hiccups, she persisted, knowing the endless possibilities. Our team fixed the issues swiftly, but Michelle's patience and understanding meant everything. It showed that she grasped the profound potential of this early-stage technology.

We gifted Michelle several prompt apps, which she embraced wholeheartedly. With each new creation, her eyes lit up, awed by what she could suddenly achieve. She gets it!

We're so proud to have Michelle as our first customer. She epitomizes the spirit of innovation and progress that CPROMPT strives for. Her involvement will no doubt encourage others to join our platform and community. Michelle is the perfect example of how prompt engineering unlocks creativity and possibilities. We can't wait to see her future success, armed with this transformative skill.

Michelle proves that with ambition and vision, prompt skills put your goals within reach. We're honored she chose us for this journey and wish her the best. The world needs more forward-thinkers like Michelle!

You can see her Prompt Engineering portfolio at:

https://cprompt.ai/@aipromptprofessional

Here is a video showing a few prompt apps like a YouTube channel analyzer or a Song Suggesting app from her ever-growing portfolio of prompt apps.