How to Detect (and Avoid Creating) Bad Generative AI

There is no doubt that generative AI presents countless opportunities. However, poorly designed AI can do much more harm to your organization than good.

There are plenty of funny generative AI fails making the rounds online. When your generative AI application fails, it can disastrously affect your brand and customer experience (CX). Poor experiences with AI and a perceived overreliance on the technology are why 52% of Americans say they feel more concerned than excited about its increased use.

Using generative AI in a natural, seamless, and secure way allows your organization to benefit from increased productivity and efficiency while improving CX.

Generative AI Fails

The main concern for businesses starting a generative AI development project is the risk of becoming one of the many public stories of the past few years. Here are a few of the funniest generative AI fails to avoid:

Devin Claims to be the First AI Software Engineer

Earlier this year, Cognition AI released a demo of their new LLM-powered coding assistant, Devin, which they claimed had already completed real software engineering jobs on Upwork. However, these claims were quickly debunked and it appears the demo was faked. Many of the owners of the Upwork projects the company claimed were completed by Devin have since revealed that Devin did not meet the actual requirements of the development project.

Today we're excited to introduce Devin, the first AI software engineer.

Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is… pic.twitter.com/ladBicxEat
— Cognition (@cognition_labs) March 12, 2024

AI Image Generators Need a Hand

AI image generators like Midjourney AI, DALL-E2, and Stable Diffusion can be used to quickly create incredible art. There are also entire accounts and subreddits dedicated to the nightmarish images they can create. Hands, feet, and animals are things these AI generators all struggle to depict correctly. As the technology is trained on more data it is improving, but we can still enjoy these telltale signs while they last.

AI-Generated Poll Asks Readers to Vote How a Woman Died

Microsoft’s new aggregator, Microsoft Start, highlighted a story from The Guardian about a 21-year-old Australian woman who was found dead with head injuries. An AI-generated poll accompanied the story asking viewers to vote on how they think the woman died: murder, accident, or suicide. The Guardian accused Microsoft of damaging their brand by publishing the poll with their news story.

Attorney Uses AI-generated Briefs with Fake Citations

A New York federal judge sanctioned attorneys Peter LoDuca and Steven Schwartz for submitting legal briefs written by ChatGPT. The briefs included citations for non-existent cases and fake quotes. Their case ended up being dismissed and they were ordered to pay fines and notify the judges the AI-generated briefs had falsely cited.

Air Canada’s Chatbot Incorrectly Offers Customer a Discount

In February, Canada's Civil Resolution Tribunal (CRT) ruled that Air Canada would have to honor a refund after a passenger was given incorrect information by their chatbot. The chatbot told the passenger they could apply for a bereavement discount within 90 days of traveling. In fact, the company’s policy required the discount to be applied for before traveling.

Microsoft’s AI Chatbot Becomes Racist

One of the most famous AI chatbot fails is Tay, a Twitter bot that Microsoft unveiled in 2016. Unfortunately, the AI wasn’t developed with any guardrails and merely parrotted back what it was fed by Twitter users. Within one day, it was sending out racist and hate-filled tweets and was quickly shut down.

How to Detect Bad Generative AI

The above cases are pretty easy to spot. However, generative AI isn’t always as obvious. Here are the best ways you can spot bad generative AI.

Don’t Rely on AI Detectors: AI detectors like GPTZero and Originality.ai are used to detect when written content is created by ChatGPT or other large language models. However, these detectors are not very reliable. When testing content generated by ChatGPT-4, GPTZero classified 80% of test cases as “very unlikely AI-generated.”
LMSYS Chatbot Arena: This crowdsourced open platform evaluates nearly a hundred LLMs based on user ratings. This provides a good indication of which LLMs are the most reliable.
Look for Common Language Signals and Behaviors: Content generated by AI uses common words and phrases that can stand out compared to human-generated content. AI relies heavily on transition words like “furthermore” and “in today’s world.” Generative AI also overuses adjectives, adverbs, and technical jargon.

AI Development Services

See the benefits of Generative AI without becoming an AI fail. Learn more about Gigster’s AI development services.

HIRE GIGSTER

Why Does AI Fail?

When AI fails, it can fail pretty spectacularly. Successfully implementing artificial intelligence requires understanding the limitations and potential pitfalls of the technology. Here are a few reasons an AI tool might fail or produce poor results.

Bias

In 2018, Amazon abandoned its AI recruitment tool when it showed discrimination toward female candidates. The AI was trained on historical data and inherited the human bias already present in the organization. At the time, 74% of Amazon’s managers were men. Since it was trained on this biased data, the AI tool learned to discriminate against female candidates.

When AI is trained on incorrect, poor-quality, or biased data, the resulting algorithm can amplify these issues. Detecting and mitigating bias and training models on sufficient, representative data are crucial to preventing this AI failure.

AI Hallucinations

AI hallucinations occur when a generative AI output contains false or misleading information that is presented as a fact. The attorneys submitting briefs with fake citations and the chatbot giving false information about a bereavement discount are both examples of AI hallucinations. A Stanford RegLab study of using AI for legal queries found 69-88% of responses included hallucinations.

The main causes of AI hallucinations are insufficient training data, improperly encoded prompts, and overfitting. Overfitting occurs when the model is too closely fit to the training data and can’t accurately generalize to new data.

Brittle AI

Overfitting is one example of brittleness that makes it easy for an AI project to fail. Brittle AI models may perform extremely well in training but can break with a minor tweak. For example, self-driving cars can read street signs in ideal conditions. But when researchers put stickers on a stop sign the AI misclassified the sign as a 45mph speed limit or a right turn sign.

In addition to overfitting, AI can be brittle due to a lack of contextual understanding, rigidity in algorithms, insufficient feature representation, and model complexity. To avoid brittleness, companies need to invest in production-level testing and validation in real-world scenarios.

How to Leverage Generative AI Effectively

Despite these AI fails, the technology can be extremely successful when applied correctly. How can your organization avoid generative AI fails and succeed at AI development?

Understand AI’s Limitations: Generative AI can do some amazing things at this stage of its lifecycle, but it still has major limitations. Understanding these limitations and conducting product ideation with those limits in mind will help you approach viable applications of AI technology and not attempt an AI development project that isn’t practical.
Use High-quality Data: Every AI project needs to start with the right data. Make sure your training data is unbiased and sufficient to achieve the model you are looking for.
Create Guardrails: Including guardrails that prevent your AI model from generating certain things can help avoid incidents like the Tay chatbot. It also ensures your AI stays on brand. For example, if Coca-Cola was developing an AI project, they would include a guardrail to never recommend Pepsi.
Production-level Deployment and Monitoring: AI needs to be validated in real-world scenarios before deployment. Training and testing your model at production levels will help avoid brittleness and hallucinations and improve the overall quality of your AI tool.

If you have an idea for a generative AI development project, Gigster can take you from proof of concept to MVP. Our expert AI development teams will help select a secure solution and tailor-fit the correct language learning model to suit your needs. Share your proof of concept today.

Share This Post