There is no doubt that generative AI presents countless opportunities. However, poorly designed AI can do much more harm to your organization than good.
There are plenty of funny generative AI fails making the rounds online. When your generative AI application fails, it can disastrously affect your brand and customer experience (CX). Poor experiences with AI and a perceived overreliance on the technology are why 52% of Americans say they feel more concerned than excited about its increased use.
Using generative AI in a natural, seamless, and secure way allows your organization to benefit from increased productivity and efficiency while improving CX.
The main concern for businesses starting a generative AI development project is the risk of becoming one of the many public stories of the past few years. Here are a few of the funniest generative AI fails to avoid:
Earlier this year, Cognition AI released a demo of their new LLM-powered coding assistant, Devin, which they claimed had already completed real software engineering jobs on Upwork. However, these claims were quickly debunked and it appears the demo was faked. Many of the owners of the Upwork projects the company claimed were completed by Devin have since revealed that Devin did not meet the actual requirements of the development project.
Today we're excited to introduce Devin, the first AI software engineer.
— Cognition (@cognition_labs) March 12, 2024
Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.
Devin is… pic.twitter.com/ladBicxEat
AI image generators like Midjourney AI, DALL-E2, and Stable Diffusion can be used to quickly create incredible art. There are also entire accounts and subreddits dedicated to the nightmarish images they can create. Hands, feet, and animals are things these AI generators all struggle to depict correctly. As the technology is trained on more data it is improving, but we can still enjoy these telltale signs while they last.
Microsoft’s new aggregator, Microsoft Start, highlighted a story from The Guardian about a 21-year-old Australian woman who was found dead with head injuries. An AI-generated poll accompanied the story asking viewers to vote on how they think the woman died: murder, accident, or suicide. The Guardian accused Microsoft of damaging their brand by publishing the poll with their news story.
A New York federal judge sanctioned attorneys Peter LoDuca and Steven Schwartz for submitting legal briefs written by ChatGPT. The briefs included citations for non-existent cases and fake quotes. Their case ended up being dismissed and they were ordered to pay fines and notify the judges the AI-generated briefs had falsely cited.
In February, Canada's Civil Resolution Tribunal (CRT) ruled that Air Canada would have to honor a refund after a passenger was given incorrect information by their chatbot. The chatbot told the passenger they could apply for a bereavement discount within 90 days of traveling. In fact, the company’s policy required the discount to be applied for before traveling.
One of the most famous AI chatbot fails is Tay, a Twitter bot that Microsoft unveiled in 2016. Unfortunately, the AI wasn’t developed with any guardrails and merely parrotted back what it was fed by Twitter users. Within one day, it was sending out racist and hate-filled tweets and was quickly shut down.
The above cases are pretty easy to spot. However, generative AI isn’t always as obvious. Here are the best ways you can spot bad generative AI.
When AI fails, it can fail pretty spectacularly. Successfully implementing artificial intelligence requires understanding the limitations and potential pitfalls of the technology. Here are a few reasons an AI tool might fail or produce poor results.
In 2018, Amazon abandoned its AI recruitment tool when it showed discrimination toward female candidates. The AI was trained on historical data and inherited the human bias already present in the organization. At the time, 74% of Amazon’s managers were men. Since it was trained on this biased data, the AI tool learned to discriminate against female candidates.
When AI is trained on incorrect, poor-quality, or biased data, the resulting algorithm can amplify these issues. Detecting and mitigating bias and training models on sufficient, representative data are crucial to preventing this AI failure.
AI hallucinations occur when a generative AI output contains false or misleading information that is presented as a fact. The attorneys submitting briefs with fake citations and the chatbot giving false information about a bereavement discount are both examples of AI hallucinations. A Stanford RegLab study of using AI for legal queries found 69-88% of responses included hallucinations.
The main causes of AI hallucinations are insufficient training data, improperly encoded prompts, and overfitting. Overfitting occurs when the model is too closely fit to the training data and can’t accurately generalize to new data.
Overfitting is one example of brittleness that makes it easy for an AI project to fail. Brittle AI models may perform extremely well in training but can break with a minor tweak. For example, self-driving cars can read street signs in ideal conditions. But when researchers put stickers on a stop sign the AI misclassified the sign as a 45mph speed limit or a right turn sign.
In addition to overfitting, AI can be brittle due to a lack of contextual understanding, rigidity in algorithms, insufficient feature representation, and model complexity. To avoid brittleness, companies need to invest in production-level testing and validation in real-world scenarios.
Despite these AI fails, the technology can be extremely successful when applied correctly. How can your organization avoid generative AI fails and succeed at AI development?
If you have an idea for a generative AI development project, Gigster can take you from proof of concept to MVP. Our expert AI development teams will help select a secure solution and tailor-fit the correct language learning model to suit your needs. Share your proof of concept today.