Today, I gave ChatGPT 4o the following prompt:
“Link to the New York Times review of the movie "Seven" (1995).”
My objective was to quickly obtain the link to the article. Instead, ChatGPT 4o returned this:
"Seven," directed by David Fincher and released in 1995, is a crime thriller starring Brad Pitt and Morgan Freeman as detectives investigating a series of murders inspired by the seven deadly sins.
The New York Times reviewed "Seven" on September 22, 1995.
In the review, critic Janet Maslin described the film as "dark, grisly, horrifying and intelligent enough to make you wonder why you enjoyed it so much."
She praised the film's atmosphere, performances, and its unflinching portrayal of evil.
The review is available on the New York Times website.
OK, I didn't get the link, but I did get a summary of the review, right?
Wrong.
ChatGPT 4o nailed Fincher, Pitt, Freeman, and the plot summary. It got the date of the NYT review right (9/22/1995), and its author (Janet Maslin). However, the two paragraphs that describe the review are utterly false. Maslin neither wrote the quote attributed to her nor did she "praise the film's atmosphere, performances, and its unflinching portrayal of evil.”
Her opinion of the movie was quite the opposite. She despised it. Please read Maslin's actual review (it's behind the New York Time's paywall, but I gifted the article from my subscription.)
(For the record, I vehemently disagree with most aspects of Maslin's review of "Seven" and, ironically, wholeheartedly agree with the "hallucinated" review that ChatGPT 4o describes.)
So, a large language model-based AI assertively spewed out totally false information. It was certainly a “dog bites man” story, not worthy of publication, but I was so jarred by my experience I felt compelled to share it.
I've been using ChatGPT 4o frequently lately, and its shocking effectiveness has lulled me into a false sense of confidence. I could easily have accepted its summary of Maslin’s review of “Seven” and continued with my day. The thought of that terrifies me. Not because I’d have tainted my personal body of knowledge with a bit of false yet innocuous information, but because no matter how intellectually aware I am that today’s large language model-based AI is prone to such confidently expressed falsehoods, to see it happen before me so explicitly took my breath away, and made me think of the broader implications of a tool that is astonishingly powerful yet incongruously brittle.
Would we use a calculator that would, every once in a while, give us the wrong answer? A keyboard that would, from time to time, type the wrong letter? A phone that would, occasionally, dial the wrong number?
Would we trust a partner, business or romantic, that would lie to us every so often?
I don’t think so. Yet we are increasingly relying on AI that sometimes, insidiously, returns false information.
What can possibly go wrong?
Image by Mike MacKenzie