Ben Dickson, a software engineer and the founder of the blog TechTalks, says that hallucinations are a “serious problem. LLMs are wont to generate plausible text that is not factually correct, such as made-up names of papers and journals.”
Moreover, there is new research, as well as anecdotal evidence, indicating that ChatGPT’s output has gotten worse, or “drifted,” over time.
For example, research by Stanford University and UC Berkeley professors looked at ChatGPT’s ability to identify prime numbers. Even with something as straightforward as math, the researchers found that in March, ChatGPT had 84 percent accuracy in identifying prime versus composite numbers, but by June had only 51 percent accuracy performing the same exercise.