Imagine a day at work where your main task is to pick a fight with a computer. No meetings, no emails – just you, a chair and a chatbot with the maddening tendency to think it has the cleverest mind in the room.
The job title alone raises an eyebrow: “AI bully”. But this is precisely what a California startup called Memvid is offering: $800 to spend eight hours testing the patience and memory of artificial intelligence.
“You’ll spend a full eight-hour day interacting with leading AI chatbots – and your only job is to be brutally honest about how frustrating they are,” the company’s job listing states.
The job requires no computer science degree or specialised AI skills. The only prerequisite is having an “extensive personal history of being let down by technology” – and the patience to ask the same question over and over again.
“People constantly have to repeat themselves to chatbots. We wanted to turn that every day frustration into something visible,” said Memvid’s co-founder and CEO, Mohamed Omar.
The role reads almost like a stress test for human temperament as much as machine intelligence: candidates are expected to keep the conversation going, revisit earlier topics and gently force the AI to admit when it has lost track – all while recording everything for analysis.
It is a far cry from coding or server management; this is conversation-driven detective work, following the trail of a chatbot’s mistakes as it forgets, fudges or hallucinates.
Omar told Business Insider that the company considered this task as a way to highlight the persistent problem in many AI chatbots of systems losing context over time.
“All the AI lives and breathes on memory. It’s the holy grail,” he said. “But the AI memory solutions that were in the market in 2024, when we started our business, were unreliable – meaning they would lose context and start hallucinating.”
The problem has only grown in subsequent years: a peer-reviewed paper, presented at the International Conference on Learning Representations (ICLR) in 2025, found that even leading commercial AI systems suffered a 30% to 60% drop in accuracy when asked to remember facts across sustained conversations, lagging well behind human performance.
Omar added that one recent college graduate who applied for the job said they pay almost $300 a month for their AI subscriptions. He said the person wrote “a whole rant about how they’ve faced memory issues on every AI platform they’ve used”.
He added: “A lot of people that are applying for this are knowledge workers who are using these products.”
The root cause of the problem, as researchers and industry analysts have documented, is that companies have rushed to connect their AI tools to vast knowledge repositories, only to discover that retrieval-based systems can surface confident but incorrect answers faster than ever, with no reliable way to signal that they are doing so.
When AI systems are deployed in the real world at scale, this confident wrongness can cause serious harm: a Guardian investigation this week by the AI security lab Irregular found that when AI agents were given broad but benign tasks inside a simulated corporate environment, they bypassed safety controls, interacted with sensitive data and performed actions with the potential to be harmful without direct instructions.
It is an issue the real world increasingly struggles with. Damien Charlotin, a French legal scholar, has tracked how the legal profession is experiencing a sharp increase in AI-driven legal hallucinations, reporting that while before spring 2025 there were roughly two incidents a week, by autumn that had risen to two or three a day.
It is also an issue in healthcare. Earlier this month, the ECRI Institute placed “navigating the AI diagnostic dilemma” at the top of its annual list of the 10 greatest patient safety concerns for 2026, warning that AI diagnostic shortcomings risk reducing clinician vigilance, particularly where oversight frameworks are not yet established.
Omar has said he doesn’t have a deadline for accepting applications but expects to narrow down the right candidate within the next week or two.
The “AI bully” experiment, although ostensibly playful, makes visible what users around the world are already encountering: that AI systems that are extremely capable in many ways can also be inconsistent and unreliable in others. The job pays $800 for a single day. But the costs of not doing it could be considerably higher.







