The Curious Case of AI Cannibalism & Possible Solutions

July 26, 2023 2 min read By Cogito Tech. 674 views

AI Cannibalism is the new age dilemma threatening the performance and long-term viability of large language models (LLMs). Let’s deep dive and explore possible solutions.

AI Cannibalism occurs when LLMs scour the Internet to produce responses that are primarily AI generated content that have already been produced by other AI systems. This results in low quality output.

Recently, it has been observed that there’s an increasing shift in sentiments of paid users of GPT-4. Several users have expressed disappointment on social media platforms regarding dip in LLMs’ output quality. The chatbot’s response has been termed as “dumber” and “lazier” in comparison to previous versions leading to concerns regarding the total user experience. This has sparked conversations within the AI community regarding the factors leading to its decline.

Data that’s generic in nature is no longer viable for pushing the performance of AI models as per developers. According to Aidan Gomez, chief executive of $2bn LLM start-up Cohere, “If you could get all the data that you needed off the web, that would be fantastic. In reality, the web is so noisy and messy that it’s not really representative of the data that you want. The web just doesn’t do everything we need.”

Now let’s shift our focus to possible solutions to address this issue.

  1. Research & Development: The AI models will require unique and sophisticated datasets for improving the performance and addressing the challenges in Science, Medicine, or Business. The datasets will have to be created by experts like scientists, doctors, authors, actors or engineers, or acquired from large companies which include pharmaceuticals, banks, and retail set-ups. There is an increased need for producing datasets that are generated and curated by humans without any help from foundation LLMs. This will ensure delivery of input by AI models that are accurate and reliable.

  2. Model Collapse: LLMs exposed to AI-generated data over a period of time are impacted by a degenerative process called ‘model collapse’. This results in them forgetting the data distribution underlying it. It also has an overarching effect on the future of generative AI technology.

  3. Increase in Volume of AI-Generated Content: There must be a mechanism to ensure that there is a clear way to differentiate between real, human-authored information, and AI produced material. Failure to address this issue can result in decline in functionality and efficacy of AI tools like ChatGPT which increasingly depend on human generated data for training and content generation.

In summary

AI cannibalism raises doubts in the minds of people regarding the sustenance of AI systems and their capability of advancing in a fashion which aligns with human intellect. It stresses on the requirement for extensive research and development for mitigating risks linked with cannibalization to ensure that AI models are consistent in offering accurate, dependable, and intelligent output.

At Cogito, we endeavor to address the above challenge by:

Engaging our subject matter experts who research the Internet for gaining an in-depth understanding regarding the topics. This will ensure that the training data is well researched and created manually without the help of foundation models.

Ensuring our team works in an on-site supervised environment so that they can access only resources that are conducive for proper Research & Development.

If you wish to learn more about Cogito’s data annotation services,
please contact our expert.