From Stochastic Parrots to Intelligent Assistants—The Secrets of Data and Human Interventions

Published by IEEE in May, 2023


Generative AI is all the rage nowadays—primarily driven by ChatGPT capturing the public imagination and attracting hundreds of millions of users in record time, reaching 100 million users in two months. However, there is much ambiguity from the providers about the technology, the methodology, and the way OpenAI makes it work. This compounds the mystique and speculation. I focus on what we know, with a particular emphasis on the aspects that the makers of ChatGPT avoid discussing with the public—namely, the underlying dependence on much manual intervention in training data curation, data labeling, operational interventions by humans, and reinforcement learning. Unfortunately, despite the criticality of these issues to the scientific community, they are hardly discussed. In this article, I attempt to address some of the issues in the hope of stimulating further studies of these less glorified but critical topics.

