Even the ratings themselves are suspect. Most, if not all, of the workers who provide this feedback to AI vendors are low-paid workers who are unlikely to have specialised knowledge relevant to the topic they’re rating, and even if they do, they are unlikely to have the time to fact-check everything. That means they are going to be ranking the conversations almost entirely based on tone and sentence structure. This is why I think that RLHF has effectively become a reward system that specifically optimises language models for generating validation statements: Forer statements, shotgunning, vanishing negatives, and statistical guesses. In trying to make the LLM sound more human, more confident, and more engaging, but without being able to edit specific details in its output, AI researchers seem to have created a mechanical mentalist.