Three mistakes in an hour isn’t terrible, especially if that’s the last generation. As another commenter put it, this is the worst it’s ever going to be.
People keep parroting this line, but it's not a given, especially for such an ill-defined metric as "better". If I ask an LLM how its day was, there's no one right answer. (Users anthropomorphizing the LLM is a given these days, no matter how you may feel about that.)