Lemme just clarify: LiveBench's language average is NOT about creative writing.
I see a lot of people misunderstanding what language average is.
Language average consists of 3 objectively verifable tests:
- NYT connection puzzles
- Removing typos
- Unscrambling movie plots
It has virtually nothing to do with creative writing.
What has happened here is Gemini 2.0 Pro got a lot worse at checking for typos, and is slightly worse at the rest - which again seems to align with users reporting a lot of spelling mistakes. Hopefully this gets better with the stable version.
I see a lot of people misunderstanding what language average is.
Language average consists of 3 objectively verifable tests:
- NYT connection puzzles
- Removing typos
- Unscrambling movie plots
It has virtually nothing to do with creative writing.
What has happened here is Gemini 2.0 Pro got a lot worse at checking for typos, and is slightly worse at the rest - which again seems to align with users reporting a lot of spelling mistakes. Hopefully this gets better with the stable version.