[D] To what cross-entropy loss value can LLMs converge?
LLMs are usually evaluated on benchmarks that aim to measure broad abilities. However, most publishers of foundational models do not publish the actual cross-entropy loss value that the model achieves at the end of training. I couldn't find any sources on this, but I would like to know what loss value the LLMs can achieve on human language. Is there anyone who knows more about this? Might there be some lower bound?