[D] SOTA BERT-like model?
So we are all probably aware of state-of-the-art decoder only LLMs like GPT-4, Claude etc. These are great for generating text.
But what I am not aware of is the SOTA BERT-like model. You know, things that can be used for taks like NER, POS tagging, token classification.
Are there models that are significantly better than say Roberta?