My prediction for 2024 is that we'll see a proliferation of smaller language (and vision) models and active research into efficient attention mechanisms or alternatives. We've already seen the latter progress this year.
I’d agree with that - in part because there’s a lot of benefits to getting an LLM to comfortably run on a mobile device, and in part because we’re running into a local maximum in terms of scaling compute for new models.
My prediction for 2024 is that we'll see a proliferation of smaller language (and vision) models and active research into efficient attention mechanisms or alternatives. We've already seen the latter progress this year.
I’d agree with that - in part because there’s a lot of benefits to getting an LLM to comfortably run on a mobile device, and in part because we’re running into a local maximum in terms of scaling compute for new models.