In the past few weeks, both OpenAI and Google have introduced smaller-scale large-language models. OpenAI’s ChatGPT-4o Mini boasts approximately 8B parameters, while Google’s Gemma 2 2B is even more compact at just 2B parameters—small enough to run on the free tier of T4 GPUs in Google Colab.
It’s exciting to see the advancement towards more lightweight models in the large language model (LLM) space. Through techniques like Knowledge Distillation, Gemma 2 2B delivers a performance that rivals models ten times its size, such as ChatGPT-3.5 turbo.
One major challenge with LLMs has been the need to send user requests to the provider’s server for processing, with responses then transmitted back to the user’s device. Despite significant efforts in privacy and security, the transmission of data poses unavoidable risks. In particular, unanonymized personal data processed in the cloud raises the risk of malicious behavior prediction and manipulation.
Apple Intelligence, released in June, offers an end-to-end solution. By prioritizing on-device processing, Apple Intelligence minimizes the need to transmit personal data over networks, thereby addressing privacy concerns head-on. While the real-world experience of Apple Intelligence remains to be seen, the growing focus on enabling powerful AI capabilities locally could be a game-changer for personal privacy in the age of LLMs.