Microsoft's lightweight Phi-3 Mini model can run on smartphones
It can supposedly provide responses close to models 10 times the size.
Microsoft has unveiled its latest light AI model called the Phi-3 Mini designed to run on smartphones and other local devices, it revealed in a new research paper. Trained on 3.8 billion parameters, it's the first of three small Phi-3 language models the company will release in the near future. The aim is to provide a cheaper alternative to cloud-powered LLMs, allowing smaller organizations to adopt AI.
According to Microsoft, the new model handily outperforms its previous Phi-2 small model and is on par with larger models like Llama 2. In fact, the company says Phi-3 Mini provides responses close to the level of a model 10 times its size.
"The innovation lies entirely in our dataset for training," according to the research paper. That dataset is based on the Phi-2 model, but uses "heavily filtered web data and synthetic data," the team states. In fact, a separate LLM was used to do both of those chores, effectively creating new data that allows the smaller language model to be more efficient. The team was supposedly inspired by children's books that use simpler language to get across complex topics, according to The Verge.
While it still can't produce the results of cloud-powered LLMs, Phi-3 Mini can outperform Phi-2 and other small language models (Mistral, Gemma, Llama-3-In) in tasks ranging from math to programming to academic tests. At the same time, it runs on devices as simple as smartphones, with no internet connection required.
Its main limitation is breadth of "factual knowledge" due to the smaller dataset size — hence why it doesn't perform well in the "TriviaQA" test. Still, it should be good for models that only require smallish internal data sets. That could allow companies that can't afford cloud-connected LLMs to jump into AI, Microsoft hopes.
Phi-3 Mini is now available on Azure, Hugging Face and Ollama. Microsoft is next set to release Phi-3 Small and Phi-3 Medium with significantly higher capabilities (7 billion and 14 billion parameters, respectively).