Hi everyone!
While on-device models still have a way to go to catch up with the cloud, the progress recently has been incredibly fast. The promise has always been about bringing powerful capabilities locally, and MiniCPM-V 4.5 is a huge step in that direction.
It’s an 8B open-source multimodal model that is outperforming giants like GPT-4o and Gemini Pro on major vision benchmarks.
Its efficiency and accessibility are awesome. It has great OCR and video understanding, and it’s easy to run with tools like Ollama and llama.cpp. This is a very powerful new option for building on the edge.
Try this model here on Gradio.