HPT 1.5 Air: Best open-sourced 8B Multimodal LLM with Llama 3May 3rd, 2024 - HyperGAI Team
Overview
We are excited to announce the release of HPT 1.5 Air, the best open-sourced 8B multimodal LLM based on our previously released HPT Air architecture. HPT 1.5 Air sets a new standard for efficacy, efficiency, and transparency. With only a total of ~8.5B total parameters, HPT 1.5 Air belongs to the small model category (<10B), yet it can punch above its weight, outperforming bigger, proprietary models that have more parameters in several occasions. We now highlight the exciting new changes in HPT 1.5 Air:
- Improved visual understanding and complex reasoning. HPT 1.5 Air can work well on real-world scenarios while maintaining competitive performances on other types of inputs such as chart and diagrams.
- Impressive performance. HPT 1.5 Air is the best multimodal Llama 3 in the market, even outperforming bigger, proprietary models on several benchmarks.
- Transparency. We released HPT 1.5 Air with all of its components publicly available under the Apache 2.0 license.
Model Architecture
HPT 1.5 Air follows the similar recipe as its predecessor, HPT 1.0 Air, with a visual encoder, the novel H-Former, and an LLM. Compared with HPT 1.0 Air, we upgraded the visual encoder and changed the LLM to the latest LLaMA 3 8B version, and trained on an improved larger dataset mixed of image and text data. Thus, the new HPT 1.5 Air is more powerful and capable, open-sourced and fully available at both Huggingface and Github, empowering developers in building various real-world applications.
Benchmark Performance
We compare HPT 1.5 Air with many competitors across a wide range of benchmarks. Overall, HPT 1.5 Air achieved the best results in the multimodal LLM with less than 10B parameter category. Interestingly, HPT 1.5 Air even outperforms bigger or proprietary models such as LLaVA-Next, GPT-4V, and Gemini 1.0 Pro in several benchmarks such as SEED-I, SQA, and MMStar. In the following table, we provide a comprehensive comparison of HPT 1.5 Air, highlight the best results in bold, and underline the second-best results within the open-sourced category.
Examples
With the improved visual understanding and complex reasoning capabilities, HPT 1.5 Air demonstrated an impressive performance in many scenarios. In the following, we provide several examples showcasing its ability to understand social references, solving complex visual math problems, and operating well in real-world environments.
The Bottom Line
With the full release of HPT 1.5 Air, we‘re eager to see what people can create with it! Additionally, our HPT Pro models are under training with many impressive features such as better OCR capabilities, multiple images understanding, support for higher resolution inputs, and many more. You can join our waitlist to get early access and latest updates on our HPT Pro series.
How to Access HPT
- Open-source release of HPT 1.5 Air
- Github repo: https://github.com/hyperGAI/HPT
- HuggingFace: https://huggingface.co/HyperGAI/HPT1_5-Air-Llama-3-8B-Instruct-multimodal
- Early Access to HPT Pro prototype/API
- Subscribe to our waitlist to get early access to HPT prototype/API
Explore More
- Contact: Research at hpt@hypergai.com or Business at info@hypergai.com
- Follow us on: LinkedIn, X