research banner circlesresearch banner ball

Advancing Multimodal
Generative AI Research

Learning more about our Groundbreaking research!
We are spearheading the future of multimodal generative AI
Multimodal Understanding /
Multimodal LLM
We aim to train multimodal large language models (LLM) that can understand multimodal input, including text, images, videos, and more.
HyperGAI's innovation framework, HPT, enables LLMs to comprehend and solve a broad range of complex vision and language tasks.
Our models demonstrate strong deliberate reasoning capabilities from multimodal inputs such as text and images. They are already advanced in understanding fine-grained detail and abstract, high-level information.
Multimodal Generation /
Multimodal Diffusion
We aim to train multimodal diffusion models for achieving personalized content generation, including photo generation, video generation, and more.
Our models have demonstrated a high-level of success at generating high quality, detailed and realistic images.
We continue to develop other forms of multimodal content generation and are excited to bring new updates and demonstrate our advanced capabilities.
Our MultiModal LLM-HPT Family
We are excited to announce HPT - a ground breaking family of state-of-the-art Multimodal large language models. Our models are available in various sizes to support a wide range of needs. We released HPT 1.0 in two variants, HPT 1.0 Air and Pro, which demonstrated strong multimodal understanding capabilities. In subsequent releases, we continuously innovate to make HPT more efficent while improving its impressive performance. Our HPT 1.5 Air was the best multimodal LLM based on Llama 3 at the time of release, even reaching GPT-4V performance in some cases. Our latest release, HPT 1.5 Edge, aims at bringing our cutting-edge models more accessible to users especially for on-device usages. HPT series boasts exceptional vision and language understanding, making it ideal for complex tasks at an affordable cost.
HPT 1.5 Edge

HPT 1.5 Edge is our latest open-sourced model for edge devices.

With only around 4B parameters, Edge is extremely efficient and still achieved impressive results on many challenging benchmarks (MMMU, POPE, SEED-I, and more). We publicly release the model on Huggingface and Github.

HPT 1.5 Edge Performance
HPT 1.5 Edge, Phi-3-vision-128k-instruct, xgen-mm-phi3-mini-instruct-r-v1 Comparison
  • HPT 1.5 Edge achieves competitive performances, with the best results on MMMU, POPE, and MathVista among models with similar size.
HPT 1.5 Air

HPT 1.5 Air is our best open-sourced 8B Multimodal Llama 3.

Our hyper capable HPT 1.5 Air packs a punch on real world understanding and complex reasoning. HPT 1.5 Air achieves the best results among <10B models across a wide range of challenging benchmarks (MMMU, POPE, SEED-I, and more). HPT 1.5 Air is publicly available on Hugging Face and GitHub.

HPT 1.5 Air Performance
HPT 1.5 Air, LLaVA, LLaVA++ Comparison
  • HPT 1.5 Air is the best publicly available multimodal Llama 3, achieving the best results on the challenging MMMU benchmarks.
  • HPT 1.5 Air achieves lower hallucination (best POPE results) while showing superlative results on all four benchmarks.
HPT 1.0 Pro

HPT Pro is HyperGAI's proprietary and most optimized model, highly capable of solving very complex multimodal tasks.

HPT Pro outperforms other larger proprietary models such as GPT-4V and Gemini Pro on the MMBench and SEED-Image benchmarks.

HPT Pro achieves state-of-the-art results for a model of its size on the MMMU leaderboard.

HPT 1.0 Pro Performance
HPT Pro, GPT-4V, Gemini Pro Comparison
  • HPT 1.0 Pro demonstrates the best result among models of similar size in multimodal understanding, evaluated on both MMBench and MMBench-CN.
  • HPT 1.0 Pro ranks second on the MMMU(val) for college-level understanding.
  • HPT 1.0 Pro performs the best in visual perception and understanding as seen on SEED(Img).
Join waitlist
HPT 1.0 Air

HPT Air is HyperGAI's first free to use, open source model.

Our most efficient model for its size, HPT Air is capable of solving a wide range of vision and language tasks. HPT Air is publicly available and achieves state-of-the-art results among all other open-source multimodal LLM models of similar or smaller size on the MMMU benchmark.

HPT 1.0 Air Performance
HPT Air, LLaVa-NeXT, Qwen-VL-Chat Comparison
  • HPT 1.0 Air demonstrates the best result among models of similar size in multimodal understanding in English, evaluated on the MMBench.
  • HPT 1.0 Air achieves the best result on the MMMU(val) for college-level understanding and reasoning.
  • HPT 1.0 Air ranks second in visual perception and understanding as seen on SEED(Img).
Research Principles
Our research goal is
“To build the leading Multimodal Foundation Models that are capable of understanding all kinds of inputs and generating any type of intended output, which becomes the digital brain to empower everyone to unlock their creativity and improve workplace productivity.”
---- Dr. Steven Hoi, Founder and CEO
Dedication to Responsible AI
Our collective vision is rooted in the pursuit of multimodal generative AI that not only demonstrates cutting-edge capabilities, but also adheres to ethical and responsible development. Our fundamental goal revolves around the mitigation of risks associated with the creation and amplification of inherent biases in existing data. Although this task is not easy, we constantly improve our research to create AIs that are fair and equitable. We scrutinize our technologies through internal assessments, proactively monitoring, and adjusting their behaviors to eliminate the anticipated risks and minimize unforeseen downsides.
We believe that being transparent not only fosters an environment of trust and accountability, but also facilitates collaborative advancements and the cumulative growth of shared wisdom in the research community. It is through transparent practices that the research enterprise promotes inclusivity, diversity, and the democratization of knowledge. By open-sourcing our HPT Air model, we promote the community to evaluate our model and further develop useful AI applications.
High standards for research integrity, rigor & excellence
We are dedicated to maintaining the highest research integrity by adhering to ethical standards throughout our research process. We work meticulously to ensure that our data collection, research development, and analysis are impartial, authentic, and free from any manipulation. We aspire to high standards of research rigor and excellence, and fully commit to advancing the field of generative AI. We constantly work with various stakeholders to ensure that our technologies are benefiting society in a manner that is not only safe but also ethically responsible.
Past Research
Prior to HyperGAI, our founding team also contributed to some of the important AI breakthroughs in the Generative AI field, including high-impact research publications and popular open-source foundation models and libraries in Generative AI especially for Multimodal Generative AI, Multimodal LLM (e.g., ALBEF, BLIP, BLIP-2, InstructBLIP, etc), Multimodal Generation (e.g., BLIP-Diffusion), LLM for Text & Code (e.g., CodeT5, CodeT5+, CodeRL), etc.
AI/Research Scientist
research - Singapore
Apply Now
AI Research Scientist (Intern)
research intern - Singapore
Apply Now
Empowering everyone with best-in-class generative AI
HyperGAI © 2024. All rights reserved