Advancing Multimodal
Generative AI Research

Learning more about our Groundbreaking research!

Overview

We are spearheading the future of multimodal generative AI

Multimodal Understanding /
Multimodal LLM

We aim to train multimodal large language models (LLM) that can understand multimodal input, including text, images, videos, and more.

HyperGAI's innovation framework, HPT, enables LLMs to comprehend and solve a broad range of complex vision and language tasks.

Our models demonstrate strong deliberate reasoning capabilities from multimodal inputs such as text and images. They are already advanced in understanding fine-grained detail and abstract, high-level information.

Multimodal Generation /
Multimodal Diffusion

We aim to train multimodal diffusion models for achieving personalized content generation, including photo generation, video generation, and more.

Our models have demonstrated a high-level of success at generating high quality, detailed and realistic images.

We continue to develop other forms of multimodal content generation and are excited to bring new updates and demonstrate our advanced capabilities.

Our MultiModal LLM-HPT Family

We are excited to announce HPT - a ground breaking family of state-of-the-art Multimodal large language models. Our models are available in various sizes to support a wide range of needs. We released HPT 1.0 in two variants, HPT 1.0 Air and Pro, which demonstrated strong multimodal understanding capabilities. In subsequent releases, we continuously innovate to make HPT more efficent while improving its impressive performance. Our HPT 1.5 Air was the best multimodal LLM based on Llama 3 at the time of release, even reaching GPT-4V performance in some cases. Our latest release, HPT 1.5 Edge, aims at bringing our cutting-edge models more accessible to users especially for on-device usages. HPT series boasts exceptional vision and language understanding, making it ideal for complex tasks at an affordable cost.

HPT 1.5 Edge

HPT 1.5 Edge is our latest open-sourced model for edge devices.

With only around 4B parameters, Edge is extremely efficient and still achieved impressive results on many challenging benchmarks (MMMU, POPE, SEED-I, and more). We publicly release the model on Huggingface and Github.

Download:

HPT 1.5 Edge Performance

HPT 1.5 Edge, Phi-3-vision-128k-instruct, xgen-mm-phi3-mini-instruct-r-v1 Comparison

HPT 1.5 Edge achieves competitive performances, with the best results on MMMU, POPE, and MathVista among models with similar size.

Download:

HPT 1.5 Air

HPT 1.5 Air is our best open-sourced 8B Multimodal Llama 3.

Our hyper capable HPT 1.5 Air packs a punch on real world understanding and complex reasoning. HPT 1.5 Air achieves the best results among <10B models across a wide range of challenging benchmarks (MMMU, POPE, SEED-I, and more). HPT 1.5 Air is publicly available on Hugging Face and GitHub.

Download:

HPT 1.5 Air Performance

HPT 1.5 Air, LLaVA, LLaVA++ Comparison

HPT 1.5 Air is the best publicly available multimodal Llama 3, achieving the best results on the challenging MMMU benchmarks.
HPT 1.5 Air achieves lower hallucination (best POPE results) while showing superlative results on all four benchmarks.

Download:

HPT 1.0 Pro

HPT Pro is HyperGAI's proprietary and most optimized model, highly capable of solving very complex multimodal tasks.

HPT Pro outperforms other larger proprietary models such as GPT-4V and Gemini Pro on the MMBench and SEED-Image benchmarks.

HPT Pro achieves state-of-the-art results for a model of its size on the MMMU leaderboard.

Join waitlist

HPT 1.0 Pro Performance

HPT Pro, GPT-4V, Gemini Pro Comparison

HPT 1.0 Pro demonstrates the best result among models of similar size in multimodal understanding, evaluated on both MMBench and MMBench-CN.
HPT 1.0 Pro ranks second on the MMMU(val) for college-level understanding.
HPT 1.0 Pro performs the best in visual perception and understanding as seen on SEED(Img).

Join waitlist

HPT 1.0 Air

HPT Air is HyperGAI's first free to use, open source model.

Our most efficient model for its size, HPT Air is capable of solving a wide range of vision and language tasks. HPT Air is publicly available and achieves state-of-the-art results among all other open-source multimodal LLM models of similar or smaller size on the MMMU benchmark.

Download:

HPT 1.0 Air Performance

HPT Air, LLaVa-NeXT, Qwen-VL-Chat Comparison

HPT 1.0 Air demonstrates the best result among models of similar size in multimodal understanding in English, evaluated on the MMBench.
HPT 1.0 Air achieves the best result on the MMMU(val) for college-level understanding and reasoning.
HPT 1.0 Air ranks second in visual perception and understanding as seen on SEED(Img).

Download:

HPT 1.5 Edge

HPT 1.5 Edge is our latest open-sourced model for edge devices.

Download:

HPT 1.5 Edge Performance

HPT 1.5 Edge, Phi-3-vision-128k-instruct, xgen-mm-phi3-mini-instruct-r-v1 Comparison

HPT 1.5 Edge achieves competitive performances, with the best results on MMMU, POPE, and MathVista among models with similar size.

Download:

HPT 1.5 Air

HPT 1.5 Air is our best open-sourced 8B Multimodal Llama 3.

Download:

HPT 1.5 Air Performance

HPT 1.5 Air, LLaVA, LLaVA++ Comparison

HPT 1.5 Air is the best publicly available multimodal Llama 3, achieving the best results on the challenging MMMU benchmarks.
HPT 1.5 Air achieves lower hallucination (best POPE results) while showing superlative results on all four benchmarks.

Download:

HPT 1.0 Pro

HPT Pro is HyperGAI's proprietary and most optimized model, highly capable of solving very complex multimodal tasks.

HPT Pro outperforms other larger proprietary models such as GPT-4V and Gemini Pro on the MMBench and SEED-Image benchmarks.

HPT Pro achieves state-of-the-art results for a model of its size on the MMMU leaderboard.

Join waitlist

HPT 1.0 Pro Performance

HPT Pro, GPT-4V, Gemini Pro Comparison

HPT 1.0 Pro demonstrates the best result among models of similar size in multimodal understanding, evaluated on both MMBench and MMBench-CN.
HPT 1.0 Pro ranks second on the MMMU(val) for college-level understanding.
HPT 1.0 Pro performs the best in visual perception and understanding as seen on SEED(Img).

Join waitlist

HPT 1.0 Air

HPT Air is HyperGAI's first free to use, open source model.

Download:

HPT 1.0 Air Performance

HPT Air, LLaVa-NeXT, Qwen-VL-Chat Comparison

HPT 1.0 Air demonstrates the best result among models of similar size in multimodal understanding in English, evaluated on the MMBench.
HPT 1.0 Air achieves the best result on the MMMU(val) for college-level understanding and reasoning.
HPT 1.0 Air ranks second in visual perception and understanding as seen on SEED(Img).

Download:

Research Principles

Our research goal is

“To build the leading Multimodal Foundation Models that are capable of understanding all kinds of inputs and generating any type of intended output, which becomes the digital brain to empower everyone to unlock their creativity and improve workplace productivity.”
---- Dr. Steven Hoi, Founder and CEO

Approach

Dedication to Responsible AI

Our collective vision is rooted in the pursuit of multimodal generative AI that not only demonstrates cutting-edge capabilities, but also adheres to ethical and responsible development. Our fundamental goal revolves around the mitigation of risks associated with the creation and amplification of inherent biases in existing data. Although this task is not easy, we constantly improve our research to create AIs that are fair and equitable. We scrutinize our technologies through internal assessments, proactively monitoring, and adjusting their behaviors to eliminate the anticipated risks and minimize unforeseen downsides.

Transparency

We believe that being transparent not only fosters an environment of trust and accountability, but also facilitates collaborative advancements and the cumulative growth of shared wisdom in the research community. It is through transparent practices that the research enterprise promotes inclusivity, diversity, and the democratization of knowledge. By open-sourcing our HPT Air model, we promote the community to evaluate our model and further develop useful AI applications.

High standards for research integrity, rigor & excellence

We are dedicated to maintaining the highest research integrity by adhering to ethical standards throughout our research process. We work meticulously to ensure that our data collection, research development, and analysis are impartial, authentic, and free from any manipulation. We aspire to high standards of research rigor and excellence, and fully commit to advancing the field of generative AI. We constantly work with various stakeholders to ensure that our technologies are benefiting society in a manner that is not only safe but also ethically responsible.

Past Research

Prior to HyperGAI, our founding team also contributed to some of the important AI breakthroughs in the Generative AI field, including high-impact research publications and popular open-source foundation models and libraries in Generative AI especially for Multimodal Generative AI, Multimodal LLM (e.g., ALBEF, BLIP, BLIP-2, InstructBLIP, etc), Multimodal Generation (e.g., BLIP-Diffusion), LLM for Text & Code (e.g., CodeT5, CodeT5+, CodeRL), etc.

Blog

HPT 1.5 Edge: The Best Open-Source 4B Multimodal LLM for Edge and Mobile Devices

research

We release HPT 1.5 Edge as our latest open-sources lightweight multimodal LLM model tailored towards edge and mobile devices. Despite its small size (~4B), our HPT 1.5 Edge model demonstrates state-of-the-art performance and impressive capabilities among the multimodal LLM models of similar sizes, while being extremely efficient.

June 6th, 2024 - HyperGAI Team