Explore Baidu's ERNIE 4.5 VL 28B A3B Model

Advanced multimodal AI for text and image understanding.

Input: text · image Output: text Context: 30,000 tokens Release: 2025-08-12
Discover Baidu's ERNIE 4.5 VL 28B A3B, released on August 12, 2025. This multimodal model excels in processing both text and images, outputting text with a context window of up to 30,000 tokens. Utilizing a Mixture-of-Experts architecture, it activates 3 billion parameters per token, enhancing its ability to understand and generate language and vision tasks. With a focus on efficient training and inference, it incorporates techniques like SFT, DPO, and UPO to optimize performance. The model supports cross-modal reasoning and generation, making it adaptable for diverse applications.

Use Cases

Here are a few ways teams apply Baidu: ERNIE 4.5 VL 28B A3B in practice—from fast drafting to multimodal understanding. Adapt these ideas to your workflow.

Enhance text and image content understanding

Optimize AI-driven visual and textual analysis

Improve cross-modal content generation

Facilitate large-scale data processing

Support complex reasoning tasks

Key Features

A quick look at the capabilities that make this model useful in real projects.

Multimodal Mixture-of-Experts architecture

28B total parameters, 3B activated per token

Supports 30,000 token context length

Text and image input, text output

Uses SFT, DPO, UPO for performance

Cross-modal reasoning and generation

Specs

Overview
Vendor
baidu
Model ID
baidu/ernie-4.5-vl-28b-a3b
Release
2025-08-12
Modalities & context
Input
text · image
Output
text
Context
30,000 tokens
Parameters & defaults

Supported parameters: frequency_penalty, include_reasoning, max_tokens, presence_penalty, reasoning, repetition_penalty, seed, stop, temperature, tool_choice, tools, top_k, top_p

Defaults: temperature 0.2, top_p 0.95

Benchmark tests: Baidu: ERNIE 4.5 VL 28B A3B

We ran this model against a few representative prompts to show its range. Review the outputs below and be the judge.

Text

Prompt:
Write 150 words on how AI might positively upend work, leisure and creativity
Artificial intelligence holds transformative potential to revolutionize work, leisure, and creativity. In the workplace, AI can streamline repetitive tasks, freeing humans to focus on complex problem-solving and innovation. By automating routine operations, it boosts productivity and reduces burnout, while predictive analytics and AI-driven insights empower smarter decision-making. For leisure, AI personalizes experiences—curating entertainment, travel, and hobbies to match individual preferences—making downtime more engaging and enriching. It can also democratize access to education and cultural resources, fostering lifelong learning. Creatively, AI acts as a collaborative tool, augmenting human imagination rather than replacing it. Artists, writers, and musicians leverage AI to explore new styles, generate ideas, or refine work, pushing boundaries of expression. It democratizes creativity by lowering barriers to entry, enabling diverse voices to thrive. Ultimately, AI could redefine work as more fulfilling, leisure as immersive, and creativity as boundless—ushering in an era where technology amplifies human potential rather than diminishes it.
Run this prompt on Upend.AI

Image

Prompt:
Generate an SVG of a pelican riding a bicycle.
Benchmark image
Run this prompt on Upend.AI

Ready to try Baidu: ERNIE 4.5 VL 28B A3B?

Chat with Baidu: ERNIE 4.5 VL 28B A3B
up.end
/ˌəpˈend/
verb

To “upend” means to completely disrupt, overturn, or drastically change the established order or structure of something. It implies a significant shift or alteration that can potentially have far-reaching consequences. When something is upended, it is turned upside down or transformed in a way that challenges conventional norms or expectations. The term often carries a sense of innovation, transformation, and sometimes even a hint of upheaval, indicating that the changes are not just minor adjustments but rather a fundamental reimagining of the status quo.