Discover StepFun's Step3 Multimodal Model

Efficiently process text and images with Step3's advanced reasoning.

Input: image · text Output: text Context: 65,536 tokens Release: 2025-08-28
StepFun's Step3, released in August 2025, is a sophisticated multimodal reasoning model designed to handle both text and image inputs, producing text outputs. With a context window of 65,536 tokens, Step3 leverages a Mixture-of-Experts architecture, operating with 321 billion parameters, of which 38 billion are actively used. This design ensures efficient performance, particularly in vision-language reasoning tasks. The integration of Multi-Matrix Factorization Attention and Attention-FFN Disaggregation allows Step3 to maintain high efficiency across various hardware, from flagship devices to more modest accelerators. Whether you're working with complex datasets or need streamlined processing, Step3 offers a versatile solution.

Use Cases

Here are a few ways teams apply StepFun: Step3 in practice—from fast drafting to multimodal understanding. Adapt these ideas to your workflow.

Enhance vision-language understanding in research

Optimize resource use in complex data environments

Streamline text generation from multimodal inputs

Support diverse hardware setups efficiently

Key Features

A quick look at the capabilities that make this model useful in real projects.

Multimodal reasoning with text and image inputs

65,536 token context length for comprehensive analysis

Mixture-of-Experts architecture with 321B parameters

Efficient operation with 38B active parameters

Supports vision-language reasoning tasks

Optimized for various hardware accelerators

Specs

Overview
Vendor
stepfun-ai
Model ID
stepfun-ai/step3
Release
2025-08-28
Modalities & context
Input
image · text
Output
text
Context
65,536 tokens
Parameters & defaults

Supported parameters: frequency_penalty, include_reasoning, reasoning, response_format, structured_outputs, temperature, tool_choice, tools, top_k, top_p

Defaults: temperature 0.2, top_p 0.95

Benchmark tests: StepFun: Step3

We ran this model against a few representative prompts to show its range. Review the outputs below and be the judge.

Text

Prompt:
Write 150 words on how AI might positively upend work, leisure and creativity
## The Upside of Upheaval: AI's Positive Disruption Artificial Intelligence isn't just automating tasks; it's poised to fundamentally reshape our work, leisure, and creative landscapes for the better. **At Work**, AI acts as a powerful augmenter. It can automate tedious, repetitive tasks (data entry, scheduling, report generation), freeing human workers to focus on higher-value activities like strategic thinking, complex problem-solving, collaboration, and empathy-driven interaction. Imagine doctors using AI for rapid diagnostics, allowing more time for patient care, or engineers leveraging AI simulations to accelerate innovation cycles. This shift promises not just productivity gains, but more meaningful, fulfilling roles for humans. **In Leisure**, AI becomes a personalized curator and enabler. It can analyze our preferences to recommend hyper-personalized entertainment, travel experiences, or learning opportunities we might never have discovered alone. AI-powered accessibility tools can open up new forms of enjoyment and connection for people with disabilities. Gaming and virtual worlds could become infinitely more immersive and adaptive, driven by AI companions and dynamic environments. **For Creativity**, AI is a potent collaborator, not a replacement. It can handle technical execution (generating drafts, composing variations, suggesting color palettes), lowering barriers to entry. Artists, writers, and musicians can use AI as a brainstorming partner, a rapid prototyping tool, or a source of unexpected inspiration, pushing the boundaries of their craft. This democratizes creation and fosters novel forms of expression. Ultimately, AI's greatest positive potential lies in **freeing human capacity**. By handling routine burdens and amplifying our capabilities, it offers the tantalizing possibility of more time for deep thinking, meaningful connection, and the pursuit of truly human passions – reshaping not just *how* we work and play, but *why*.
Run this prompt on Upend.AI

Image

Prompt:
Generate an SVG of a pelican riding a bicycle.
Benchmark image
Run this prompt on Upend.AI

Ready to try StepFun: Step3?

Chat with Step3
up.end
/ˌəpˈend/
verb

To “upend” means to completely disrupt, overturn, or drastically change the established order or structure of something. It implies a significant shift or alteration that can potentially have far-reaching consequences. When something is upended, it is turned upside down or transformed in a way that challenges conventional norms or expectations. The term often carries a sense of innovation, transformation, and sometimes even a hint of upheaval, indicating that the changes are not just minor adjustments but rather a fundamental reimagining of the status quo.