Explore NVIDIA's Nemotron Nano 12B 2 VL Model

Advanced AI for video understanding and document intelligence.

Input: image · text Output: text Context: 128,000 tokens Release: 2025-10-28
Discover NVIDIA's Nemotron Nano 12B 2 VL, an advanced AI model released on October 28, 2025. This model excels in video understanding and document intelligence using a hybrid Transformer-Mamba architecture. With a context window of 128,000 tokens, it processes text and image inputs to generate text outputs. Ideal for tasks like optical character recognition and multimodal comprehension, it offers high throughput and low latency. Open weights and fine-tuning recipes make it accessible for various applications.

Use Cases

Here are a few ways teams apply NVIDIA: Nemotron Nano 12B 2 VL (free) in practice—from fast drafting to multimodal understanding. Adapt these ideas to your workflow.

Enhance video content analysis

Improve document processing accuracy

Streamline optical character recognition

Facilitate multimodal comprehension

Optimize chart reasoning tasks

Key Features

A quick look at the capabilities that make this model useful in real projects.

12-billion-parameter model

Hybrid Transformer-Mamba architecture

Processes text and multi-image inputs

Produces natural language outputs

128,000 token context window

Open weights and fine-tuning recipes

Specs

Overview
Vendor
nvidia
Model ID
nvidia/nemotron-nano-12b-v2-vl:free
Release
2025-10-28
Modalities & context
Input
image · text
Output
text
Context
128,000 tokens
Parameters & defaults

Supported parameters: include_reasoning, reasoning, tool_choice, tools

Defaults: temperature 0.2, top_p 0.95

Benchmark tests: NVIDIA: Nemotron Nano 12B 2 VL (free)

We ran this model against a few representative prompts to show its range. Review the outputs below and be the judge.

Text

Prompt:
Write 150 words on how AI might positively upend work, leisure and creativity
AI has the potential to positively upend work, leisure, and creativity by transforming how we engage with tasks, experiences, and innovation. In work, AI automates repetitive tasks, freeing humans to focus on strategic, creative, and interpersonal roles. This shift could boost productivity, reduce burnout, and create new job opportunities in AI development, ethics, and oversight. For leisure, AI personalizes entertainment—curating music, games, or content tailored to individual preferences—while also enabling immersive experiences like virtual travel or interactive storytelling. This could deepen engagement and foster new hobbies. In creativity, AI tools assist artists, writers, and designers by generating ideas, refining concepts, or handling technical details, allowing humans to focus on vision and emotion. By democratizing access to creative resources, AI empowers more people to innovate. Together, these changes could enhance efficiency, enrich personal time, and expand creative possibilities, ultimately amplifying human potential while fostering a more dynamic, inclusive, and imaginative society. (150 words)
Run this prompt on Upend.AI

Image

Prompt:
Generate an SVG of a pelican riding a bicycle.
Benchmark image
Run this prompt on Upend.AI

Ready to try NVIDIA: Nemotron Nano 12B 2 VL (free)?

Chat with NVIDIA: Nemotron Nano 12B 2 VL
up.end
/ˌəpˈend/
verb

To “upend” means to completely disrupt, overturn, or drastically change the established order or structure of something. It implies a significant shift or alteration that can potentially have far-reaching consequences. When something is upended, it is turned upside down or transformed in a way that challenges conventional norms or expectations. The term often carries a sense of innovation, transformation, and sometimes even a hint of upheaval, indicating that the changes are not just minor adjustments but rather a fundamental reimagining of the status quo.