Tomato AI Daily · Tuesday, April 28, 2026

Highlights

vLLM v0.20.0 enhances memory and MoE serving efficiency with TurboQuant 2-bit KV cache.
NVIDIA's Nemotron 3 Nano Omni is a multimodal MoE model with 256K context for agentic workloads.
Mistral launched Workflows for durable, fault-tolerant AI process orchestration.
Open-model economics are reshaping deployment strategies with cost control advantages.
Closed-model dependence is increasingly viewed as an operational risk.

vLLM v0.20.0 focuses on memory and MoE serving efficiency with TurboQuant 2-bit KV cache.

vLLMTurboQuant

NVIDIA's Nemotron 3 Nano Omni is a 30B multimodal MoE model with 256K context for agentic workloads.

NVIDIANemotron

Mistral launched Workflows for durable, fault-tolerant AI process orchestration.

MistralWorkflows

Microsoft's TRELLIS.2 is a 4B image-to-3D model producing up to 1536³ PBR textured assets.

MicrosoftTRELLIS.2

Hermes is gaining traction, outperforming OpenClaw in instruction-following and practical workflows.

HermesAgents

Open-model economics are reshaping deployment strategies with cost control advantages.

Open-modelEconomics

Closed-model dependence is increasingly viewed as an operational risk.

Closed-modelRisk