We Just Compressed Days of Work Into Minutes. Here's How.

Creating work instructions has always been one of manufacturing's most stubborn bottlenecks. We built an AI agent that changes that.

Video thumbnail: Valentijn Kint, Head of Product at Azumuta, explaining how VLM-powered process capture turns a video into a draft work instruction.

Published on:

22 June 2026

Updated on:

22 June 2026

TL;DR

Process engineers spend too much time writing documentation and not enough time improving processes.
Azumuta built a VLM-powered AI agent that watches a video of an operation and generates a structured, review-ready work instruction automatically.
This reduces work instruction creation time by 90% compared to fully manual methods.
It is live in production at some of the world's largest discrete manufacturers today.
Read the full research on Azumuta Labs.

Process engineers are some of the most valuable people on your shop floor. They understand operations deeply. They know where variation creeps in, where quality breaks down, where tribal knowledge lives. And for decades, too much of their time has been eaten up by something far below their skill level: writing documentation.

Walk the floor. Record the process. Transfer the files. Write the steps. Format the document. Review it. And when something changes, start over.

That cycle has been the cost of doing things right. Until now.

90% less time. Starting from video.

We set out to answer one question: how do we compress the most time-consuming parts of process capture into something fast enough to actually scale?

By building a VLM-powered AI agent into the process capture workflow, we achieved a 90% reduction in work instruction creation time compared to fully manual methods. Not a marginal improvement. Not a nice-to-have efficiency gain. A fundamental shift in what's possible.

What used to take an engineer a full day now takes minutes. Film the operation. The AI watches the video, structures the procedure into steps, extracts the right frames, and generates a draft instruction ready for review. The knowledge that lives in your best operators' hands gets captured directly from the demonstration itself. No more losing it when someone retires. No more processes drifting because updating the documentation felt like too much effort.

Hear it from our Head of Product

Valentijn Kint, Head of Product at Azumuta, explains how this VLM capability works, what problem it solves, and how it fits into our broader product strategy.

Already live. Already working.

This isn't a concept paper. It isn't a prototype. It is already running in production at some of the world's largest discrete manufacturers, on real shop floors, in live operations.

The engineers using it are not cutting corners. They are redirecting their expertise toward the work that actually requires engineering judgment: process improvement, quality, training, scale.

That shift is significant. When documentation stops being a bottleneck, knowledge transfer accelerates. Standardization across lines and sites becomes achievable. And the gap between how a process is supposed to run and how it actually runs gets a lot smaller.

We published our full findings on Azumuta Labs, including the tradeoffs we encountered, what it takes to make this reliable in production, and where the technology is headed next.

If you're a process engineer, a quality manager, or anyone responsible for getting knowledge off the shop floor and into the hands of operators, this one is for you.

Read the full research on Azumuta Labs →

Quick FAQs to get you up to speed

VLM stands for Vision Language Model — a type of AI that can analyze video and images and generate structured text from them. In Azumuta's process capture workflow, a VLM agent watches a video recording of a manufacturing operation and automatically identifies the steps, extracts relevant frames, and generates a draft work instruction. The engineer then reviews and publishes it, rather than writing it from scratch.

Azumuta's VLM agent reduces work instruction creation time by 90% compared to fully manual methods. A process that previously took an engineer a full day to document — filming, transcribing, formatting, reviewing — now takes minutes from video to draft.

The VLM agent is trained to understand manufacturing operations. It segments a video into discrete steps, identifies the correct keyframes for each step, and organizes the output into a structured instruction format. The result is a complete draft — not a set of raw notes — that an engineer can review, edit, and publish directly in Azumuta.

Yes. The AI generates the draft; the engineer validates and approves it. This is by design. Engineering judgment remains in the loop for quality, accuracy, and compliance. Azumuta supports the engineer's decisions — it does not replace them.

It is live in production. Azumuta's VLM-powered process capture is already running at some of the world's largest discrete manufacturers, in real operations. It is not a prototype or a pilot.

Process engineers, quality managers, and operations leaders responsible for maintaining and scaling work instructions across lines and sites. Anyone whose team has ever delayed a process update because the documentation effort wasn't worth it.

With traditional documentation, change means starting over. With Azumuta, you re-record the updated operation and the AI generates a new draft. This makes it practical to keep instructions current — which is what closes the gap between how a process is supposed to run and how it actually runs.

The complete findings, including methodology, tradeoffs, and production requirements, are published on Azumuta Labs.