Sense is an MCP server that lets AI agents see your desktop. No more copying error messages, describing layouts, or screenshotting UI bugs. Just say "take a look" — and it does.
Get your API keyEvery time your AI agent needs visual context, you stop what you're doing. You screenshot. You crop. You paste. You describe what you're looking at in words that are never quite right.
"The button is slightly off. No, more to the left. The padding is wrong — not that padding, the inner one."
Words are lossy. Screenshots are manual. And every token you spend describing what's on screen is a token not spent solving the problem.
Sense doesn't flood your context window with high-res screenshots. Your agent is smart about how it looks.
A low-resolution capture to get the lay of the land. Cheap. Fast. Usually enough.
OCR extracts visible text without sending an image at all. Error messages, console output, UI labels — as raw text. Tiny payload.
When detail matters, the agent crops into a specific region at higher resolution. Pixel-level precision, only where it's needed.
Screen and window capture — full frame or any region. Works even on occluded windows.
OCR text extraction from any part of the screen. Get content without the image cost.
Click at precise coordinates. Screen-absolute or window-relative.
Send text and key combinations. Ctrl+C, Enter, Tab — anything.
Scroll within any window at any position.
List open windows, bring them to the foreground, target by process or title.
Everything runs locally on your machine. Screen captures, OCR, input — all processed on-device. Nothing is sent to a server. Your screen data never leaves your computer.
A visual bug in your app. Instead of writing "the card component has a 2px gap on the right side when the viewport is below 768px" — you say "look at the app window, the cards look off." The agent sees it, understands it, fixes it.
Your designer hands you a Figma file. Point the agent at the Figma window. It reads the specs, sees the layout, and writes code that matches — without you translating design into words.
The agent makes a change. Instead of asking you "does it look right?" — it checks itself. Capture the window, compare to the intent, iterate. You stay in flow.
Better together
Sense gives your agents eyes. Superlite lets you run them in parallel.
Get your API key and start in under a minute.
Get Started