16px

Full StackReal-TimeAI

How I Built Sketchr

February 26, 2026|8 min read

🎯

How it started: I didn't set out to build a "collaborative whiteboarding platform". I set out to answer one question: "Why does every free whiteboard tool feel like garbage?"

The actual problem#

Every whiteboard tool I tried had the same issues:

Laggy drawing that makes you feel like you're sketching underwater
Collaboration that "works" but feels broken
UIs designed by people who've never used a whiteboard

I wanted something that felt instant — draw, and everyone sees it. No spinners. No "syncing…" messages. No nonsense.

That's how Sketchr started. A real-time collaborative whiteboard with an infinite canvas, in-room chat, video calls, and AI-powered flowchart generation. Simple idea. Brutal execution.

Splitting it into layers, not features#

I made this decision early and it saved everything: Don't think in features. Think in layers.

Sketchr has four layers:

Canvas Engine — drawing, shapes, sticky notes, zoom, pan
Sync Layer — Socket.IO, in-memory caching, auto-save
Communication Layer — chat, WebRTC video calls
AI Layer — Gemini-powered flowchart generation

Each layer is independent. The canvas works without sockets. Chat works without the canvas. AI works without either. When layers don't depend on each other, nothing cascades when something breaks.

The Canvas: harder than it looks#

Drawing on a screen sounds simple. It is not.

Freehand strokes need to feel human

Raw mouse coordinates give you jagged, robotic lines. Nobody wants that. I used perfect-freehand to simulate pressure-sensitive strokes with smooth Bézier curves.

But here's what nobody tells you: the library gives you points, not a rendered path. You still have to convert those points into SVG path data, render them on the canvas, and make sure redrawing thousands of strokes doesn't kill performance.

Lesson: Libraries solve 60% of the problem. The remaining 40% is yours.

Infinite canvas is an infinite headache

Zoom and pan sound like two simple transforms. In reality, every single interaction — click positions, element placement, cursor coordinates — has to be translated between screen space and canvas space.

I got this wrong three times before it clicked. Every bug was the same: something was positioned in the wrong coordinate system. A shape drawn at one zoom level would appear in the wrong place at another.

The fix was boring: one utility function that converts between spaces, used everywhere. No exceptions. No shortcuts.

Undo/Redo is state management in disguise

Undo sounds trivial. Push to a stack, pop on Ctrl+Z. But when you have multiple element types — strokes, shapes, sticky notes, text — and each has different properties, and some operations are "add" while others are "update" or "delete"…

The history stack becomes a state machine. I used Zustand for global board state and maintained a separate action history. The trick was storing actions, not snapshots. Snapshots bloat memory fast. Actions are tiny and reversible.

Real-Time Sync: where the real pain lives#

This is where Sketchr goes from "a drawing app" to "a collaborative tool". And this is where I lost the most sleep.

The latency lie

Here's the truth about real-time apps: they're not real-time. There's always latency. The trick is hiding it.

My approach: optimistic local rendering. When you draw a stroke, it appears on YOUR screen instantly — no round trip. Simultaneously, the event fires to the server via Socket.IO, which broadcasts it to everyone else.

This means your canvas is always slightly ahead of the server. And that's fine. Perceived speed matters more than actual speed.

In-memory caching saved everything

My first approach was writing every element directly to MongoDB on every change. At 60fps with multiple users drawing simultaneously, that's hundreds of writes per second.

MongoDB said no.

So I built an in-memory cache on the server: a Map<roomId, Map<elementId, Element>>. All real-time changes hit the cache. A debounced function flushes to MongoDB every 5 seconds during active sessions.

This single decision:

Dropped write operations by ~95%
Made joins instant (new users get state from cache, not DB)
Made the whole system feel responsive

⚠️

The tradeoff: If the server crashes, you lose up to 5 seconds of work. For a whiteboarding app, that's acceptable. For a banking app, absolutely not. Know your tolerance.

Graceful disconnect is harder than connecting

Users refresh pages. Tabs crash. WiFi drops. If I destroyed room state on every disconnect, the experience would be terrible.

So I added a 30-second retention window. When a user disconnects, their slot is held. If they reconnect within 30 seconds, they rejoin seamlessly with full state. If not, they're removed and the slot opens up.

This tiny feature made Sketchr feel reliable instead of fragile.

Live cursors: simple concept, tricky execution

Showing other users' cursors seems easy. Emit cursor position, render a dot. Done.

Not quite. At high mouse-move frequency, you flood the socket with events. And rendering 10 cursors updating 60 times per second causes jank.

The fix: throttle cursor emissions to ~20fps (still feels smooth to humans) and use CSS transforms for cursor rendering instead of re-rendering React components. Never let cosmetic features steal performance from core features.

WebRTC Video Calls: the chaos layer#

I added peer-to-peer video calls using PeerJS (a WebRTC abstraction). On paper, WebRTC is elegant. In practice, it's chaos.

NAT traversal is a coin flip

WebRTC needs peers to find each other through firewalls and NATs. Sometimes it works flawlessly. Sometimes two users on different networks simply cannot connect. TURN servers fix this, but they add cost and latency.

I used PeerJS's cloud signaling server for the MVP. It works for most cases. Perfection isn't shipping. Shipping is shipping.

Media controls are deceptively stateful

Mute, unmute, camera on, camera off, screen share. Each is a state toggle, but they interact. Screen sharing replaces the video track. Stopping screen share should restore the camera. Muting should persist across track swaps.

I burned hours on edge cases like: "User starts screen share → mutes → stops screen share → unmutes. Is the camera on or off?"

Every boolean becomes a state machine if you add enough of them.

AI Flowchart Generation: the fun part#

This was actually the most enjoyable feature to build. Users type a natural language prompt like "authentication flow with login, register, and password reset" and Gemini generates a structured flowchart that renders directly on the canvas.

The key insight: structured output, not text

I don't ask Gemini for text. I ask for JSON — nodes and connectors with positions, labels, and types. The prompt engineering forces a strict schema: