Loading
How it started: I didn't set out to build a "collaborative whiteboarding platform". I set out to answer one question: "Why does every free whiteboard tool feel like garbage?"
Every whiteboard tool I tried had the same issues:
I wanted something that felt instant β draw, and everyone sees it. No spinners. No "syncingβ¦" messages. No nonsense.
That's how Sketchr started. A real-time collaborative whiteboard with an infinite canvas, in-room chat, video calls, and AI-powered flowchart generation. Simple idea. Brutal execution.
I made this decision early and it saved everything: Don't think in features. Think in layers.
Sketchr has four layers:
Each layer is independent. The canvas works without sockets. Chat works without the canvas. AI works without either. When layers don't depend on each other, nothing cascades when something breaks.
Drawing on a screen sounds simple. It is not.
Raw mouse coordinates give you jagged, robotic lines. Nobody wants that. I used perfect-freehand to simulate pressure-sensitive strokes with smooth BΓ©zier curves.
But here's what nobody tells you: the library gives you points, not a rendered path. You still have to convert those points into SVG path data, render them on the canvas, and make sure redrawing thousands of strokes doesn't kill performance.
Lesson: Libraries solve 60% of the problem. The remaining 40% is yours.
Zoom and pan sound like two simple transforms. In reality, every single interaction β click positions, element placement, cursor coordinates β has to be translated between screen space and canvas space.
I got this wrong three times before it clicked. Every bug was the same: something was positioned in the wrong coordinate system. A shape drawn at one zoom level would appear in the wrong place at another.
The fix was boring: one utility function that converts between spaces, used everywhere. No exceptions. No shortcuts.
Undo sounds trivial. Push to a stack, pop on Ctrl+Z. But when you have multiple element types β strokes, shapes, sticky notes, text β and each has different properties, and some operations are "add" while others are "update" or "delete"β¦
The history stack becomes a state machine. I used Zustand for global board state and maintained a separate action history. The trick was storing actions, not snapshots. Snapshots bloat memory fast. Actions are tiny and reversible.
This is where Sketchr goes from "a drawing app" to "a collaborative tool". And this is where I lost the most sleep.
Here's the truth about real-time apps: they're not real-time. There's always latency. The trick is hiding it.
My approach: optimistic local rendering. When you draw a stroke, it appears on YOUR screen instantly β no round trip. Simultaneously, the event fires to the server via Socket.IO, which broadcasts it to everyone else.
This means your canvas is always slightly ahead of the server. And that's fine. Perceived speed matters more than actual speed.
My first approach was writing every element directly to MongoDB on every change. At 60fps with multiple users drawing simultaneously, that's hundreds of writes per second.
MongoDB said no.
So I built an in-memory cache on the server: a Map<roomId, Map<elementId, Element>>. All real-time changes hit the cache. A debounced function flushes to MongoDB every 5 seconds during active sessions.
This single decision:
The tradeoff: If the server crashes, you lose up to 5 seconds of work. For a whiteboarding app, that's acceptable. For a banking app, absolutely not. Know your tolerance.
Users refresh pages. Tabs crash. WiFi drops. If I destroyed room state on every disconnect, the experience would be terrible.
So I added a 30-second retention window. When a user disconnects, their slot is held. If they reconnect within 30 seconds, they rejoin seamlessly with full state. If not, they're removed and the slot opens up.
This tiny feature made Sketchr feel reliable instead of fragile.
Showing other users' cursors seems easy. Emit cursor position, render a dot. Done.
Not quite. At high mouse-move frequency, you flood the socket with events. And rendering 10 cursors updating 60 times per second causes jank.
The fix: throttle cursor emissions to ~20fps (still feels smooth to humans) and use CSS transforms for cursor rendering instead of re-rendering React components. Never let cosmetic features steal performance from core features.
I added peer-to-peer video calls using PeerJS (a WebRTC abstraction). On paper, WebRTC is elegant. In practice, it's chaos.
WebRTC needs peers to find each other through firewalls and NATs. Sometimes it works flawlessly. Sometimes two users on different networks simply cannot connect. TURN servers fix this, but they add cost and latency.
I used PeerJS's cloud signaling server for the MVP. It works for most cases. Perfection isn't shipping. Shipping is shipping.
Mute, unmute, camera on, camera off, screen share. Each is a state toggle, but they interact. Screen sharing replaces the video track. Stopping screen share should restore the camera. Muting should persist across track swaps.
I burned hours on edge cases like: "User starts screen share β mutes β stops screen share β unmutes. Is the camera on or off?"
Every boolean becomes a state machine if you add enough of them.
This was actually the most enjoyable feature to build. Users type a natural language prompt like "authentication flow with login, register, and password reset" and Gemini generates a structured flowchart that renders directly on the canvas.
I don't ask Gemini for text. I ask for JSON β nodes and connectors with positions, labels, and types. The prompt engineering forces a strict schema: