How I Built Pointerful's AI Detection System from Scratch
A deep dive into the engineering behind Pointerful's real-time cursor tracking, click detection, and smart zoom AI — written by the developer who built it.
How I Built Pointerful's AI Detection System from Scratch
Every time you record with Pointerful, an AI watches over your shoulder — detecting every click, tracking every cursor movement, and planning cinematic camera moves automatically. Here's how I built that.
The Problem: Editing is the Bottleneck
When I first started building Pointerful, I watched users record amazing demos and tutorials, then spend hours manually editing out dead time, zooming into actions, and adding smooth transitions. The editing was taking 10x longer than the recording.
I knew there had to be a better way: what if the recorder itself could understand what was important?
Phase 1: Capturing the Raw Data
The first challenge was figuring out what data to capture. A screen recording is just a video — but to make it "smart," we needed more than pixels.
The Event Pipeline
I built an event capture system that intercepts at the browser-level API layer:
- •Mouse Events: Every mousemove, mousedown, mouseup with precise timestamps and coordinates
- •Click Events: Left clicks, right clicks, double clicks — each tagged with the DOM element that was clicked
- •Keyboard Events: Typing activity patterns (not the actual keys — just timing and frequency)
- •Scroll Events: Page scrolls with direction and velocity
- •Navigation Events: Tab switches, URL changes, window resizes
All these events get streamed into a single timeline alongside the video frames. The key insight? Store everything — filter later. Storage is cheap, but missing an event means the AI is blind.
Phase 2: The Attention Engine
With raw event data streaming in, the next problem was: how does the AI know what to zoom into?
The Scoring Algorithm
I built what I call the Attention Engine — a deterministic scoring system that evaluates every moment of the recording:
| Signal | Weight | Why |
|---|---|---|
| Click event | High | User interacted = viewer should see it |
| Mouse pause + movement | High | User read something = important content |
| Rapid clicks | Medium | Workflow demonstration |
| Scroll followed by pause | Medium | User found what they were looking for |
| No activity > 5s | Negative (remove) | Dead time |
The Zoom Planning Algorithm
Once the Attention Engine identifies important moments, the system plans camera movements. This was the hardest part.
`
For each important moment:
- 1.Identify the bounding box of the action (click position ± context)
- 2.Calculate optimal zoom level (not too tight, not too wide)
- 3.Plan a smooth Bezier curve path from current camera position
- 4.Add 200ms of dwell time before and after each zoom
- 5.Ensure minimum 1.5 seconds between camera moves
`
The 1.5-second minimum was discovered through hours of testing. Any faster and viewers got motion sickness. Any slower and the video felt sluggish.
Phase 3: Real-Time Processing Constraints
One of the toughest constraints: the AI has to work in real-time during recording, with zero perceptible lag.
I couldn't run heavy ML models in the browser without destroying performance. The solution was a hybrid approach:
- 1.During recording: Lightweight heuristics and scoring (pure math, no models)
- 2.During export: Optional deep analysis with the full model pipeline
- 3.Preview mode: Deterministic replay of saved event data
This split approach means the recorder stays snappy while the export can take its time for perfection.
Phase 4: The Edge Cases That Almost Broke Me
The "Frantic Clicker" Problem
Some users click rapidly — 5+ clicks per second. The AI would try to zoom into each one, creating a seizure-inducing video. Fix: Debounce clicks within 800ms windows and only zoom to the cluster centroid.
The "Invisible Scroll" Problem
On long pages, users scroll continuously. The AI thought everything was important. Fix: Only trigger on scroll-stop events (scroll + 200ms pause = potential point of interest).
The "Where Did My Cursor Go" Problem
Users would move their cursor off-screen and the AI would zoom into empty space. Fix: Filter out-of-bounds cursor positions and predict trajectory for brief exits.
What I Learned
Building the AI detection system taught me that intelligence doesn't need to be a black box. The Attention Engine is entirely deterministic — there's no mystery about why a zoom happens. Every camera movement can be traced back to specific mouse events and scoring rules.
This transparency turned out to be a feature: users can adjust zoom sensitivity, minimum zoom duration, and even manually override any AI decision in the timeline editor.
The AI isn't the boss — it's an extremely fast assistant.
What's Next
I'm currently working on the next generation of the Attention Engine that adds:
- •Contextual understanding — detecting whether you're in a code editor vs a slideshow vs a browser
- •Voice-guided framing — using speech detection to time zooms with what you're saying
- •Smart text detection — automatically framing readable text regions
The goal remains the same: make the AI invisible, so you can focus on creating.