Over the past few months, I’ve been building one of the most exciting—and technically demanding—projects I’ve ever worked on: a real-time, on-device computer vision web app designed specifically for low-end mobile devices.
The app runs entirely on the edge:
- in the browser,
- on mobile hardware,
- with partially intermittent connectivity,
- and with strict resource constraints.
It wasn’t just about making it work—it had to be efficient, especially when it came to battery usage.
That’s where things got interesting.
The Problem: 1% Battery Drain… per Minute
At the heart of the application is a lightweight ONNX model that runs inference every second. Each inference produces an embedding, and that embedding must be compared with a database of 2000+ embeddings to find the nearest matches.
This similarity search computes a cosine similarity between:
- 1 query embedding
- and 2000+ database embeddings
- each containing 32 float values
Every. Single. Second.
That’s an O(N) operation, and it was expensive.
During early tests, the app was using 1% of battery per minute on some standard spec devices even.
That’s an instant deal-breaker.
Why This Problem Matters Today More Than Ever
Computer vision workloads have been shifting rapidly toward embedding-based approaches. Instead of relying solely on classification heads or heavy post-processing, modern vision systems increasingly represent images, gestures, poses, or features as fixed-length embedding vectors.
This opens the door to:
- multimodal search,
- semantic comparison,
- clustering,
- personalization,
- and real-time context-aware applications.
And with the explosive growth of edge AI, many of these systems are no longer running on powerful servers—they’re being deployed directly on:
- phones,
- wearables,
- AR devices,
- lightweight IoT hardware,
- and browser environments using ONNX, WebAssembly, and WebGPU.
The rise of embedding-first computer vision makes similarity search—especially efficient, battery-friendly search—one of the most important challenges in modern edge computing.
This is why solving this problem wasn’t just an optimization exercise.
It reflects a shift in where computer vision is heading.
Why Edge Computing Changes Everything
Today’s apps are becoming more complex and increasingly pushed toward the edge. Running machine-learning pipelines locally gives users improved privacy, lower latency, and offline capability—but it also exposes limitations that cloud apps never have to think about:
- limited CPU
- limited memory
- limited battery
- uncertain browser optimizations
- uneven device capability
The architecture itself had to respect these constraints.
So, I explored two major optimization strategies:
- SIMD
- Approximate Nearest Neighbors (ANN) with HNSW
Both were promising. Only one survived.
Exploring SIMD: Why It Almost Worked
Before jumping into complex algorithms, I looked into low-level optimization: SIMD, which stands for Single Instruction, Multiple Data.
SIMD allows the CPU to apply the same instruction across multiple data points simultaneously. Perfect for vector math like cosine similarity.
Here’s a simple visualization:

Low-end mobile devices—the ones I was optimizing for—often:
- don’t support WASM SIMD,
- or support only partial instruction sets,
- or silently fall back to a non-SIMD path.
In theory, SIMD could have reduced the per-inference computation cost by up to 4× (around 75%) because it can process multiple float values in parallel with a single instruction. But when a device falls back to scalar execution, that benefit disappears entirely—sometimes even making things slower and increasing battery usage.
I needed a solution that worked consistently across all devices, not just flagship phones.
So SIMD, despite being beautiful, wasn’t reliable enough..
Switching Strategies: Approximate Nearest Neighbors (ANN)
If I couldn’t make O(N) cheaper, maybe I didn’t need O(N) at all.
Approximate Nearest Neighbor algorithms dramatically reduce search time by returning near-exact results without scanning the entire dataset.
They trade a tiny bit of accuracy for huge speed gains.
The algorithm that stood out was HNSW.
Understanding HNSW: Hierarchical Navigable Small World
HNSW stands for Hierarchical Navigable Small World.
It structures the embedding database as multiple graph layers:
- upper layers are sparse
- lower layers are dense
- you traverse from top to bottom to zoom in on nearest neighbors
Think of it like a multi-level map:

Instead of checking all 2000 embeddings, you:
- Start high up in a sparse graph
- Make greedy hops toward the query embedding
- Drop down layers as you get closer
- End in a local neighborhood containing the likely top matches
This reduces complexity from:
- O(N) → O(log N)
For my dataset, that’s a massive improvement.
If exact search cost ~10 units of battery,
O(log N) brings it closer to ~1 unit.
But ANN Introduces a New Problem: Accuracy
HNSW is incredibly fast… but it’s approximate.
Typically you get:
- 99% to 99.5% accuracy
Which sounds high—but for a computer vision pipeline, even 0.5% error can be problematic. I needed a way to recover precision without losing the ANN speed advantage.
So I came up with a hybrid solution.
My Hybrid Strategy: More Candidates + Exact Re-Ranking
To maintain accuracy, I modified the pipeline in two key ways:
1. Increasing Top-K Results from ANN
Originally, I retrieved top-3 results from exact search.
With HNSW, this might miss the true match—so I expanded the list.
I increased the top-K to 10, giving:
- more coverage
- higher chance the true match appears
- ~30 combined candidate embeddings
2. Exact Re-Ranking on the Shortlist
Once ANN produced its candidate set, I performed a precise cosine similarity calculation only on that small subset.

This gave me:
- ANN efficiency
- Near exact accuracy
- Great battery performance
- Stable behavior across all devices
The best of both worlds.
The Outcome: Over 50% Reduction in Battery Drain
As you might already know, measuring battery drain in mobile browsers is notoriously difficult, mostly because the browser APIs don’t expose these sort of hardware level information. However, with the assistance of the rest of the teammates, we were able to confirm that we were able to reduce the battery drain by 50% compared to our former baseline.
After integrating HNSW + re-ranking:
- 🚀 Similarity search CPU cost dropped sharply
- 🔋 Battery usage reduced by more than 50%
- ⚡ Inference remained real-time (1 Hz)
- 🎯 Accuracy remained nearly 100%
This made the application practical even for low-end devices—the very users it was built for.
Closing Thoughts
We’re entering a new era where computer vision and embedding-based search are becoming the default way apps understand the world. But as these workloads increasingly move to the edge—into browsers, onto cheap devices, and into places with unstable connectivity—efficiency becomes as important as accuracy.
This project forced me to rethink old assumptions and design for the constraints of the real world. It’s been one of the most rewarding optimization challenges I’ve worked on.
And this is only the beginning.

Leave a Reply