Bringing Computer Vision to the Edge: How I Cut Real-Time Similarity Search Battery Usage by 50%

Over the past few months, I’ve been building one of the most exciting—and technically demanding—projects I’ve ever worked on: a real-time, on-device computer vision web app designed specifically for low-end mobile devices.

The app runs entirely on the edge:

  • in the browser,
  • on mobile hardware,
  • with partially intermittent connectivity,
  • and with strict resource constraints.

It wasn’t just about making it work—it had to be efficient, especially when it came to battery usage.

That’s where things got interesting.


The Problem: 1% Battery Drain… per Minute

At the heart of the application is a lightweight ONNX model that runs inference every second. Each inference produces an embedding, and that embedding must be compared with a database of 2000+ embeddings to find the nearest matches.

This similarity search computes a cosine similarity between:

  • 1 query embedding
  • and 2000+ database embeddings
  • each containing 32 float values

Every. Single. Second.

That’s an O(N) operation, and it was expensive.

During early tests, the app was using 1% of battery per minute on some standard spec devices even.

That’s an instant deal-breaker.


Why This Problem Matters Today More Than Ever

Computer vision workloads have been shifting rapidly toward embedding-based approaches. Instead of relying solely on classification heads or heavy post-processing, modern vision systems increasingly represent images, gestures, poses, or features as fixed-length embedding vectors.

This opens the door to:

  • multimodal search,
  • semantic comparison,
  • clustering,
  • personalization,
  • and real-time context-aware applications.

And with the explosive growth of edge AI, many of these systems are no longer running on powerful servers—they’re being deployed directly on:

  • phones,
  • wearables,
  • AR devices,
  • lightweight IoT hardware,
  • and browser environments using ONNX, WebAssembly, and WebGPU.

The rise of embedding-first computer vision makes similarity search—especially efficient, battery-friendly search—one of the most important challenges in modern edge computing.

This is why solving this problem wasn’t just an optimization exercise.
It reflects a shift in where computer vision is heading.


Why Edge Computing Changes Everything

Today’s apps are becoming more complex and increasingly pushed toward the edge. Running machine-learning pipelines locally gives users improved privacy, lower latency, and offline capability—but it also exposes limitations that cloud apps never have to think about:

  • limited CPU
  • limited memory
  • limited battery
  • uncertain browser optimizations
  • uneven device capability

The architecture itself had to respect these constraints.

So, I explored two major optimization strategies:

  1. SIMD
  2. Approximate Nearest Neighbors (ANN) with HNSW

Both were promising. Only one survived.


Exploring SIMD: Why It Almost Worked

Before jumping into complex algorithms, I looked into low-level optimization: SIMD, which stands for Single Instruction, Multiple Data.

SIMD allows the CPU to apply the same instruction across multiple data points simultaneously. Perfect for vector math like cosine similarity.

Here’s a simple visualization:

Low-end mobile devices—the ones I was optimizing for—often:

  • don’t support WASM SIMD,
  • or support only partial instruction sets,
  • or silently fall back to a non-SIMD path.

In theory, SIMD could have reduced the per-inference computation cost by up to 4× (around 75%) because it can process multiple float values in parallel with a single instruction. But when a device falls back to scalar execution, that benefit disappears entirely—sometimes even making things slower and increasing battery usage.

I needed a solution that worked consistently across all devices, not just flagship phones.

So SIMD, despite being beautiful, wasn’t reliable enough..


Switching Strategies: Approximate Nearest Neighbors (ANN)

If I couldn’t make O(N) cheaper, maybe I didn’t need O(N) at all.

Approximate Nearest Neighbor algorithms dramatically reduce search time by returning near-exact results without scanning the entire dataset.

They trade a tiny bit of accuracy for huge speed gains.

The algorithm that stood out was HNSW.


Understanding HNSW: Hierarchical Navigable Small World

HNSW stands for Hierarchical Navigable Small World.

It structures the embedding database as multiple graph layers:

  • upper layers are sparse
  • lower layers are dense
  • you traverse from top to bottom to zoom in on nearest neighbors

Think of it like a multi-level map:

Instead of checking all 2000 embeddings, you:

  1. Start high up in a sparse graph
  2. Make greedy hops toward the query embedding
  3. Drop down layers as you get closer
  4. End in a local neighborhood containing the likely top matches

This reduces complexity from:

  • O(N)O(log N)

For my dataset, that’s a massive improvement.

If exact search cost ~10 units of battery,
O(log N) brings it closer to ~1 unit.


But ANN Introduces a New Problem: Accuracy

HNSW is incredibly fast… but it’s approximate.

Typically you get:

  • 99% to 99.5% accuracy

Which sounds high—but for a computer vision pipeline, even 0.5% error can be problematic. I needed a way to recover precision without losing the ANN speed advantage.

So I came up with a hybrid solution.


My Hybrid Strategy: More Candidates + Exact Re-Ranking

To maintain accuracy, I modified the pipeline in two key ways:


1. Increasing Top-K Results from ANN

Originally, I retrieved top-3 results from exact search.
With HNSW, this might miss the true match—so I expanded the list.

I increased the top-K to 10, giving:

  • more coverage
  • higher chance the true match appears
  • ~30 combined candidate embeddings

2. Exact Re-Ranking on the Shortlist

Once ANN produced its candidate set, I performed a precise cosine similarity calculation only on that small subset.

This gave me:

  • ANN efficiency
  • Near exact accuracy
  • Great battery performance
  • Stable behavior across all devices

The best of both worlds.


The Outcome: Over 50% Reduction in Battery Drain

As you might already know, measuring battery drain in mobile browsers is notoriously difficult, mostly because the browser APIs don’t expose these sort of hardware level information. However, with the assistance of the rest of the teammates, we were able to confirm that we were able to reduce the battery drain by 50% compared to our former baseline.

After integrating HNSW + re-ranking:

  • 🚀 Similarity search CPU cost dropped sharply
  • 🔋 Battery usage reduced by more than 50%
  • Inference remained real-time (1 Hz)
  • 🎯 Accuracy remained nearly 100%

This made the application practical even for low-end devices—the very users it was built for.


Closing Thoughts

We’re entering a new era where computer vision and embedding-based search are becoming the default way apps understand the world. But as these workloads increasingly move to the edge—into browsers, onto cheap devices, and into places with unstable connectivity—efficiency becomes as important as accuracy.

This project forced me to rethink old assumptions and design for the constraints of the real world. It’s been one of the most rewarding optimization challenges I’ve worked on.

And this is only the beginning.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Discover more from Dulan Dias, Ph.D.

Subscribe now to keep reading and get access to the full archive.

Continue reading