1. Introduction
A few weeks ago, I was assigned a task at work involving object detection. As someone who primarily works on the frontend, I became curious was it possible to implement an object detection model directly in a browser based React app, without relying on a backend or Python-based inference?
This blog post is a continuation of that research. It documents the step-by-step process I took to run a YOLOv7 model using TensorFlow.js within a React project. Along the way, I encountered several technical challenges particularly around model conversion and client-side rendering —that I believe are worth sharing.
My goal is to make this post useful for fellow developers who are exploring the same idea or simply want to integrate machine learning into their frontend applications. I’ll walk you through everything from model conversion, preprocessing, inference, to displaying the results in a browser.
Let’s get started.
2. What is YOLO and Why TensorFlow.js?
🧠 A Quick Overview of YOLO
YOLO (You Only Look Once) is a well-known family of real-time object detection models. It became popular for its ability to detect multiple objects in a single forward pass - making it fast and efficient for applications like surveillance, robotics, and real-time analytics.
Over time, YOLO has evolved into several versions maintained by different contributors:
- YOLOv3 & YOLOv4: Older but still widely used, lightweight, and efficient
- YOLOv5, v6, v8, v11: Developed and maintained by Ultralytics, offering better tooling and performance improvements
- YOLOv7: Developed by WongKinYiu, widely appreciated for its balance of accuracy and speed, and considered one of the most stable and community-driven versions
⚖️ Why Licensing Matters (and Why You Should Care)
When working with open-source models, licensing is not just a legal formality - it determines how you can use, share, or deploy that model. And in many real-world cases, misuse of licenses (even unintentionally) can cause serious issues, especially in commercial settings.
Here's a brief overview:
🔹 AGPLv3 (used by Ultralytics for YOLOv5+):
If you use this in a public-facing app, you're required to open-source your entire application, including any code that interacts with the model even if you didn't modify the model itself.
🔹 YOLOv4:
Released under a custom license that explicitly restricts commercial use, which makes it risky to use in production unless you've obtained special permission.
🔹 YOLOv3 and YOLOv7:
These are safer choices for projects that may eventually be used commercially or shared publicly. YOLOv7, in particular, offers excellent performance without restrictive licensing.
🛑 Note: Always double-check the license of any model you use don't treat open source as "free to use without conditions." It's better to be cautious than to deal with legal issues later on.
🌐 Why TensorFlow.js?
To run the model entirely in the browser, I used TensorFlow.js, a JavaScript library that brings machine learning to the web.
Why TensorFlow.js?
- No backend or server needed
- Seamless integration with React
- GPU acceleration via WebGL
- Ideal for building lightweight prototypes, interactive tools, and real-time demos
In this project, TensorFlow.js allowed me to take a fully trained YOLOv7 model, convert it, and run object detection directly in a React app - no Python, no API calls, no external inference servers.
3. Converting YOLOv7 to TensorFlow.js
Most pre-trained YOLO models - like YOLOv7 - are built in PyTorch, which can't be used directly in the browser. To make it work with TensorFlow.js, we need to convert the model through several stages. Each step transforms the model into a format that gets us closer to running it in the browser.
Below is the step-by-step pipeline I used:
🔄 Conversion Flow
Step 1: Get and Export YOLOv7 from PyTorch to ONNX
First, I used the official YOLOv7 export script to convert the .pt model file into the ONNX format.
Get model from the official repository:
Official repository YOLOV7 by WongKinYiu
!# Download trained weights
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
Export YOLOV7 Model to ONNX
!python export.py --weights ./yolov7-tiny.pt \
--grid --end2end --simplify \
--topk-all 100 --iou-thres 0.65 --conf-thres 0.35 \
--img-size 640 640 --max-wh 640 # For onnxruntime, you need to specify this value as an integer, when it is 0 it means agnostic NMS,
# otherwise it is non-agnostic NMS
The result is a .onnx file containing the YOLOv7 model structure and weights
📎 See the full notebook: here
Step 2: Convert ONNX to TensorFlow.js
Next, I converted the ONNX model into TensorFlow’s SavedModel format using onnx2tf.
# Convert ONNX to TensorFlow SavedModel using onnx2tf
!python -m onnx2tf -i best2.onnx -ois input:1,3,640,640 -osd -dgc
# Convert SavedModel to TensorFlow.js (tfjs)
!tensorflowjs_converter \
--input_format=tf_saved_model \
--output_format=tfjs_graph_model \
saved_model \
tfjs_model
This creates a folder tfjs_model and saved_model
📎 Reference: TFJS Converter Docs
4. Integrating the Model into ReactJS
With the model converted and ready to be used in the browser, the next step is integrating it into a React application. For this project, I used React with Vite, along with Tailwind CSS for UI, and @tensorflow/tfjs for inference.
Here’s a breakdown of how I structured the integration:
🧱 Project Setup
First, I initialized the project with Vite and installed necessary dependencies:
npm create vite@latest object-detection-yolo --template react-ts
cd object-detection-yolo
npm install
Then I installed TensorFlow.js:
npm install @tensorflow/tfjs
Optional: Tailwind CSS for styling
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p
⚙️ Loading and Preparing the Model in React
In the React application, I used a useEffect() hook to load the model and prepare it for inference as soon as the component mounts. This process includes downloading the model, warming it up, and storing relevant metadata in the component’s state.
useEffect(() => {
tf.ready().then(async () => {
const yolov8 = await tf.loadGraphModel(
`${window.location.origin}/yolov7tiny_web_model/model.json`,
{
onProgress: (fractions) => {
setLoading({ loading: true, progress: fractions }); // set loading fractions
},
}
); // load model
if (!yolov8) return;
// warming up model
const dummyInput = tf.ones(yolov8.inputs[0].shape!);
const warmupResults = await yolov8.executeAsync(dummyInput);
setLoading({ loading: false, progress: 1 });
setModel({
net: yolov8,
inputShape: yolov8.inputs[0].shape ?? [1, 0, 0, 3],
}); // set model & input shape
tf.dispose([warmupResults, dummyInput]); // cleanup memory
});
}, []);
Key Steps Explained:
Model Loading: The model is loaded using tf.loadGraphModel() from the local public directory. The progress of the model loading is tracked using the onProgress callback to show a loading indicator on the UI.
Warm-Up Step: Before performing any actual detection, the model is run once with a dummy input (tf.ones(...)) that matches its input shape. This “warms up” the model by initializing memory and caching computation graphs, which helps reduce lag on the first real inference.
Set Model in State: Once the model is ready, it’s stored in the component’s state using setModel, along with its expected input shape. This makes the model available to other parts of the app for processing images or video.
Memory Management: Temporary tensors used during the warm-up are disposed of using tf.dispose() to avoid memory leaks—especially important in browser-based apps where resources are limited.
This entire lifecycle setup ensures that the model is loaded efficiently and ready for real-time inference as soon as the user interacts with the app.
🎨 Preprocessing the Input
Before passing an image or video frame into the model, it needs to be preprocessed to match the model’s expected input format. In this case, the YOLOv7 model (converted to TensorFlow.js) expects an input shape of [1, 640, 640, 3], meaning a single RGB image with dimensions 640×640 pixels.
Here’s how the preprocess() function handles that:
const preprocess = (
source:
| tf.PixelData
| ImageData
| HTMLImageElement
| HTMLCanvasElement
| HTMLVideoElement
| ImageBitmap,
modelWidth: number,
modelHeight: number,
) => {
const { input, xRatio, yRatio } = tf.tidy(() => {
const img = tf.browser.fromPixels(source)
// padding image to square => [n, m] to [n, n], n > m
const [h, w] = img.shape.slice(0, 2) // get source width and height
const maxSize = Math.max(w, h) // get max size
const imgPadded = img.pad([
[0, maxSize - h], // padding y [bottom only]
[0, maxSize - w], // padding x [right only]
[0, 0],
]) as tf.Tensor<tf.Rank.R3>
const xRatio = maxSize / w // update xRatio
const yRatio = maxSize / h // update yRatio
const input = tf.image
.resizeBilinear(imgPadded, [modelWidth, modelHeight]) // resize frame
.div(255.0) // normalize
.expandDims(0) // add batch
return {
input: input,
xRatio: xRatio,
yRatio: yRatio,
}
})
return { input, xRatio, yRatio }
}
What it does:
- Converts the input to a tensor: The image, canvas, or video frame is converted to a TensorFlow tensor using tf.browser.fromPixels().
- Pads the image to make it square: Since many real-world images are rectangular, the function calculates the larger of the two dimensions (height or width) and pads the shorter side so that the image becomes a square. This avoids distortion when resizing later.
- Calculates scale ratios: The original aspect ratio is preserved by storing the horizontal (xRatio) and vertical (yRatio) scaling factors. These will later be used to map bounding box coordinates back to the original image size.
- Resizes and normalizes the image: The square image is resized to the model’s expected dimensions (modelWidth × modelHeight), normalized to values between 0 and 1, and expanded to include the batch dimension.
-
Memory-safe execution with
tf.tidy()
: The entire process is wrapped in tf.tidy() to automatically dispose of intermediate tensors and prevent memory leaks in the browser.
Output:
The function returns:
- input: the preprocessed image tensor ready to be passed to the model
- xRatio and yRatio: scaling factors to restore original coordinate positions later during post-processing
This preprocessing step ensures that any input image, video frame, or canvas can be fed into the model without shape mismatch errors, while also preserving spatial accuracy for rendering detection results.
🔍 Running Inference and Rendering the Result
Once the input image is preprocessed, the next step is to run it through the model and render the detection results. This is handled by the detect2() function, which performs inference, processes the output, and visualizes the detected objects on a element.
export const detect2 = async (
source:
| tf.PixelData
| ImageData
| HTMLImageElement
| HTMLCanvasElement
| HTMLVideoElement
| ImageBitmap,
model: { net: tf.GraphModel<string | tf.io.IOHandler>; inputShape: number[] },
treshold: number,
canvasRef: HTMLCanvasElement,
callback = () => {},
) => {
const [modelWidth, modelHeight] = model.inputShape.slice(1, 3) // get model width and height
tf.engine().startScope() // start scoping tf engine
const { input, xRatio, yRatio } = preprocess(source, modelWidth, modelHeight) // preprocess image
const res = (await model.net.executeAsync(input)) as tf.Tensor<tf.Rank.R2> // inference model
const dets = res.arraySync()
renderBoxesSimple(canvasRef, dets, [xRatio, yRatio], treshold)
tf.dispose([res]) // clear memory
callback()
tf.engine().endScope() // end of scoping
}
What the function does:
- Extracts model dimensions: The model’s expected input width and height are taken from its inputShape and passed to the preprocessing function.
-
Starts a memory scope:
tf.engine().startScope()
is called to ensure that any tensors created within this block are tracked and can be cleaned up afterward. This is important for long-running apps like webcam feeds, where unmanaged memory usage can grow rapidly. -
Preprocesses the input: The input (image, video, canvas, etc.) is processed using the
preprocess()
function, which returns a normalized, padded, and resized tensor along with the scaling ratios needed to map detections back to the original image. -
Runs model inference: The preprocessed input is passed to
executeAsync()
, which returns a prediction tensor. This tensor contains the raw detection results: bounding boxes, class IDs, and confidence scores. -
Processes output and renders detections: The output tensor is converted to a JavaScript array with
arraySync()
and passed to a custom rendering function (renderBoxesSimple). This function draws the bounding boxes and labels directly onto the canvas using the correct scale and position. -
Cleans up memory: After inference is complete, the result tensor is disposed using
tf.dispose()
, and the scope is ended withtf.engine().endScope()
ensuring all temporary tensors are released. - Executes optional callback: A callback can be provided to trigger any additional logic after the detection is complete (e.g., logging, UI updates, analytics).
Summary:
This function acts as the main detection loop. It takes an image, processes it, feeds it into the model, and then displays the result all within a memory-safe scope. It’s designed to be reused in real-time pipelines, like webcam-based detection systems or image upload flows.
🖍️ Rendering the Bounding Boxes and Labels on Canvas
Once the model produces detection results, the final step is to visualize them. This is handled by the renderBoxesSimple()
function, which draws bounding boxes and corresponding class labels onto an HTML <canvas>
.
export const renderBoxesSimple = (
canvasRef: HTMLCanvasElement,
boxes_data: number[][],
ratios: number[],
threshold: number,
) => {
const ctx = canvasRef.getContext('2d')
if (!ctx) return
ctx.clearRect(0, 0, ctx.canvas.width, ctx.canvas.height) // clean canvas
// font configs
const font = `${Math.max(
Math.round(Math.max(ctx.canvas.width, ctx.canvas.height) / 40),
14,
)}px Arial`
ctx.font = font
ctx.textBaseline = 'top'
if (!ctx) return
boxes_data.forEach((det) => {
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const [_, x0, y0, x1, y1, cls_id, score] = det
if (score < threshold / 100) return
const [xRatio, yRatio] = ratios
// Konversi koordinat ke ukuran gambar asli
const origX0 = x0 * xRatio
const origY0 = y0 * yRatio
const origX1 = x1 * xRatio
const origY1 = y1 * yRatio
const colors = new Colors()
const color = colors.get(cls_id)
// Gambar background
ctx.fillStyle = Colors.hexToRgba(color, 0.2)!
ctx.fillRect(origX0, origY0, origX1 - origX0, origY1 - origY0)
// Gambar kotak (bounding box)
ctx.strokeStyle = color
ctx.lineWidth = 2
ctx.strokeRect(origX0, origY0, origX1 - origX0, origY1 - origY0)
// Draw the label background.
ctx.fillStyle = color
const text = `${labels[cls_id]}: ${score.toFixed(2)}%`
const textWidth = ctx.measureText(text).width
const textHeight = parseInt(font, 10) // base 10
const yText = origY0 - (textHeight + ctx.lineWidth)
ctx.fillRect(
origX0 - 1,
yText < 0 ? 0 : yText, // handle overflow label box
textWidth + ctx.lineWidth,
textHeight + ctx.lineWidth,
)
// Draw labels
ctx.fillStyle = '#ffffff'
ctx.fillText(text, origX0 - 1, yText < 0 ? 0 : yText)
})
}
What the function does:
Prepares the canvas
- It starts by getting the canvas rendering context (ctx) and clearing any existing drawings using clearRect().
- The font size is set dynamically based on the canvas size to ensure label text scales appropriately.
Iterates through detection results
- For each detection in boxes_data, the function extracts the bounding box coordinates (x0, y0, x1, y1), class ID (cls_id), and confidence score (score).
- If the confidence score is below the defined threshold, the detection is skipped.
Scales bounding boxes
- Coordinates are rescaled back to the original image dimensions using the xRatio and yRatio values obtained during preprocessing.
Draws bounding boxes and background
- A semi-transparent background is drawn to highlight the detected object.
- A colored border (stroke) is rendered around the object using a consistent color assigned to the class ID.
Adds labels
- A solid background is drawn behind the label text to improve readability.
- The label includes the class name and confidence score, and is positioned just above the bounding box.
- White text (#ffffff) is used for high contrast.
Color management
- The function uses a helper class Colors() (not shown here) to assign consistent, visually distinct colors for each class.
Example output:
- A green box around a person with the label person: 94.23%
- A blue box around a car with the label car: 88.17%
This function ensures that detection results are not just computed, but clearly and professionally visualized — making it useful for demos, prototypes, and real-time visual feedback in the browser.
5. Results and Performance
After integrating everything, I was able to run real-time object detection entirely in the browser using a React app — no server-side processing, no backend API, and no Python code involved at runtime.
✅ What Worked Well
- Client-side inference with TensorFlow.js worked surprisingly well for images and short video clips.
- Bounding boxes and labels rendered cleanly on top of a canvas element, with consistent performance.
- Warm-up step noticeably improved initial response time, avoiding delays on first detection.
- The model ran on WebGL acceleration, making it fairly efficient even on mid-range laptops.
🖼️ Visual Output
I tested the system on a variety of images with multiple objects. The model was able to:
- Detect and classify multiple objects with reasonable accuracy
- Adjust bounding boxes according to the original image ratio
- Display real-time updates when used with webcam or video input
If you’re curious to try it yourself:
🚀 Live Demo
⚠️ Limitations and Considerations
As with any frontend-only machine learning project, there are trade-offs:
- Browser memory usage can spike, especially with large input images or repeated inference
- Model size and load time: The TFJS model (~30–50MB) can take a few seconds to download depending on connection
- Performance varies: On mobile or low-end devices, detection can lag or cause dropped frames
- Output format from YOLOv7 required some adjustment to interpret correctly in TensorFlow.js
That said, for prototyping, learning, and lightweight client-side ML applications — this approach works surprisingly well.
6. Final Thoughts
This project started as part of a task at work, but it quickly grew into a deeper exploration of what’s possible with machine learning on the frontend. Running an object detection model like YOLOv7 directly in a browser — without any backend — might not be the most common approach, but it’s a powerful proof-of-concept that opens up a lot of possibilities.
Along the way, I faced several challenges — from converting the model across formats to adapting the output for frontend rendering. But those obstacles were exactly what made this process meaningful — and now, I hope, useful for others too.
If you’re a frontend developer curious about AI, or someone working on rapid prototyping with limited backend infrastructure, I hope this guide provides both inspiration and practical guidance.
🔗 Resources Recap
GitHub Repo: github.com/ihda06/object-detection-yolo
Live Demo: object-detection-yolo-ihda.vercel.app
Model Conversion (Colab):
If you found this helpful, feel free to share it or fork the repo.
And if you’re working on something similar — I’d love to connect, collaborate, or just chat.
Thanks for reading 🙌
Top comments (0)