0% found this document useful (0 votes)
7 views2 pages

Vyorius Test (Computer Vision Intern)

The Vyorius Test involves using a zero-shot vision model to recognize custom object categories from real-time or pre-recorded video, ensuring that none of the detected objects are from the COCO dataset. The project requires implementation in Python with OpenCV and PyTorch, and includes tasks such as displaying annotated video with bounding boxes and confidence scores. Deliverables include a GitHub repository with code, a README, a short write-up on the project's workings and challenges, and a video demonstration.

Uploaded by

Jyanu Ratna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Vyorius Test (Computer Vision Intern)

The Vyorius Test involves using a zero-shot vision model to recognize custom object categories from real-time or pre-recorded video, ensuring that none of the detected objects are from the COCO dataset. The project requires implementation in Python with OpenCV and PyTorch, and includes tasks such as displaying annotated video with bounding boxes and confidence scores. Deliverables include a GitHub repository with code, a README, a short write-up on the project's workings and challenges, and a video demonstration.

Uploaded by

Jyanu Ratna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Vyorius Test (Computer Vision Intern)

Use a zero-shot vision model to recognize objects from a real-time or pre-recorded video.
The twist: the object categories provided will not be part of the common COCO dataset.
This tests the model's generalization ability in a live setting.

Task Breakdown

1. Accept input from a webcam or a local video file.

2. Use a list of custom object categories as text prompts (examples below).

3. Run each frame through a zero-shot model.

4. Display annotated video with:

o Bounding boxes (if supported by model)

o Labels & confidence scores

5. Ensure none of the detected objects are from the COCO dataset.

Object Categories (Not in COCO):

You must detect objects that are not in COCO. Here are some examples you can use:

• A lightbulb

• A Matchstick

• A Monitor

• A lion

• A gaming console

You're welcome to add more, as long as they’re not in COCO (no chairs, people, dogs, etc.)

Technical Requirements

• Must be implemented in Python

• Use OpenCV for video input

• Use PyTorch and pre-trained zero-shot models like CLIP or OWL-ViT

• Write clean, modular, and well-commented code

• Either display results in a live window, or print predictions to console


Bonus Points

• Live prompt editing (change detection classes during runtime)

• Frame rate optimization (>=10 FPS)

• Logging predictions to a file (JSON or CSV)

• Using ONNX or TorchScript to accelerate inference

• Visualize detections with a minimal dashboard or UI

Deliverables

• GitHub repo or zipped folder

• Code + README with:

o Setup instructions

o Model download/usage steps

• Short write-up (1–2 paragraphs) on:

o How it works

o Challenges faced

o What could be improved or added next

• A video Demonstration of the above

You might also like