Yorigo

App page 1
App page 2
App page 3
App page 4
App page 5
App page 6

Inspiration

Short-form food videos are one of the most consumed content formats today. People constantly save recipes from Instagram Reels, TikTok, and YouTube Shorts, intending to cook them later — yet most of those saved recipes never turn into actual meals.

We encountered this problem ourselves. Even when a recipe looked simple, cooking it required replaying videos, manually extracting ingredients, recalculating portions, searching products, and comparing prices across grocery apps. The friction between watching and doing was surprisingly high.

We realized the real problem wasn’t content discovery, but the lack of an execution layer that connects short-form videos to real-world cooking. That insight became the starting point for Yorigo.

What it does

Yorigo is an AI-powered cooking execution platform that transforms short-form recipe videos into structured recipes and instant grocery carts.

Users can input a cooking video link, and Yorigo automatically:

Extracts ingredients, quantities, and cooking steps
Adjusts portions and merges multiple recipes
Generates a ready-to-buy grocery cart linked to online grocery platforms
Provides a step-by-step cooking mode to help users complete the meal

Yorigo focuses on turning saved recipes into finished cooking.

Gemini 3 usage (How we used Gemini 3)

We used Gemini 3 as the core reasoning engine for turning short-form videos into actionable cooking data.

Specifically, Gemini 3 helps Yorigo:

Understand multimodal inputs (video frames + on-screen text + spoken narration) to infer ingredients and steps that are often implicit in short-form content
Extract and normalize ingredient names, quantities, and units into a consistent schema (e.g., converting “a handful / a pinch” into structured, editable values when possible)
Resolve ambiguities by combining signals across modalities (e.g., when captions omit quantities but the visual shows the container or portion)
Generate structured outputs (recipe JSON) that our app can use for portion scaling, multi-recipe merging, and cart generation
Validate outputs with lightweight rules (required fields, unit normalization, duplicates) to make results reliable for real-world execution

In short: Gemini 3 enables Yorigo to move from “video understanding” to “execution-ready structured recipes.”

How we built it

Yorigo is built as a multimodal AI pipeline optimized for real-world execution:

Multimodal Parsing (Gemini 3)
We analyze video frames, audio narration, and on-screen text and use Gemini 3 to reason over these signals to extract cooking instructions and ingredients.
Recipe Structuring
Unstructured video content is converted into standardized recipe cards that support portion scaling and multi-recipe aggregation.
Action Conversion
Structured recipes are translated into optimized grocery carts, reducing redundant items and decision fatigue.
Execution-Focused UX
A dedicated cooking mode guides users step by step, minimizing friction during the actual cooking process.

Challenges we ran into

Incomplete and noisy video data
Short-form videos often omit quantities or skip steps, making reliable extraction difficult.
Multimodal alignment
Synchronizing visual cues, speech, and text overlays required careful prompting and output validation for Gemini 3.
Designing for completion, not accuracy
High extraction accuracy alone was not enough — we had to optimize for user follow-through and cooking completion.

Accomplishments that we're proud of

Built a working MVP that converts short-form videos into actionable grocery carts
Successfully applied Gemini 3 multimodal reasoning to a real-life execution problem
Designed an end-to-end pipeline focused on action and completion
Demonstrated how AI can reduce friction in everyday decision-making