Inspiration

Short-form food videos are one of the most consumed content formats today. People constantly save recipes from Instagram Reels, TikTok, and YouTube Shorts, intending to cook them later — yet most of those saved recipes never turn into actual meals.

We encountered this problem ourselves. Even when a recipe looked simple, cooking it required replaying videos, manually extracting ingredients, recalculating portions, searching products, and comparing prices across grocery apps. The friction between watching and doing was surprisingly high.

We realized the real problem wasn’t content discovery, but the lack of an execution layer that connects short-form videos to real-world cooking. That insight became the starting point for Yorigo.

What it does

Yorigo is an AI-powered cooking execution platform that transforms short-form recipe videos into structured recipes and instant grocery carts.

Users can input a cooking video link, and Yorigo automatically:

  • Extracts ingredients, quantities, and cooking steps
  • Adjusts portions and merges multiple recipes
  • Generates a ready-to-buy grocery cart linked to online grocery platforms
  • Provides a step-by-step cooking mode to help users complete the meal

Yorigo focuses on turning saved recipes into finished cooking.

Gemini 3 usage (How we used Gemini 3)

We used Gemini 3 as the core reasoning engine for turning short-form videos into actionable cooking data.

Specifically, Gemini 3 helps Yorigo:

  • Understand multimodal inputs (video frames + on-screen text + spoken narration) to infer ingredients and steps that are often implicit in short-form content
  • Extract and normalize ingredient names, quantities, and units into a consistent schema (e.g., converting “a handful / a pinch” into structured, editable values when possible)
  • Resolve ambiguities by combining signals across modalities (e.g., when captions omit quantities but the visual shows the container or portion)
  • Generate structured outputs (recipe JSON) that our app can use for portion scaling, multi-recipe merging, and cart generation
  • Validate outputs with lightweight rules (required fields, unit normalization, duplicates) to make results reliable for real-world execution

In short: Gemini 3 enables Yorigo to move from “video understanding” to “execution-ready structured recipes.”

How we built it

Yorigo is built as a multimodal AI pipeline optimized for real-world execution:

  1. Multimodal Parsing (Gemini 3)
    We analyze video frames, audio narration, and on-screen text and use Gemini 3 to reason over these signals to extract cooking instructions and ingredients.

  2. Recipe Structuring
    Unstructured video content is converted into standardized recipe cards that support portion scaling and multi-recipe aggregation.

  3. Action Conversion
    Structured recipes are translated into optimized grocery carts, reducing redundant items and decision fatigue.

  4. Execution-Focused UX
    A dedicated cooking mode guides users step by step, minimizing friction during the actual cooking process.

Challenges we ran into

  • Incomplete and noisy video data
    Short-form videos often omit quantities or skip steps, making reliable extraction difficult.

  • Multimodal alignment
    Synchronizing visual cues, speech, and text overlays required careful prompting and output validation for Gemini 3.

  • Designing for completion, not accuracy
    High extraction accuracy alone was not enough — we had to optimize for user follow-through and cooking completion.

Accomplishments that we're proud of

  • Built a working MVP that converts short-form videos into actionable grocery carts
  • Successfully applied Gemini 3 multimodal reasoning to a real-life execution problem
  • Designed an end-to-end pipeline focused on action and completion
  • Demonstrated how AI can reduce friction in everyday decision-making

What we learned

  • Multimodal AI delivers the most value when it drives real-world behavior
  • Execution-oriented UX is as important as model performance
  • Reducing decisions is often more powerful than adding features

This project reinforced our belief that AI should help users do, not just understand.

What's next for Yorigo

Next, we plan to:

  • Improve multimodal parsing accuracy through user feedback loops
  • Expand integrations with online grocery platforms
  • Use cooking execution data to enable smarter personalization
  • Evolve Yorigo into a broader daily-life execution platform

Our long-term goal is simple:
turn saved recipes into real meals — automatically.

Built With

  • dart(flutter)
  • docker
  • geminillm
  • json
  • python(fastapi+services)
  • shell
  • typescript(react)
  • yaml
Share this project:

Updates