Scale AI

Scale AI

Software Development

San Francisco, California 172,116 followers

The Data Engine that powers the most advanced AI models.

About us

At Scale, our mission is to accelerate the development of AI applications. We believe that to make the best models, you need the best data. The Scale Generative AI Platform leverages your enterprise data to customize powerful base generative models to safely unlock the value of AI. The Scale Data Engine consists of all the tools and features you need to collect, curate and annotate high-quality data, in addition to robust tools to evaluate and optimize your models. Scale powers the most advanced LLMs and generative models in the world through world-class RLHF, data generation, model evaluation, safety, and alignment. Scale is trusted by leading technology companies like Microsoft and Meta, enterprises like Fox and Accenture, Generative AI companies like Open AI and Cohere, U.S. Government Agencies like the U.S. Army and the U.S. Airforce, and Startups like Brex and OpenSea.

Website
https://scale.com
Industry
Software Development
Company size
501-1,000 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2016
Specialties
Computer Vision, Data Annotation, Sensor Fusion, Machine Learning, Autonomous Driving, APIs, Ground Truth Data, Training Data, Deep Learning, Robotics, Drones, NLP, and Document Processing

Locations

  • Primary

    303 2nd St

    South Tower, 5th FL

    San Francisco, California 94107, US

    Get directions

Employees at Scale AI

Updates

  • View organization page for Scale AI, graphic

    172,116 followers

    Today, we’re announcing Scale has closed $1B of financing at a $13.8B valuation, led by existing investor Accel. For 8 years, Scale has been the leading AI data foundry helping fuel the most exciting advancements in AI, including autonomous vehicles, defense applications, and generative AI. With today’s funding, we’re moving into the next phase of our journey: accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI). “Our vision is one of data abundance, where we have the means of production to continue scaling frontier LLMs many more orders of magnitude. We should not be data-constrained in getting to GPT-10.” - Alexandr Wang, CEO and founder of Scale AI. This new funding also enables Scale to build upon our prior model evaluation work with enterprise customers, the U.S. Department of Defense, and collaboration with the White House to deepen our capabilities and offerings for both public and private evaluations. There’s a lot left to do. If this challenge excites you, join us: https://scale.com/careers Read the full announcement: https://lnkd.in/gVBhaPZ5

    Scale’s Series F: Expanding the Data Foundry for AI

    Scale’s Series F: Expanding the Data Foundry for AI

    scale.com

  • View organization page for Scale AI, graphic

    172,116 followers

    LLMs have become more capable with better training and data. But they haven’t figured out how to “think” through problems at test-time. The latest research from Scale finds that simply scaling inference compute–meaning, giving models more time or attempts to solve a problem–is not effective because the attempts are not diverse enough from each other. 👉 Enter PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language to encourage response diversity. PlanSearch enables the model to “think” through various strategies before generating code, making it more likely to solve the problem correctly. The Scale team tested PlanSearch on major coding benchmarks (HumanEval+, MBPP+, and LiveCodeBench) and found it consistently outperforms baselines, particularly in extended search scenarios. Overall performance improves by over 16% on LiveCodeBench from 60.6% to 77%. Here’s how it works: ✅ PlanSearch first generates high-level strategies, or "plans," in natural language before proceeding to code generation. ✅ These plans are then further broken down into structured observations and solution sketches, allowing for a wider exploration of possible solutions. This increases diversity, reducing the chance of the model recycling similar ideas. ✅ These plans are then combined before settling on the final idea and implementing the solution in code. Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. When PlanSearch is paired with filtering techniques—such as submitting only solutions that pass initial tests—we can get better results overall and achieve the top score of 77% with only 10 submission attempts. Big thanks to all collaborators on this paper including: Evan Wang, Hugh Zhang, Federico Cassano, Catherine Wu, Yunfeng Bai, William Song, Vaskar Nath, Ziwen H., Sean Hendryx, Summer Yue 👉 Read the full paper here: arxiv.org/abs/2409.03733

    • No alternative text description for this image
  • View organization page for Scale AI, graphic

    172,116 followers

    📣 Happening tomorrow! Get ahead of the curve and learn how to credibly assess LLM performance. 👇 How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh tomorrow, Wednesday September 4 at 10AM PT, for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. Can’t make it? Register to receive the recording. Register here 👉 https://lnkd.in/gRFgMSVT

    • No alternative text description for this image
  • View organization page for Scale AI, graphic

    172,116 followers

    How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh on Wednesday September 4 at 10AM PT for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. You don't want to miss this one. 👉 Register here: https://lnkd.in/gRFgMSVT

    • No alternative text description for this image
  • View organization page for Scale AI, graphic

    172,116 followers

    “The most valuable thing for most businesses going forward is the proprietary data that they have. You can point that model at your data and be able to extract information and value from that that no one else can.” Scale’s Managing Director and former CTO of the United States, Michael Kratsios went on the You Might be Right podcast from the Howard H. Baker Jr. School of Public Policy and Public Affairs at the University of Tennessee, Knoxville. Along with hosts, former Tennessee Governors Philip Bredesen and Bill Haslam, he discussed: • Has AI been as disruptive as we imagined a year ago? • Whether the United States is in a position to lead in AI • How businesses can use AI to their competitive advantage Listen → https://lnkd.in/dSBcT94m 

    • No alternative text description for this image
  • View organization page for Scale AI, graphic

    172,116 followers

    📣 Scale is excited to introduce the latest addition to the SEAL Leaderboards: Adversarial Robustness! Designed to uncover potential risks that may not be apparent in standard testing, this leaderboard evaluates top models against 1,000 adversarial prompts, covering critical areas like illegal activities, harm, and hate speech.   Here's what sets the leaderboard apart: ✅ It measures harm that is universally recognized as problematic, rather than issues that might be deemed harmful by some but not others.  ✅ Its evaluation dataset was created by red teamers selected for their creativity, different approaches to model prompting, and unique opinions.  ✅ We implemented a multi-tiered review system to ensure thorough assessment and accurate categorization of potentially harmful content.  ✅ We openly publish our harm categories and encourage contributions from the community to refine and add details to these definitions. By releasing the Adversarial Robustness leaderboard, we remain committed to advancing AI safety standards industry-wide, empowering the AI community to build safer, more trustworthy models. Explore our methodology and results: https://lnkd.in/g7hW476N

    • No alternative text description for this image
  • View organization page for Scale AI, graphic

    172,116 followers

    Bloomberg named Scale as one of its 10 AI Startups to Watch in 2024 in its second-annual leaderboard of AI trailblazers. The list recognizes the top AI startups leading the industry. Join us on our mission to accelerate the development of AI applications: https://lnkd.in/gwgNK9pY Read more: https://lnkd.in/gNbG72PN

    These Are the 10 AI Startups to Watch in 2024

    These Are the 10 AI Startups to Watch in 2024

    bloomberg.com

  • View organization page for Scale AI, graphic

    172,116 followers

    This morning Scale Field CTO Vijay Karunamurthy testified in front of the U.S. House Financial Services Committee on AI applications for the financial services and housing sectors. During the hearing, he discussed how AI is used in these sectors, recommendations for the safe deployment of AI, and industry trends in adopting AI. His key points included: 1️⃣ Foundational AI elements, like AI-ready data strategies, are critical to companies getting high quality outputs from their AI systems. 2️⃣ Fine-tuned, proprietary data is a game changer that enables enterprises to derive value from generative AI for their particular applications. 3️⃣ Deploying AI is top of mind and must be done safely and responsibly. Read his prepared remarks here: https://lnkd.in/dW6Sqsig

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Scale AI 8 total rounds

Last Round

Series F

US$ 1.0B

See more info on crunchbase