Our project, CorroSight, was inspired by the critical need to modernize pipeline integrity management. Historically, pipeline operators have struggled with manual, error-prone alignment of In-Line Inspection (ILI) data collected years apart by different vendors. We saw an opportunity to apply AI and advanced data science to this underutilized field, transforming fragmented datasets into a unified "digital twin" of pipeline health.

Inspiration

The inspiration came from the physical complexity of pipeline inspections. Imagine a "smart pig" traveling 57,000 feet through a pipe, with its odometer slipping and sliding along the way. We were fascinated by the challenge of "odometer drift" where a physical feature like a weld might appear at different distances in different years and the potential for AI to bridge the gap between these inconsistent data "snapshots".

What it does

CorroSight is an AI-powered pipeline integrity platform that automates the complex process of aligning and analyzing In-Line Inspection (ILI) data from multiple years and vendors. By using stable structural features like girth welds as anchors, the system corrects physical measurement errors known as "odometer drift," allowing for the precise tracking of corrosion spots across different datasets. Through advanced mathematical models like the Hungarian Algorithm and regression analysis, it identifies matching defects and calculates their growth rates to predict future pipe wall loss. Integrated with xAI’s Grok, the platform offers a "Virtual ILI Predictor" and an AI chat copilot to simulate future risks and generate prioritized repair schedules, transforming fragmented spreadsheets into a predictive digital twin that prevents environmental leaks and ensures operational safety.

How we built it

We built CorroSight as an AI-powered platform using a modern tech stack: FastAPI for the backend engine and Angular 17 for the interactive frontend.

  1. The Data Engine: We developed a configuration-driven normalization layer to handle 59+ different event types and inconsistent column naming across three decades of data (2007, 2015, and 2022).
  2. The Alignment Algorithm: We used 1,603 common girth weld joints as "ground-truth anchors," applying piecewise linear interpolation to eliminate odometer drift.
  3. The Matching Logic: To pair defects across years, we implemented the Hungarian Algorithm for globally optimal matching, ensuring no anomaly "stole" a match from another.
  4. The AI Layer: We integrated xAI's Grok to provide a "Chat Copilot" and a "Virtual ILI Predictor" that simulates future inspection results based on historical growth tre

Challenges we ran into

  1. Vendor Inconsistency: Each vendor used different clock formats and measurement granularities. We solved this by creating a universal parser that normalizes all positions to a decimal 12-hour scale.
  2. Non-Linear Drift: Odometer drift isn't constant; it changes as the tool speeds up or slows down. Our piecewise interpolation was essential here, as it allowed for local corrections between every 35-foot joint.
  3. Data Scarcity for ML: With only three data points per defect (2007, 2015, 2022), traditional ML would overfit. We chose to use linear and quadratic regression for growth trends, providing the most "honest" fit for the data available.

Accomplishments that we're proud of

We are incredibly proud of building a fully functional end-to-end pipeline that transforms raw, messy Excel data from multiple decades into a cohesive digital twin in under 10 seconds. Successfully implementing the Hungarian Algorithm for global anomaly matching was a major milestone, as it moved beyond simple proximity matching to a mathematically rigorous solution that accounts for the physical reality of corrosion. Additionally, we are proud of our "Virtual ILI Predictor," which uses xAI’s Grok to generate "future" inspection reports. Seeing the system correctly identify that a specific anomaly from 2007 was the same one detected in 2022 despite thousands of feet of odometer drift validated our entire technical approach.

What we learned

This project taught us that in industrial data science, domain physics is just as important as the code. We learned that simple linear alignment isn't enough for 57,000 feet of pipe; instead, we had to use "piecewise" logic to account for the tool’s variable speed. We also gained deep experience in data normalization, learning how to map 59 different event types and three distinct vendor formats into a single universal schema. On the AI side, we discovered how to effectively use LLMs for structured engineering tasks, moving past simple chat to using AI for generating narrative "story cards" that explain the history and risk of a specific physical defect.

What's next for CorroSight

  1. Interpretability for Onsite Workers: We plan to develop an interactive 3D Pipeline Visualization module that transforms complex sensor data into a realistic digital twin. Instead of reading technical spreadsheets, onsite technicians can use a tablet to rotate, zoom, and "walk through" a high-fidelity 3D model of the pipeline. This visualization will highlight critical segments in red, allowing workers to instantly grasp the physical location and severity of risks before they even break ground.
  2. Computer Vision Integration: To bridge the gap between virtual models and physical reality, we will integrate Computer Vision algorithms. By training models on historical "dig" photos and actual pipe surface imagery, the system will be able to cross-reference visual data with ILI sensor readings. This will allow the platform to identify the "visual signature" of specific types of corrosion and precisely overlay that risk onto the 3D model, ensuring workers locate the exact spot of concern with millimeter precision.
  3. And possible integration of Augmented Reality (AR).

Built With

Share this project:

Updates