2025.11.09 Data Extraction Needs UIpath
2025.11.09 Data Extraction Needs UIpath
1 engine_description
1 engine_description
1 engine_description
1 engine_description
1 engine_description
1 draught
1 speed
1 deadweight (name of collumn in dataset)
1 design_max_speed
1 container_capacity
1 gear
1 hull_description
1 cargo_capacity
1 cargo_capacity
1 fuel_consumption
1 fuel_consumption
1 fuel_consumption
1 engine_description
1 hull_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
engine_description
builder
hull_description
hull_description
hull_description
hull_description
hull_description
hull_description
hull_description
hull_description
cargo_capacity
cargo_capacity
cargo_capacity
cargo_capacity
cargo_capacity
cargo_capacity
cargo_capacity
Data Category
Fuel Consumption (the data are in are in the engine_description column Y in csv file)
Engine Type and Manufacturer Identification
Engine Power (kW)
Engine Power (HP)
Engine RPM
Draught Measurements
Speed Analysis
Deadweight Information (name of collumn in dataset is called deadweight - column L in scsv file)
Maximum Speed Design
Container Capacity
Cargo Handling Gear
Cargo Tank and Frame Details
Number of Cargo Tanks
Cargo Handling Gear
Fuel Capacity
Auxiliary Fuel Consumption
Fuel Type
Main Engine Details
Draught and Depth
Transmission Type
Starter Type
Cylinders, Bore, and Stroke
Propulsion System and Thrusters
Generators and Electric Installations
Boilers and Auxiliary Equipment
Boiler Type
Compressed Air Receivers
Shaft and Propeller Details
Bow Thruster
Stern Thruster
Rudder
Shipbuilder Identification (builder column as V in csv file)
Hull Type
Hull Material
Construction Details
Bulkheads
Deck Information
Classification and Ice Class
Additional Notations
Bow Type
Cargo Hold Capacity
Tank Capacity and Volume
Cargo Spaces and Volume Details
Segregated Ballast
Specialized Cargo Types
Heating Coils and Temperature
Bunker and Fuel Capacity
Pattern/Keyword
"Fuel consumption:", "t/day at", "knots", "per day", "mt per day", "Speed & Consumption:", "tonnes per day", "ISO 700"
"Main engine:", "Engine builder:", "Diesel engine", "MAN B&W", "Sulzer", "Mitsubishi", "Akasaka", "Doosan", "Hyundai", "Mi
"power:", "kW", "Power (kW)", "kw."
"hp", "bhp", "hp.", "bhp.", "Power (hp)"
"rpm", "Revolution (rpm)"
"m", "ft", "draft", "depth"
"Avg/Min/Max", "knots", "kn", "km/h", "Service speed"
"tons", "DWT"
"Max. Speed:", "kn", "design speed"
"TEU", "Twenty-Foot Equivalent Unit"
"Crane:", "Pump:", "CO2 tanks", "handling gear"
"Cargo tanks:", "Longitudinal frames:", "stainless steel"
"Number of cargo tanks:", "Cargo tanks:", "Tanks:"
"Cargo handling gear:", "Crane", "E/R Overhead Crane"
"Fuel:", "Fuel Capacity:", "Distillate Fuel:", "Residual Fuel:", "Bunkers:"
"Aux Generator:", "Auxiliary Engines", "Generators:", "kW"
"Fuel type:", "MGO", "Distillate Fuel", "LNG", "Residual Fuel", "marine diesel"
"Main engine:", "stroke", "cylinders", "engine power:", "rpm"
"Draught:", "Depth:", "Draft:", "Freeboard:"
"Transmission:", "main propulsion acts directly on the propellershaft"
"Starter Type:", "compressed air direct"
"Cylinders:", "Bore:", "Stroke:", "mm", "Bore x Stroke (mm)"
"Propulsion engine:", "Fixed pitch propeller", "Controllable pitch", "Bow thruster", "Stern thruster"
"Main generator", "Emergency generator", "kVA", "plant", "V", "Ah", "battery"
"Oil fired boiler", "Exh gas boiler", "steam boiler", "boiler", "Pressure (MPa)", "Evaporation rate"
"Boilers:", "steam boiler", "oil fired", "exh.gas heated", "KangRim Industries Co."
"Compressed Air Receivers", "Receiver", "liters", "bar"
"Propeller:", "solid propeller", "keyless", "aft", "propeller revolution:"
"Bow thruster:", "transverse thruster", "kW"
"Stern thruster:", "no"
"Rudder:", numerical values if present
"[Shipbuilder]", "[Location]", "yard", "shipyard"
"Hull Type:", "Single hull", "Double hull", "Double Bottom", "Double Sides"
"Hull Material:", "material: steel", "material Steel"
"Construction Detail", "Statcode5:", "Welded", "Connections:"
"Bulkheads:", "Watertight bulkheads:", "Transverse bulkheads"
"Decks:", "Deck numbers:", "Deckhouse", "Forecastle", "Poop"
"Class:", "Class Symbol:", "Ice Class", "DNV", "RINA", "FS Ice Class"
"Navigation Notation:", "Additional Class Notation", "Service Notations", "Aut-UMS", "MON-SHAFT"
"Bulbous bow"
"Cargo hold capacity:", "hold capacity", "Cargo holds"
"Tank capacity:", "Volume:", "Capacity of tanks:", "Tank capacities"
"Cargo spaces:", "Cargo volume:", "Volume of Cargo Space:", "Cargo capacities:"
"Segregated Ballast", "Ballast"
"Liquid gas", "Liquid/oil", "Asphalt", "CO2", "Gas oil", "MDO", "Fresh water"
"Cargo heating coils", "Maximum Temp", "Flash Point"
"Bunker:", "Fuel Capacity:", "FO", "FW"
Extraction Method
Use regex to capture numerical values for consumption rates in tons or metric tons (mt), often listed as t/day or mt/d
Extract manufacturer and model details using regex to capture text following these keywords.
Use regex to capture numerical values followed by "kW" or similar variations for kilowatt values.
Use regex to capture numerical values followed by "hp", "bhp", or similar variations for horsepower values.
Extract RPM values using regex to capture numerical values followed by "rpm".
Use regex to extract average, minimum, and maximum draught values.
Use regex to extract average, minimum, and maximum speed values.
Extract numerical values followed by "tons" or "DWT" to capture deadweight tonnage.
Extract maximum design speed values using regex to capture numerical values and units like "kn".
Extract container capacity values using units like "TEU".
Extract equipment and machinery details using regex to capture text and numerical values.
Extract cargo tank count and frame materials if listed, such as "stainless steel".
Extract the number of cargo tanks if specified following these keywords.
Capture details of cargo handling equipment, such as cranes, and their capacities.
Extract fuel capacity for various fuel types, including distillate and residual fuels.
Capture fuel usage or power ratings for auxiliary engines or generators.
Extract fuel types and their capacities where specified.
Extract engine type, stroke, cylinder count, power, and RPM specifications following "Main engine:".
Capture draught, depth, and freeboard measurements, including units like "m" or "mm".
Capture transmission type using keywords "Transmission" and "propulsion acts directly on the propellershaft".
Use regex to capture the starter type as described after "Starter Type:".
Extract cylinder count, bore, and stroke measurements using regex to capture numerical values and units.
Extract propulsion system types, including propeller type and thrusters, using regex to capture descriptive text.
Capture generator details and specifications for electric installations, including voltage (V) and ampere-hour (Ah) valu
Use regex to capture boiler types, pressure values in MPa, and evaporation rate (ton/h) details.
Extract boiler types and heating mechanisms (e.g., oil-fired or exhaust gas heated) following "Boilers:".
Extract details on compressed air receivers, capturing capacity (liters) and pressure (bar).
Capture propeller specifications, including type (solid, keyless), position (aft), and revolution rate.
Use regex to capture bow thruster power rating (e.g., "900 kW") and type (transverse).
Capture if stern thruster is specified (or "no" if absent) following "Stern thruster:".
Extract rudder details, capturing the presence and count if specified.
Extract builder names and locations using regex to capture text patterns.
Extract hull type (e.g., "Single hull", "Double hull") following these keywords.
Extract material (e.g., "Steel") following "Hull Material" or similar phrases.
Capture construction details, such as connections and Statcode, using regex.
Extract bulkhead count and type (e.g., "Transverse") following "Bulkheads:".
Capture deck-related information, including deck count and structure details.
Capture classification society and ice class (e.g., "DNV", "RINA", "Ice Class 1A").
Extract additional notations related to navigation and class, following keywords.
Capture "Bulbous bow" or other specific bow types if present in the description.
Extract cargo hold capacity in specified units (e.g., m³, CBM, tonnes) following these terms.
Extract tank capacity values, capturing units such as "m³", "CBM", or "mt".
Capture cargo space count and volume details in m³ or other units.
Extract ballast capacities when specified in terms like "Segregated Ballast" or "Ballast".
Extract capacities for specialized cargo types if mentioned.
Capture information about heating coils and temperature limits.
Extract fuel and freshwater capacities, including "FO" (Fuel Oil) and "FW" (Fresh Water).
Remarks
Regex should handle complex phrases like "68 t/day at avg. speed" and "19.00 knots on 39.00 tonnes per day". Capture auxilia
Expanded manufacturer names to include additional known brands for accurate identification.
Ensure that regex can differentiate between kW and HP values if both are present.
Added separate row for HP to ensure specific extraction for horsepower in various formats.
Regex should handle different formats, such as "Revolution (rpm)" or "rpm" alone.
Add alternative terms like "draft" and "depth" to capture all variations.
Add "Service speed" for entries listing typical speeds.
Consistency in units (tons or DWT) is essential for clear interpretation.
Ensure that regex handles unit variations and captures the full speed designation.
"TEU" is standard, but clarify if "Twenty-Foot Equivalent Unit" appears in the data.
Clarify if each piece of equipment should be listed separately or as a combined entry.
Cargo tank materials and framing are essential for safety and chemical compatibilit
Capture numerical values following keywords like "Number of cargo tanks" or "Tanks".
Cargo handling equipment details provide insight into loading/unloading capabilities.
Capacity data often include specific fuel types (e.g., "264 cu m").
Include auxiliary generator details, often measured in kW or tonnes per day.
Different fuel types require specific storage and handling; capture all mentioned typ
Ensure regex captures multi-part descriptions, including power and RPM together (e.g., "16660 kW / 105 rpm").
Ensure all variations of depth-related terms are captured, including "Draft" and "Freeboard".
Direct transmission details indicate engine configuration, which affects efficiency.
Starter type is critical for understanding engine starting mechanism.
Ensure that bore and stroke are extracted together in combined formats (e.g., "Bore x Stroke (mm)").
Added auxiliary propulsion terms to capture more complex configurations.
Capture multiple generators/batteries as separate entries if listed.
Expanded to include various boiler types and pressure/evaporation rates.
Multiple boiler types might be specified; capture both "oil fired" and "exh.gas heated" if present.
Important for auxiliary equipment specifications; ensure regex captures units correctly.
Extract details like "propeller revolution: 105 min" to understand propulsion speed and efficiency.
Ensure that both thruster type and power are captured.
Presence or absence of stern thruster affects maneuverability.
Presence of rudder affects steering capabilities; capture numerical count if available.
Adding terms like "yard" or "shipyard" improves identification accuracy.
Ensure that single/double hull variations are captured, including "Double Sides" and "Double Bottom".
Ensure that "Steel" is captured as hull material regardless of formatting.
Statcode and welding connections indicate structural standards and techniques.
Bulkheads often indicate compartmentalization, affecting stability and safety.
Ensure regex can handle terms like "Decks: 1 dk" and structures like "Deckhouse".
Classification details are essential for compliance and performance in specific conditions.
Notations provide information on vessel capabilities and compliance with various standards.
Bow type can affect hydrodynamics and fuel efficiency.
Ensure regex captures capacity with units like m³, CBM, or metric tonnes.
Ensure regex handles different units and formats for capacity.
Cargo spaces and volume provide a general indication of storage capacity.
Capture numerical values and units like "m³" or "mt" following these terms.
Different cargo types require specific storage conditions; capture capacities for each.
These details are crucial for cargo that requires temperature control.
Capture both fuel and freshwater capacities to understand onboard fuel reserves.
onnes per day". Capture auxiliary fuel and fuel by mode if present.
kW / 105 rpm").