
Validating Vehicle Telemetry by Reconciling Disparate Datasets
Bridging the gap between raw data and trip-level summaries to enhance data quality and fuel efficiency analysis.
The Problem to Solve
The main objective was to bridge the gap between a detailed, high-frequency dataset and an aggregated, trip-level dataset. The goal was to identify consistent driving segments, compute distance and fuel consumed for each segment from the detailed data, and then compare these computed values against the aggregated dataset to validate data quality and generate performance metrics.
Our Solution Approach
Segmentation of Initial Dataset
Broke down trips from the 30-second interval dataset into meaningful segments based on a 180-degree direction change, representing a continuous stretch of travel.
Feature Extraction per Segment
Calculated the distance traveled (from latitude/longitude) and fuel consumed (from fuel volume changes) for each individual segment.
Dataset Alignment
Mapped the aggregated trips onto the new segmented structure using start and end timestamps, ensuring that segment boundaries matched across both datasets.
Validation & Metrics
Compared the computed distance and fuel values against the aggregated dataset's values to validate data consistency and determine quality metrics.
Outcome
The project successfully created a segment-level mapping between the high-frequency and aggregated datasets. This enabled per-segment fuel efficiency analysis and provided a reliable way to validate aggregated data using raw telemetry. The work also established a foundation for future machine learning analysis, including anomaly detection, driver behavior profiling, and trip optimization.