Validating Vehicle Telemetry by Reconciling Disparate Datasets

Bridging the gap between raw data and trip-level summaries to enhance data quality and fuel efficiency analysis.

The Problem to Solve

The main objective was to bridge the gap between a detailed, high-frequency dataset and an aggregated, trip-level dataset. The goal was to identify consistent driving segments, compute distance and fuel consumed for each segment from the detailed data, and then compare these computed values against the aggregated dataset to validate data quality and generate performance metrics.

Our Solution Approach

Segmentation of Initial Dataset

Broke down trips from the 30-second interval dataset into meaningful segments based on a 180-degree direction change, representing a continuous stretch of travel.

Feature Extraction per Segment

Calculated the distance traveled (from latitude/longitude) and fuel consumed (from fuel volume changes) for each individual segment.

Dataset Alignment

Mapped the aggregated trips onto the new segmented structure using start and end timestamps, ensuring that segment boundaries matched across both datasets.

Validation & Metrics

Compared the computed distance and fuel values against the aggregated dataset's values to validate data consistency and determine quality metrics.

Outcome

The project successfully created a segment-level mapping between the high-frequency and aggregated datasets. This enabled per-segment fuel efficiency analysis and provided a reliable way to validate aggregated data using raw telemetry. The work also established a foundation for future machine learning analysis, including anomaly detection, driver behavior profiling, and trip optimization.