We are looking for a highly analytical data engineer who can design and implement complex SQL and PySpark logic over massive, high-frequency time-series data from real-world vehicle sensor recordings. This role requires the ability to craft intricate temporal and spatial query logic—including multi-table joins, windowing across time, and event-detection rules—to extract meaningful driving behaviors from a rich 3D world-model (obstacles, lanes, traffic signs, etc.). The ideal candidate excels at breaking down complex behavioral patterns into precise, scalable query logic and optimizing these queries to run efficiently on large datasets. We are also looking for someone who can work independently.

Project Description

This long-term engagement supports simulation data processing for Autonomous Vehicle (AV) development. Key scenarios include obstacle detection, path planning, and complex traffic situations (e.g., tunnels, unusual vehicles, or temporary network issues). The candidate will work with high-volume sensor data from a test AV fleet (8–12 cameras, LiDAR, radar), generating up to ~1TB/hour. Familiarity with the AV domain and real-world edge cases will be valuable. The team collaborates closely with other team members to develop and validate safety-critical AV features. The scope includes full-cycle data curation—from raw sensor input to simulation-ready datasets—and close cooperation with engineers and researchers.

Key Responsibilities:

• Analyze real-world sensor data to identify edge cases (e.g., hard braking, nearby vehicles)
• Create advanced SQL, Python, and Spark/PySpark queries for data filtering and transformation
• Work with internal tools for data search and auto-labeling workflows
• Process structured/semi-structured data (e.g., object detection output)
• Identify relevant data for AV simulations and ML pipelines
• Suggest and validate improvements in data discovery processes
• Build and maintain data mining scripts and ETL processes
• Develop tools to enhance analytics and streamline workflows

Requirements

• SQL (advanced)
• Python (advanced)
• Spark / PySpark (advanced)
• Hands-on experience with Databricks
• Understanding of ML workflows
• University degree in Computer Science (nice to have)
• Over 4 years of commercial experience

Senior Data Engineer

Project Description

Key Responsibilities:

Requirements

About Spyrosoft

Senior Data Engineer

Already working at Spyrosoft?