100% Remote role
Responsibilities
• Design and build large-scale data pipelines for ingestion, transformation and processing
• Work on ETL/ELT workflows handling different types of data
• Build and maintain end-to-end ML pipelines from data preparation to deployment and monitoring
• Collaborate with data scientists to productionize ML models
• Work on feature engineering, training pipelines and model serving
• Ensure data quality, monitoring and pipeline reliability
• Optimize systems for performance, scalability and cost
• Contribute to clean, maintainable, production-grade Python code
Experience Required
• 8+ years of software engineering experience with Python as primary language
• Strong background in data engineering (ETL/ELT, pipelines, data processing)
• Hands-on experience building and maintaining ML pipelines in production environments
• Experience with PySpark / Apache Spark
• Experience with workflow orchestration tools like Airflow, Dagster, or Prefect
• Good understanding of streaming/data processing systems (Kafka, Kinesis, etc.)
• Experience working with cloud platforms (AWS / GCP / Azure)
• Strong SQL skills and experience with data warehouses
• Comfortable working in a distributed/remote engineering setup
Plus
• Experience with NLP or LLM-based systems
• Familiarity with MLOps tools like MLflow, Kubeflow, or similar
• Experience with feature stores
• Exposure to data privacy, PII detection, or compliance-related systems
This is a remote position.