Data Architect (Data Modeling, Apache Spark, ETL)
A startup in the loop downtown focusing on an advanced natural language generation platform is looking for a Python software engineer to join their growing Engineering team. This candidate will be responsible for helping empower this company with data and help them in making data-driven decisions. This team is working on a building out and scaling their data platform that integrates with their transaction platform to be migrated/separated into their analytics and streaming platforms.
The ideal candidate will have a solid background in distributed applications, data streaming with Apache Spark and Kafka, and experience working in SQL and or NoSQL database environments. You will be responsible for data modeling and writing ETL code using Airflow.
Required Skills & Experience
- Multiple years’ experience working with text mining or Hadoop clusters/lakes using D3, Spark or Apache Flink
- 4+ years’ of large-scale distributed infrastructures
- 4+ years’ experience development using Python, R, and or Java
- Strong knowledge in Cloud Technologies including AWS and Google Cloud
- Expertise in Statistical Algorithms for data analysis
- Test-driven development experience
- Bachelor’s Degree in Computer Science (minimum)
Benefits & Perks
- Matching 401k, unlimited PTO, Health, dental and vision covereage