Job Title: Big Data Engineer
Location: Remote
Employment Type: [Full-Time/Contract]
Department: Data Engineering / Analytics
About the Role:
We are looking for a highly skilled and experienced Big Data Engineer to join our growing data team. As a Big Data Engineer, you will be responsible for designing, developing, and optimizing scalable data pipelines and architectures that enable data-driven decision-making across the organization. You'll work closely with data scientists, analysts, and software engineers to ensure reliable, efficient, and secure data infrastructure.
Key Responsibilities:
- Design, develop, and maintain robust and scalable data pipelines for batch and real-time processing.
- Build and optimize data architectures to support advanced analytics and machine learning workloads.
- Ingest data from various structured and unstructured sources using tools like Apache Kafka, Apache NiFi, or custom connectors.
- Develop ETL/ELT processes using tools such as Spark, Hive, Flink, Airflow, or DBT.
- Work with big data technologies such as Hadoop, Spark, HDFS, Hive, Presto, etc.
- Implement data quality checks, validation processes, and monitoring systems.
- Collaborate with data scientists and analysts to ensure data is accessible, accurate, and clean.
- Manage and optimize data storage solutions including cloud-based data lakes (AWS S3, Azure Data Lake, Google Cloud Storage).
- Implement and ensure compliance with data governance, privacy, and security best practices.
- Evaluate and integrate new data tools and technologies to enhance platform capabilities.
Required Skills and Qualifications:
- Bachelor's or Master’s degree in Computer Science, Engineering, Information Systems, or related field.
- 3+ years of experience in data engineering or software engineering roles with a focus on big data.
- Strong programming skills in Python, Scala, or Java.
- Proficiency with big data processing frameworks such as Apache Spark, Hadoop, or Flink.
- Experience with SQL and NoSQL databases (e.g., PostgreSQL, Cassandra, MongoDB, HBase).
- Hands-on experience with data pipeline orchestration tools like Apache Airflow, Luigi, or similar.
- Familiarity with cloud data services (AWS, GCP, or Azure), particularly services like EMR, Databricks, BigQuery, Glue, etc.
- Solid understanding of data modeling, data warehousing, and performance optimization.
- Experience with CI/CD for data pipelines and infrastructure-as-code tools like Terraform or CloudFormation is a plus.
Preferred Qualifications:
- Experience working in agile development environments.
- Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes.
- Knowledge of data privacy and regulatory compliance standards (e.g., GDPR, HIPAA).
- Experience with real-time data processing and streaming technologies (e.g., Kafka Streams, Spark Streaming).
Why Join Us:
- Work with a modern data stack and cutting-edge technologies.
- Be part of a data-driven culture in a fast-paced, innovative environment.
- Collaborate with talented professionals from diverse backgrounds.