Data Engineer

  • Location


  • Sector:

  • Job type:


  • Salary:


  • Contact:

    James Lesniak

  • Contact email:

  • Job ref:


We are seeking a highly skilled and experienced Data Engineer to join our team. As a Data Engineer, you will play a critical role in migrating our Cloudera Hadoop infrastructure to AWS EMR. You will work closely with our data engineering team and collaborate with other stakeholders to ensure a successful and seamless migration process.


  • Lead the migration process from Cloudera Hadoop to AWS EMR, ensuring a smooth transition of our big data infrastructure.
  • Assess the existing Cloudera Hadoop environment, including infrastructure, configurations, and data storage, to plan and strategize the migration.
  • Design and implement the migration strategy, considering factors such as data transfer, compatibility, security, and performance.
  • Collaborate with cross-functional teams, including data engineers, solution architects, and operations teams, to coordinate the migration activities and ensure alignment with business objectives.
  • Develop and execute migration scripts, tools, and processes to automate and streamline the migration tasks.
  • Optimize the AWS EMR environment for performance, scalability, and cost-effectiveness, leveraging the available AWS services and best practices.
  • Perform data validation and testing to ensure data consistency, accuracy, and integrity during the migration process.
  • Monitor and troubleshoot any issues or bottlenecks that may arise during the migration and provide timely resolutions.
  • Document the migration process, including configurations, workflows, and lessons learned, to create a knowledge base for future reference.


  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer, specifically with hands-on experience in migrating Cloudera Hadoop to AWS EMR.
  • Strong knowledge of big data technologies, including Cloudera Hadoop, AWS EMR, HDFS, MapReduce, and Hive.
  • Solid understanding of AWS cloud services, particularly EMR, S3, and IAM.
  • Proficient in programming languages like Python, Scala, or Java for data processing and scripting.
  • Experience with data modeling, schema design, and data integration techniques.
  • Familiarity with data warehousing concepts and tools like Apache Parquet, Apache Avro, or Amazon Redshift.
  • Strong problem-solving skills and ability to troubleshoot complex data engineering issues.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.

Preferred Qualifications:

  • AWS Certification (e.g., AWS Certified Big Data - Specialty).
  • Experience with other big data processing frameworks such as Apache Spark, Apache Kafka, or Apache Pig.
  • Knowledge of data pipeline orchestration tools like Apache Airflow or AWS Glue