Data Engineer

We are seeking a highly skilled and experienced Data Engineer to join our team. As a Data Engineer, you will play a critical role in migrating our Cloudera Hadoop infrastructure to AWS EMR. You will work closely with our data engineering team and collaborate with other stakeholders to ensure a successful and seamless migration process.

Responsibilities:

Lead the migration process from Cloudera Hadoop to AWS EMR, ensuring a smooth transition of our big data infrastructure.
Assess the existing Cloudera Hadoop environment, including infrastructure, configurations, and data storage, to plan and strategize the migration.
Design and implement the migration strategy, considering factors such as data transfer, compatibility, security, and performance.
Collaborate with cross-functional teams, including data engineers, solution architects, and operations teams, to coordinate the migration activities and ensure alignment with business objectives.
Develop and execute migration scripts, tools, and processes to automate and streamline the migration tasks.
Optimize the AWS EMR environment for performance, scalability, and cost-effectiveness, leveraging the available AWS services and best practices.
Perform data validation and testing to ensure data consistency, accuracy, and integrity during the migration process.
Monitor and troubleshoot any issues or bottlenecks that may arise during the migration and provide timely resolutions.
Document the migration process, including configurations, workflows, and lessons learned, to create a knowledge base for future reference.

Qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Proven experience as a Data Engineer, specifically with hands-on experience in migrating Cloudera Hadoop to AWS EMR.
Strong knowledge of big data technologies, including Cloudera Hadoop, AWS EMR, HDFS, MapReduce, and Hive.
Solid understanding of AWS cloud services, particularly EMR, S3, and IAM.
Proficient in programming languages like Python, Scala, or Java for data processing and scripting.
Experience with data modeling, schema design, and data integration techniques.
Familiarity with data warehousing concepts and tools like Apache Parquet, Apache Avro, or Amazon Redshift.
Strong problem-solving skills and ability to troubleshoot complex data engineering issues.
Excellent communication and collaboration skills to work effectively with cross-functional teams.

Preferred Qualifications:

AWS Certification (e.g., AWS Certified Big Data - Specialty).
Experience with other big data processing frameworks such as Apache Spark, Apache Kafka, or Apache Pig.
Knowledge of data pipeline orchestration tools like Apache Airflow or AWS Glue