We are seeking a highly skilled and experienced Data Engineer to join our team. As a Data Engineer, you will play a critical role in migrating our Cloudera Hadoop infrastructure to AWS EMR. You will work closely with our data engineering team and collaborate with other stakeholders to ensure a successful and seamless migration process.
- Lead the migration process from Cloudera Hadoop to AWS EMR, ensuring a smooth transition of our big data infrastructure.
- Assess the existing Cloudera Hadoop environment, including infrastructure, configurations, and data storage, to plan and strategize the migration.
- Design and implement the migration strategy, considering factors such as data transfer, compatibility, security, and performance.
- Collaborate with cross-functional teams, including data engineers, solution architects, and operations teams, to coordinate the migration activities and ensure alignment with business objectives.
- Develop and execute migration scripts, tools, and processes to automate and streamline the migration tasks.
- Optimize the AWS EMR environment for performance, scalability, and cost-effectiveness, leveraging the available AWS services and best practices.
- Perform data validation and testing to ensure data consistency, accuracy, and integrity during the migration process.
- Monitor and troubleshoot any issues or bottlenecks that may arise during the migration and provide timely resolutions.
- Document the migration process, including configurations, workflows, and lessons learned, to create a knowledge base for future reference.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Data Engineer, specifically with hands-on experience in migrating Cloudera Hadoop to AWS EMR.
- Strong knowledge of big data technologies, including Cloudera Hadoop, AWS EMR, HDFS, MapReduce, and Hive.
- Solid understanding of AWS cloud services, particularly EMR, S3, and IAM.
- Proficient in programming languages like Python, Scala, or Java for data processing and scripting.
- Experience with data modeling, schema design, and data integration techniques.
- Familiarity with data warehousing concepts and tools like Apache Parquet, Apache Avro, or Amazon Redshift.
- Strong problem-solving skills and ability to troubleshoot complex data engineering issues.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- AWS Certification (e.g., AWS Certified Big Data - Specialty).
- Experience with other big data processing frameworks such as Apache Spark, Apache Kafka, or Apache Pig.
- Knowledge of data pipeline orchestration tools like Apache Airflow or AWS Glue