Tóm lược
Mô tả công việc
Tóm tắt công việc
As a Hadoop big data engineer, you will develop, operate and drive scalable and resilient data platform based on Hadoop ecosystem to address the business requirements:
• Ensure industry best practices around data pipelines, metadata management, data quality, data governance and data privacy
• Design and implement business-specific large-scale data processing pipelines
• Work with complex data structures, manipulate, cleanse data, and perform transformations to make insights from data.
• Responsible to Ingest data from files, streams, and databases. Process the data with PySpark, Kafka, Hive, Hive LLAP…
• Develop efficient software code for multiple use cases leveraging Spark and Big Data Technologies for various use cases built on the platform
• Provide high operational excellence guaranteeing high availability and platform stability.
Yêu cầu công việc
➢ Must requirements:
• Experience in Hadoop ecosystem including HDFS, MapReduce, YARN, HBase, Zookeeper, Pig, Hive…
• Experience in building large-scale data processing (batch-processing, stream processing)
• Experience with Apache Spark preferably in PySpark
• Understanding of SLA and meeting Timelines for support activities
➢ Good to Have:
• Experience with Hadoop distributions such as Cloudera, HortonWorks, comparison and feasibility
• Experience in Apache Kafka, Apache Beam
• Experience with Data warehouse
• Experience in ETL
• Experience in Data management: o Data Governance o Data Architecture o Data Modelling o Data Quality o Data integration
• Experience in SQL and NoSQL Database
• Good in programming language Python, Java
• Experience with SRE, Patching & Automation: Kubernetes or Docker & Containerization • Experience working with Big Data on Cloud environment
• Experience with Google Cloud
• Experience in Backend development using Java
• Experience in Data API
• Good to have Architecture knowledge or experience
Ngôn ngữ
-
English
Nói: Intermediate - Đọc: Intermediate - Viết: Intermediate
Yêu cầu kỹ thuật
- Hadoop
- Python
- Java
- NoSQL
- MS SQL
- MapReduce
- Hbase
- Docker
- HDFS
- Apache Spark
- Data Modeling
- Architecture
- Pig script
- Big Data
- Apache Kafka
- Stream processing
- Kubernetes
- GCP
- Cloudera
- Apache Zookeeper
- Yarn
- Hortonworks
Thông tin doanh nghiệp
HCL Technologies is a next-generation global technology company.
We help enterprises reimagine their businesses for the digital age. With a worldwide network of R&D, innovation labs and delivery centers, and 150,000+ ‘Ideapreneurs’ working in 49 countries, HCL serves leading enterprises across key industries, including 250 of the Fortune 500 and 650 of the Global 2000. HCL generated consolidated
revenues of US$ 9.93 bn for 12 Months as of 30 th June, 2020.
We offer an integrated portfolio of products, solutions, services, and IP through our Mode 1-2-3 strategy built around Digital, IoT, Cloud, Automation, Cybersecurity, Analytics, Infrastructure Management and Engineering Services, amongst others, to help enterprises reimagine their businesses for the digital age.