Tóm lược
- Yêu cầu kỹ thuật:
- Networking ,
- HPC ,
- Protocol ,
- Quantum ,
- Linux ,
- Ethernet ,
- Fintech ,
- GPU ,
- Switches ,
- Jetson
Mô tả công việc
Tóm tắt công việc
We are looking for a highly specialized AI Infrastructure Network Engineer to design, implement, and optimize the high-speed data fabric that powers our supercomputing and AI clusters. You will be responsible for the low-latency, high-throughput interconnects that allow thousands of GPUs to work as a single unit. Your expertise in InfiniBand (IB), RDMA, and advanced network topologies will be critical in scaling our AI training and inference capabilities.
- Fabric Design & Architecture: Design and scale high-performance InfiniBand (IB) fabrics using advanced topologies such as Fat-Tree, Dragonfly, and Torus to support massive AI workloads.
- Interconnect Optimization: Manage and optimize NVLink (NVL) domains and multi-GPU communication across nodes to ensure maximum throughput and minimal collective communication overhead.
- High-Speed Data Transmission: Implement and fine-tune RDMA (Remote Direct Memory Access), including RoCE and InfiniBand Verbs, to reduce CPU overhead and latency in data transfers.
- Supercomputer Networking: Configure and maintain the backend "Compute Fabric" specifically tailored for distributed deep learning and large-scale parallel processing.
- Performance Tuning: Monitor and troubleshoot congestion, adaptive routing, and quality of service (QoS) within the IB fabric to prevent bottlenecks during large-scale model training.
- Collaboration: Work closely with AI Systems Engineers to align network performance with the requirements of frameworks like PyTorch and distributed training libraries.
Salary & Allowances
- 13-month salary with annual performance bonus, project incentives, sales incentives (based on position)
- Lunch allowance: 730.000 VND/month
- Special occasion bonus: 3.000.000 - 5.000.000 VND/year
- Annual leaves: Up to 20 days/year (based on levels)
- Health: Social insurance, premium health insurance, yearly health check
- Laptop, screen and other needed facilities/ accounts/ tools for work
Career Growth
- Yearly salary review and promotion
- Diverse career path: Management or Expert and functions rotation opportunity
- Free learning sources in Udemy, Coursera, O'relly platforms; internal workshop, certification sponsorship, and exclusive mentoring from C-levels
- Recognition and awards at team and organizational levels.
Working Environment
- Open & collaborative working space foster both individual focus and teamwork activities
- Young, dynamic, and collaborative working atmosphere
- Unwind zones: gaming, table tennis, yoga, gyms, bath rooms, sleep corner.
- Quarterly/yearly teambuilding & engaged internal events.
Yêu cầu công việc
- Expertise in HPC Networking: Deep understanding of data transmission mechanics within supercomputers and AI clusters.
- Network Topologies: Practical experience or strong theoretical knowledge of Fat-Tree, Dragonfly, and SlimFly architectures.
- Protocol Mastery: Advanced knowledge of the InfiniBand stack, RDMA, and Ethernet-based high-speed networking.
- Hardware Knowledge: Familiarity with NVIDIA/Mellanox Quantum switches, ConnectX NICs, and NVLink/NVSwitch technologies.
- Systems Proficiency: Strong Linux networking skills, including experience with OFED (OpenFabrics Enterprise Distribution) and subnet managers.
- Education: Relevant experience in AI infrastructure or honors programs is highly valued. No degree required, so long as you can prove your knowledge and value.
Preferred Skills
- Experience in Fintech or large-scale AI production environments.
- Knowledge of GPU-aware MPI and collective communication libraries (NCCL).
- Experience managing networking for NVIDIA Jetson or GPU clusters.
Ngôn ngữ
-
English
Nói: Intermediate - Đọc: Intermediate - Viết: Intermediate
Yêu cầu kỹ thuật
- Networking
- HPC
- Protocol
- Quantum
- Linux
- Ethernet
- Fintech
- GPU
- Switches
- Jetson
NĂNG LỰC
- Communication Skills
Thông tin doanh nghiệp
One Mount Group (1MG) goal is to build Vietnam’s largest-scale technological ecosystem.
1MG was established with the vision of promoting and contributing to the economy’s efficiency, creating a technology infrastructure for Vietnamese businesses to accelerate its value added, providing products to consumers at a more competitive cost of goods sold.
1MG is committed to building a strong and sustainable Vietnamese business, creating a broad playing field to nurture and grow future start-ups. We believe that from our core infrastructure the following “giant” businesses of Vietnam will be generated. The goal of 1MG is to build Vietnam’s largest-scale technological ecosystem with solutions in order to link, optimize and close the gaps of the value chain of focused economic sectors having strong growth in Vietnam.
With a sound financial position and business administration, 1MG has competitive advantages to attract and retain the best Vietnamese talents from all over the world.