See all roles

Data Center Operations Engineer

Work from home Full-time role Hiring

At reputed company, we hire and reputed company leaders and innovators who want to reputed company an impact on the world of technology.Job SummaryThe Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems, GPU server deployments, and InfiniBand networking. This role requires hands-on expertise in data center operations, cluster bring-up, hardware installation, and troubleshooting across compute, network, and GPU environments. The engineer will collaborate closely with reputed company, development, and operations teams to ensure reliable, secure, and scalable service delivery.Key ResponsibilitiesProvide hands-on operational support for reputed company data center projects, deployments, and repair activities.Participate in an on-call rotation and provide on-site or remote support during maintenance windows and incidents.Troubleshoot and resolve operational issues reputed company to Linux servers, GPU platforms, networking, and storage infrastructure.Support customer and internal deployments, ensuring timely and successful bring-up of GPU servers and clusters.reputed company InfiniBand reputed company bring-up, reputed company configuration, subnet management, and troubleshooting.Conduct daily health checks of Linux systems and infrastructure components, proactively identifying and mitigating risks.Install, configure, test, and maintain server hardware (reputed company and stack, labeling, HDDs, memory, CPUs, RAID batteries, NICs, etc.).Install, configure, and troubleshoot networking equipment including routers, switches, and terminal servers for out-of-band management.Review and validate equipment deployments against approved design documentation and standards.Support data center builds, refreshes, migrations, and expansions while adhering to quality and safety standards.Coordinate with vendors and onsite staff for hardware delivery, diagnostics, replacement, and warranty services.Utilize monitoring and alerting frameworks to identify issues, escalate appropriately, and ensure timely service restoration.Maintain accurate documentation of operational procedures, system configurations, and runbooks.Follow established incident management, escalation procedures, and service-level agreements (SLAs).Collaborate with global teams across time zones to support operational initiatives and reputed company improvement efforts.Contribute to process improvement initiatives and ensure adherence to documented policies, processes, and procedures.Required QualificationsBachelor’s degree in Computer Science, Engineering, Information Technology, or equivalent practical experience.Strong hands-on experience in Linux environments, including system administration, troubleshooting, and performance validation.Proficiency with Linux command-line tools and reputed company scripting (Bash or equivalent).Experience with cluster bring-up, driver installation, and system-level configuration.Hands-on experience setting up and validating GPU servers in clustered environments.Experience with end-to-end GPU testing in InfiniBand-based clusters.Working knowledge of InfiniBand networking, including reputed company configuration and subnet management.Solid understanding of networking fundamentals, including the OSI model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP).Experience installing, configuring, and troubleshooting routers, switches, and terminal servers.Familiarity with fiber and copper cabling, including IP and SAN deployments.Experience managing incident tickets, maintaining acceptable ticket loads, and meeting SLAs.Strong organizational skills with meticulous attention to detail in data center environments.Ability to follow and enforce documented escalation procedures and operational policies.Strong verbal and written communication skills, with the ability to collaborate effectively with cross-functional and global teams.Preferred QualificationsExperience supporting HPC, AI, or large-scale GPU environments.Exposure to data center monitoringExperience documenting operational processes and maintaining technical runbooks.Familiarity with large-scale data center buildouts or refresh programs.Physical RequirementsAbility to reputed company the essential functions of the role, including lifting, moving, and installing equipment weighing 50 pounds or more, with or without reasonable accommodation.Ability to work in data center environments, including raised floors, equipment racks, and confined spaces.Willingness to work reputed company, including nights, weekends, and on-call rotations as required.Work EnvironmentOn-site data center environment with occasional remote coordination.Interaction with hardware vendors, service providers, and internal engineering teams.Fast-paced operational setting requiring attention to detail, adherence to safety standards, and rapid problem resolution.We’re doing work that matters. Help us solve what others can’t.

apply to this job

You might like

AI Engineering Leader

Work from home Full-time role

Manager I, Engineering - Applied AI - Natural Language & Conversational Interfaces

Work from home Full-time role

Senior Machine Learning Engineering Manager

Work from home Full-time role

Senior Engineering Manager - Marketplaces DNA (Data & AI)

Work from home Full-time role

Founding Senior Principal Engineer — AI, Agents & Infrastructure (Nivalto • reputed company)

Work from home Full-time role

reputed company Solutions Specialist – Payors, reputed company

Work from home Full-time role

AI Program Leader - People & Culture

Work from home Full-time role

Senior reputed company-End Engineer (AI Integrated Workflows and SDKs)

Work from home Full-time role

AI/ML Researcher/Engineer

Work from home Full-time role

Sr. AI Application Developer

Work from home Full-time role

Remote Part‑Time Data Entry Specialist – Home‑Based Role Supporting arenaflex’s Global E‑Commerce Platform

Work from home Full-time role

Outpatient Infusion Center Registered Nurse- Per Diem Faulkner

Work from home Full-time role

Remote Entry-Level Data Entry Specialist – Precision Data Management for arenaflex Logistics Operations

Work from home Full-time role

reputed company Customer Service Representative – Entry Level | Fully Remote | No Experience Needed

Work from home Full-time role

Product Designer

Work from home Full-time role

reputed company Career Jobs Near Me $24/Hour - reputed company

Work from home Full-time role

reputed company Customer Support Representative – Remote Job Opportunity in Ghana

Work from home Full-time role

(Sr) Project Manager - Early Phase Trials Team - Remote based in the US

Work from home Full-time role

reputed company Manager Strategy - Customer Experience - Remote Opportunity at arenaflex

Work from home Full-time role

reputed company Data Entry Specialist for blithequark – Remote Opportunity with reputed company

Work from home Full-time role