Full Job Description
Druva’s award-winning solutions intelligently collect data, and unify backup, disaster recovery, archival and governance capabilities onto a single, optimized data set. As the industry’s fastest growing data protection provider, Druva is trusted by over 4,000 global organizations, and protects over 40 PB of data.
We are currently seeking an exceptional Cloud Operations Engineer-L2 as we enhance the support model for our SaaS platform. If you are eager to work in an environment that is fast paced, complex, large, new technologies, ensures cloud uptime, and enjoys being a team player and work effectively with other members of a global team, this position might be for you.
Role and Responsibility:
The CloudOps Engineer is responsible for 24/7 availability for Druva, a cloud SaaS
Support and sustain customer facing AWS Production
Front line support for Cloud Monitoring – Infrastructure and Application
Respond, troubleshoot and resolve production alerts
Communicate and troubleshoot operational issues supporting a complex environment
Initiate Incident Response for cloud outages
Analyze trends to pro-actively prevent incidents
Respond to product escalations from Support as well as Engineering
Scale infrastructure capacity on production
Participate in Cloud Updates and Maintenance
Assist in security vulnerability and remediation
Must feel comfortable working in a fast-paced, dynamic and flexible environment
Participation in an on-call rotation and operate effectively in a global 24×7 environment
Must be able to work extended hours as needed including being available for off hours production support
Experience:
Ability to learn new technologies quickly with some support and guidance
5 – 9 years
Strong Linux/Unix administration
Knowledge of Cloud providers including Amazon AWS, Google Cloud Platform, or Microsoft Azure
Knowledge of Configuration Management (SaltStack) for complex software management
Monitor and analyze system logs and RCA
Monitor site reliability and performance
Scripting knowledge (Shell, Python)
High-level understanding of networking standard protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing
Ability to think outside-of-the-box to generate creative solutions to problems
Requires the ability to multitask and work well under pressure
Requires excellent communications skills, both verbal and written