Full Job Description
This SRC is a 24x7x365 multi-disciplinary operations support function that includes system administration, software engineering, process expertise, incident response & management, change enablement, multi-tier network operations centre (NOC), and customer engagement.
As ‘Lead SRC Operation’ engineer, the successful candidate will lead by example and mentor the team to deliver 24×7 operations for multiple product lines of Nuance Communications. You will also act as an escalation point and guide the team on escalated issues. This role maintains a unique position to see the entire division and interact across all international teams as well as hold an important structural part of the team.
Qualifications:
Education : Bachelor’s degree in Computer Science, Engineering, or equivalent demonstrated IT work experience with an emphasis towards production support of high capacity mission critical systems.
Work Experience: 5+ years of (relevant) experience in supporting day to day operations for large complex IT environment.
1+ years of experience leading a team.
Shifts: 24*7 rotational
Required Skills:
Experience in leading a team – mentoring and managing shifts.
Strong Linux Server and/or Windows Server administration & operational support skills in a production environment
Knowledge of database technologies with experience with MySQL and/or Microsoft SQL Server.
Knowledge of monitoring tools such as Zabbix, Nagios, SCOM, Solar winds, etc.
TCP/IP networking knowledge and troubleshooting.
Ability to develop and plan for longer term projects to directly impact the SRC and LOB relationship and our understanding and ability to support the related products.
Experience of supporting and troubleshooting large-scale distributed systems covering application, OS and infrastructure layer.
Phone and email based direct customer support for escalated issues.
Familiarity with cloud support engineering practices.
Good hands-on experience with any of these technologies: Kubernetes, AMQ, MSSQL, Graphana, Sumologic, Nagios, SaltStack, Zenoss, HP Openview, Remedyforce, Confluence, Jira, Pagerduty.
Working experience in Linux and Windows based production environments and strong knowledge in fundamentals and internals – file systems, memory management, threads, and processes etc.
Strong understanding of networking protocols, IP packets, DNS, OSI layers and load balancing.
Ability to solve operational related challenges through automation or process related improvements.
Efficient problem-solving skills, mediation and stakeholder relationship building skills.
Self-motivated and proactive, with demonstrated creative and critical thinking capabilities.
Excellent time management and organizational skills.
Preferred Skills:
Basic knowledge of public cloud deployment architecture & administration (ideally Azure)
Experience with any automation tools.
Scripting language experience. Any of the following: Bash / PowerShell / Python / Pearl / UI Path (RPA).
Principal Duties and Responsibilities:
Train new and current team members on various tasks, assign duties and delegate responsibilities within shift.
Resolution of all incidents which have been escalated from Support teams or identified through events and alerts. Guide the team members on escalated incidents.
Responsible for all server and network related support tickets and incidents. Ensuring incidents and support tickets are resolved within agreed SLA and meet customer expectations.
Run forensics on escalated incidents to identify processes, knowledge gaps and devise a strategy to rectify incidents and gaps.
Develop and train current team members to become experts in specific fields of operations
Support 24x7x365 SRC operations.
Administration of Event Management rules in Nagios, SCOM, and other monitoring tools.
Monitor alerts mailbox and Event Management systems for Events and follow KB articles for resolution actions, performing functional escalations to on-call resources as needed
Monitor Application dashboards for indications of incidents
Invoke Incident Management process for Incidents that cannot be resolved within this team (Open the conference Bridge if needed)
Execute routine checklists to validate system functionality and batch process completion (backups, scheduled tasks, etc.)
Defining monthly and weekly activities such as systems patching, vulnerability management and standard changes.
Document and communicate system status per process definitions.
Perform tasks related to securing and keeping the products, tools, and processes that you are responsible for securing
Nuance offers a compelling and rewarding work environment. We offer market competitive salaries,
bonus, equity, benefits, meaningful growth and development opportunities and a casual yet technically
challenging work environment. Join our dynamic, entrepreneurial team and become part of our
continuing success.