Role: Windows SRE Engineer
Location: Remote within UK (except London - where Local to London needs to be on-site 5 days a week)
Full Time Employment - initial 6 months Contract
Inside IR35 via Umbrella
CLIENT
Site Reliability Engineering are responsible for delivering continuous improvement, automation and self-service offerings to operational teams across Bank EMEA and Securities International.
The main purpose of the role:
- Responsible for the reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil operations must perform
- Member of L3 Engineering team providing subject matter expertise and ultimate escalation
KEY RESPONSIBILITIES
Primary:
- Develop software to make infrastructure services self-managing and self-service
- Deliver continuous service improvement by developing Infrastructure as Code
- Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value
- Improve system performance, make effective use of resources, distribute load and reduce latency
- Identify SLO’s (Service Level Objectives) to meet availability and latency objectives
- Develop pro-active monitoring solutions that alert on symptoms and not just on outages
- Perform detailed root cause analysis (RCA’s) on incidents and outages to prevent future
- Partner with development teams to improve services via rigorous testing and release procedures
- Identify technical debt and partner with application teams to build remediation plans
- Develop standard operational procedures and produce effective documentation
- Analyse workloads and devise suitable cloud migration strategies where appropriate
- Ensure all project / investment workloads are delivered according to plans and budget defined
- Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests
- Deputise for team lead when required to do so and act-up accordingly
- Identify cost saving and optimisation opportunities across the group
- Build strong working relationships across the organisation
- Adhere to the core values of the bank
Secondary:
- Perform daily health and compliance checks for all systems as required
- Ensure all systems are backed up successfully and any issues are promptly resolved
- Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved
- Ensure sufficient capacity is available to accommodate drive growth
- Respond to emails sent to the team distribution list / mailboxes in a timely manner
- Handle incidents and requests with efficiency and a “customer first” mindset
- Maintain infrastructure in a highly available, reliable, secure and performant manner
- General Server / Database / Virtualisation Administration maintenance activities
- Provide technical support to application support and development teams
- Provide consultancy to application support and development teams
- Take part in On-Call & weekend work rotation; triaging and addressing production issues as they arise
SKILLS AND EXPERIENCE
Essential:
- Exceptional skills in Microsoft Windows Server internals and related technologies
- Excellent skills in managing and maintaining Active Directory, DHCP, DNS, LDAP and Kerberos
- Extensive experience in hardware performance monitoring and tuning complex low latency systems.
- Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices
- Exceptional knowledge of scripting and programming languages such as PowerShell, Python and C#
- Fluent in Backup and Recovery processes and procedures
- Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques
- Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability
- Excellent Performance Tuning skills, in-depth knowledge of system internals, performance counters and performance measurement and analysis tools.
- Ability to interpret and implement CIS security hardening recommendations in a controlled manner
- Acute awareness of Security and Auditing requirements in a regulated environment
- “Infrastructure as Code” Principles and practices.
- “Continuous Integration (CI) and Continuous Development (CD)” Principles and practices
- Git, Ansible, Terraform and TeamCity
- Serena Deployment Automation (SDA) and Jenkins
Highly Desirable:
- Experience on writing, managing plays/playbooks on AWX / Ansible Tower
- Advance working knowledge of Kubernetes and Docker container orchestration
- Microsoft SQL Server, Oracle, Sybase ASE, MongoDB and Snowflake
- IBM Tivoli / Netcool
- Nutanix HCI and VMWare ESX
- Networking Protocols (TCP/IP, DNS, DHCP, VLAN’s)
- RHEL, Oracle Linux, Oracle Solaris and related technologies
- Cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle
- Knowledge of data security governance and regulations such as GDPR and SOX
Desirable:
- Dell EMC PowerStore (SAN) and Isilon (NAS)
- Rubrik, EMC Networker, Data Domain and IBM Tivoli Storage Manager
- CyberArk
- Splunk
- Qualys
- Cisco Tetration
- ServiceNow
- JIRA and Confluence
Contract Details
- Contract Type: Permanent
- Salary Type: per annum
- Total Applications: 1
- Last Date: 20/02/2025