Job ID 310465
Site Reliability Engineer – SaaS, Azure, DevOps
Up to £70,000 – DOE
We are working with a company that has been the recognized leader in fields providing solutions which turn both structured and unstructured data into valuable and actionable information. Their success started by capturing interactions and their expertise has evolved into applying analytics on these interactions. Combined with years of cultivating domain expertise in partnership with their customers, they help customers not only understand what's happening in real time, but they can also predict what will be.
We are looking for a Site Reliability Engineer / Performance Engineer to join our clients Development team. The Site Reliability Engineer / Performance Engineer champions reliability and performance of our cloud services working with the Secure Operations, and Product Development teams. The role involves reviewing performance and security both at design time and during operation.
Must Have's: –
Have experience of design and analysis of software service performance and reliability.
Be comfortable working as part of a multidisciplinary and distributed team.
Be able to communicate effectively with both internal and external stakeholders.Responsibilities
Analyse system reliability and performance to address and prevent issues.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Automating measurement and monitoring.
Scale systems sustainably through mechanisms like automation and evolve systems by making code and configuration changes that improve reliability and velocity.
Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement.
Practice sustainable incident response, blameless post-mortems, and root cause analysis.
Training and mentoring development and operations engineers in performance and reliability topics.
Troubleshooting & responding to downtime, performance degradation and outside attacks
Identifies bottlenecks and manages architectural changes that would improve system performance.
Input into the production of system sizing spreadsheets for presales activities.
Work with operation teams, understanding defects and anomalies in production.
Keep current with advances in performance tools and methodologies
Communicate clear status and risk reports
Experience scaling high traffic SaaS applications
Experience with Kubernetes
Experience with Application Monitoring Metrics
Familiarity with agile software development life cycle and practices .
Experience in Roadmap development and execution; converting strategies into executable tasks and milestones
Previous UK security clearance. Or meeting the criteria for UK NPPV level 3 / SC clearance. (Including 3 years UK residency)Benefits
25 days holiday per annum (there are also 8 Statutory Bank Holidays)
Pension – employee must contribute 5% – they can put in more. Employer matches at 5%
Private Medical Insurance (company pays for employee, employee can add family members by paying a contribution).
Security Clearance will be essential for this position