Reliability Engineer V

Job Locations US-MN
Req ID
2025-7930
Category
Engineering
Type
Full-Time Regular
Security Access Level
Access 1: US Citizenship Only (No Dual) / CFIUS Approval / Sole US Citizen (DMV & FBI Programs)
Work Schedule
Core Business Hours

Overview

IDEMIA is the global leader in identity and security. Our mission is to create a safe and simple future where identity verification is indisputable, and only you can assert your identity. We are a distributed company leveraging the latest technologies to deliver world-class products in the private and public sectors of finance, telecom, identity, security, retail, sports entertainment, commercial, government, and IoT. We use a variety of technologies and approaches to deliver quality product and services to government agencies and technology companies. IDEMIA is a made up of a group of 14,000 diverse people from different nationalities, speaking over 20 different languages. Together, our solutions impact the everyday lives of citizens and nations. In this ever-changing world, protecting your identity is paramount. Join the team that is ensuring one person - one identity.

Responsibilities

  • Manages and (techncally) leads a team of site reliability engineers.

  • Serve as the face and lead of site reliability engineering and operations across all facets of the business units.

  • Provide vision and leadership for SRE operations within a digital landscape.

  • Responsible for understanding the requirements (services, features, and timing) from the customers of the cloud platform services to effectively support in an operational scale. Develops strategic plans, roadmaps, and business cases in collaboration with technology leaders and architects.

  • Engage teams in highly collaborative activities to drive alignment and partnership. Use effective negotiation and influence skills to ensure priorities are met.

  • Set and communicate team priorities that support the broader organization's goals. Align strategy, processes, and decision-making across teams.

  • Function as an engineering manager working in an agile environment, which includes but not limited to: story writing workshops, backlog refinement, planning, standups, all maintained through Jira.

  • As a manager, provides technical guidance to the team and mentorship to less experienced team members.

  • Engage with key stakeholders, internal and external to help foster and strengthen working relationships.

  • Provides analytical, logical, and rational thinking abilities to build enterprise level, scalable, highly available, and performant systems.

  • Provide and operate SRE functions (as needed) within a Kubernetes / EKS environment in AWS GovCloud.

  • Serve as an SRE (as needed) with an emphasis on Operations to reactively respond, triage, and remediate reported categorized issues based on severity.

  • Serve as an SRE (as needed) to proactively establish the means (through tooling) to effectively monitor, analyze, report, and observe the health and upkeep of the systems and/or environments.

  • Establish key practices to ensure the availability, stability, scalability, performance, monitoring, incident response are handled appropriately through a means of Automation.

  • Provide on-call rotation to field issues and support issues as they may arise.

  • As a senior engineer, provides technical guidance and mentorship to less experienced team members.

  • Collaborate with specific SMEs from various teams to investigate, troubleshoot, and resolve issues.

  • Implement automation to mitigate risks and faults based on reactive and proactive measures.

  • Construct and maintain incident response playbooks with documented corrective actions.

  • Adhere to an established and well defined escalation process to handle reported incidents.

  • Function as an engineering team lead/manager in an agile environment, which includes but not limited to: story writing workshops, backlog refinement, planning, standups, all maintained through Jira.

  • Participate in the investigation and breakdown of technical issues, thoroughly, and support in troubleshooting, identifying, and addressing root causes.

  • Establishes proactive solutions to prevent faults within the system and underlying infrastructure.

  • Build automation practices across applicable aspects that improve the overall efficiency and scalability of our applications and infrastructure.

  • Documents on a consistent basis for knowledge sharing and redundancy as a part of the definition of done.

  • Engage with key stakeholders, internal and external to help foster and strengthen working relationships.

  • Provides analytical, logical, and rational thinking abilities to build enterprise level, scalable, highly available, and performant systems.

  • Demonstrate proficiency and ability in creating reusable tools through scripting or development languages such as: Python, PowerShell, Perl, Java, BASH, Shell or other languages.

  • Automates pipelines used for SRE functions in a continuous delivery and deployment (CI/CD) model.

  • Analyze all platform level changes and monitors for resulting issues to effectively formulate technical solutions.

  • Work with cross functional teams within the internal teams in North America and Europe.

Qualifications

  • Bachelors in Computer Science or a related field or equivalent work experience.

  • 3+ years of experience in a leadership engineering role with direct/indirect reports.

  • Possesses a deep knowledge of AWS (or Cloud) foundation principles and design, cloud security and compliance, cloud networking and pipelines.

  • Extensive experience with Agile development methodologies, Automation, SRE, and/or DevOps principles.

  • Experience managing large scale environments.

  • Communicates with honesty and kindness and creates the space for others to do the same.

  • Leads with courage, knowing the possibility of greatness is bigger than the fear of failure.

  • Fosters connection by putting people first and building trusting relationships.

  • Hands on working knowledge or familiarity of Observability Services such as: ELK stack, CloudWatch, Jaeger, Kiali, Grafana, Prometheus, New Relic, Datadog, Netdata.

  • Experience with being on call and working with incident response tools such as: PagerDuty, VictorOps.

  • 7+ years of experience working with Cloud providers: AWS, Microsoft, Google.

  • 5+ years of experience working with Deployment Automation such as: Ansible, Helm, Chef, Puppet, Vagrant.

  • 5+ years of experience working with IaaC such as: Terraform, CDK, CloudFormation.

  • 5+ years of experiencing working with source control tools such as BitBucket, Git, SVN.

  • 5+ years of experience working with CI/CD tools such as Jenkins, Bamboo, TeamCity, GitLab.

  • 5+ years of experience working as an SRE or DevOps Engineer

  • Hands on working knowledge or familiarity with service mesh architectures (specifically Istio) is a plus

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed