Contact us

Site Reliability Engineer (shift work hours)

Site Reliability Engineer (shift work hours)

Job Summary 

We are looking for a seasoned Site Reliability Engineer to augment our team to support its strategy of driving products and technology into everything they deliver to accelerate the growth in business. As a SRE, you'll work as part of a team of problem solvers, helping to solve complex business issues from strategy to execution.

Job Responsibilities

The team covers a variety of responsibilities that are executed by DevSecOps, Site Reliability and ML Ops Engineers, including:

  •          Defining standard reliability and resilience for infrastructure and application components.
  •          Proactive optimization of redundancies, monitoring and alerting practices and patterns
  •          Developing resilient and highly available distributed systems.
  •          Infrastructure as Code development for building cloud tools.
  •          Secrets and configuration management
  •          Monitoring systems and services, providing incident and emergency response to triage and resolve system or client issues
  •          Management of the application ecosystem improving platform infrastructure and applications with high reliability, resiliency, performance, and quality
  •          Supporting documentation, knowledge articles, and runbooks
  •          Designing, building, and Implementing SRE patterns that adhere to our client’s security guidelines and policies.

Minimum Qualifications

  •          Work hours till 9-10pm Ukrainian time is required
  •          At least 4 years of relevant working experience.
  •          Advanced Kubernetes – Must have strong skills in Kubernetes at scale using one of GKE, AKS, EKS or RKE. Experience with Kubectl and Helm.
  •          Containers: Experience deploying Java (Spring Boot) microservices in dockerized environments.
  •          Observability – Experience in setting up tools like Prom/Grafana, Datadog, AppDynamics, Splunk. to give actionable intel on a microservice environment including but not limited to synthetics, Application performance monitoring, logging and Alerting (Pagerduty/OpsGenie Integrations).
  •          Good CI/CD expertise. Jenkins, Azure DevOps, Github Actions, ArgoCD, Artifactory, Azure container registry, Google container registry and other similar tooling.
  •          SCM - Working with tools like Github/Gitlab for source code management and well as experience with branching strategies like GitFlow and trunk based.
  •          Strong troubleshooting skills – Be able to move all the way down to code level to give development teams a head start on application issues. Effectively be able to contribute to root cause analysis exercises post problem resolution.
  •          Good Communication Skills - Active listening, verbal and non-verbal communication, Clarity and Concision, Confidence, Open-Mindedness, Respect.
  •          Good Documentation skills - Be able to effectively document any automation, technical efforts so as to ensure ease of adoptability of a solution.
  •          Good collaboration skills– Must be able to work effectively with Scrum/Dev teams with a push/pull (push back and prioritize work pulled in) philosophy to manage expectations and contribute to the stability and improvement of the platform.

Nice to Have Qualifications

  •          IAC- Terraform , Pulumi. Preferably developed modules in the past rather than just using them.
  •          Security – worked with encryption at rest, in transit patterns. Experience with tools like Azure Key vault, Hashicorp Vault, Google KMS.
  •          Security – Experience with tools like Veracode, Blackduck for AppSec testing, Qualys scanners for infra testing and Twistlock/Aqua for container scanning.
  •          Automation – Must be able to identify toil and opportunities to reduce that within the team.
  •          Authentication/Authorization – Familiarity with Authn/Authz schemes like OpenID, OAuth 2.0, SAML.
  •          Scripting and Programming – Experience with Python, Powershell, Go, Java, Node.
  •          Event Driven/Event Sourcing Patterns – Familiarity with distributed event streaming platforms like Kafka, EventHub, RabbitMQ and patterns like CQRS.


  • A brand-new office in a BC “Natsionalnyy” + remote working options
  • 18 business days of vacation, 10 days of sick leave, national holidays off
  • Compensation for technical conferences/events participation
  • Free English classes
  • Gym and shower in the office
  • Medical insurance


Become part of a professional and patriotic team that works and supports the Armed Forces of Ukraine and people who need it. Join in 💙