Site Reliability Engineer III
About the job
The world’s most critical--and at-risk--business applications have been neglected for far too long. Onapsis eliminates this blind spot by providing cybersecurity solutions dedicated to business-critical applications. Onapsis helps nearly 30% of the Forbes Global 100 understand the threats and risks across their SAP and Oracle landscapes, whether running on-premises, in the cloud, or in a hybrid environment.
We are looking for a SSr Site Reliability Engineer (SRE) who will work to ensure the availability of Onapsis Security Products. The SRE will be directly accountable for monitoring, inspecting, troubleshooting, and resolving service and product issues while continuously working with engineering partners to improve telemetry and automation.
What you will be doing, your legacy:
You’ll participate in writing and deploying software to improve the availability, scalability, and efficiency of Onapsis products and services while solving complex problems related to cloud services and remotely deployed products. You’ll also learn from designs, architectures, standards, and methods for large-scale distributed systems.
As a member of the global Site Reliability Engineering (SRE) team on the shared full-stack ownership, you’ll understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
This role will spend significant time doing "ops" related work such as production issues and service on-call. When not working on operations, you will work on software engineering tasks such as designing and developing systems that increase our reliability and scalability and reduce operational overhead through automation. The ideal SRE candidate is a proficient programmer who has a considerable breadth of knowledge and experience, including areas such as networking, internet protocols, and Linux systems.
- Bachelor's or Master’s degree in Computer Science or related fields or equivalent experience.
- 3+ years experience, including DevOps, Software Engineering Site Reliability Engineer (SRE), and on-call rotations, working on highly scalable distributed systems
- 1+ years of programming experience writing code in OOP like Java/Python or Shell scripting
- Experience managing Linux OS (Debian/Ubuntu), Queue servers (RabbitMQ), and Database instances
- Knowledge working with monitoring systems, both APMs and Infrastructure monitoring
- Knowledge using Code Version Control like Git
- Knowledge with orchestration/automation tools (Ansible, Terraform, Chef, etc.)
- Knowledge with container technologies (Docker, Kubernetes)
- Experience deploying code using CI/CD tools like GiLab Pipelines within change management procedures
- Experience participating in or running incident bridges
- Cloud-native application development and Cloud Technologies in AWS, GCP, or Azure
- Experience troubleshooting complex software and networking issues
- Customer obsession, passion for delighting customers
- Understanding of cloud concepts, on-premise infrastructure, and platforms
- Proven ability to quickly learn new technical domains and then train others
- Great verbal and written communication skills
- Service on-call or shift rotations will be required.
Desired skills or interests in:
- Knowledge in cloud support, operations, NOC, or similar is preferred but not required
- Security software development best practices
- Knowledge of test-driven development (TDD), CI / CD tooling, and Agile methodologies
- Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
- Experience with distributed computing and enterprise-wide systems
- Good communicating with users, other technical teams to collect requirements, describe software product features and technical designs
- Experience working with junior software engineers to improve their skills
What we offer:
- A role in shaping the future of protecting the most critical applications that run the world's business and a career that grows as the company grows.
- A unique culture of high achievement and teamwork.
- Supportive and humble colleagues are the space's top problem solvers and innovators.
- Financial security through competitive compensation and incentives.
Onapsis is establishing a new development center in Bucharest. This is a hybrid role, so candidates must be commutable to Bucharest.
Onapsis protects the business applications that run the global economy. The Onapsis Platform delivers vulnerability management, change assurance, and continuous compliance for business applications from leading vendors such as SAP, Oracle, and others. The Onapsis Platform is powered by the Onapsis Research Labs, the team responsible for the discovery and mitigation of more than 1,000 zero-day vulnerabilities in business applications.
Onapsis is headquartered in Boston, MA, with offices in Dallas, TX, Heidelberg, Germany, Bucharest, Romania, and Buenos Aires, Argentina, and proudly serves hundreds of the world’s leading brands, including close to 30% of the Forbes Global 100, six of the top 10 automotive companies, five of the top 10 chemical companies, four of the top 10 technology companies, and three of the top 10 oil and gas companies.