Senior Site Reliability Engineer
About the job
The world’s most critical--and at-risk--business applications have been neglected for far too long. Onapsis eliminates this blind spot by providing cybersecurity solutions dedicated to business-critical applications. Onapsis helps nearly 30% of the Forbes Global 100 understand the threats and risks across their SAP and Oracle landscapes, whether running on-premises, in the cloud, or in a hybrid environment.
We are looking for a robust Senior Site Reliability Engineer (SRE) who will work to ensure the availability of Onapsis Security Products. The SRE will be directly accountable for monitoring, inspecting, troubleshooting, and resolving service and product issues while continuously working with engineering partners to improve telemetry and automation.
What you will be doing, your legacy:
You’ll design, write, and deploy software to improve the availability, scalability, and efficiency of Onapsis products and services while solving complex problems related to cloud services and remotely deployed products. You’ll also develop designs, architectures, standards, and methods for large-scale distributed systems. You’ll facilitate service capacity planning, demand forecasting, software performance analysis, and system tuning.
As a member of the global Site Reliability Engineering (SRE) team on the shared full-stack ownership, you’ll understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
This role will spend significant time doing "ops" related work such as production issues and service on-call. When not working on operations, you will work on software engineering tasks such as designing and developing systems that increase our reliability and scalability and reduce operational overhead through automation. The ideal SRE candidate is a proficient programmer who has a considerable breadth of knowledge and experience, including areas such as networking, internet protocols, and Linux systems.
- Bachelor's or Master’s degree in Computer Science or related fields or equivalent experience.
- 8+ years experience, including DevOps, Software Engineering Site Reliability Engineer (SRE), working on highly scalable distributed systems
- 2+ years of programming experience writing code in OOP like Java/Python or Shell scripting
- Experience managing Linux OS (Debian/Ubuntu), Queue servers (RabbitMQ), and Database instances
- Experience Working with monitoring systems, both APMs and Infrastructure monitoring
- Experience using Code Version Control like Git
- Experience with orchestration/automation tools (Ansible, Terraform, Chef, etc.)
- Experience with container technologies (Docker, Kubernetes)
- Experience deploying code using CI/CD tools like GiLab Pipelines within change management procedures
- Experience participating in or running incident bridges
- Cloud-native application development and Cloud Technologies in AWS, GCP, or Azure
- Experience troubleshooting complex software and networking issues
- Customer obsession, passion for delighting customers
- Strong understanding of cloud concepts, on-premise infrastructure, and platforms
- Proven ability to quickly learn new technical domains and then train others
- Great verbal and written communication skills in English
Desired skills or interests in:
- Experience in cloud support, operations, NOC, or similar is preferred but not required
- Security software development best practices
- Knowledge of test-driven development (TDD), CI / CD tooling, and Agile methodologies
- Experience taking a leading role in building complex software systems that have been successfully delivered to customers
- Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
- Experience with distributed computing and enterprise-wide systems
- Experience in communicating with users, other technical teams, and senior management to collect requirements, describe software product features, technical designs, and product strategy
- Experience mentoring junior software engineers to improve their skills and make them more effective product software engineers
- Experience influencing software engineers' best practices within your team
What we offer:
- A role in shaping the future of protecting the most critical applications that run the world's business and a career that grows as the company grows.
- A unique culture of high achievement and teamwork.
- Supportive and humble colleagues are the space's top problem solvers and innovators.
- Financial security through competitive compensation and incentives.
Onapsis is establishing a new development center in Bucharest. This is a hybrid role, so candidates must be commutable to Bucharest.
Onapsis protects the business applications that run the global economy. The Onapsis Platform delivers vulnerability management, change assurance, and continuous compliance for business applications from leading vendors such as SAP, Oracle, and others. The Onapsis Platform is powered by the Onapsis Research Labs, the team responsible for the discovery and mitigation of more than 1,000 zero-day vulnerabilities in business applications.
Onapsis is headquartered in Boston, MA, with offices in Dallas, TX, Heidelberg, Germany, Bucharest, Romania, and Buenos Aires, Argentina, and proudly serves hundreds of the world’s leading brands, including close to 30% of the Forbes Global 100, six of the top 10 automotive companies, five of the top 10 chemical companies, four of the top 10 technology companies, and three of the top 10 oil and gas companies.