CareersSite Reliability Engineer
Full-Time
Remote

Site Reliability Engineer

Corellium is seeking an experienced Site Reliability Engineer to help us build, maintain, and troubleshoot our rapidly expanding infrastructure. In this role, you will focus on measuring and improving the availability, reliability, performance, and capacity of our cloud-based enterprise software services. Contributions will range from small cloud scaling operations and logging initiatives to large-scale multi-faceted rebuilds of production systems. You will also help define the infrastructure strategy, tooling, metrics, processes, and overall product scalability as we seek to grow our customer base and increase production efficiency.

You’ll be successful in this role if you have experience increasing AWS-based production reliability and performance, and providing thought-leadership to implement best practices and tools. The position requires an ability to work across departments while negotiating outcomes with other engineers. A holistic end-to-end approach to reliability will require general programming skills with strong computer science fundamentals. As a startup, we place a strong emphasis on individual contribution and diversity of thought, and a friendly collaborative voice is greatly appreciated.

Successful candidates will have experience with the following tools and languages:

  • AWS
  • Shell Scripting Experience
  • Terraform / Ansible
  • Docker / Kubernetes
  • Node.js (Homegrown Node.js-based CI/CD for automated iOS and Android testing)
  • Jenkins
  • Git
  • Grafana
  • Jira

Responsibilities

  • Owning cloud-related #alerts: tracking down the cause of the alert, finding relevant logs, troubleshooting the alert, and resolving the source of the alert. This may range from troubleshooting the cause of an errored virtual device to the cause of a server going offline.
  • Owning, overseeing, and managing our AWS resources – including AWS accounts, permissions, settings, rogue unused resources, etc.
  • Optimizing our strategy for auto-scaling.
  • Debugging any cloud-related services and infrastructure bugs.
  • Facilitating system maintenance and incident response.
  • Analyzing logs for bug and anomaly detection, detecting new bugs or malicious use.
  • Managing observability services and expanding the metrics we can observe. Using metrics for performance tuning. Metric anomaly detection.
  • Analyzing technology currently in use, developing plans for improvement, recommending performance enhancements and cost-optimizations, identifying alternative solutions.
  • Code review contribution.
  • Managing by-with-and-through Service Level Objectives.
  • Iterative development of a holistic reliability approach.
  • Creating documentation, procedures, and reports.

Qualifications:

  • At least 4 years of experience in DevOps (infrastructure) including; mentorship, coaching of junior engineers, and leadership across an organization
  • At least 3 years of experience as an SRE
  • Engineering background or degree
  • Experience working in a startup environment
  • Experience in risk-based testing
  • Strong Javascript / Node.js skills
  • Competitive salary, benefits, and stock package
  • Completely remote, work from home, and a nice work-life balance
  • Work with impressive engineering and state of the art technology
  • Join a small team where your actions have great impact on the company’s success
  • Sponsored learning and development

At Corellium®, we create virtual models of mobile phones and other Arm-based smart devices to eliminate barriers to testing and development. Our goal is to ensure engineers are well-equipped to research, work, and test on Arm-based technologies — whether that's testing a mobile banking app at scale, creating software for a new smart car, or looking for security flaws in the latest router firmware.

We're a fully-remote team with headquarters in South Florida.


The problem with the current ecosystem is that physical devices don't scale and emulators don’t provide a true native experience. The solution: virtual devices. We create virtual models of Arm-based devices and run them on Arm servers, combining the fidelity of a real device with the convenience and scale of the cloud.

These devices run on top of our proprietary custom hypervisor, purpose-built to model complex peripherals and chipsets. From rapid hardware prototyping to advanced security testing, our groundbreaking platform accelerates development work on Arm technologies.


As a fully remote team, we rely on a lot of tools to make asynchronous work efficient. Our favourites at the moment are GitHub, Slack and Linear.


Corellium® is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. 


Think you fit the role?

We’d love to have a chat. Get in touch!

Locations

Remote

Who you'll work with

Chris Wade

CTO and Co-Founder

Stan Skowronek

Chief Architect and Co-Founder