widget.job-description.alt
widget.job-description.alt
widget.job-description.alt
widget.job-description.alt

At Stackable we believe that the future of software development is open source. That's why we've made it our mission to make it as easy as possible to build and operate scalable open source data and streaming infrastructures for event processing, machine learning (ML) and AI with open source products. Modern DevOps concepts, such as a consistent Infrastructure-as-Code approach, enable companies of all sizes to easily integrate and manage their data platforms in their respective IT infrastructure - be it in their own data center or in the cloud.

Automation and Site Reliability Engineer (f/m/d)

Full or Part Time
Deutschland
Experienced Professional
June 16, 2023
Remote

Who are we?

At Stackable we are building a data platform based on Kubernetes. We frequently must take design decisions that will either haunt or help us for years.

 

Who are are looking for:

A solid grasp of Kubernetes fundamentals

  • You know best practices and common pitfalls
  • Ideally experience in administering and troubleshooting Kubernetes clusters, administering deployments with operators, helm charts or manually
  • You dug around the Kubernetes codebase out of curiosity or to understand its weird behaviour – that’a a bonus

Programmin experience

  • We are not fussy about whether Python, Java, Go or Haskell floats your boat
  • Willingness to learn Rust as part of your job will be code reviews and actually coding 

 

What your day-to-day job would look like: 

Your job is to help us build operators according to Kubernetes community best practices. We also need your help in integrating and staying up to date with related technologies and make suggestions on how we can improve our product.

  • How best to version CRDs?
  • Useful abstraction level in operator CRDs
  •  Permissions issues in pods when our custom csi driver provisioned files in an ephemeral volume
  • How can we represent an LDAP endpoint in a way that makes it reusable across multiple operators?
  •  Can we integrate continuous profiling? eBPF anyone?
  •  What about tracing?

Looking at the CNCF landscape, there are hundreds of projects we could integrate with. We'd like you to do research and evaluate which of those we prioritize.

We run nightly tests of our platform across many managed Kubernetes providers and sometimes these break with fun error messages. So a bit of your time would be spent diagnosing broken clusters.

Tracker image