Senior Software Engineer

Posted 10ds ago

Employment Information

Education
Salary
Experience
Job Type

Job Description

Senior Software Engineer building Upbound Spaces for cloud infrastructure management. Working on multi-tenant SaaS environments and contributing to open-source projects like Crossplane.

Responsibilities:

  • Actively build and operate Upbound Spaces in production, troubleshooting and resolving issues across multi-tenant SaaS environments, as well as contributing to Upbound's open-source projects, including Crossplane.
  • Take ownership of building features in high demand by Upbound's customers and deliver new functionality that will delight and amaze our users.
  • Investigate and debug complex issues in customer environments, including multi-control plane scenarios, resource reconciliation problems, and performance bottlenecks.
  • Communicate through thoughtful and thorough design documents for new initiatives and detailed post-incident reviews that drive system improvements.
  • Support the full project lifecycle for highly scalable and reliable services running in a cloud environment – discovery, analysis, architecture, design, review, documentation, building, migration, automation, deployment, production-readiness, and ongoing operational support.
  • Write and maintain Go code that interfaces with the Kubernetes API, such as operators, controllers, add-ons, etc., with a focus on observability, debuggability, and operational excellence.
  • Deploy, manage, and troubleshoot our Kubernetes services in production, using metrics, logs, and traces to identify and resolve issues quickly.
  • Build and maintain operational tooling for debugging customer environments, analyzing control plane health, and automating incident response.
  • Author documentation, user guides, runbooks, and blog posts to support and promote new features that you release.
  • Support the software release cycle for Spaces self-hosted distributions, including diagnosing issues in customer-managed deployments.
  • Participate in on-call rotation to support Upbound Cloud, responding to incidents and driving them to resolution.

Requirements:

  • Have experience operating production cloud services at scale: monitoring, alerting, incident response, post-mortems, and continuous improvement of service reliability.
  • Have strong debugging skills across distributed systems, including experience with observability tools (Prometheus, Grafana, OpenTelemetry, distributed tracing) and techniques for diagnosing issues in production environments.
  • Have experience building and operating controllers that interact with the Kubernetes API server, including troubleshooting reconciliation loops, managing API rate limits, and optimizing controller performance.
  • Are comfortable working directly with customers to understand, reproduce, and resolve complex technical issues in their environments.
  • Take responsibility and ownership for solving problems even if they are outside your lane, especially during incidents affecting customer workloads.
  • Demonstrate excellence in your work, constantly trying to improve your skills and the operational posture of the systems you build.
  • Have empathy for customers and keep them in mind as you build solutions, understanding that reliability and debuggability are features.
  • Realize the importance of clear communication and effective collaboration to work as a team, deliver great results, and support customers through technical challenges.
  • Help create a safe environment where everyone can contribute, learn from failures, share on-call knowledge, and help each other grow as operators and engineers.

Benefits:

  • Health insurance
  • Retirement plans
  • Paid time off
  • Flexible work arrangements
  • Professional development

Upbound

Computer Software

The platform for platform teams.

Cloud ComputingEnterprise
View all jobs at Upbound