Job Description
About the Role
We are looking for Software Development Engineers to help build our Agent Platform: the infrastructure that enables teams to develop, deploy, and operate AI agents in production.
In this role, you will design and implement backend services, systems, and tooling that support the Software Development Life Cycle (SDLC) for AI applications. You’ll work on problems spanning distributed systems, orchestration, and developer experience, helping teams reliably build, test, and scale AI-powered workflows.
This is a great opportunity for an engineer who enjoys building production systems, solving complex technical problems, and working at the intersection of platform engineering and AI.
Key Responsibilities :
- Build and maintain services and tooling that support agent deployment, testing, and lifecycle management within the CI/CD pipeline.
- Develop systems for workflow coordination, state management, and tool integration in the context of development and operations.
- Write high-quality, maintainable code in Python to power platform capabilities and APIs.
- Deploy and operate services on Kubernetes, ensuring reliability and scalability.
- Contribute to systems for observability, logging, tracing, and debugging of distributed workflows in development and production.
- Improve system performance across latency, throughput, and fault tolerance.
- Build internal tools and APIs that improve the developer experience for teams using the platform.
- Collaborate with engineers, product managers, and AI teams to deliver production-ready solutions and development infrastructure.
- Participate in design discussions and contribute to system architecture and technical decisions.
About You
Basic Qualifications (Software Engineer):
- 3+ years of software development experience building backend systems or services.
- Strong proficiency in Python (or similar languages such as Go or Java).
- Experience designing and building distributed systems and scalable services.
- Experience running and operating services in Kubernetes-based environments.
- Familiarity with machine learning or LLM-powered applications and the challenges of running them in production, especially related to their deployment and operational tooling.
- Experience designing systems with a focus on reliability, scalability, observability, and maintainability.
- Strong understanding of APIs, asynchronous processing, and service-oriented architecture.
- Bachelor’s degree in Computer Science, Engineering, or related discipline, or equivalent practical experience.
Basic Qualifications (Senior Software Engineer)
- 5+ years of software engineering experience building and operating production-grade backend or platform systems.
- Strong proficiency in Python (or similar languages such as Go or Java).
- Experience designing and building distributed systems and scalable services.
- Experience running and operating services in Kubernetes-based environments.
- Familiarity with machine learning or LLM-powered applications and the challenges of running them in production.
- Experience designing systems with a focus on reliability, scalability, observability, and maintainability.
- Strong understanding of APIs, asynchronous processing, and service-oriented architecture.
- Bachelor’s degree in Computer Science, Engineering, or related discipline, or equivalent practical experience.
Other Qualifications:
- Ability to navigate ambiguity, make sound technical decisions, and drive projects end-to-end.
- Strong collaboration and communication skills.
- Experience working on platforms, infrastructure, or developer tooling.
- Familiarity with workflow orchestration or multi-step processing systems.
- Experience with monitoring, logging, and observability tools.
- Familiarity with cloud platforms and modern deployment practices (CI/CD).
- Experience building or supporting platforms, developer infrastructure, or internal tooling.
- Experience with agent-based systems, workflow orchestration, or complex multi-step pipelines from an infrastructure or SDLC perspective.
- Exposure to LLM-based applications or agent-style architectures.
- Familiarity with LLM application patterns, such as: Tool integration, Retrieval-augmented generation (RAG), Context and memory management and/or Multi-step workflows.
- Experience with observability stacks, tracing systems, and debugging distributed workflows.
- Familiarity with model serving, vector databases, or evaluation frameworks in the context of MLOps.
- Experience mentoring engineers and contributing to technical direction.
Workday Pay Transparency Statement
Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please click here.
Primary Location: CAN.ON.Toronto
Primary Location Base Pay Range: $112,000 CAD – $168,000 CAD
Primary CAN Base Pay Range: $112,000 – $168,000 CAD