Cloud Infrastructure Engineer
Software Engineering, Other Engineering
Bellevue, WA, USA
Posted on Wednesday, May 19, 2021
TruEra provides the first AI Quality platform, to help enterprises analyze machine learning, improve model quality and build trust. Powered by enterprise-class Artificial Intelligence (AI) Explainability technology based on six years of research at Carnegie Mellon University, TruEra’s platform helps eliminate the black box surrounding widely used AI and ML technologies. This visibility leads to higher quality, explainable models that achieve measurable business results, address unfair bias, and ensure governance and compliance.
We are excited about the amazing team we’re building at TruEra. One of the core cultural principles at TruEra is: “Create what’s not there.” We’re building a team of creator-builders who are excited about our mission and keen to build large-scale systems and drive cutting-edge research in support of it.
We are a rapidly growing Series B company funded by Greylock, Wing, and Menlo Ventures, and working with both Fortune 100 customers and startups throughout the world!
About the job
As a Cloud Infrastructure Engineer on the TruEra Infrastructure team, you will be managing a scalable and highly available Data platform, AI/ML infrastructure ecosystems. We're developing the platform for both public and private cloud environments with the container as first-class citizens. Infrastructure is at the core of our platform, and we're constantly innovating to make our systems more performant, timely, cost-effective, and capable while maintaining high reliability. You'll be owning our core data and ML infrastructure and pipelines, customer sandbox, production system, CI/CD pipeline.
What You Will be Doing:
- Solve customer challenges: Understand customers' installation and deployment. It could be on-prem, cloud, and hybrid. Understand customer infrastructure, security requirements, identity integration, and setup customer incident management, etc.
- Build tailored solutions: TruEra integrates with customers' data lakes, machine learning infrastructure, access management system, etc. This role requires active participation in helping them to integrate the TruEra platform with the customer ML or data ecosystems: support and help to troubleshoot any integration issue and beyond.
- Provide technical clarity: Come up with repeatable automation and best practice for both on and off-prem deployment. Influence and participate in design discussion to create a reference architecture for each deployment model
- Infrastructure as Code: Create a programmable infrastructure that can interact with the host's or container (cloud or on-prem) for provisioning, deployment, and configuration management
- 5+ years experience working on both Public Cloud (AWS, Azure, GCP) and on-prem systems (OpenShift) or similar enterprise Kubernetes ecosystems.
- Strong DevOps/Infrastructure background, with expertise across numerous technologies.
- Expertise in working with containerized applications; docker and Kubernetes, Container Storage Interface and container networking, etc
- Hands-on experience with IaC automation like Ansible, Terraform, Puppet or Chef.
- Deep understanding of security, identity, and access management for on-premise and cloud setups.
- Understanding of logging and monitoring and security best practices
- Experience with disaster recovery tiers designed for highly available workloads.
- Experience deploying/operating CI/CD systems like Jenkins, CircleCI, GitOps, etc
- Familiarity with incident management tools and process
- Proficiency in Python or Java and scripting languages like bash.
- Experience in working with data and ML systems.
- Experience in working with enterprise IT infrastructure.
Any unsolicited resumes/candidate profiles submitted through our website or to personal email accounts of employees of TruEra are considered property of TruEra and are not subject to payment of agency fees.
See more open positions at TruEra
Something looks off?