AI/ML Infrastructure Engineer

Remote · Los Gatos, CA

Come join a team of industry and science leaders to achieve a vision of empowering innovation through state-of-the-art artificial intelligence and machine learning. We are addressing exciting challenges for our customers, at the intersection of AI/ML and cutting-edge cloud infrastructure with ML being both a core enabler for and a major feature of, our platform.

We are looking for candidates who are adept in AI/ML engineering and infrastructure engineering capabilities…


  • Design, architect, and implement the infrastructure solution in order to scale, keeping in mind the performance and infrastructure costs associated with an AI system.
  • Develop and adopt low-latency and scalable infrastructure that leverage AI models.
  • Demonstrate high competency in understanding the requirements of an AI-powered solution and structuring the development and releases of the software system.
  • Endorse the latest data science and engineering practices within our organization to ensure the scalability of our systems and mobility of development.
  • Collaborate with the data science, engineering, and company leadership, helping to set the strategy and standards for data science, engineering, and advanced analytics.
  • Conceive and prototype innovative AI products and solutions to enable our current and potential customers to adopt aiXplain’s platform.
  • Innovate, design, develop, test, deploy, maintain, and enhance software solutions, along with managing project priorities, deadlines, and deliverables.


  • Entrepreneurial: dealing with ambiguity and working in a highly collaborative tech-startup environment while maintaining a customer-centric approach.
  • 2+ years of working experience in an infrastructure role with strong Kubernetes experience.
  • BSc in Computer/Software Engineering, Science, or a similar technical field.
  • Experience in designing, building, and deploying end-to-end ML pipelines using DL frameworks like PyTorch and TensorFlow 2.0.
  • Experience in MLOps, AutoML, and big-data platforms such as Kubeflow, MLflow, Hadoop, Spark, H2O, Kubernetes, and Docker.
  • Experience in designing, building, and deploying highly scalable distributed ML models and/or software systems.
  • Knowledge of Python and experience with at least one MVC framework (e.g., Django) and one statically typed language (e.g., C++ or GoLang).
  • Experienced with SQL/NoSQL databases and big-data platforms (MongoDB, Hadoop, Elasticsearch, Redis, Cassandra, Spark, H2O).
  • Experience configuring, scripting, and managing cloud infrastructure environments (AWS, Azure, GCP, and Linux/Unix administration).

Desired skills

  • 3+ years of experience in software development, machine learning engineering, infrastructure engineering, and backend engineering.
  • Kaggle achievements and/or open-source project contributions.
  • Delivered talks for machine learning software or infrastructure in tech conferences, preferably applications involving AI/ML.
  • Excellence in deep learning frameworks: PyTorch, TensorFlow 2.0.
  • Experience in unsupervised, semi-supervised, and active learning.
  • Experience in hyperparameter optimization methods and frameworks.
  • Experience in neural architecture search, and model compression/distillation.
  • Experience with cloud infrastructure orchestration using scripting tools.

Not working for you?

Visit our Careers page to see other career opportunities.