Hi, I'm Mustapha Unubi Momoh.

Machine Learning Engineer
I solve real-world problems with machine learning

About

Machine Learning Engineer

I build production machine learning systems (e.g., recommender systems), model serving stacks, and machine learning infrastructure on cloud and Kubernetes-based systems.

Core Focus

  • Recommender systems and Personalization
  • Model serving and Inference infrastructure
  • Training pipelines, Evaluation, and Monitoring
  • Applied GenAI and Document Intelligence

Featured Projects

Production ML Case Study

Production Multistage Multimodal Recommender on Amazon Elastic Kubernetes Service (EKS)

Built and deployed an end-to-end recommender system with candidate generation, ranking, reranking, filtering, feature caching, and Triton serving on Kubernetes.

Multistage multimodal recommender system serving pipeline
Why This Design

The target use case is an ecommerce homepage recommender that serves both registered users and anonymous visitors. Recommendations need to account for request context such as device type, time of day, and day of week, while still producing reasonable cold-start results for users with little or no history.

The system also has to scale to large product catalogs. Scoring millions of items on every request is impractical, so the architecture uses a multistage design: a lightweight retrieval stage quickly narrows the candidate set, then a heavier ranking stage scores the smaller pool.

To keep the system current without rebuilding the full retrieval stack every day, I separated the workflow into an initial Kubeflow pipeline and an incremental fine-tuning pipeline. The initial pipeline builds preprocessing workflows, trains models from scratch, creates the ANN index, and deploys Triton. The incremental pipeline updates the query tower and ranker with new interactions while keeping item embeddings fixed.

System Components
  • Two-Tower candidate generation, Seen-items filtering, DLRM ranking, Diversity reranking.
  • Cold-start handling with feature masking, context-aware recommendations, and multimodal learning with CLIP-image and Sentence-BERT embeddings.
  • Serving stack with Amazon EKS, NVIDIA Triton, TensorFlow, Feast feature stores (Offline backed by S3 and Athena, Online backed by Valkey (Redis)), ANN index (FAISS), Kubeflow pipelines, and Valkey-backed Bloom filters.
Online Feature Updates

Recommendation requests and feature updates run through separate Lambda functions in the same VPC as ElastiCache and the EKS node subnets hosting Triton. Triton is exposed through an internal AWS Network Load Balancer, while DynamoDB, S3, and SQS are reached privately through VPC endpoints.

Online feature update and recommendation request flow with Lambda, internal Network Load Balancer, Triton on EKS, ElastiCache, DynamoDB, S3, SQS, and VPC endpoints
Interactive Demo

The demo shows recommendations adapting to changing user preferences in near real time, using the online feature update path above to refresh user features as new interactions are generated. Click the image to view the demo.

YouTube thumbnail for the interactive multistage recommender system demo with user, context, Top-K controls, scores, and recommendation cards
Triton Serving Ensemble

A single client request flows through a Triton ensemble: context preprocessing, Feast-backed user lookup, NVTabular transforms, Two-Tower retrieval with FAISS, Bloom-filter seen-item removal, item feature lookup, DLRM ranking, and final softmax sampling.

Triton serving graph for multistage recommender with context, retrieval, filtering, item features, ranking, and response stages
Feature Caching Optimization

Loading item features into an in-memory NumPy cache at model initialization reduced lookup latency from 195 ms to 0.5 ms, cut end-to-end latency by 54% and improved throughput by 310%.

Before and after feature caching optimization showing Feast network lookups replaced by in-memory NumPy cache
Training, Deployment, and Monitoring

The MLOps flow keeps training and serving on Amazon EKS: Kubeflow prepares data and models, artifacts are persisted to Amazon EFS, NVIDIA Triton Inference Server serves the 14-model ensemble, and Prometheus/Grafana track utilization, throughput, and latency for capacity planning.

MLOps architecture for multistage recommender on Amazon EKS with Kubeflow, Triton, Prometheus, Grafana, GPU nodes, and CPU nodes
Initial Pipeline Run

The initial run builds the system from scratch. This includes the data preprocessing, feature store setup, Two-Tower and Deep Learning Recommendation Model training, ANN index setup, and Triton Server deployment.

Initial training pipeline for multistage recommender showing full data preparation, feature engineering, model training, artifact generation, and serving preparation
Incremental Pipeline Run

The incremental run updates the system without rebuilding everything from scratch. The Two-Tower is finetuned with the candidate encoder frozen, so only the query tower is updated. The ranker is finetuned with all layers trainable. Training uses recent data and some historical data for stability. The fine-tuned models are deployed to the server and Triton picks them up.

Incremental update pipeline for multistage recommender showing partial refresh of data, features, embeddings, indexes, and serving artifacts

For the full implementation details, architecture decisions, and deployment notes, see the TDS article, Medium article, demo, or source code linked above.

Tools: Amazon EKS, NVIDIA Merlin, NVIDIA Triton, Feast, FAISS, Kubeflow, Redis/Valkey, CLIP, Sentence-BERT

ML Infrastructure Case Study

Recommender System with Continuous Retraining on Amazon Elastic Kubernetes Service (EKS)

Built and deployed a recommender system for Ads ranking on Amazon EKS. It includes a monitoring component that triggers incremental retraining when model performance drifts below a defined threshold.

DCN-based recommender system architecture with continuous retraining on Amazon EKS
System
  • Trains a Deep and Cross Network model to predict the click probability for an Ad, using the user and item (Ad) features. The training data is a subset of the Criteo 1TB click logs.
  • The monitoring component triggers incremental finetuning when model performance (based on AUC-ROC) drifts below a defined threshold.
  • One Kubeflow pipeline orchestrates both the full and incremental training runs. Incremental training is either triggered by the monitoring component or scheduled periodically.
Autoscaling Strategies

Triton autoscaling uses a custom queue latency metric. When average request queue time exceeds the 200 ms target, HPA schedules additional Triton replicas. The project includes two node-scaling paths for pending GPU workloads. In one variant, Karpenter launches GPU nodes directly, and in the other, Cluster Autoscaler increases the desired capacity of a GPU Auto Scaling Group and lets AWS Auto Scaling provision the nodes.

Kubernetes HPA with Cluster Autoscaler for recommender serving on Amazon EKS
HPA with Cluster Autoscaler
Kubernetes HPA with Karpenter for recommender serving on Amazon EKS
HPA with Karpenter

Full implementation details and source code are linked at the top of this case study.

Tools: Amazon EKS, NVIDIA Merlin, HugeCTR, Kubernetes, model monitoring, autoscaling

Experience

Machine Learning Engineer (Recommender systems)
  • Designed and proposed recommender-system architecture options on AWS and GCP, evaluating tradeoffs in training speed, inference latency, delivery timelines, and operating costs across data ingestion, model training, and inference.
  • Collaborated with the product team to define data and ranking requirements for personalized search and recommendation features for Pigment app.
  • Collaborated with engineering to train recommendation models for Pigment app enabling homepage content personalization for millions of users.
  • Led discussions around recommendation request/response caching to optimize performance, including evaluating trade-offs between different cache types.
  • Tools: Recommendation algorithms, Vertex AI, and Cloud functions
November 2024 – July 2025 | United States, Remote (Contract)
Data Scientist (OCR, ETL, and Automation)
  • Designed and deployed an ETL pipeline to extract mortgage rates from structured documents using Azure AI Document Intelligence, Azure Functions, and Blob triggers.
  • Benchmarked OCR pipeline tools, including Amazon Textract, Google Document AI, Azure AI Document Intelligence, and vision-language models for tabular data extraction.
  • Automated document processing with blob-triggered functions and upserted extracted mortgage-rate data into PostgreSQL for application use.
  • Tools: Azure AI Document Intelligence, Azure Functions, Blob Storage, PostgreSQL, Amazon Textract, Google Document AI
May 2024 – December 2024 | Vancouver, Remote
Machine Learning and Generative AI Engineer (Contracts)

Several Companies including Stealth startups and Upwork clients

  • Trained, packaged, and deployed deep learning models for spoofing verification for credit card and spend management companies.
  • Worked with the VP of Engineering to set up API gateways and collaborated on API specifications and technical reports detailing benchmarking results.
  • Led the Data Science team in pitches to two corporate credit card and spend management companies with positive feedback.
  • Worked as a Generative AI consultant for a Copilot development for a Visual programming language.
  • Built AWS well-architected solutions for startups and companies for usecases including Medical GenAI, Injury Claims LLM-assisted processing, Image upscaling, and OCR for check management.
  • Worked on a POC of an AI shopping Assistant similar to Shopify’s shop.app but tailored to the client’s inventory.
  • Worked on Beauty Retail Generative AI POC using PaLM-2, Stable Diffusion, and Vertex AI.
  • Worked on Causal understanding of REM sleep, Deep sleep, and Sleep latency project with TabNet, SHAP, and PyMC
  • Tools: Python, AWS Lambda, SageMaker Endpoint, TensorFlow Serving, SQS, Docker, API Gateway, AWS Bedrock, Large Language Models (Titan), text embedding, Vector DBs, Amazon Kendra, Streamlit, AWS EC2, Stable Diffusion, GCP, vertex AI, AI agents, Knowledge graphs, Amazon Neptune, neo4j, Amazon kendra, Entity extraction, Intent recognition, Explainable AI with SHAP, Bayesian Causal Inference, and Machine Learning
April 2023 - present | United States (NYC) | Canada (Remote)

Selected Projects and Hackathons

Documentation Review Application for Atlassian
2024 NVIDIA AI Hackathon: AI Assisted Documentation Review Review and Update

AI Assisted Documentation Review Review and Update Application using AWQ Quantized 13B llama and TensorRT-LLM

Accomplishments
  • Tools: llama-2, Nvidia TensorRT, TensorRT-LLM, Quantization, Streamlit, docker, Nvidia RTX 4090
  • Launch app and Login with your Atlassian Confluence Credentials.
  • Your documentation/articles in Confluence space will be auto downloaded and indexed
  • Chat with the documentation or
  • Create new content by providing a title, edit the generated content, and publish
GEN-AI app
Retrieval Augmented Generation with AWS Bedrock, Kendra, and Amazon Titan

Retrieval Augmented Generation with AWS Bedrock, Kendra, and Amazon Titan for content and slides generation

Accomplishments
  • Tools: AWS Bedrock, AWS Kendra, EC2, Amazon Titan model, Prompt Engineering, Amazon s3
  • Clone the repo
  • Launch the application and create long form articles or short powerpoint slides
Screenshot of  web app
Medical Decision Support

Machine Learning Clinical Decision Support System Proof of Concept with LIME and Decision Trees.

Accomplishments
  • engineered features such as speech speed, average characters, average nouns, sentiments from interview recording and transcripts
  • trained a decision tree classifier for detecting the likelihood of depression
  • used Local Interpretable Model Agnostic Explanations (LIME) to produce local feature contributions and Visualizations for interpretable ML
  • app can generate and display prediction probabilities, decision trees, LIME plots, and Feature importance on the interface
  • users can generate a short medical report with their assessment
Screenshot of  web app
Interactive Text Label Explorer

An Interactive Dashboard for Text Label Exploration.

Accomplishments
    The preprocessing steps include:
  • creating word embeddings.
  • Projecting the embeddings vector to 2D plane using dimensionality reduction techniques (17 of them used in the project)
  • Topic modeling to produce clusters based on topics.
  • The dashboard allows users to interactively explore the data and labels in different panels including:
  • label-based groupings view
  • topic-based groupings view
  • top sentences view
  • top words and word cloud view
  • Based on findings from the explorations, the user can select data for review directly from the scatterplots. The selected data can be downloaded by clicking on a button. Please watch the demo video for more details.
Screenshot of  web app
Microsoft Responsible AI Hackathon - Deeplearning Assisted Diagnosis of Primary Open-Angle Glaucoma

Deeplearning Assisted Diagnosis of Primary Open-Angle Glaucoma

Accomplishments
  • the solution leverages a finetuned ResNet50 model and Azure Custom Vision Classifier to analyze fundus images for glaucomatous changes
  • it leverages techniques such as smart tagging for optic disc region identification, suitable for the calculation of cup-to-disc ratio
  • the datasets leveraged for training and testing include Retina Fundus Images for Glaucoma Analysis (RIGA) and the Dhrishti datasets
Screenshot of  web app
Multivariate Regression and Explainable AI with SHAP

Multivariate Regression and Explainable AI with SHAP: exploring factors affecting sleep latency, rem sleep, deep sleep, and number of awakenings.

Accomplishments
  • developed regression models capable of predicting variables such as awake time, rem sleep time, deep sleep time, sleep latency, and number of awakenings
  • used SHAP and sensitivity analysis to explain the model's predictions
  • the models leveraged in this project include Support Vector Machines, XGboost, and TabNet Regressor
GIF of Demo
Animated Node-link and Adjacency Matrix Transition

Animated Node-link and Adjacency Matrix Transition using the Les Miserables dataset

Accomplishments
  • An implementation of an animated transition between a force directed graph and an adjacency matrix
  • Users can hover over a node to enlarge and highlight its direct connections. This will also display the character name and description.
  • Click and drag nodes to reposition them. Other nodes will repel and move accordingly
  • After initiating 'Start Transition', interactions are limited to hover details due to overlapping elements that disable other interactions like link highlighting and node dragging
  • Users can toggle between the node-link and adjacency matrix views
  • If you re-order nodes in the matrix view, ensure you allow the reordering process to complete before switching back to the node view.
  • Overlapping names in the matrix view can be resolved by completing the reordering process.

Skills

Production ML and Recommender Systems

Retrieval Ranking Reranking Two-Tower models Transformer-based sequence encoders Session-based recommendations DLRM DCN Context-aware recommendations Cold-start handling ANN search with FAISS Feature stores Bloom filters MRR, NDCG, Precision@K

ML Serving and MLOps

NVIDIA Triton Inference Server NVIDIA Merlin HugeCTR NVTabular Kubeflow Pipelines Docker Kubernetes Amazon EKS Prometheus Grafana HPA Karpenter Cluster Autoscaler

Cloud and Data Infrastructure

AWS Lambda Amazon S3 Amazon Athena DynamoDB SQS Amazon EFS ElastiCache / Valkey SageMaker API Gateway Vertex AI Cloud Functions Azure Functions Blob Storage PostgreSQL

GenAI, OCR, and Applied AI

RAG AWS Bedrock Amazon Titan Amazon Kendra Prompt engineering Stable Diffusion Azure AI Document Intelligence Amazon Textract Google Document AI Vision-language models OpenCV SHAP

Languages, Frameworks, and Analysis

Python SQL R Bash TensorFlow PyTorch Keras scikit-learn NumPy Pandas PyMC matplotlib Git

Education

University of Waterloo

Ontario, Canada

Degree: Master of Applied Science in Systems Design Engineering

Thesis: Remote Medical Diagnosis in Virtual Reality: A Mixed-methods approach to understanding Patients and Physicians’ Perceptions through Thematic Analysis and Regression Discontinuity Design.

Relevant Courseworks:

Contact