Hi, I'm Mustapha Unubi Momoh.

a/an
I solve real-world problems with machine learning

About

Machine Learning Engineer

I build production recommender systems, ML infrastructure, and applied GenAI systems on AWS, Kubernetes, and NVIDIA tooling.

My recent work focuses on ranking systems, model serving, feature retrieval, offline evaluation, retraining workflows, and operational reliability for machine learning systems.

Core Focus

  • Production recommender systems
  • ML serving and infrastructure
  • Offline evaluation and monitoring
  • Generative AI applications

Stack

  • Languages: Python, R, Bash
  • ML: TensorFlow, PyTorch, NVIDIA Merlin
  • Data: NumPy, Pandas, OpenCV, PyMC
  • Cloud: AWS, GCP, Azure, Docker, Kubernetes

Open to challenging opportunities in recommender systems, production ML, GenAI, and MLOps.

Featured Projects

Production ML Case Study

Production Multistage Multimodal Recommender on Amazon Elastic Kubernetes Service (EKS)

End-to-end recommender system with candidate generation, ranking, reranking, filtering, feature caching, and Triton serving on Kubernetes.

Multistage multimodal recommender system serving pipeline
System
  • Two-Tower candidate generation, DLRM ranking, diversity reranking, and seen-item filtering.
  • Cold-start handling with feature masking, context-aware recommendations, CLIP embeddings, and Sentence-BERT embeddings.
  • Serving stack with Amazon EKS, NVIDIA Triton, Feast, FAISS, Kubeflow, and Valkey-backed Bloom filters.
Impact
99.7%
item feature lookup latency reduction
54%
end-to-end latency reduction
310%
throughput improvement
Interactive Demo

The demo exposes the serving system as a recommender UI with controls for user ID, device type, time of day, and Top-K results, making the personalization inputs and scored outputs visible without opening the code.

Interactive multistage recommender system demo with user, context, Top-K controls, scores, and recommendation cards
Triton Serving Ensemble

A single client request flows through a Triton ensemble: context preprocessing, Feast-backed user lookup, NVTabular transforms, Two-Tower retrieval with FAISS, Bloom-filter seen-item removal, item feature lookup, DLRM ranking, and final softmax sampling.

Triton serving graph for multistage recommender with context, retrieval, filtering, item features, ranking, and response stages
Feature Caching Optimization

Profiling showed per-request item feature lookup caused hundreds of Redis/Valkey round trips through Feast. Loading item features into an in-memory NumPy cache at model initialization reduced lookup latency from 195 ms to 0.5 ms.

Before and after feature caching optimization showing Feast network lookups replaced by in-memory NumPy cache
Training, Deployment, and Monitoring

The MLOps flow keeps training and serving on Amazon EKS: Kubeflow prepares data and models, artifacts are persisted to Amazon EFS, NVIDIA Triton Inference Server serves the 14-model ensemble, and Prometheus/Grafana track utilization, throughput, and latency for capacity planning.

MLOps architecture for multistage recommender on Amazon EKS with Kubeflow, Triton, Prometheus, Grafana, GPU nodes, and CPU nodes

Tools: Amazon EKS, NVIDIA Merlin, NVIDIA Triton, Feast, FAISS, Kubeflow, Redis/Valkey, CLIP, Sentence-BERT

ML Infrastructure Case Study

Recommender System with Continuous Retraining on Amazon Elastic Kubernetes Service (EKS)

DCN-based ads-ranking recommender deployed on Amazon EKS with monitoring, drift-triggered retraining, and autoscaling.

DCN-based recommender system architecture with continuous retraining on Amazon EKS
System
  • Built and deployed a DCN-based ads-ranking recommender system on Amazon EKS.
  • Triggered continuous retraining when model performance, based on AUC-ROC, drifts below a defined threshold.
  • Designed the project around serving, monitoring, retraining triggers, and server autoscaling.
Focus
  • Production ranking architecture
  • Model performance monitoring
  • Kubernetes-native deployment

Tools: Amazon EKS, NVIDIA Merlin, HugeCTR, Kubernetes, model monitoring, autoscaling

Experience

Machine Learning Engineer (Recommender systems)
  • Designed and proposed recommender-system architecture options on AWS and GCP, evaluating tradeoffs in training speed, inference latency, delivery timelines, and operating costs across data ingestion, model training, and inference.
  • Collaborated with the product team to define data and ranking requirements for personalized search and recommendation features for Pigment app.
  • Collaborated with engineering to train recommendation models for Pigment app enabling homepage content personalization for millions of users.
  • Led discussions around recommendation request/response caching to optimize performance, including evaluating trade-offs between different cache types.
  • Tools: Recommendation algorithms, Vertex AI, and Cloud functions
November 2024 – July 2025 | United States, Remote (Contract)
AWS logo

AWS

AWS Community Builder in Machine Learning and GenAI
  • Fostering a learning community around AWS services and Machine Learning.
  • Tools: AWS services, Machine Learning, Generative AI
February 2024 - present | Waterloo, Canada
Data Scientist (OCR, ETL, and Automation)
  • Designed and deployed an ETL pipeline to extract mortgage rates from structured documents using Azure AI Document Intelligence, Azure Functions, and Blob triggers.
  • Benchmarked OCR pipeline tools, including Amazon Textract, Google Document AI, Azure AI Document Intelligence, and vision-language models for tabular data extraction.
  • Automated document processing with blob-triggered functions and upserted extracted mortgage-rate data into PostgreSQL for application use.
  • Tools: Azure AI Document Intelligence, Azure Functions, Blob Storage, PostgreSQL, Amazon Textract, Google Document AI
May 2024 – December 2024 | Vancouver, Remote
Machine Learning Engineer (DS Team lead)
  • Trained, packaged, deployed deeplearning models for spoofing verification for credit card and spend management companies.
  • Worked with the VP of Engineering to set up API gateways, and also collaborated on writing the API specification and technical report that detail the benchmarking results.
  • Led the Data Science team in pitching to two corporate credit card and spend management companies with positive feedback.
  • Tools: Python, AWS Lambda, Sagemaker Endpoint, tensorflow serving, SQS, Docker, API Gateway
March 2024 – May 2024 | United States, Remote (Contract)
Generative AI Engineer (expert-vetted)
    Several Companies including Stealth startups, Capgemini, Checkcare and Upwork
  • Worked as a Generative AI consultant for a Copilot development for a Visual programming language.
  • Built AWS well-architected solutions for startups and companies for usecases including Medical GenAI, Injury Claims LLM-assisted processing, Image upscaling, and OCR for check management.
  • Worked on a POC of an AI shopping Assistant similar to Shopify’s shop.app but tailored to the client’s inventory.
  • Worked on Beauty Retail Generative AI POC using PaLM-2, Stable Diffusion, and Vertex AI.
  • Worked on Causal understanding of REM sleep, Deep sleep, and Sleep latency project with TabNet, SHAP, and PyMC
  • Tools: AWS Bedrock, Large Language Models (Titan), text embedding, Vector DBs, Amazon Kendra, Streamlit, AWS EC2, Stable Diffusion, GCP, vertex AI, AI agents, Knowledge graphs, Amazon Neptune, neo4j, Amazon kendra, Entity extraction, Intent recognition, Explainable AI with SHAP, Bayesian Causal Inference, and Machine Learning
April 2023 - present | United States (NYC) | Canada (Remote)
Founder
  • Liased with clients to identify their needs and goals for Data Analytics.
  • Trained individuals and corporates on SQL and Python for Data Science.
  • Tools: Python, Data science
June 2020 - present | Lagos, Nigeria

Selected Projects and Hackathons

Documentation Review Application for Atlassian
2024 NVIDIA AI Hackathon: AI Assisted Documentation Review Review and Update

AI Assisted Documentation Review Review and Update Application using AWQ Quantized 13B llama and TensorRT-LLM

Accomplishments
  • Tools: llama-2, Nvidia TensorRT, TensorRT-LLM, Quantization, Streamlit, docker, Nvidia RTX 4090
  • Launch app and Login with your Atlassian Confluence Credentials.
  • Your documentation/articles in Confluence space will be auto downloaded and indexed
  • Chat with the documentation or
  • Create new content by providing a title, edit the generated content, and publish
GEN-AI app
Retrieval Augmented Generation with AWS Bedrock, Kendra, and Amazon Titan

Retrieval Augmented Generation with AWS Bedrock, Kendra, and Amazon Titan for content and slides generation

Accomplishments
  • Tools: AWS Bedrock, AWS Kendra, EC2, Amazon Titan model, Prompt Engineering, Amazon s3
  • Clone the repo
  • Launch the application and create long form articles or short powerpoint slides
screenshot of app
GenAI app for a Beauty Retail

A GenAI Proof of Concept for a Beauty Retail

Accomplishments
  • Tools:Python, Vertex AI, PaLM-2, Finetuning Stable difussion, Large Language Model (chat-bison) finetuning, model deployment
  • Automate Content creation for a beauty retail website
  • User purchase histories and interests, along with the query are used to generate branded product and model images in various scenes, before and after transformation photos, alongside relevant descriptions on the home page
  • User provides basic information about them such as age, race, hair type and length and a prompt is constructed automatically to generate suitable hair product, beauty model, and transformation photos
Screenshot of  web app
Medical Decision Support

Machine Learning Clinical Decision Support System Proof of Concept with LIME and Decision Trees.

Accomplishments
  • engineered features such as speech speed, average characters, average nouns, sentiments from intervioew recording and transcripts
  • trained a decision tree classifier for detecting the likelihood of depression
  • used Local Interpretable Model Agnostic Explanations (LIME) to produce local feature contributions and Visualizations for interpretable ML
  • app can generate and display prediction probabilities, decision trees, LIME plots, and Feature importance on the interface
  • users can generate a short medical report with their assessment
Screenshot of  web app
Interactive Text Label Explorer

An Interactive Dashboard for Text Label Exploration.

Accomplishments
    The preprocessing steps include:
  • creating word embeddings.
  • Projecting the embeddings vector to 2D plane using dimensionality reduction techniques (17 of them used in the project)
  • Topic modeling to produce clusters based on topics.
  • The dashboard allows users to interactively explore the data and labels in different panels including:
  • label-based groupings view
  • topic-based groupings view
  • top sentences view
  • top words and word cloud view
  • Based on findings from the explorations, the user can select data for review directly from the scatterplots. The selected data can be downloaded by clicking on a button. Please watch the demo video for more details.
Screenshot of  web app
Microsoft Responsible AI Hackathon - Deeplearning Assisted Diagnosis of Primary Open-Angle Glaucoma

Deeplearning Assisted Diagnosis of Primary Open-Angle Glaucoma

Accomplishments
  • the solution leverages a finetuned ResNet50 model and Azure Custom Vision Classifier to analyze fundus images for glaucomatous changes
  • it leverages techniques such as smart tagging for optic disc region identification, suitable for the calculation of cup-to-disc ratio
  • the datasets leveraged for training and testing include Retina Fundus Images for Glaucoma Analysis (RIGA) and the Dhrishti datasets
Screenshot of  web app
Multivariate Regression and Explainable AI with SHAP

Multivariate Regression and Explainable AI with SHAP: exploring factors affecting sleep latency, rem sleep, deep sleep, and number of awakenings.

Accomplishments
  • developed regression models capable of predicting variables such as awake time, rem sleep time, deep sleep time, sleep latency, and number of awakenings
  • used SHAP and sensitivity analysis to explain the model's predictions
  • the models leveraged in this project include Support Vector Machines, XGboost, and TabNet Regressor
GIF of Demo
Animated Node-link and Adjacency Matrix Transition

Animated Node-link and Adjacency Matrix Transition using the Les Miserables dataset

Accomplishments
  • An implementation of an animated transition between a force directed graph and an adjacency matrix
  • Users can hover over a node to enlarge and highlight its direct connections. This will also display the character name and description.
  • Click and drag nodes to reposition them. Other nodes will repel and move accordingly
  • After initiating 'Start Transition', interactions are limited to hover details due to overlapping elements that disable other interactions like link highlighting and node dragging
  • Users can toggle between the node-link and adjacency matrix views
  • If you re-order nodes in the matrix view, ensure you allow the reordering process to complete before switching back to the node view.
  • Overlapping names in the matrix view can be resolved by completing the reordering process.

Skills

Frameworks

kerasKeras
tensorflowTensorFlow
pytorchPyTorch
fastaiFastAI
pymcPyMC
djangoDjango
flaskFlask

Languages and Databases

Python
SQL
R programmingR
MATLAB
Shell Scripting

Libraries

NumPy
Pandas
OpenCV
scikit-learn
matplotlib

Cloud and Others

AWS
GCP
Azure
Heroku
Git

Education

University of Waterloo

Ontario, Canada

Degree: Master of Applied Science in Systems Design Engineering

Thesis: Remote Medical Diagnosis in Virtual Reality: A Mixed-methods approach to understanding Patients and Physicians’ Perceptions through Thematic Analysis and Regression Discontinuity Design.

Relevant Courseworks:

Contact