A production-focused NLP system for large-scale taxonomy modernization, built for a high-volume enterprise environment with hundreds of millions of product records.

What I built

  • Hierarchical classification workflows to route predictions across multi-level taxonomies
  • Sentence-transformer fine-tuning and active-learning loops for long-tail coverage
  • Kubernetes-based orchestration for training, inference, and rollout monitoring
  • Experiment tracking and delivery workflows that supported rapid iteration

Outcome

  • Supported automation at 400M+ record scale
  • Improved recall on sparse and ambiguous product descriptions
  • Reduced manual review pressure by pushing more records through dependable automation

Stack

Python, PyTorch, Transformers, active learning, Kubernetes, and production ML operations