RL Inventory Optimization

Deep reinforcement learning agents (PPO & DQN) trained to optimize inventory management and minimize stockouts.

Trained deep reinforcement learning agents to solve an inventory management problem, benchmarking PPO and DQN against a supervised learning baseline.

Applied epsilon-greedy exploration annealed from 1.0 → 0.05 to balance exploration and exploitation over the course of training.

Hyperparameters: lr=1e-4, buffer=50k, γ=0.99, target update=1000 steps

Key Metrics

  • Cumulative reward — primary optimization objective
  • Service level — fraction of demand met (no stockout)
  • Convergence — mean episode reward vs training timesteps
  • Reward distribution — episode-level variance across 200 evaluation runs

Tech Stack: Gymnasium, Stable Baselines 3 (PPO & DQN), PyTorch, scikit-learn, Matplotlib.