Ondřej Kutil

projects / ml-from-scratch

ML From Scratch

Core ML algorithms implemented from first principles using only NumPy — building deep mathematical intuition.

PythonNumPyMathematics
github ↗

The Problem

When a model fails in production, someone needs to open the black box and explain why. That requires understanding the mathematics under the surface — not just calling sklearn.fit().

This repository builds core ML algorithms from first principles using only NumPy and pandas, then validates each against the sklearn equivalent on identical data. The goal is the kind of deep technical understanding needed to audit model behaviour, diagnose failures, and explain predictions to non-technical stakeholders.

Algorithms Covered

AlgorithmKey concept practised
Linear regression (OLS)Normal equations, gradient descent optimisation
k-meansClustering data points
Neural networksForward/backward propagation, activation functions

Every implementation follows the same constraint: NumPy only — no scikit-learn, no PyTorch, no shortcuts. Every matrix operation, gradient computation, and update rule is written explicitly.

Methodology

Each algorithm follows a structured five-step process:

  1. Select — Choose an algorithm with clear mathematical foundations and practical relevance.
  2. Derive — Work through the mathematics: loss functions, gradients, update rules. Document each derivation step-by-step.
  3. Implement — Build from scratch in NumPy or pandas. Every matrix operation is visible and explainable.
  4. Validate — Run both the from-scratch implementation and the sklearn equivalent on the same dataset. Compare outputs numerically and visually.
  5. Document — Write up the case study in notebook format, explaining not just what the algorithm does but why each design decision matters.

What This Demonstrates

This project is less about novel results and more about demonstrating the ability to read a paper, translate mathematics into working code, and verify correctness rigorously.

The case study format — one notebook per algorithm — forces clear written explanation of every step. That's the same skill required when presenting model audits to stakeholders: explaining why a model makes the predictions it does, what its failure modes are, and where the boundaries of its reliability lie.

Key competencies shown:

  • Mathematical fluency — loss functions, derivatives, matrix operations implemented from first principles
  • Training dynamics — learning rates, convergence analysis, optimisation algorithms
  • Validation discipline — every implementation verified against a reference library
  • Technical communication — each notebook is a self-contained explainer, not just code

Code & Artifacts