Kubernetes & AI Infrastructure

Kubernetes & AI Infrastructure,
Built from Metal Up.

Deep-dive content on Kubernetes at scale, GPU clusters, distributed training, and the real engineering behind production AI systems. No fluff. No slides-only talks.

500+
Microservices in Production
5x
CNCF Kubestronaut
GPU
Bare-Metal Cluster

What we build here

Engineering education that
ships real infrastructure.

Every piece of content comes from running production systems — not textbooks.

☸️

Kubernetes at Scale

Running 500+ microservices across 30+ teams on bare metal. Cilium CNI, BGP routing, Envoy Gateway, etcd internals, and capacity planning from first principles.

GPU Infrastructure & Distributed Training

Multi-node DDP, NCCL collectives, vLLM deployment, tensor parallelism, and GPU operator internals — from real bare-metal clusters with RTX 4090s and A100s.

🔁

MLOps & AI Platform Engineering

MLflow, Argo Workflows, KServe, Kubeflow Pipelines, and the platform layer that keeps AI teams shipping without blocking on infra.

📡

Observability & GitOps

Prometheus, Grafana, Loki, ArgoCD — how to actually run GitOps at scale across 30+ teams without it becoming a mess.

🧠

Transformer Architecture & Inference

Attention mechanics, GQA/MQA, quantization (AWQ/GPTQ), and inference optimisation from a platform engineer's perspective.

🛠️

Career in AI Infrastructure

How to break into top-tier AI infra roles at CoreWeave, Nebius, Lambda Labs — interview prep, portfolio builds, and what actually matters.

Topics

Everything in the stack.

Kubernetes
NVIDIA GPU Operator
vLLM
Distributed Training
PyTorch DDP
NCCL
Cilium CNI
Envoy Gateway
ArgoCD
MLflow
Argo Workflows
Terraform
KServe
Kubeflow Pipelines
QLoRA Finetuning
Axolotl
Prometheus & Grafana
Rook-Ceph
BGP Routing
HPC Networking
Transformer Architecture
AWQ Quantization

Credentials

Built by someone
running production.

Not a course creator who read the docs. A Lead Platform Engineer who ships this daily.

🏆

CNCF Kubestronaut

All five CNCF Kubernetes certifications — CKA, CKAD, CKS, KCNA, KCSA

☁️

AWS Community Builder

Containers category — recognised contributor to the AWS ecosystem

⚙️

500+ Microservice Cluster

Managing bare-metal Kubernetes for 30+ engineering teams in production

🖥️

Live GPU Infrastructure

RTX 4090s, RTX 5090s, A100s running vLLM and NLP inference in production

🔬

HashiCorp Terraform Associate

Infrastructure as Code practitioner across multi-cloud and bare-metal

🎓

Live ML Inference Endpoint

Running at inference.barilon.com — GB electricity forecasting + LLM chat

The person behind it

Isreal Urephu

Isreal Urephu

Founder & Lead Platform Engineer

Not just content.
Built from production.

I'm a Lead Platform Engineer with a decade of experience running large-scale Kubernetes infrastructure in production. I manage a bare-metal cluster of 500+ microservices across 30+ engineering teams, operate GPU nodes running vLLM and distributed training workloads, and hold all five CNCF Kubestronaut certifications plus AWS Community Builder status.

Barilon is where I document what I actually build — no slides-only content, no theory disconnected from production. Every post, video, and breakdown comes directly from real systems I run daily.

CNCF KubestronautAWS Community BuilderLead Platform EngineerKubernetes at Scale

Newsletter

Stay sharp.

Weekly breakdowns on Kubernetes internals, GPU infrastructure, and AI platform engineering. Written by a practitioner, not a content farm.

No spam. Unsubscribe anytime.

Contact

Let's work together.

Whether it's a workshop, a technical consultation, a partnership, or just a conversation about Kubernetes and AI infrastructure — I'm open to it.

or find me on