Machine Learning

Skip-gram vs CBOW: Understanding Word2vec Training Examples

A deep dive into how Skip-gram and CBOW generate training examples differently, and why it matters for your dataset size.

Oct 12, 2025

KL Divergence, Cross Entropy

tbd

Oct 12, 2025

Understanding DeepSeek's Multi-Head Latent Attention- One Trillion Dollar Math Trick

A comprehensive mathematical derivation of DeepSeek MLA's weight absorption mechanism, explaining how it compresses the KV cache by 57× while maintaining performance.

Oct 11, 2025

Understanding Normalization in Deep Learning: A Complete Guide

A comprehensive exploration of normalization techniques in neural networks, from basic concepts to advanced comparisons and real-world applications.

Apr 11, 2025