Engineers trace nanoGPT speedrun spikes to 2015 Marathi blog
Engineers have identified that performance spikes in nanoGPT training runs are linked to a 2015 Marathi blog post, which combines English text with dense Devanagari script. This blog post has managed to evade standard content filters used in AI datasets, raising questions about data quality and the implications for AI training.