Sparse Autoencoders (SAEs) and Cross-Layer Transcoders (CLTs) are two approaches to interpretability of transformer models. Read up on what they're good for and how they differ.
A brief history of LLM Scaling Laws from compute-optimal training and inference to scaling test-time compute and whether Scaling Laws are coming to an end.
An overview of the motivations and techniques used for generating synthetic data for LLM post-training, as seen in the Llama 3.1, AFM, Qwen2 and Hunyuan-Large papers.