Data Story: What Microsoft's MAI-Thinking-1 Dataset Actually Contains
Microsoft built a frontier reasoning model from scratch on training data it claims is fully human-authored and appropriately licensed, then published a 109-page report describing how. This breakdown of the MAI-Thinking-1 dataset separates what Microsoft documented from what it left undisclosed, and explains why the gap matters for anyone building on curated training data.

.png)











.webp)



.webp)
