Averages can hide changing sample sizes, hidden difficulty, and expensive success in engineering and AI metrics.