Evals is OpenAI's open-source framework for evaluating large language models and the systems built with them. It allows developers to run existing benchmarks or create custom evaluations to measure model performance, making it a key tool for anyone working with or building upon OpenAI models.