Explore the dataset We'll use the CoQA dataset which contains a diverse set of passages, questions, and answers. Because CoQA is quite large, we'll just look at the first several passages. As with any public dataset, there's a chance that the underlying LLMs have memorized aspects of the dataset, so when developing your own scorers, it's a good idea to test them using your own private data. import