# Big Bench Hard
## Summary
* [Introduction](#introduction)
* [Dataset Structure](#dataset_structure)
* [Reference](#reference)
* [License](#license)
* [Citation](#citation)
## Introduction
This is a subset of BIG Bench dataset and consists of 23 tasks that are particularly hard for current generation of language models. The dataset is called Big Bench Hard.
## Dataset Structure
### Data Instances
```
{
'input': The text of example,
'target': The label for that example
}
```
### Data Fields
* input: string
* target: string
### Data Splits
Every subset has 250 samples. There are not validation/test splits
## Reference
We would like to acknowledge Suzgun, Mirac et al. for creating and maintaining the Big Bench Hard dataset as a valuable resource for the computer vision and machine learning research community. For more information about the Big Bench Hard dataset and its creator, please visit [the Big Bench Hard website](https://github.com/suzgunmirac/BIG-Bench-Hard).
## License
The dataset has been released under the MIT License.
## Citation
```
@article{suzgun2022challenging,
title={Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them},
author={Suzgun, Mirac and Scales, Nathan and Sch{\"a}rli, Nathanael and Gehrmann, Sebastian and Tay, Yi and Chung, Hyung Won and Chowdhery, Aakanksha and Le, Quoc V and Chi, Ed H and Zhou, Denny and and Wei, Jason},
journal={arXiv preprint arXiv:2210.09261},
year={2022}
}
```