# CodeComplex: Code Complexity Prediction Dataset
## Summary
* [Introduction](#introduction)
* [Dataset Structure](#dataset-structure)
* [Reference](#reference)
* [License](#license)
* [Citation](#citation)
## Introduction
CodeComplex consists of 4,200 Java codes submitted to programming competitions by human programmers and their complexity labels annotated by a group of algorithm experts.
## Dataset Structure
```
DatasetDict({
train: Dataset({
features: ['src', 'complexity', 'problem', 'from'],
num_rows: 4517
})
})
```
### Data Instances
```
{
'src': 'import java.io.*;\nimport java.math.BigInteger;\nimport java.util.InputMismatchException;...',
'complexity': 'quadratic',
'problem': '1179_B. Tolik and His Uncle',
'from': 'CODEFORCES'
}
```
### Data Fields
* `src`: a string feature, representing the source code in Java.
* `complexity`: a string feature, giving program complexity.
* `problem`: a string of the feature, representing the problem name.
* `from`: a string feature, representing the source of the problem.
complexity filed has 7 classes, where each class has around 500 codes each. The seven classes are constant, linear, quadratic, cubic, log(n), nlog(n) and NP-hard.
### Data Splits
The dataset only contains a train split.
### Dataset Creation
The authors first collected problem and solution codes in Java from CodeForces and they were inspected by experienced human annotators to label each code by their time complexity. After the labelling, they used different programming experts to verify the class of each data that the human annotators assigned.
## Reference
We would like to acknowledge Mingi Jeon and Seung-Yeop et al. for creating and maintaining the CodeComplex dataset as a valuable resource for the computer vision and machine learning research community. For more information about the CodeComplex dataset and its creator, please visit [the CodeComplex website](https://github.com/yonsei-toc/CodeComple).
## License
The dataset has been released under the Apache 2.0 License.
All non-code materials provided are made available under the terms of the CC BY 4.0 license (Creative Commons Attribution 4.0 International license).
## Citation
```
@article{JeonBHHK22,
author = {Mingi Jeon and Seung-Yeop Baik and Joonghyuk Hahn and Yo-Sub Han and Sang-Ki Ko},
title = {{Deep Learning-based Code Complexity Prediction}},
year = {2022},
}
```