# Structure your codebase using conventions Below is a generic and simple example of a folder tree structure for a software project. Here we provide you with some examples of project structure. This exercise consists on using one that fits your project needs. We provide After looking at the example bellow and references to the templates we provide. Decide which project template to use. ## About project structure and its utility A well-organized project structure is crucial for project scalability and developer productivity. This guide divides the project into distinct directories, each serving a specific purpose. Here's a brief overview of a conventional project structure: ``` ProjectName/ ├── src/ │ ├── main.py # Main application script │ └── module/ │ ├── __init__.py # Makes Python treat directories as containing packages │ └── helper.py # Supplementary functions ├── lib/ │ └── README.md # Information about included libraries ├── tests/ │ ├── unit/ │ │ └── test_helper.py # Unit tests for helper module │ └── integration/ │ └── test_flow.py # Integration tests ├── docs/ │ ├── README.md # Project overview documentation │ └── setup.md # Setup instructions ├── bin/ │ └── run.sh # Executable script for starting the application ├── data/ │ └── dataset.csv # Example dataset (consider .gitignore for large/sensitive data) ├── notebooks/ │ └── analysis.ipynb # Jupyter notebook for data analysis ├── config/ │ ├── app_config.yaml # Application configuration file │ └── .env.example # Environment variables template ├── build/ │ └── app.exe # Compiled application (for distribution) └── .gitignore # Specifies intentionally untracked files to ignore ``` This folder tree provides a high-level overview of how the different parts of your project can be organized: - **`/src`** contains your project's source code, including the main application script and any additional modules. - **`/lib`** is reserved for any third-party libraries that are not managed by a package manager. - **`/tests`** is organized into subdirectories for different types of tests (unit, integration). - **`/docs`** includes your project documentation, such as the README and setup instructions. - **`/bin`** contains any scripts or executables needed to run your application. - **`/data`** is used for storing datasets or other data files used by your project. - **`/notebooks`** is for Jupyter notebooks, particularly useful for data analysis and exploration. - **`/config`** holds configuration files and templates for environment variables. - **`/build`** includes compiled or built versions of your application for distribution. Remember, this structure is a guideline and can be adapted based on the specific needs and practices of your project. Templates are versatile tools that cater to a wide array of development needs across various domains. Here are examples of project templates for different cases, particularly useful when utilized through tools like Cookiecutter, a command-line utility that creates projects from templates: ## Instructions 1. Look at the example templates and define which one better suits your needs. 2. If none of these templates suit your needs: 2.1 Let us know. 2.2 Googler and/or ask around. 2.3 Let us know also if you find something interesting. 2.4 If none of this works, follow the guidelines explained above. ### Templates for Python Packages - **Cookiecutter PyPackage:** A comprehensive template for Python projects, facilitating the creation of Python packages with best practices in testing, documentation, and package structure. Ideal for developers looking to distribute their Python libraries. - GitHub: [cookiecutter-pypackage](https://github.com/audreyfeldroy/cookiecutter-pypackage) - Github: [Netherlands Escencience center template](https://github.com/NLeSC/python-template) ### Templates for Research Software (Equivalent to Data Science) - **Cookiecutter Data Science:** Tailored for data science projects, this template organizes data, models, analyses, and notebooks, ensuring that data science projects are reproducible and well-documented from the start. - GitHub: [cookiecutter-data-science](https://github.com/drivendata/cookiecutter-data-science) ### Templates for Machine Learning Focused Projects - **Cookiecutter Machine Learning:** Designed specifically for machine learning projects, this template includes directories for datasets, models, notebooks, and scripts, supporting ML project best practices and facilitating experimentation and collaboration. - https://dagshub.com/DagsHub/Cookiecutter-MLOps - https://github.com/Chim-SO/cookiecutter-mlops ### How to Use These Templates To use these templates, first, ensure you have Cookiecutter installed. If not, you can install it via pip: ``` pip install cookiecutter ``` Then, create a new project by running: ``` cookiecutter [template-url] ``` Replace `[template-url]` with the GitHub URL of the template you wish to use. ### Contribute if you find useful templates for other languages We acknowledge the diversity of programming languages and project needs in the research community. While we currently focus on Python and its ecosystems due to its wide adoption in data science and machine learning, we are open to expanding our template repository. If you have or know of project templates for MATLAB or other programming languages that align with research software development best practices, please share them with our community. Your contributions can help broaden the support for various research software development needs. By sharing and utilizing these templates, we can collectively enhance the efficiency, quality, and reproducibility of research software development projects.