High Performance Data Analytics in Python - Event Page
Jan. 21-23, 9:00-12:00 (CET), 2024
Welcome to the online workshop on High Performance Data Analytics in Python
on Jan. 21-23 (2025). Python is a modern, object-oriented, and an industry-standard programming language for working with data on all levels of data analytics pipeline. A rich ecosystem of libraries ranging from generic numerical libraries to special-purpose and/or domain-specific packages has been developing using Python for data analysis and scientific computing.
This three half-day online workshop is meant to give an overview of working with research data in Python using general libraries for storing, processing, analyzing and sharing data. The focus is on improving performance. After covering tools for performant processing (netcdf, numpy, pandas, scipy) on single workstations the focus shifts to parallel, distributed and GPU computing (snakemake, numba, dask, multiprocessing, mpi4py).
This material is for all researchers and engineers who work with large or small datasets and who want to learn powerful tools and best practices for writing more performant, parallelised, robust and reproducible data analysis pipelines. This workshop is an interactive online event, featuring live coding, demos, and practical exercises. We aim to equip you with the tools and knowledge to write efficient, high-performance code using Python.
After attending the workshop, you should:
Time | Contents | Instructor(s) |
---|---|---|
09:00-09:15 | Welcome | Yonglei |
09:15-09:30 | Motivation | Yonglei |
09:30-10:20 | Scientific data | Francesco |
10:20-10:40 | Break | |
10:40-11:55 | Efficient array computing | Francesco |
11:55-12:00 | Q/A & Reflections |
Time | Contents | Instructor(s) |
---|---|---|
09:05-10:20 | Parallel computing | Qiang |
10:20-10:40 | Break | |
10:40-11:55 | Profiling and optimizing | Ashwin |
11:55-12:00 | Q/A & Reflections |
Parallel: https://aaltoscicomp.github.io/python-for-scicomp/parallel/
Time | Contents | Instructor(s) |
---|---|---|
09:05-10:15 | Performance boosting | Yonglei |
10:15-10:30 | Break | |
10:30-11:55 | Dask for scalable analytics | Ashwin |
11:55-12:00 | Q/A & Summary | Yonglei |
Profiling: https://aaltoscicomp.github.io/python-for-scicomp/profiling/
Due to EuroCC2 regulations, we CAN NOT ACCEPT generic or private email addresses. Please use your official university or company email address for registration.
This training is for users that live and work in the European Union or a country associated with Horizon 2020. You can read more about the countries associated with Horizon2020 HERE.
For questions regarding this workshop or general questions about ENCCS training events, please contact training@enccs.se.