![](https://media.enccs.se/2024/11/python-hpda-25.webp) <p style="text-align: center"><b><font size=5 color=blueyellow>High-Performance Data Analytics with Python - Day 3</font></b></p> :::success **High-Performance Data Analytics with Python — Schedule**: https://hackmd.io/@yonglei/python-hpda-2025-schedule ::: ## Schedule | Time | Contents | Instructor(s) | | :---------: | :------: | :-----------: | | 09:05-10:15 | Performance boosting | YW | | 10:15-10:30 | Break | | 10:30-11:55 | Dask for scalable analytics | AM | | 11:55-12:00 | Q/A & Summary | YW | --- ## Useful Links :::warning - [Lesson material](https://enccs.github.io/hpda-python/) - [Setup programming environment on personal computer](https://enccs.github.io/hpda-python/setup/#local-installation) - [Access to programming environment on LUMI](https://enccs.github.io/hpda-python/setup/#using-pyhpda-programming-environment) ::: --- :::danger You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such. ::: ## Questions, answers and information ### 6. Performance Boosting :::info | | Step 0 | Step 1 | Step 2 | Step 3 | Step 4 | | :-: | :----: | :----: | :----: | :----: | :----: | | YL | 200 | 139 | 65.6 | 58 | 14.3 | | MC | 132 | 106 | 53.2 | 27.2 | 0.5 | | KS | 177 | 133 | 41.5 | 38.3 | 0.5 | | MK | 305 | 177 | 71.1 | 59 | 0.812 | | Em | 88 | 79 | 35 | 28.7 | 0.356 | | IV | 532 | 320 | 144 | 129 | 40.6 | | TW | 175 | 159 | 41.6 | 41.5 | 0.6 | | OS | 271 | 229 | 78.6 | 63.2 | 1.2 | | OR | 262 | 229 | 74.3 | 62.6 | 0.87 | ::: - Do you know if numpy.typing annotations are compatible with Cython? I.e., they also result in a speedup? - Yes, and no :smile: Cython 3.0 support standard Python annotations, but you will have to use something like: ```py def sum3d(arr: cython.int[:, :, :]) -> cython.int: ``` - instead of `numpy.typing` annotations. See here for an example: https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html#memoryviews - How persistant is Numba cache? Will it be cleared upon, *e.g.* closing my project? Or is it only runtime persistant or something else? - The compiled code is stored only in memory, which is why it is called Just-In Time and not Ahead-of-Time. So always with, Numba the first execution will be nearly as slow as pure Python code. - Found this [interesting Numba link](https://numba.pydata.org/numba-doc/dev/developer/caching.html). Do you have any experience with this? Looks like a new feature. - Interesting. TIL! Thanks for sharing. - The last command `%timeit apply_integrate_f_numba_dtype(df['a'].to_numpy(), df['b'].to_numpy(), df['N'].to_numpy())` gives me a TypeError:`" No matching definition for argument type(s) array(float64, 1d, C), array(float64, 1d, C), array(int32, 1d, C)".`How can I solve this? Do I need to install something to my VS Code? - It is not installation / VS code related. Can you check `df['a'].to_numpy().shape` and `df['b'].to_numpy().shape` and so on. Also `df['a'].to_numpy().dtype` etc. Once you have that, you either need to either modify the inputs or modify the types in `@numba.jit` decorator for that function. - it seems that when your the code, the integer is `int32` but another one is `int64` - Can you please show how to load Cython in VSCode? - To run Cython code in VSCode you need the Jupyter VSCode extension. Once you have that, and along with a conda environment / virtual environment containing Cython as the chosen interpreter, Cython should work. Does that answer your question? :::info ### Break until XX:30 ::: ### 7. Dask for Scalable Analytics I am getting a "Nanny failed to start" error when I try to run a Dask cluster/client. - a tree like figure from output. both `dask.visualize(sum_da)` and `sum_da.visualize()` will get the same figure shown below ![20250123105850](https://hackmd.io/_uploads/BJQlm9y_yl.png) - I can see the `4 chunks in a graph layer` ![20250123105624](https://hackmd.io/_uploads/r1ypQ5ku1g.png) #### Exercise set 1 (until xx:30) https://enccs.github.io/hpda-python/dask#exercise-set-1 - How to open Jupyter on LUMI with the correct environment? If I open it through the LUMI login webpage, it doesn't connect to *pyhpda* conda environment, and therefore doesn't have Dask installed. - follow instructions [here](https://enccs.github.io/hpda-python/setup/#using-pyhpda-programming-environment) and the **Login to LUMI cluster via web-interface** to set the environmetn on LUMI. - ==below are the details you should fill when you launch jupyter notebook on LUMI== ``` Project: project_465001310 Partition: interactive Number of CPU cores: 2 Time: 4:00:00 Working directory: /projappl/project_465001310 Python: Custom Path to python: /project/project_465001310/miniconda3/envs/pyhpda/bin/python check for Enable system installed packages on venv creation check for Enable packages under ~/.local/lib on venv start Click the Launch button, wait for minutes until your requested session was created. Click the Connect to Jupyter button, and then select the Python kernel Python 3 (venv) for the created Jupyter notebooks. ``` #### Exercise set 2 (until: xx:55) https://enccs.github.io/hpda-python/dask/#exercise-set-2 --- :::info *Always ask questions at the very bottom of this document, right **above** this.* ::: ---