
<p style="text-align: center"><b><font size=5 color=blueyellow>High-Performance Data Analytics with Python - Day 3</font></b></p>
:::success
**High-Performance Data Analytics with Python — Schedule**: https://hackmd.io/@yonglei/python-hpda-2025-schedule
:::
## Schedule
| Time | Contents | Instructor(s) |
| :---------: | :------: | :-----------: |
| 09:05-10:15 | Performance boosting | YW |
| 10:15-10:30 | Break |
| 10:30-11:55 | Dask for scalable analytics | AM |
| 11:55-12:00 | Q/A & Summary | YW |
---
## Useful Links
:::warning
- [Lesson material](https://enccs.github.io/hpda-python/)
- [Setup programming environment on personal computer](https://enccs.github.io/hpda-python/setup/#local-installation)
- [Access to programming environment on LUMI](https://enccs.github.io/hpda-python/setup/#using-pyhpda-programming-environment)
:::
---
:::danger
You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.
:::
## Questions, answers and information
### 6. Performance Boosting
:::info
| | Step 0 | Step 1 | Step 2 | Step 3 | Step 4 |
| :-: | :----: | :----: | :----: | :----: | :----: |
| YL | 200 | 139 | 65.6 | 58 | 14.3 |
| MC | 132 | 106 | 53.2 | 27.2 | 0.5 |
| KS | 177 | 133 | 41.5 | 38.3 | 0.5 |
| MK | 305 | 177 | 71.1 | 59 | 0.812 |
| Em | 88 | 79 | 35 | 28.7 | 0.356 |
| IV | 532 | 320 | 144 | 129 | 40.6 |
| TW | 175 | 159 | 41.6 | 41.5 | 0.6 |
| OS | 271 | 229 | 78.6 | 63.2 | 1.2 |
| OR | 262 | 229 | 74.3 | 62.6 | 0.87 |
:::
- Do you know if numpy.typing annotations are compatible with Cython? I.e., they also result in a speedup?
- Yes, and no :smile: Cython 3.0 support standard Python annotations, but you will have to use something like:
```py
def sum3d(arr: cython.int[:, :, :]) -> cython.int:
```
- instead of `numpy.typing` annotations. See here for an example: https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html#memoryviews
- How persistant is Numba cache? Will it be cleared upon, *e.g.* closing my project? Or is it only runtime persistant or something else?
- The compiled code is stored only in memory, which is why it is called Just-In Time and not Ahead-of-Time. So always with, Numba the first execution will be nearly as slow as pure Python code.
- Found this [interesting Numba link](https://numba.pydata.org/numba-doc/dev/developer/caching.html). Do you have any experience with this? Looks like a new feature.
- Interesting. TIL! Thanks for sharing.
- The last command `%timeit apply_integrate_f_numba_dtype(df['a'].to_numpy(), df['b'].to_numpy(), df['N'].to_numpy())` gives me a TypeError:`" No matching definition for argument type(s) array(float64, 1d, C), array(float64, 1d, C), array(int32, 1d, C)".`How can I solve this? Do I need to install something to my VS Code?
- It is not installation / VS code related. Can you check `df['a'].to_numpy().shape` and `df['b'].to_numpy().shape` and so on. Also `df['a'].to_numpy().dtype` etc. Once you have that, you either need to either modify the inputs or modify the types in `@numba.jit` decorator for that function.
- it seems that when your the code, the integer is `int32` but another one is `int64`
- Can you please show how to load Cython in VSCode?
- To run Cython code in VSCode you need the Jupyter VSCode extension. Once you have that, and along with a conda environment / virtual environment containing Cython as the chosen interpreter, Cython should work. Does that answer your question?
:::info
### Break until XX:30
:::
### 7. Dask for Scalable Analytics
I am getting a "Nanny failed to start" error when I try to run a Dask cluster/client.
- a tree like figure from output. both `dask.visualize(sum_da)` and `sum_da.visualize()` will get the same figure shown below

- I can see the `4 chunks in a graph layer`

#### Exercise set 1 (until xx:30)
https://enccs.github.io/hpda-python/dask#exercise-set-1
- How to open Jupyter on LUMI with the correct environment? If I open it through the LUMI login webpage, it doesn't connect to *pyhpda* conda environment, and therefore doesn't have Dask installed.
- follow instructions [here](https://enccs.github.io/hpda-python/setup/#using-pyhpda-programming-environment) and the **Login to LUMI cluster via web-interface** to set the environmetn on LUMI.
- ==below are the details you should fill when you launch jupyter notebook on LUMI==
```
Project: project_465001310
Partition: interactive
Number of CPU cores: 2
Time: 4:00:00
Working directory: /projappl/project_465001310
Python: Custom
Path to python: /project/project_465001310/miniconda3/envs/pyhpda/bin/python
check for Enable system installed packages on venv creation
check for Enable packages under ~/.local/lib on venv start
Click the Launch button, wait for minutes until your requested session was created.
Click the Connect to Jupyter button, and then select the Python kernel Python 3 (venv) for the created Jupyter notebooks.
```
#### Exercise set 2 (until: xx:55)
https://enccs.github.io/hpda-python/dask/#exercise-set-2
---
:::info
*Always ask questions at the very bottom of this document, right **above** this.*
:::
---