High-Performance Data Analytics with Python - Day 3
High-Performance Data Analytics with Python — Schedule: https://hackmd.io/@yonglei/python-hpda-2025-schedule
Time | Contents | Instructor(s) |
---|---|---|
09:05-10:15 | Performance boosting | YW |
10:15-10:30 | Break | |
10:30-11:55 | Dask for scalable analytics | AM |
11:55-12:00 | Q/A & Summary | YW |
You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.
Step 0 | Step 1 | Step 2 | Step 3 | Step 4 | |
---|---|---|---|---|---|
YL | 200 | 139 | 65.6 | 58 | 14.3 |
MC | 132 | 106 | 53.2 | 27.2 | 0.5 |
KS | 177 | 133 | 41.5 | 38.3 | 0.5 |
MK | 305 | 177 | 71.1 | 59 | 0.812 |
Em | 88 | 79 | 35 | 28.7 | 0.356 |
IV | 532 | 320 | 144 | 129 | 40.6 |
TW | 175 | 159 | 41.6 | 41.5 | 0.6 |
OS | 271 | 229 | 78.6 | 63.2 | 1.2 |
OR | 262 | 229 | 74.3 | 62.6 | 0.87 |
Do you know if numpy.typing annotations are compatible with Cython? I.e., they also result in a speedup?
numpy.typing
annotations. See here for an example: https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html#memoryviewsHow persistant is Numba cache? Will it be cleared upon, e.g. closing my project? Or is it only runtime persistant or something else?
The last command %timeit apply_integrate_f_numba_dtype(df['a'].to_numpy(), df['b'].to_numpy(), df['N'].to_numpy())
gives me a TypeError:" No matching definition for argument type(s) array(float64, 1d, C), array(float64, 1d, C), array(int32, 1d, C)".
How can I solve this? Do I need to install something to my VS Code?
df['a'].to_numpy().shape
and df['b'].to_numpy().shape
and so on. Also df['a'].to_numpy().dtype
etc. Once you have that, you either need to either modify the inputs or modify the types in @numba.jit
decorator for that function.int32
but another one is int64
Can you please show how to load Cython in VSCode?
I am getting a "Nanny failed to start" error when I try to run a Dask cluster/client.
a tree like figure from output. both dask.visualize(sum_da)
and sum_da.visualize()
will get the same figure shown below
I can see the 4 chunks in a graph layer
https://enccs.github.io/hpda-python/dask#exercise-set-1
https://enccs.github.io/hpda-python/dask/#exercise-set-2
Always ask questions at the very bottom of this document, right above this.