SciPy 2024 IDE BoF

https://hackmd.io/@CAM-Gerlach/Scipy-2024-IDE-BoF # What would you like to see in your scientific IDE? - Context: I support a small stats group, biggest challenge is helping students learn scientific computing. Have a rugged individualist culture, need to learn how to share knowledge with others. How can we do that with IDEs? - Where I work platforms for accessing data require a lot of authentication for accessing data, given a default IDE/platform of JupyterHub. - Very little flexibility in the choice of an IDE; when using JupyterHub, JupyterLab must be used as the IDE - Web-based IDE to be used as a platform, with knowledge of a JupyterHub environment? - Thoughts about using a web-based application vs. a desktop application, even if the desktop app uses a webview as its UI? - Web browser can help me be effective for collaboration, but may not be as useful for single-user situations as a native app would be - Concerns with usability of Zoom screen sharing - vscode.dev, which runs in a browser and which can be self-hosted — Python support may not be "fully on par". Run inside container — impact on security? - WASM — run apps in the browser in a deployable container? - Xarray/pandas, extension system, accessors. These are opaque to PyCharm/etc's autocomplete without further tagging or augmentation to the IDE / code. IDEs should be better aware of extensions. - LSP servers can help make editors better aware of extensible syntaxes Using dataframes and other rich data structures instead of just CSV — can IDEs add a "database browser" instead of just a browser for one table? - Data lives in databases, not in file systems — use this as the norm - Support any database type that pandas supports, with the simplest interface to it, so it feels like a CSV file - [from Slack] Ibis covers a lot of territory for exploring tabular data in many different data storage backends. Potentially that might be a path to database exploration in IDEs Debugging — very slow stepping (30s per step) when a very large variable needs to be refetched at every step. - No facility to do "lazy data loading" in Python; everything has a wrapper - Streaming wrapper plugin that types can implement lazy loading in. Use in all IDEs and notebooks. - How to balance "latent computation" need with this desire? - Need some way to indicate to consumers that data is expensive to generate a repr Spyder feedback: - Need multi-line editing - Really like plots in separate windows "We need RStudio for Python" Generative AI in coding - Scientific coding is different. Copilots of the world may not help w/scientific aspects of coding. May break code "a lot" when used - Spyder working w/GitHub about potential integration w/Copilot - GenAI interfaces - Any that work well, for scientific programming or general programming? (silence) - Large scientific organization was very reticent to use Jupyter AI because the AI was effectively a blackbox, and they needed to be able to show where their code comes from. How much would people be intrested in an AI that generates citations? - I'd still need to go and verify the citation, so not sure how much that would save me - Most important thing is tests that verify that the code does what it is supposed to do - As to that comment, AIs tend to do much better iterating on existing code than writing brand new code. That is one thing I'd recommend people try, if you have code ask the AI to write tests for it - What I've found is that AIs tend to do much better on code that has proper variable names and literate structure - - I'm in the geospatial field and interactivity with these data is limited with IDEs, generally like notebooks but there are workflows where an IDE would be nice. In GIS like QGIS or ArcGIS, they had native support for interactive map display instead of just static matplotlib images. Would make more more rapid prototyping. - From Nvidia, getting serious about Python and CUDA. Would really love to hear people's gripes about developing GPU apps in Python, what tooling you use and what problems do you face/how do you debug them. In terms of Python and native code (C++) what are your experiances there? - In terms of native code, had to compile a Fortran module and import it from Python, just used f2py and the workflow was already in there and worked, would change the native code, recompile and rinse and repeat. Would be nice to offer a workflow to shorten that development loop. Two seperate codebases, changes in one can affect the other but difficult to conceptilize and work with the coupling between them - How would like you, having in a debugger, the ability to step into your code from Python to Fortran but still be able to inspect your variables on the Python side - Sounds awesome - Have a postdoc using a bunch of A100 GPUs, I asked him whether he was getting the most performance out of it, he replied "PyTorch runs"—it was painful enough for him just getting it running. Have new students now doing JAX, going to be in the middle of this again. Diving into hardware level optimization is something that students really won't do and can be very intimidating. Got students who don't know how to use 4 CPUs in their laptop, much less the GPUs in the supercomputer center - What do you feel is the main problem? Documentation, tooling? - Courage on the student's part to actually go down there, student clearly likes the platform and bought 40 new A100s and is fine with the platform abstracting that for him. But a real socialogical challenge to get students used to using, much less optimizing GPUs - What kind of things could IDEs do to make GPU worflows more convinient? - Perhaps better integration with NVSMI or an integrated dashboard of GPU usage to see, e.g. a particular kernel is crunching data is too big to use the full potential of the GPU, for example - Nvidia does have a profiling tool that would potentially address some of those concerns, gives visibility on GPU utilization, bottlenecks and efficiency, get in touch - Any support for that in VSCode extension? - Tools have their own UI but could explore IDE integration; tooling in Python tends to be pretty thin so users often don't go looking for tools and use the ones that are there. Profiler is really intimidating, throws so much information at you