Try   HackMD

#t20-gostin Brainstorming

tags: t20-gostin

:loudspeaker: gostin is a hackathon project in the awesome Transform2020 event from Swung.

TLDR;

About

The idea is simple: There are many cool geoscientific communities which do awesome stuff. But to connect between them is not always easy. We want to lower that barrier and need your help for it! The lofty long-term goal is to have a bridge, a package with a common denominator between as many tools as possible - modelling, inversion, visualization. The short term goal is to proof that we can combine them - via case studies, notebooks, you-name-it. What you get out of this hackathon depends mostly on you - but we try to be here to help and assist you, as we are really interested in what you create with our tools!

Currently involved communities and people ready to help out on Slack (with their slack names):

  • GemPy (Miguel [@Miguel de la Varga], Alex [@Alex [UoA, GemPy]])
  • PyVista (Bane [@Bane [Kitware - PyVista]])
  • SimPEG (Lindsey [@lindsey]; Dom [@Dom [simpeg]])
  • Fatiando (Leo [@leouieda])
  • empymod/emg3d (Dieter [@prisae [TU Delft]])
  • (Martin [@mtb-za]) -> Fatiando to GemPy
  • :arrow_right: PLEASE ADD YOUR PROJECT/NAME IF YOU WANT TO PARTICIPATE WITH YOUR AWESOME PROJECT!

:link: NOTE gostin has currently a focus on EM/Potential-methods. A similar attempt with seismic data happens over in #t20-segysak - check that channel if you are more of a seismic nature. In the future these projects hopefully will come together, but for now we try if we can find a common ground in these sub-groups.

The lofty final objective is a subsurface library to connect an increasing amount of geo-tools. Part of this lofty goal could be to use the inspiration of OMF (not really maintained), rebuild it, but using pydantic instead of properties (not really maintained), and also build it on xarray. Good idea is to include something like pynt (or yt project’s unyt) for units. xarray does have some unit support, but more as a means of description. Conversion will need something like unyt or pynt.

For more info check-out:

Getting started

Ideas

There is a lofty goal, to have a common package to easily move between different tools. But to find out what is really needed we need a lot of examples. Core ideas:

  • If you want to get to know one of the above projects, this is your time! Many core devs will be around to answer questions. If you can link two projects, even better, we would be thrilled to see your results!
  • If you are already familiar with some of the projects then we would find it awesome if you could move between them. Start a project in X, and move the result to Y => all your struggles will help to design this bridge.
  • If you are more into core developping and interested in getting your hands dirty in this new bridge reach out to us.
  • The idea is to get as much experience as possible on the first weekend by trying to create many examples of cross-project examples. On the second weekend we would like to try and define the path forward.
  • Sort of an I/O-library for as many tools as possible; needs parsers
  • Good point from Jesper: "Just saying, but if this thing connected to tensorflow, you have a winner."

Collection of various other ideas:

  • Probably - get it running, explain how to for newbies?
  • Create an environment.yml that works for everyone
  • Use a model from GemPy->discretize->SimPEG->PyVista
  • Reproducibility of all (many) packages
  • :arrow_right: DON'T BE SHYlist your ideas here!

OMF (Open Mining Format)

It is not dead, but not very alive at the moment either.

Some notes from @fwkoch:

  • Overall, I'd love to see more life injected into this project. Wider adoption would likely spur positive feedback of library improvements and even more adoption.
  • There are 3 aspects to the Python OMF repository
    • OMF data structures: These are the most useful thing to adopt and share - how do we, the geo/geophys community, represent data-rich points, lines, meshes, etc? This provides a common language (both verbal and programatic API) when we are talking about data.
    • OMF file serialization: this enables file-based interoperability, but the serialized format for v1 is a bit opaque on it's own. v2 is better - leverages other standard formats - but incomplete.
    • Additional features to make the library / format usable: Aside from data validation, there is very little of this. I suspect most of this should live outside the OMF itself (e.g. https://github.com/OpenGeoVis/omfvista)
  • The biggest piece of pseudo-in-progress work is OMF v2
    • Data structures: These have already been finalized at a GMG meeting. However, they are mostly an extension of v1. No major changes, mostly just additions. Details can be found here https://github.com/gmggroup/omf/wiki
    • File serialization: The approach has been decided, leveraging standard formats, but not yet implemented.
    • Additional features / helper functionality: Again, very little of this. The most substantial requirement is v1 -> v2 upgrade functionality (we cannot claim to be a nice open standard, then leave all early adopters in the dirt).
  • Regarding activity / contribution:
    • Contributions are totally welcome: https://github.com/gmggroup/omf/blob/dev/CONTRIBUTING.md
    • Currently @fwkoch is the only maintainer and will help with any contributions
    • New features cannot impact the data structures - these are agreed on by the community (well, unless they are totally backwards compatible or a proposal for v2.1)
    • Any help with v2 (reviews, new PRs related to existing v2 issues, etc) is great
    • Any other features/functionality you would find useful are also great

Work-In Progress updates and screenshots

Fatiando (verde/harmonica) to GemPy

Proof of concept notebooks work, see this pull request.

  • Fatiando to netCDF is solid, might need more metadata for GemPy to understand.
  • netCDF to GemPy is less simple, but not too arduous. Will depend in large part on the data.

Final output from netCDF to GemPy notebook:

Grand goals

Dom (simpeg)

what I would like to achieve, and I don't think we are far off is something around this

which right now can only deal with blocks or spheres, cause that's all what simpeg can do

I need to be able to leverage the discretization of volumes from surfaces of GemPy

so we can do actual forward of geology

without the cost of Gocad of what not

Bane (PyVista)

I have some thoughts on how we might “connect the many open-source geo islands”….

We could leverage VTK/PyVista as an underlying layer across all these packages. i.e., continue to build out the linking between this software and PyVista and take advantage of the growth around VTK for 3D web viz, analytics, etc. Currently, there are links between the following software and PyVista (VTK): GemPy, SimPEG (via discretize), PyGIMLi, GSTools,

We (at Kitware and the PyVista community in general) are working on expanding VTK’s presence in scientific Python and Jupyter ecosystems. I don’t think we are far from having a mainstream toolkit out there for embedded 3D rendering directly in notebooks (there are currently solutions with Panel and itkwidgets, but they are all hacky and in alpha stage IMO)…. I think there will be some exciting new software coming your way all in the VTK/PyVista ecosystem :wink:

Here’s a diagram I made a while back conceptualizing some of this, where all the “geo islands” of software are fragmented, and working across them is currently a challenge. If we can let PyVista be an underlying glue between these tools - as a layer where we bring data down from the islands and move laterally between software - I think a lot of really cool things are possible. Plus, you get to leverage all of PyVista/VTK’s 3D viz functionality.

… also, for those unfamiliar with PyVista, it is not just a visualization tool - 3D data structures and filtering/analysis tools for those data structures are its core. i.e., PyVista is built for managing and tracking spatial data (we have some work to do to improve handing 4D data, but it’s possible).

So, all in all, I think it would be cool to explore PyVista/VTK as an option for being the glue between many of the libraries. At the end of the day, PyVista may make sense for some libraries, and it may not work for others so it may also be worth exploring using software like xarray in many other places and have some sort of xarray to PyVista bridge for 3D viz/analysis

Here;s the figure I meant to share… where PyVista acts as a layer/shell connecting fragmented geo software on the fringes of the bigger scientific Python ecosystem

I’ve been meaning to put together some blog posts on this after my thesis… I suppose now is the time!

Miguel (GemPy)

I agree that we need something like what you propose but I think it would be better if the Xarray is the pivotal point.

There were some discussions about this last year at Transform 2019 and it was identify that it is missing a library that contains and converts data across all the geo ecosystem. @matt, @Alex and @jesper started prototyping https://github.com/softwareunderground/subsurface (which is still conceptual) based on Xarray.

For me there is no question that pyvista should be the default 3d render but I am not so convinced if using pyvista as the shared data structures is good idea. It would force people to install vtk even if they want to work in 2d problems. I propose 2 possibilities.

  1. Either we split pyvista data structures into its own stand alone library and we use it as a base - I think this would look very similar to discretize maybe some type of merging.
  2. We develop subsurface to be a multidimensional data base capable to contain all what we need and in each library we add functionality for reading/writing data from subsurface. The advantage of doing this is that each library only has to maintain i/o with one other library while getting access to all the other compatible libraries.

An example would look like:

subsurface (input data)-> GemPy (interpolation)-> Subsurface (scalar fields/ mesh) -> pyvista (visualization) -> simpeg (forward magnetics)

Since at the end of the day xarray/pandas are just using numpy arrays if we code it right subsurface would be only passing pointers from one library to the next so we would not add much overhead

It is great to have this conversation now, thanks @prisae for moving it! If we are able to agree in some specifications, this could speed up a lot the integration of all these awesome tools.

=> Here followed then some discussion whether or not xarray, subsurface, etc.

Andrew Annex (user)

Andrew is a user of GemPy, Pyvista, Discretize for his Ph.D. research.

Ideas:

  1. better interop of gempy/verde/etc with scikit learn model api, I want to compare various model fits without changing code
  2. better dependency management/conda integration for easier installs
  3. better integrations with GDAL/OGR world through the geopandas/rasterio stack
  4. Make GemPy play well with discretize

Meeting notes

Fr, 16:00 UTC: Pre-Hack meeting

Miguel, Alex, Lindsey, Dom, Bane, Dieter
Video: https://youtu.be/dTa1g2U1vTQ

Short intro; what does everyone expect from the hack

Also, how much time does everyone plan to invest.

  • SimPEG/discretize: Dom; Lindsey
  • PyVista: Bane
    • Reproducible workflow for their research
    • Interested in having an underlying "glue" for working across open software
    • Recently started at Kitware
    • Interesetd in positioning geoscients well for leverage emerging ML technologies
  • GemPy: Miguel, Alex
    • Does all the scientific support, but they also started recently the company. Open basically to help anyone using GemPy.
    • Input of GemPy
    • Another library, like subsurface, to connect the different tools
    • Need for I/O library
  • Fatiando: Leo (not very much time, weekend is already overbooked | I would be happy to have a shared environment and example use case)
    • Load, save, plot, plug into other things is already possible in Python
    • NumPy is already a good denominator; NumPy, Pandas, Xarray
    • Better bridge between tools.
    • Shared environment for all main packages
    • Potential point of contact between Gempy and Verde: https://github.com/fatiando/verde/issues/262
  • emg3d: Dieter

:rocket: Thanks for participating!

Plan the hack

  • Should there be a main focus (overarching goal) or many foci?
    • Two possible approaches: data driven or application (?) driven
    • Build use cases first, so we know how the underlying 'thing' could look like.
    • Bane did some previous work with the forge geothermal site that he can share (with different kind of geophysical data)
    • FORGE data is here: https://gdr.openei.org/search?sort=pub_date_desc
  • One general repo or everyone in its own repo?
    • Maybe 1 use case repo and a hackmd document for planning and discussion?
    • On SWUNG or on subsurface?
  • Any other ideas/suggestions from seasoned hackathon attendees?
    • List of communities involved
  • How shall we handle the time-zone difficulty (meetings in Europe evening)
    • How about 1 meeting at 5 or 6pm GMT to hand-off from Europe to US West Coast?
    • Need to be good at taking notes somewhere so people know what's going on. These documents seem like a good place (we can archive them to Github easily).

Sa, 08:15 UTC: Breakout meeting

Martin, Wesley, Richard, Miguel, Alexander, Dieter
No Video (forgot to record)

  • Richard: Interest in getting things to work for folks not necessarily geo (e.g. data scientist); focus on 3D viz; "Backwards and forwards examples sound like a good idea, certainly."
  • Martin: Is going to generate a grid or two in verde/harmonica and then see about getting that into GemPy. In theory the grid will be constraining the dip and depth of the causative bodies which GemPy can then create a model for. (Because harmonica is usable but not feature-complete for the approaches for doing this that I know of, this is unlikely to be at realistic of the underlying geology's geometry, but hopefully will be illustrative of where pain points might be.) Ideally, this model would then be suitable for forward modelling, but I am not sure where that would happen. I assume we would pull it out of GemPy into something else.
  • Wesley: Interested in making it easier for non-numerical geologists to use some of the awesome tools.
  • Alex: Will be working on a tutorial/guide notebook for going From Petrel seismic interpretation to probabilistic geomodel in GemPy - and probably outlining how and where something like subsurface could help for that.
  • Miguel: Will mainly work on the GemPy tutorial of next week, but support anything that need help around GemPy.
  • Dieter: First working at integratien GemPy-discretize-emg3d, make it work with TravisCI; then move to GemPy-discretize/PyVista-SimPEG.

Sa, 17:00 UTC: Catch-up meeting

Bane, Miguel, Lindsey, Andrew, Brian, Martin, Jesper, Dom, Dieter
Video: https://youtu.be/-LWA3p7Ab1Y

  • Martin got a grid (xarray) in fatiando to netcdf, trying to get it into GemPy. But GemPy expects it as dataframe (Pandas). Relevant notebooks in this PR.
    • From fatiando to netCDF is trivial, since fatiando uses xarray natively.
    • Conversion of netCDF to DataFrame is easy, pain point (such as it is) is that things like specific column headers and data decimation are needed to get GemPy to create points. See above PR for details.
  • Jesper: The idea of subsurface was a common data format; but then it stalled after Transform2019 it is based on xarray, and xarray is based on pandas with dimension. But pandas and xarray even less do not support subclassing. MetPy they do a lot of registering, which might be a good project to look at.
  • Brian: Taking an image of a sedimentarly log turning into striplog adding deviation to it to get it into 3D space to eventually get into GemPy.
  • Andrew is having a crack at getting GemPy into conda-forge.
  • A good idea might be to use the inspiration of OMF (not really maintained), rebuild it, but using pydantic instead of properties (not really maintained), and also build it on xarray. Good idea is to include something like pynt (or yt project's unyt) for units. xarray does have some unit support, but more as a means of description. Conversion will need something like unyt or pynt.

So, 17:00 UTC: Wrap-up first weekend

Martin, Lindsey, Brian, Dom, Dieter
Video: https://youtu.be/_aQSUpG4C-c

  • There was general little activity in the group today, but still some bits got done (see first screenshots above under Work-In Progress updates and screenshots)
  • Martin finished a first go from Fatiando to GemPy :rocket:, Bane started with some Docker-files for on overarching environment; Andrew got a first try for a GemPy recipe in conda-forge, and probably many more thing that I forgot after a tyring weekend. PLEASE ADD YOUR ACHIEVEMENTS!
  • OMF is not dead, but in limbo. So we could potentially use it. The question is how "open" is the OMF? Is it open as "it is on GitHub, you can use it", or is it open to accept pull requests and incorporate new ideas? Franklin can hopefully shed some light on that. See also https://github.com/gmggroup/omf/blob/dev/CONTRIBUTING.md
    • From @fwkoch: "Openness" depends on the new ideas. Fundamental, structural ideas about the format will need wider discussion (but still open to this discussion, and Github is probablly the best place for these discussions to start). Ideas to improve the library, make it more usable, faster, etc, etc, PRs will be accepted.
  • We should meet for another pre-weekend-hack meeting coming Friday. I reach out to some people and will let you know about time in advance.

Fr, 17:00 UTC: Pre-hack meeting second weekend

Particpants: Dom, Aaron, Wesley, Martin, Franklin, Miguel, Alexander Jüstel, Andrew, Alexander Schaaf, Dieter, Bane, Rowan
Video: https://youtu.be/AwISXbtKXAE

Agenda

  • Who is interested AND willing to invest time
    • Lead - who (can be several) is taking the lead (not necessarily coding wise, but organisational wise)
  • What is expected (think big [framework], but start small [few packages then grow])
    :arrow_right: Do we find a common denominator?
  • What do we have to define today, what can be left for later?
    • Storage in memory (more restrict than :arrow_down:)
    • Storage on disk (more flexible than :arrow_up: [different backends])
    • Where hosted? (github.com/softwareunderground)
    • Name?

Notes

"Doesn't matter how good it is initially, but we have to agree that we use it" (Rowan) If we try to make the best tool that takes forever yet it is not used then it is useless. So most importantly we have to agree that we want something and that we use it.

Communities committed so far:

  • GemPy (Miguel)
  • SimPEG (Dom)
  • PyVista (Bane)
  • Fatiando (Martin, Leo/Santiago?)
  • pyGIMLi (Aaron, Florian?)
  • OMF Expertise: Franklin, Rowan
  • OGC: Andrew
  • map2loop (Mark Jessell signalled that he is interested)

Everyone in the meeting is interested of course (otherwise they wouldn't have joined the meeting), but above persons voluntered to get a particular responsibility. We hope that Leo/Santiago for Fatiando and Florian for pyGIMLi are on it as well.

General

  • Repo: subsurface on github.com/softwareunderground seems to be a good fit.
  • We should then also move those discussions to the #subsurface channel on Slack

Other notes

  • Examples: OGC (Open Geospatial Consortium), OMF (Discussed above)
  • Rowan, Andrew: Want to essentially have an API approach - can change the internals later if we have to. Start with a minimal infrastructure.
  • Bane: VTK tied to rectilinear Cartesian coords. Can be difficult for other cases. Look at OMF, reimplement using xarray. PyVista can start working for visualising all of these things.
  • Challenge will be dynamic link between libraries - all libraries change data, so need to know where it can from and how it has been processed.
  • Downside of xarray - also mostly structured data. Use pandas to deal with unstructured data instead.

Other things

  • Bring in welly, wellpathpy
  • Andrew likes scikit-learn interface. Might be worth trying to use it more generally.
    • Miguel - GemPy has a really specific and strange interpolator.
    • Andrew - could be something to put in front of existing stuff
  • https://www.ogc.org/ogc/members
    From Wesley Banfield to Everyone: early project but has the sk interface https://github.com/mmaelicke/scikit-gstat
  • OMF:
    • has a focus on mining (as it says in the name);
    • 3D first approach;
    • Few people, agile
  • Richard - I agree with Andrew on geopandas/rasterio et al too

Sa, 15:20 UTC: Catch-up meeting

Particpants: Dom, Martin, Miguel, Andrew, Dieter, Bane, anyone else?
Video: Not recorded

I thought this would give a 5 min catch-up, which turned out to be a ~40
minutes chat about how subsurface should look like.