# Subsurface DataHub for geoscientific data in Python. Two main purposes: + Unify geometric data into data objects (using numpy arrays as memory representation) that all the packages of the stack understand + Basic interactions with those data objects: + Write/Read + Categorized/Meta data + Visualization ## Requirements + The core of the package has to be **light** + I/O **has** to happen at the level of **primary structures**. Once the primary structure has been exported/imported we can keep going up the pile (i.e. elements, geological objects, etc) + **NEW 28.09.20** Even if primary structures only parse numerical data (e.g. vertex, edges, attributes) **all data levels** should be able to contain all raw data (`dicts` and even `strings`). Therefore, the difference between data levels is **not** which data they stored but which data they **parse and understand**. The rationale for this is to be able to pass along any object along while keeping the I/O in subsurface. ## Optional libraries: + I/O + segyio + welly + rasterio + ... + Visualization + Matplotlib + pyvista + Standard formats + OMF ## Data Levels: The difference between data levels is **not** which data they stored but which data they **parse and understand**. The rationale for this is to be able to pass along any object along while keeping the I/O in subsurface. **Human** \=================================/' \===============================/ ' \ \==========geo_format=========/ ' \ -> Additional context/meta information about the data \===========================/' ' \ \=======geo_object========/ ' ' \ -> Elements that represent some \=======================/ ' ' / geological concept. E.g: faults, seismic \=====================/' ' ' ' / \======element======/' ' ' ' / -> type of geometric object: PointSet, \=================/' ' ' ' / TriSurf, LineSet, Tetramesh \=primary_struct/ '' / - > Set of arrays that define a geometric object: \=============/ ' ' / e.g. *StructuredData* **UnstructuredData** \============/'' / \DF/Xarray/ ' '/ -> Label numpy.arrays \=======/'' / \array/' / -> Memory allocation \===/ / \=// ' **Computer** ## Primary Structures definitions: ### Unstructured: NumPy, Pandas Basic components: - vertex: NDArray[(Any, 3), FloatX]: XYZ point data - edges: NDArray[(Any, ...), IntX]: Combination of vertex that create different geometric elements - attributes: NDArray[(Any, ...), FloatX]: Number associated to an element Depending on the shape of `edge` the following unstructured elements can be create: - edges NDArray[(Any, 0), IntX] or NDArray[(Any, 1), IntX] -> *Point cloud*. E.g. Outcrop scan with lidar - edges NDArray[(Any, 2), IntX] -> *Lines*. E.g. Borehole - edges NDArray[(Any, 3), IntX] -> *Mesh*. E.g surface-DEM Topography - edges NDArray[(Any, 4), IntX] - -> *tetrahedron* - -> *quadrilateral (or tetragon)* UNSUPPORTED? - edges NDArray[(Any, 8), IntX] -> *Hexahedron: Unstructured grid/Prisms* ### Structured: NumPy, XArray The main distinction from unstructures is that we do not need to provide edges since that can be determined by the order of the points (vertex) and the description of the coordinate Basic components (XArray lingo): - DataSets: Number associated to an structured element - Coordinates: Define the **center** of the element Depending on the number of coordinates of the XArray - 2D: *structured surface* - defined by 1 array per axis (two axis in 2D). Usually axis are parallel to XY but technically the don't have to. Also they can be rotated - Z is function of XY. I.e could be seen as simply another attribute (DataArray) - 3D: *Structured grid*: - defined by 1 array per axis (two axis in 2D). Usually axis are perpendicular to Cartesian but technically the don't have to - Optional: Rotation - **Special case:** *Uniform grid*. It is a structured grid with all spacing constant. Defined by:: - Extent, resolution or - origin and spacing - Rotation?