Joris Van den Bossche

@jorisvandenbossche

Joined on Mar 7, 2019

  • Patrick and I went for an ice cream, and two hours later ... a potential proposal: User visible end state: make all data types (whether based on numpy or arrow) use NA as (visible) missing value sentinel with NA/nullable semantics (i.e. propagation of NA in comparisons and kleene logic in boolean operations) Implementation steps: Make ExtensionArrays fully support 2D (and make everything use ExtensionArray/Dtype, also the numpy based ones) Change the pd.NA scalar to be less annoying (probably mostly bool(pd.NA) not raising?) Fix conversion to numpy to not use object dtype Only use NA for the masked (numpy-based) Floating dtype (so don't allow NaN to be present, and thus no need to distinguish both, nan could be present, but would be hidden by the mask)This makes conversion numpy <-> pandas clearer (numpy only has NaN, pandas only has NA, so the conversion is on input/output is unambiguous)
     Like  Bookmark
  • The Geometry classes Currently, Shapely provides a set of classes (Point, Linestring, Polygon, MultiPoint, etc), one for each of the basic geometry types. PyGEOS on the other hand has only a single Geometry extension type. So we need to be either fine with going with such a single class and provide some functionality to smooth the transition, or find workarounds to still have multiple classes. First, some reasons that we might like those different classes: Different interfaces: currently, each of the classes can have it own set of methods. For example, Point has x, y and z attributes, while Polygon has exterior and interiors attributes (and some others). But in the end, most of the methods they have in common, and it are only a few exceptions. Those can also raise an error if they are not applicable for the geometry type in question.
     Like  Bookmark