or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Note that I continued part of this in new documents which are not yet online, but it should be relevant since because it lists many things (including some smaller side notes)
Prepared Document for Meeting: https://hackmd.io/B5TPPP-8QiKOFODPqmXtfA
TODO Document restructuring/new document
I will probably try to split the document up into:
It may be interesting to have some python dummy implementation…
NumPy Dtype Requirements and Approaches
This is a work in progress to try to structure my thoughts.
It may be a bit random and some of these are probably premature.
Please feel free to change/edit this!
Nomenclature
Depending on how types/instances are created, things can be confusing, so I would suggest to call:
Just to note, the names below are just working names, and they would all need an
__array_
or__numpy_
prefix of course.Related Documents
Please try to list all related documents here (in case I missed some):
ArrFunctions
implementation approach.
includes teardown/setup needs.
__array_ufunc__
NEP:what needs to happen when e.g. ufunc calls occur.
Descriptor (dtype instance) Requirements
this would not be true (but maybe it would be true, since
if the storage changes it is probably some kind of shared
storage of the dtype type and not the descriptor)
callback can be set in the "slot" level (I guess this is
what python does as well).
possible easily? (thoughts below depend on this!)
timedelta * timedelta -> unit[timedelta**2]
although datetime does not know this may be plausible.
if we allow registering arbitrary loops, it is (depends on
who we ask).
__array_ufunc__
when it comes to implementation.
unit[m]
is the casting handled on theunit level?
aside from "
result_type
" (and the less its ideal scalarrules).
np.result_type
does try to do handle them generically.
able to do it?! (also affects possible caching)
np.result_type
common type (promotion) on failure(this is what Julia does I think)
np.result_type
like common type operation is also neededconcatenate
which knows how to cast them all, say a unit which understands
ints and datetime)
or is it enough to have a way to do this for subclassing).
and find descriptor. However, after they did this, they
can simply use the normal python inner-loops.
functions for this. It needs to be possible to
inject logic here for custom dtypes!
__array_ufunc__
(just noting seems logical and pretty obvious).
on
PyArray_XDECREF
, etc.(issue requesting such a feature)
dtype.type
(so already exists).
ArrFunctions
:(current timsort hack).
(although sometimes we should likely move them to functions)
Other Possible Requirements:
dtype
parsing capability, fordtype="MyType[unit]"
.(Frankly, I am not convinced of this at all, but it can wait
in any case. Mentioned in https://github.com/numpy/numpy/wiki/Dtype-Brainstorming)
this PR:
*
np.array(b, dtype=np.blasable)
for(s, d, c, z)
*
np.array(b, dtype=np.floating)
Are these like ABCs that types register to? Is this similar to
a flexible type?
They could also be used for casting…
np.load
andnp.save
would require pickling for user dtypes,that seems very annoying.
List of (current) ArrFunction slots/attributes
__cmp__
slots)Other slots:
Flags
Possibly nice to haves?
isinstance(scalar, descriptor)
could be nice, but thatdoes not necessarily mean that
descriptor is scalar_type
?(dtype|descriptor).type
which may beeasier to think of in any case.
decimal_dtype
, in a sense specialize python types(which mostly would mean asserting the output type and
possibly allowing to expose some of its methods)
Proposals
The specific propals are outdated and need some thoughts.
__array_ufunc__
)Proposed API for subclassing numpy arrays. These are really brain storming right now:
Casting inner loops, basically seem like
"Type1->Type2"
ufuncs,so whatever API we end up using, unary ufunc calls and the final
casting call should likely look identical. In fact, they could
probably be identical.
Details about promotion
Promotion is necessary if there is no ufunc loop for the specific types.
Pushing it to the objects, may also allow to hack value based promotion
that we currently are stuck with.
We do have some "special" promotion rules currently hacked in, first
thing that comes to mind is integer addition using at least
long
precision.Should check current
TypeResolution
functions for other special cases.The question is how to split it up. We currently have
np.result_type
andit would be nice if that can just call the
__promote__
logic.This is also used elsewhere, i.e. in concatenate
If the promotion gets additional information, it could handle most
most of the things done in setup (see below), making the default setup
basically a no-op.
Other things to keep in mind:
the ufunc, it may have to return a generic string (or something like a
generic string user dtype).
dtype=IntWithUnit
but the unit stillis flexible? Note that this is likely so corner case,
I am not sure we have to worry about it.
unit * unit -> unit
, and the ufunc setup time checkdecides which unit.
loop selection logic, which is hard to represent (unless you pass
method="reduce"
to some dtype/loop related setup).
Existing Implemenations:
Julia and XND
Julia:
type1, type2 → typeX
withpromote
promote
can promote any number of arguments of courseNumber
) implementation of functions (math operators):promote(*args)
and tries againxnd-project:
loops, i.e. you register all loops that you may want to use.
(I do not think there is casting involved at all in function calls?
For xnd, the interesting part is resolving the shape part of the
datashape probably.)
Details about casting
Note that unsafe casting. We currently have:
And python coercion (can be seen as unsafe casting to/from PyObject):
item
)I think it may be possible to get around
same_kind
casting, butmaybe there is also not much reason for it.
There is also the concept of construction which we may also need for
np.array(..., dtype=dtype)
logic. Julias Reasoning although possibly it can be seen as unsafe casting?Other Notes:
This may be useful to e.g. allow unsafe casting when the existing loop
you want to reuse has a for example a more precise output dtype.
Issues/Discussion?:
MyInt(int)
could inherit casting fromint
, but often that doesprobably not make sense. However, unsafe casting may make sense.
(Probably we should just not – accidentally – allow this?)
np.array
(unsupported for user types?)Details about setup/teardown
At some point, we have to discover which inner loop to use and give
the ufunc a chance to set other information:
Discover the correct output dtype (to get there, promotion of the input
dtypes may be necessary!).
is OK, but timedelta * timedelta is not), it seems like pushing
this into promotion may be simpler (promotion would know the ufunc!?)
Return the inner loop function (in some form or another)
"f,f->f"
which together with the ufunc name defines the inner loopnp.add.loop["f,f->f"]
using some PyCapsulestyle wrapper object.
Set whether the inner loops requires the Py-API (maybe tagged onto the
inner loop object itself, e.g. python implemented ones always need anyway,
ours will never need).
Run additional setup code:
Setup a teardown function to:
Kwargs to ufuncs, foward to inner-loops?
(Example:
np.clip(arr, minval=None, maxval=None)
)arr.view(new_dtype)
is buggy if types use their own storage area:(object → object is OK, but for a type with metadata and refs
it can probably go both ways)
API
What needs to happen if we call a ufunc:
Ask dtypes if they implement a ufunc loop that should be used
and thus output dtype. Although that could still be cast in principle.
Decide if casting input to the output can be handled.
Run the inner-loop:
→ What type of access should we give these step? E.g. access values?
Currently:
ResultType
to find the output type(unless specific TypeResolver for the function)
needs_api
(otherwise the iterator will decide)Main issue: Everything is tagged onto the ufunc object, so ufunc object would have to ask the dtypes specifically.
C-API vs. Python API
Some thoughts:
Py-API:
np.from_pyfunc
except it would return somecapsule.
defined inner loops easily).
fast loops in Cython and tags them on from python during instanciation
time?
or even alignment requirements)
From the python side, it may be an alternative to simply view the array
and then call
np.add
again, OTOH that may force some consistencychecking/wrapping to make sure the python side cannot just return a wrong
dtype or shape.
C-API:
Arguments for and against making descriptors types
Pro:
float32_arr.dtype(3)
and
isinstance(float32_arr[0], float32_arr.dtype)
is nice.Against:
break.
(Although it is also not very hard probably).
decimal.Decimal
. It is notpossible to change
Decimal
to add numpy specific information, sooff-loading it into a dtype/descriptor with
npdecimal.type is Decimal
seems easier?
within numpy.)
Related Changes
Ufunc properties
identity
should this move to the loop implementation(or be overridable)? The loop implementation knows the correct output type.
(It could live on the dtype, but that seems unnecessary/strange?)
Ufunc signatures
See: https://github.com/numpy/numpy/issues/12518
The ufunc inner loops are pretty limited right now. I am (personally) not
in favor of bloating the API too much, but we may want to add some things
to it while we are at it:
may already fit into the current pointer we have. But are definitely
necessary.
Plausibly, the old ufuncs could recieve a very lightweight wrapper