String dtype roadblocks

# String dtype roadblocks These issues are ordered from least tractable to most tractable. * ### No API to control iteration/indexing * I think we'll need some way to override numpy's iteration machinery, a little fuzzy on how this currently works in numpy and if it's a real problem. * Not an issue for ufuncs or casting because we control the inner loop fully and will have access to the needed offset information during the ufunc loop. * Could be an issue for e.g. data selection. There's no API to control which elements are selected when someone does `arr[::3]`. For fixed-width types this isn't a problem but for variable-width types Numpy can't just jump a fixed number of bytes ahead to account for strides. * ### The signature for the dtype's getitem and setitem methods need to be updated. * Currently get a reference to the array's dtype, a reference to the object we need to do setitem for, and a pointer to the array data: ```C static int asciidtype_setitem( ASCIIDTypeObject *descr, PyObject *obj, char *dataptr) ``` * The pointer will in general be in the middle of the array, for variable-width dtypes there's no way to know the size of the array element dataptr is pointing to. * We also don't have control over how dataptr is selected. Does the machinery in numpy assume fixed width dtypes? * Assuming Numpy knows how to determine `dataptr`, then if setitem and getitem received the `index` for the item, that would be sufficient for our purposes. * ### Mutating array elements might require allocating a new storage for the array data. * If someone does a `__setitem__` call from Python using a string bigger than the item already stored there, we'll need to reallocate the entire array. * How do we communicate that to Numpy? * A single memory buffer to store the array data is kind of a bad fit for a mutable variable length string container, but the assumption that data in numpy arrays are stored in a single heap memory buffer is baked pretty deeply into Numpy. * ### No mechanism for per-array data storage. * The current plan is to store an `offsets` array we can use to index the locations in the array data buffer where array elements start. There currently isn't a facility for per-array data storage. * It might be possible to store the `offsets` on the dtype itself, but we need to be very careful to make sure a new dtype instance is created every time a new view is created. I don't know if this will turn out to be a leaky abstraction. * ### Identify unicode string library we can depend on * Need compatibly licensed unicode string library. * Possible options: * GNU Libiconv (LGPL) * Has conversions between the unicode flavors numpy and python uses with UTF-8. * Only useful to convert encodings, does not give you e.g. len(), uppercase() or other possibly needed functionality. * libicu (ICU license, BSD-like, need to ship license) * Very full-featured, also heavyweight and can't be vendored so would need an additional C library dependency. * Python's unicode API * Requires the GIL in ufunc and casting loops. * UTF8-CPP (boost license, BSD-like) * Small, lightweight, support for encoding/decoding between UTF-16/32 and UTF-8. * C++, not C * UTF8proc (MIT) * Small, lightweight, can just vendor it. * ### Hooking into np.char / string ufuncs * np.char defers to the scalar to implement functionality. * I think we can get `np.char` working just by making the scalar we define for the string dtype subclass `str`. * Will require some code changes in Numpy to support

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.