The proposal is to have an option mode.no_default_index
, in which:
In [3]: with pd.option_context('mode.no_default_index', True):
...: df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
...:
In [4]: df
Out[4]:
a b
1 4
2 5
3 6
The Index can be a source of confusion and frustration for pandas users. For example, let's consider the inputs
In [37]: ser1 = df.groupby('sender')['amount'].sum()
In [38]: ser2 = df.groupby('receiver')['amount'].sum()
In [39]: ser1
Out[39]:
sender
1 10
2 15
3 20
5 25
Name: amount, dtype: int64
In [40]: ser2
Out[40]:
receiver
1 10
2 15
3 20
4 25
Name: amount, dtype: int64
. Then:
it can be unexpected that summing Series
with the same length (but different indices) produces NaN
s in the result ( https://stackoverflow.com/q/66094702/4451315):
In [41]: ser1 + ser2
Out[41]:
1 20.0
2 30.0
3 40.0
4 NaN
5 NaN
Name: amount, dtype: float64
concatenation, even with ignore_index=True
, still aligns on the index (https://github.com/pandas-dev/pandas/issues/25349):
In [42]: pd.concat([ser1, ser2], axis=1, ignore_index=True)
Out[42]:
0 1
1 10.0 10.0
2 15.0 15.0
3 20.0 20.0
5 25.0 NaN
4 NaN 25.0
it can be frustrating to have to repeatedly call .reset_index()
(https://twitter.com/chowthedog/status/1559946277315641345):
In [45]: df.value_counts(['sender', 'receiver']).reset_index().rename(columns={0: 'count'})
Out[45]:
sender receiver count
0 1 1 1
1 2 2 1
2 3 3 1
3 5 4 1
With this mode enabled, two major changes would happen:
as_index
option in groupby
, and allowing for value_counts
to not set an index.With this option enabled, users who don't want to worry about indices could safely ignore them.
A DataFrame without an index would have an index which would behave like a RangeIndex, except for the following differences:
name
could only be None
;start
could only be 0
, step
1
;Index
should still be NoIndex
;NoIndex
;NoIndex
(so transpose
would need some adjustments);insert
and delete
should raise. In particular, .drop
with axis=0
would aways raise;Some pandas methods create an Index by default. This can sometimes be opted out of (e.g. with as_index=False
in .groupby
), but other times there is no choice but to call reset_index
after the operation (e.g. with .pivot_table
and .value_counts
).
A couple of solutions come to mind:
as_index
options to these methods, whose default could be False
under this option;The second would keep API size down, whilst the first one would give the most flexibility to users. I'd be more inclined towards the former.
It should be fine to do df.reset_index().set_index('index')
, no need to add a new method.
Seaborn makes extensive use of label-based indexing, and so NoIndex DataFrames would break it:
In [1]: df = pd.DataFrame({'a': [1, 1, 2], 'b': [1, 3, 4]})
In [2]: import seaborn as sns
In [3]: sns.lineplot(df)
NotImplementedError: Can't reindex a DataFrame without an index. First, give it an index.
Even if df
had an Index, seaborn.lineplot
would still error because internally it creates new DataFrames (which now wouldn't have an index) and then it would call things that wouldn't work on them, such as data.loc[[]]
.
This would need some working out.
.index
be None, rather than a NoIndex?.index
methods are quite common to call, e.g.
in
ser = Series([1,2,3])
breakpoint()
ser.loc[ser>1]
In pandas 2.x.0, introduce the mode.no_default_index
option. It's unlikely that this could ever be made the default, but it could be made the default in a separate namespace (which would try to be compliant with the DataFrame standards API).
pandas issue: https://github.com/pandas-dev/pandas/issues/48880
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing