Try   HackMD

Lazy Dataframes Refactor

All commands listed were either commands modified to be lazy only, or eager commands that I did not change.

When commands are not changed it is because:

  • The operation is not available in a lazy form
  • The operation requires that the dataframe be collected

Type conversion

Removed all the automatic eager/lazy type conversion code. If a command is lazy only, the dataframe will still be converted to a lazy frame if it is an eager. For the remaining eager commands, lazy dataframes will be converted to eager dataframes.

append

STATUS: UNMODIFIED
This command will require collecting a lazy frame.. perhaps leave as an eager dataframe?

cast

STATUS: modified
This command can be changed to just convert a dataframe from a lazy frame and handle both as lazy

columns

STATUS: UNMODIFIED
This command requires collecting a dataframe.

drop

STATUS: modified
Re-implemented to be lazy

drop-duplicates

STATUS: modified
Re-implemented to be lazy

drop-nulls

STATUS: MODIFIED
Re-implemented to be lazy

dummies

STATUS: UNMODIFIED
This operation only works with eager dataframes

filter-with

STATUS: MODIFIED
Re-implemented to be lazy only. Filter must be applied with an expression. Using a Dataframe mask is nolonger supported.

first

STATUS: MODIFIED
re-implemented to be lazy of expression

query

STATUS: UNMODIFIED

last

STATUS: MODIFIED
re-implemented to be lazy of expression. Changed the behavior to more closely work like the nu last command by only returning 1 row when called without a number of rows.

shape

STATUS: UNMODIFIED

melt

STATUS: MODIFIED
Re-implemented to be lazy

slice

STATUS: MODIFIED
Re-implemented to be lazy

take

STATUS: UNMODIFIED
No equivalent lazy functionality

open

STATUS: MODIFIED
All file types that support the ability to create lazy frames now return lazy frames. The --lazy flag has been removed.

Returns lazy dataframe

  • parquet
  • json lines
  • ipc / arrow
  • CSV

Return eager dataframe

  • json
  • avro

unique

STATUS: MODIFIED
Changed to work with only lazy

to-lazy

STATUS: REMOVED
The command is now redundant. All operations that work with lazy frames will automatically convert the eager frame to a lazy frame.

to-<file format>

STATUS: UNMODIFIED
All of these commands require that the data be collected before writing anyway.