This file demonstrates the application of rOpenSci ’s standards for statistical software to three EDA software packages. These applications are not intended to represent or reflect evaluations or assessment of the packages, and particularly not of the extent to which they fail to meet standards. Rather, the demonstrations are intended to highlight aspects of the software which could be productively improved by adhering to the standards, and thereby more generally to demonstrate the general usefulness of these standards in advancing and improving software quality.
Prior to considering these demonstrations, it is recommended to peruse a recent article which appeared in the R Journal entitled, The Landscape of R Packages for Automated Exploratory Data Analysis. A general overview of the packages mentioned in that article is also provided in this github repository (work-in-progres).
SmartEDA
This package conveniently offers a single “master function”,
ExpReport
, which generates a stand-alone html
report containing the
output of most of the package’s functions. The current README of the
autotest
package includes an
example of this package which demonstrates a failure of this package to
appropriate process or reject rectangular input objects which have
list-columns.
Statistical Terminology
Function-level Documentation
roxygen
Supplementary Documentation
Uni-variate (Vector) Input
autotest
, but will be
confirmed there.)match.arg()
not used for single-valued character
inputs, nor is tolower()
or equivalent used to avoid sensitivity
to case.Tabular Input
Missing or Undefined Values
ExpReport()
function)
not appropriately parsed, rather simply assumed to be *.html
.Test Data Sets
Responses to Unexpected Input
Algorithm Tests
Extended tests
Index Columns
Multi-tabular input
Classes and Sub-Classes
autotest
.integer
input types not maintained, rather all values
are converted to numeric
.print
methods apply to all return objects, but
default plot
methods either fail (for ExpData
), or are not
accessible (for ExpNumStat
, ExpCTable
, and others, which rely on
plot.default
, and so simply plot a grid of ncol-by-ncol
results, often with no labels to enable interpretation).ExpNumViz
and ExpCatViz
functions, provides accessible colour
schemes, as well as allowing overrides of defaults through
additional parameters.Summary and Screen-based Output
storage.mode
General Standards for Visualization (Static and Dynamic)
ExpNumViz
includes no
scaleReturn Values
data.frame
-type tabular objectsGraphical Output
insight
Statistical Terminology
Function-level Documentation
roxygen
used to
document all functions.
Supplementary Documentation
Uni-variate (Vector) Input
component
arguments of the
find_...()
functions).match.arg()
is not usedtolower()
is not used to avoid sensitivity to
caseinteger
via
as.integer()
as.numeric()
as.character()
as.factor()
as...()
functionsfactor
type, so not
applicableTabular Input
Missing or Undefined Values
Test Data Sets
Responses to Unexpected Input
Algorithm Tests
Extended tests
numeric
, although get_random()
does return
factor
for factor
input, so standard met in that regard.format_
functions to specify such.print
and plot
methods give sensible results, and also implement
additional print_
and format_
methods.plot
or other graphical
functions.Summary and Screen-based Output
getOption("digits")
, and
so uses default print formatting for numeric
types, with no user
control possible.storage.mode
, class
, or equivalent defining attribute of each
column.General Standards for Visualization (Static and Dynamic)
naniar
Statistical Terminology
Function-level Documentation
roxygen
to document all functions.
roxygen
formatSupplementary Documentation
Uni-variate (Vector) Input
missing
parameter of add_any_miss()
function).integer
uses
as.integer()
as.numeric()
paste
or paste0
)factor
type, so not
applicable.Tabular Input
naniar
does not accept matrix
/array
input, even
though it easily could.Missing or Undefined Values
NA
. NaN
is treated exactly as NA
, and Inf
is
simply ignored.Test Data Sets
Responses to Unexpected Input
Algorithm Tests
Extended tests
Index Columns
Multi-tabular input
Classes and Sub-Classes
tibble
objects regardless of class of input.tibble
classes ensures explicit control
of numeric precisiontibble
classes ensures default print
and plot
methods give sensible results.Summary and Screen-based Output
numeric
types, rather relies on print.tibble
throughout.storage.mode
via tibble
.General Standards for Visualization (Static and Dynamic)
ggplot2
to produce sensibly rounded values.Dynamic Visualization
Return Values
Graphical Output