or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Machine Learning Demonstrations
This file demonstrates the application of rOpenSci ’s standards for statistical software to one Machine Learning software package. These applications are not intended to represent or reflect evaluations or assessment of the packages, and particularly not of the extent to which they fail to meet standards. Rather, the demonstrations are intended to highlight aspects of the software which could be productively improved by adhering to the standards, and thereby more generally to demonstrate the general usefulness of these standards in advancing and improving software quality.
applicable
General Standards
1 Documentation
The package lists no primary reference, and only has itself has a citation.
1.1 Statistical Terminology
1.2 Function-level Documentation
roxygen
to document all functions.roxygen
format.Internal functions are not documented at all, merely given commented titles to separate them.
2 Input Structures
2.1 Uni-variate (Vector) Input
Length controls not implemented (for example,
add_pca (..., threshold = rep (1, 2))
passes silently).tolower()
or equivalent to ensure input of character parameters is not case dependent; or explicitly document that parameters are strictly case-sensitive.Parameters like
type
inscore.apd_hat_values
are matched but are case sensitive. That’s probably okay here since “numeric” is the only acceptable value anyway.Explicit conversion is not implemented. The following is possible:
That works silently, which is okay, but then:
factor
input expected, so not relevant2.2 Missing or Undefined Values
NA
) data, with options minimally including:Functions neither document whether or not missing data may be submitted, nor do they implement any pre-processing checks. Missing data is passed on to further routines, triggering unhelpful error messages.
na.rm = FALSE
-type parameters (such asmean()
,sd()
orcor()
).Functions assume non-missingness, and pass missing values through to base routines such as
svd()
.NaN
,Inf
and-Inf
), including potentially ignoring or removing such values.No such options provided.
3 Output Structures
4 Testing
4.1 Test Data Sets
These standards are not explicitly fulfilled, but as tests can all be implemented with relatively small data sets, they may be considered not relevant.
4.2 Responses to Unexpected Input
4.3 Algorithm Tests
Tests are only run with a single random seed
is not tested
is not tested
NA
fields or columns or all identical fields or columnsProcessing of missing data is note tested
are not tested
.Machine$double.eps
) to data does not meaningfully change resultsis not tested
is not tested
Machine Learning Standards
5 Input Data Specification
Documentation refers frequently to “training data”, yet without any clear interpretation of this phrase.
print
methods summarise contents of training data sets.5.1 Missing Values
No such processing is implemented
No explanation is given of whether or not missing values are admitted.
No such examples are provided.
6 Pre-processing
The vignette for continuous data utilizes several
recipes
steps for transforming, “variables to be distributed as Gaussian-like as possible,” and normalizing, “numeric data to have a mean of zero and standard deviation of one,” yet no explanation is given for why this is necessary, nor for why these values are used.7 Model and Algorithm Specification
No comparisons are made with equivalent methods from other software, even though this could readily be done.
7.1 Control Parameters
There is no ability to use alternative ways of exploring search space, nor of multiple loss functions (or equivalent).
7.2 CPU and GPU processing
8 Model Training
The package exports several distinct functions for model training, both leaving it up to the user to select an appropriate one, and suggesting a design decision likely to expand functions through adding new functions for each new mode of training.
8.1 Batch Processing
8.2 Re-sampling
9 Model Output and Performance
9.1 Model Output
No such comparison is made.
No such documentation is provided.
Such documentation is not provided, even though it could be.
9.2 Model Performance
The
score
function is effectively hard-coded and unable to permit usage of alternative scoring metrics.It is not possible to submit custom scoring metrics.
10 Documentation
No demonstration is provided for how the workflow enabled by this package can be embedded within a more complete ML workflow, even though such documentation could readily be provided.
Also not done.
11 Testing
11.1 Input Data
No such tests implemented.
11.2 Model Classes
No tests implemented to demonstrate restrictions on classes of objects generated by this package.
No such tests are present.
11.3 Model Training
Different algorithms not tested, and they could be.
Different loss functions not tested, and they could be.
Not implemented
11.4 Model Performance
Performance metrics are neither tested nor compared, and they could readily be.