Statistical Software
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- title: ml-demos tags: statistical-software-demos, statistical-software robots: noindex, nofollow --- Machine Learning Demonstrations =============================== This file demonstrates the application of [rOpenSci](https://ropensci.org) ’s [standards for statistical software](https://ropenscilabs.github.io/statistical-software-review-book/standards.html) to one [Machine Learning software](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#machine-learning-softwaree) package. These applications are not intended to represent or reflect evaluations or assessment of the packages, and particularly not of the extent to which they fail to meet standards. Rather, the demonstrations are intended to highlight aspects of the software which could be productively improved by adhering to the standards, and thereby more generally to demonstrate the general usefulness of these standards in advancing and improving software quality. [`applicable`](https://applicable.tidymodels.org/) -------------------------------------------------- ``` r library (applicable) ``` ------------------------------------------------------------------------ [General Standards](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#general-standards-for-statistical-software) ---------------------------------------------------------------------------------------------------------------------------------------------- ### 1 Documentation - [ ] **G1.0** *Statistical Software should list at least one primary reference from published academic literature.* The package lists no primary reference, and only has itself has a citation. #### 1.1 Statistical Terminology - [x] **G1.1** *All statistical terminology is clarified and unambiguously defined.* #### 1.2 Function-level Documentation - [x] **G1.2** *Software should use [`roxygen`](https://roxygen2.r-lib.org/) to document all functions.* - [ ] **G1.2a** *All internal (non-exported) functions should also be documented in standard [`roxygen`](https://roxygen2.r-lib.org/) format.* - [x] **G1.3**-**G1.4** Not applicable, as no performance claims are made. Internal functions are not documented at all, merely given commented titles to separate them. ### 2 Input Structures #### 2.1 Uni-variate (Vector) Input - [ ] **G2.0** *Implement assertions on lengths of inputs, particularly through asserting that inputs expected to be single- or multi-valued are indeed so.* - [ ] **G2.0a** Provide explicit secondary documentation of any expectations on lengths of inputs - [x] **G2.1** *Implement assertions on types of inputs (see the initial point on nomenclature above).* - [ ] **G2.2** *Appropriately prohibit or restrict submission of multivariate input to parameters expected to be univariate.* Length controls not implemented (for example, `add_pca (..., threshold = rep (1, 2))` passes silently). - [ ] **G2.3** *For univariate character input:* - [ ] **G2.3b** *Either: use `tolower()` or equivalent to ensure input of character parameters is not case dependent; or explicitly document that parameters are strictly case-sensitive.* Parameters like `type` in `score.apd_hat_values` are matched but are case sensitive. That’s probably okay here since “numeric” is the only acceptable value anyway. - [ ] **G2.4** *Provide appropriate mechanisms to convert between different data types, potentially including:* Explicit conversion is not implemented. The following is possible: ``` r predictors <- mtcars [, -1] mod <- apd_pca (predictors, threshold = "0.5") ``` That works silently, which is okay, but then: ``` r print (mod) ``` ## [1] "Error in x$threshold * 100 : non-numeric argument to binary operator" - [x] **G2.5** No `factor` input expected, so not relevant - [x] **G2.6** Standard tabular forms accepted - [x] **G2.7** Standard tabular forms converted appropriately - [x] **G2.8** Not applicable - [x] **G2.9** List-column extraction works consistently #### 2.2 Missing or Undefined Values - [ ] **G2.10** *Statistical Software should implement appropriate checks for missing data as part of initial pre-processing prior to passing data to analytic algorithms.* ``` r predictors <- mtcars [, -1] predictors [1, 1] <- NA mod <- apd_pca (predictors) ``` ## <simpleError in svd(x, nu = 0, nv = k): infinite or missing values in 'x'> - [ ] **G2.11** *Where possible, all functions should provide options for users to specify how to handle missing (`NA`) data, with options minimally including:* Functions neither document whether or not missing data may be submitted, nor do they implement any pre-processing checks. Missing data is passed on to further routines, triggering unhelpful error messages. - [ ] **G2.12** *Functions should never assume non-missingness, and should never pass data with potential missing values to any base routines with default `na.rm = FALSE`-type parameters (such as [`mean()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html), [`sd()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sd.html) or [`cor()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html)).* Functions assume non-missingness, and pass missing values through to base routines such as `svd()`. - [ ] **G2.13** *All functions should also provide options to handle undefined values (e.g., `NaN`, `Inf` and `-Inf`), including potentially ignoring or removing such values.* No such options provided. ### 3 Output Structures - [x] **G4.0** No writing to local files implemented, so not applicable. ### 4 Testing #### 4.1 Test Data Sets - [x] **G5.0** *Where applicable or practicable, tests should use standard data sets with known properties (for example, the [NIST Standard Reference Datasets](https://www.itl.nist.gov/div898/strd/), or data sets provided by other widely-used R packages).* - [x] **G5.1** *Data sets created within, and used to test, a package should be exported (or otherwise made generally available) so that users can confirm tests and run examples.* These standards are not explicitly fulfilled, but as tests can all be implemented with relatively small data sets, they may be considered not relevant. #### 4.2 Responses to Unexpected Input - [ ] **G5.2** Some but not all error and warning behaviour is explicitly tested - [ ] **G5.3** Absence of missing or undefined data in return objects is not explicitly tested. #### 4.3 Algorithm Tests - [x] **G5.4**–**G5.5** Correctness tests are not applicable - [ ] **G5.6b** *Parameter recovery tests should be run with multiple random seeds when either data simulation or the algorithm contains a random component. (When long-running, such tests may be part of an extended, rather than regular, test suite; see G4.8-4.10, below).* Tests are only run with a single random seed - [ ] **G5.8** Algorithm performance tests are not implemented. - [ ] **G5.8** **Edge condition tests** *to test that these conditions produce expected behaviour such as clear warnings or errors when confronted with data with extreme properties including but not limited to:* - [ ] **G5.8a** *Zero-length data* is not tested - [ ] **G5.8b** *Data of unsupported types (e.g., character or complex numbers in for functions designed only for numeric data)* is not tested - [ ] **G5.8c** *Data with all-`NA` fields or columns or all identical fields or columns* Processing of missing data is note tested - [ ] **G5.8d** *Data outside the scope of the algorithm (for example, data with more fields (columns) than observations (rows) for some regression algorithms)* are not tested - [ ] **G5.9** **Noise susceptibility tests** *Packages should test for expected stochastic behaviour, such as through the following conditions:* - [ ] **G5.9a** *Adding trivial noise (for example, at the scale of `.Machine$double.eps`) to data does not meaningfully change results* is not tested - [ ] **G5.9b** *Running under different random seeds or initial conditions does not meaningfully change results* is not tested ------------------------------------------------------------------------ [Machine Learning Standards](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#machine-learning-software) -------------------------------------------------------------------------------------------------------------------------------------- ### 5 Input Data Specification - [ ] **ML1.0** Documentation should make a clear conceptual distinction between training and test data (even where such may ultimately be confounded as described above.) Documentation refers frequently to “training data”, yet without any clear interpretation of this phrase. - [x] **ML1.1**–**ML1.** Design decisions distinguishing “training” from “test” data clarified and justified - [x] **ML1.5** Default `print` methods summarise contents of training data sets. #### 5.1 Missing Values - [ ] **ML1.6** ML software which does not admit missing values, and which expects no missing values, should implement explicit pre-processing routines to identify whether data has any missing values, and should generally error appropriately and informatively when passed data with missing values. In addition, ML software which does not admit missing values should: No such processing is implemented - [ ] **ML1.6a** Explain why missing values are not admitted. No explanation is given of whether or not missing values are admitted. - [ ] **ML1.6b** Provide explicit examples (in function documentation, vignettes, or both) for how missing values may be imputed, rather than simply discarded. No such examples are provided. - [x] **ML1.7**–**ML1.8** Missing values not admitted, so not applicable. ### 6 Pre-processing - [x] **ML2.0** Pre-processing steps are appropriately defined and parametrized. - [x] **ML2.1** Broadcasting not used to reconcile dimensionally incommensurate input data, so not applicable. - [ ] **ML2.2b** Any extended documentation (such as vignettes) which demonstrates the use of explicit values for numeric transformations should explicitly describe why particular values are used. The vignette for continuous data utilizes several `recipes` steps for transforming, “variables to be distributed as Gaussian-like as possible,” and normalizing, “numeric data to have a mean of zero and standard deviation of one,” yet no explanation is given for why this is necessary, nor for why these values are used. - [ ] **ML2.3** No data on numeric transformations recorded in return objects. - [x] **ML2.4** There are no default values defining numeric transformations, so not applicable. - [x] **ML2.5** Transformations must be explicitly defined, so not applicable. - [x] **ML2.6** There are no distinct functions for implementing transformations, so not applicable. - [ ] **ML2.7** Explicit transformations are documented, but not how these may be reversed, even though this could be documented. ### 7 Model and Algorithm Specification - [x] **ML3.1** Model specification is implemented as a distinct stage - [x] **ML3.2** Models can not be specified without directly fitting, but nor is this meaningful in the context of this pacakge. - [ ] **ML3.3** Where ML software implements its own distinct classes of model objects, the properties and behaviours of those specific classes of objects should be explicitly compared with objects produced by other ML software. In particular, where possible, ML software should provide extended documentation (as vignettes or equivalent) comparing model objects with those from other ML software, noting both unique abilities and restrictions of any implemented classes. No comparisons are made with equivalent methods from other software, even though this could readily be done. - [x] **ML3.4** No training rates used or implemented, so not applicable. #### 7.1 Control Parameters - [x] **ML3.5** No control parameters implemented, so not applicable. - [ ] **ML3.6** Unless explicitly justified otherwise (for example because ML software under consideration is an implementation of one specific algorithm), ML software should: - [ ] **ML3.6a** Implement or otherwise permit usage of multiple ways of exploring search space - [ ] **ML3.6b** Implement or otherwise permit usage of multiple loss functions. There is no ability to use alternative ways of exploring search space, nor of multiple loss functions (or equivalent). #### 7.2 CPU and GPU processing - [x] **ML3.7** There is no C++ code, so not applicable. ### 8 Model Training - [ ] **ML4.0** *ML software should generally implement a unified single-function interface to model training, able to receive as input a model specified according to all preceding standards.* The package exports several distinct functions for model training, both leaving it up to the user to select an appropriate one, and suggesting a design decision likely to expand functions through adding new functions for each new mode of training. - [x] **ML4.1**–**ML4.2** No optimizer implemented directly, so not applicable. #### 8.1 Batch Processing - [x] **ML4.3**–**ML4.6** There is no explicitly ability to implement batch processing, so not applicable. #### 8.2 Re-sampling - [x] **ML4.7**–**ML4.8** There are no explicit re-sampling routines, so not applicable. ### 9 Model Output and Performance #### 9.1 Model Output - [ ] **ML5.0** No single function defined via **ML4.0**, so a variety of return objects are implemented rather than a single, unified object. - [ ] **ML5.1** … the properties and behaviours of trained models produced by ML software should be explicitly compared with equivalent objects produced by other ML software. No such comparison is made. - [ ] **ML5.2** The structure and functionality of objects representing trained ML models should be thoroughly documented. In particular, - [ ] **ML5.2a** Either all functionality extending from the class of model object should be explicitly documented, or a method for listing or otherwise accessing all associated functionality explicitly documented and demonstrated in example code. No such documentation is provided. - [ ] **ML5.2b** Documentation should include examples of how to save and re-load trained model objects for their re-use in accordance with **ML3.1**, above. Such documentation is not provided, even though it could be. #### 9.2 Model Performance - [x] **ML5.3** Model performance is assessed via distinct functions. - [ ] **ML5.4** Model performance should be able to be assessed according to a variety of metrics. The `score` function is effectively hard-coded and unable to permit usage of alternative scoring metrics. - [ ] **ML5.4b** It should be possible to submit custom metrics to a model assessment function, and the ability to do so should be clearly documented including through example code. It is not possible to submit custom scoring metrics. ### 10 Documentation - [x] **ML6.0** Descriptions clearly distinguish training and testing stages and associated data sets. - [ ] **ML6.1** ML software intentionally designed to address only a restricted subset of the workflow described here should clearly document how it can be embedded within a typical *full* ML workflow in the sense considered here. No demonstration is provided for how the workflow enabled by this package can be embedded within a more complete ML workflow, even though such documentation could readily be provided. - [ ] **ML6.1** Such demonstrations should include and contrast embedding within a full workflow using at least two other packages to implement that workflow. Also not done. ### 11 Testing #### 11.1 Input Data - [x] **ML7.0** Input data need not be labelled, so not applicable. - [ ] **ML7.1** Tests should demonstrate effects of different numeric scaling of input data (see **ML2.2**). No such tests implemented. - [x] **ML7.2** Missing data are not imputed, so not applicable. #### 11.2 Model Classes - [ ] **ML7.3a** These tests should explicitly identify restrictions on the functionality of model objects in comparison with those of other packages. No tests implemented to demonstrate restrictions on classes of objects generated by this package. - [ ] **ML7.3b** These tests should explicitly identify functional advantages and unique abilities of the model objects in comparison with those of other packages. No such tests are present. #### 11.3 Model Training - [x] **ML7.4**–**ML7.5** Training rates not explicitly considered, so not applicable. - [x] **ML7.6** Training epochs not explicitly considered, so not applicable. - [ ] **ML7.7** ML software should explicitly test different optimization algorithms, even where software is intended to implement one specific algorithm. Different algorithms not tested, and they could be. - [ ] **ML7.8** ML software should explicitly test different loss functions, even where software is intended to implement one specific measure of loss. Different loss functions not tested, and they could be. - [ ] **ML7.9** Tests should explicitly compare all possible combinations in categorical differences in model architecture, such as different model architectures with same optimization algorithms, same model architectures with different optimization algorithms, and differences in both. Not implemented - [x] **ML7.10** There is no information on paths taken by optimizers, so this can not be tested and is not applicable. #### 11.4 Model Performance - [ ] **ML7.11** All performance metrics available for a given class of trained model should be thoroughly tested and compared. Performance metrics are neither tested nor compared, and they could readily be.

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully