HackMD
  • Prime
    Prime  Full-text search on all paid plans
    Search anywhere and reach everything in a Workspace with Prime plan.
    Got it
    • Prime  Full-text search on all paid plans
      Prime  Full-text search on all paid plans
      Search anywhere and reach everything in a Workspace with Prime plan.
      Got it
      • Options
      • Versions and GitHub Sync
      • Transfer ownership
      • Delete this note
      • Template
      • Insert from template
      • Export
      • Dropbox
      • Google Drive
      • Gist
      • Import
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
      • Download
      • Markdown
      • HTML
      • Raw HTML
      • Sharing Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • More (Comment, Invitee)
      • Publishing
        Everyone on the web can find and read all notes of this public team.
        After the note is published, everyone on the web can find and read this note.
        See all published notes on profile page.
      • Commenting Enable
        Disabled Forbidden Owners Signed-in users Everyone
      • Permission
        • Forbidden
        • Owners
        • Signed-in users
        • Everyone
      • Invitee
      • No invitee
    Menu Sharing Help
    Menu
    Options
    Versions and GitHub Sync Transfer ownership Delete this note
    Export
    Dropbox Google Drive Gist
    Import
    Dropbox Google Drive Gist Clipboard
    Download
    Markdown HTML Raw HTML
    Back
    Sharing
    Sharing Link copied
    /edit
    View mode
    • Edit mode
    • View mode
    • Book mode
    • Slide mode
    Edit mode View mode Book mode Slide mode
    Note Permission
    Read
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    Write
    Owners
    • Owners
    • Signed-in users
    • Everyone
    Owners Signed-in users Everyone
    More (Comment, Invitee)
    Publishing
    Everyone on the web can find and read all notes of this public team.
    After the note is published, everyone on the web can find and read this note.
    See all published notes on profile page.
    More (Comment, Invitee)
    Commenting Enable
    Disabled Forbidden Owners Signed-in users Everyone
    Permission
    Owners
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Invitee
    No invitee
       owned this note    owned this note      
    Published Linked with GitHub
    Like BookmarkBookmarked
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: ml-demos tags: statistical-software-demos, statistical-software robots: noindex, nofollow --- Machine Learning Demonstrations =============================== This file demonstrates the application of [rOpenSci](https://ropensci.org) ’s [standards for statistical software](https://ropenscilabs.github.io/statistical-software-review-book/standards.html) to one [Machine Learning software](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#machine-learning-softwaree) package. These applications are not intended to represent or reflect evaluations or assessment of the packages, and particularly not of the extent to which they fail to meet standards. Rather, the demonstrations are intended to highlight aspects of the software which could be productively improved by adhering to the standards, and thereby more generally to demonstrate the general usefulness of these standards in advancing and improving software quality. [`applicable`](https://applicable.tidymodels.org/) -------------------------------------------------- ``` r library (applicable) ``` ------------------------------------------------------------------------ [General Standards](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#general-standards-for-statistical-software) ---------------------------------------------------------------------------------------------------------------------------------------------- ### 1 Documentation - [ ] **G1.0** *Statistical Software should list at least one primary reference from published academic literature.* The package lists no primary reference, and only has itself has a citation. #### 1.1 Statistical Terminology - [x] **G1.1** *All statistical terminology is clarified and unambiguously defined.* #### 1.2 Function-level Documentation - [x] **G1.2** *Software should use [`roxygen`](https://roxygen2.r-lib.org/) to document all functions.* - [ ] **G1.2a** *All internal (non-exported) functions should also be documented in standard [`roxygen`](https://roxygen2.r-lib.org/) format.* - [x] **G1.3**-**G1.4** Not applicable, as no performance claims are made. Internal functions are not documented at all, merely given commented titles to separate them. ### 2 Input Structures #### 2.1 Uni-variate (Vector) Input - [ ] **G2.0** *Implement assertions on lengths of inputs, particularly through asserting that inputs expected to be single- or multi-valued are indeed so.* - [ ] **G2.0a** Provide explicit secondary documentation of any expectations on lengths of inputs - [x] **G2.1** *Implement assertions on types of inputs (see the initial point on nomenclature above).* - [ ] **G2.2** *Appropriately prohibit or restrict submission of multivariate input to parameters expected to be univariate.* Length controls not implemented (for example, `add_pca (..., threshold = rep (1, 2))` passes silently). - [ ] **G2.3** *For univariate character input:* - [ ] **G2.3b** *Either: use `tolower()` or equivalent to ensure input of character parameters is not case dependent; or explicitly document that parameters are strictly case-sensitive.* Parameters like `type` in `score.apd_hat_values` are matched but are case sensitive. That’s probably okay here since “numeric” is the only acceptable value anyway. - [ ] **G2.4** *Provide appropriate mechanisms to convert between different data types, potentially including:* Explicit conversion is not implemented. The following is possible: ``` r predictors <- mtcars [, -1] mod <- apd_pca (predictors, threshold = "0.5") ``` That works silently, which is okay, but then: ``` r print (mod) ``` ## [1] "Error in x$threshold * 100 : non-numeric argument to binary operator" - [x] **G2.5** No `factor` input expected, so not relevant - [x] **G2.6** Standard tabular forms accepted - [x] **G2.7** Standard tabular forms converted appropriately - [x] **G2.8** Not applicable - [x] **G2.9** List-column extraction works consistently #### 2.2 Missing or Undefined Values - [ ] **G2.10** *Statistical Software should implement appropriate checks for missing data as part of initial pre-processing prior to passing data to analytic algorithms.* ``` r predictors <- mtcars [, -1] predictors [1, 1] <- NA mod <- apd_pca (predictors) ``` ## <simpleError in svd(x, nu = 0, nv = k): infinite or missing values in 'x'> - [ ] **G2.11** *Where possible, all functions should provide options for users to specify how to handle missing (`NA`) data, with options minimally including:* Functions neither document whether or not missing data may be submitted, nor do they implement any pre-processing checks. Missing data is passed on to further routines, triggering unhelpful error messages. - [ ] **G2.12** *Functions should never assume non-missingness, and should never pass data with potential missing values to any base routines with default `na.rm = FALSE`-type parameters (such as [`mean()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html), [`sd()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sd.html) or [`cor()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html)).* Functions assume non-missingness, and pass missing values through to base routines such as `svd()`. - [ ] **G2.13** *All functions should also provide options to handle undefined values (e.g., `NaN`, `Inf` and `-Inf`), including potentially ignoring or removing such values.* No such options provided. ### 3 Output Structures - [x] **G4.0** No writing to local files implemented, so not applicable. ### 4 Testing #### 4.1 Test Data Sets - [x] **G5.0** *Where applicable or practicable, tests should use standard data sets with known properties (for example, the [NIST Standard Reference Datasets](https://www.itl.nist.gov/div898/strd/), or data sets provided by other widely-used R packages).* - [x] **G5.1** *Data sets created within, and used to test, a package should be exported (or otherwise made generally available) so that users can confirm tests and run examples.* These standards are not explicitly fulfilled, but as tests can all be implemented with relatively small data sets, they may be considered not relevant. #### 4.2 Responses to Unexpected Input - [ ] **G5.2** Some but not all error and warning behaviour is explicitly tested - [ ] **G5.3** Absence of missing or undefined data in return objects is not explicitly tested. #### 4.3 Algorithm Tests - [x] **G5.4**–**G5.5** Correctness tests are not applicable - [ ] **G5.6b** *Parameter recovery tests should be run with multiple random seeds when either data simulation or the algorithm contains a random component. (When long-running, such tests may be part of an extended, rather than regular, test suite; see G4.8-4.10, below).* Tests are only run with a single random seed - [ ] **G5.8** Algorithm performance tests are not implemented. - [ ] **G5.8** **Edge condition tests** *to test that these conditions produce expected behaviour such as clear warnings or errors when confronted with data with extreme properties including but not limited to:* - [ ] **G5.8a** *Zero-length data* is not tested - [ ] **G5.8b** *Data of unsupported types (e.g., character or complex numbers in for functions designed only for numeric data)* is not tested - [ ] **G5.8c** *Data with all-`NA` fields or columns or all identical fields or columns* Processing of missing data is note tested - [ ] **G5.8d** *Data outside the scope of the algorithm (for example, data with more fields (columns) than observations (rows) for some regression algorithms)* are not tested - [ ] **G5.9** **Noise susceptibility tests** *Packages should test for expected stochastic behaviour, such as through the following conditions:* - [ ] **G5.9a** *Adding trivial noise (for example, at the scale of `.Machine$double.eps`) to data does not meaningfully change results* is not tested - [ ] **G5.9b** *Running under different random seeds or initial conditions does not meaningfully change results* is not tested ------------------------------------------------------------------------ [Machine Learning Standards](https://ropenscilabs.github.io/statistical-software-review-book/standards.html#machine-learning-software) -------------------------------------------------------------------------------------------------------------------------------------- ### 5 Input Data Specification - [ ] **ML1.0** Documentation should make a clear conceptual distinction between training and test data (even where such may ultimately be confounded as described above.) Documentation refers frequently to “training data”, yet without any clear interpretation of this phrase. - [x] **ML1.1**–**ML1.** Design decisions distinguishing “training” from “test” data clarified and justified - [x] **ML1.5** Default `print` methods summarise contents of training data sets. #### 5.1 Missing Values - [ ] **ML1.6** ML software which does not admit missing values, and which expects no missing values, should implement explicit pre-processing routines to identify whether data has any missing values, and should generally error appropriately and informatively when passed data with missing values. In addition, ML software which does not admit missing values should: No such processing is implemented - [ ] **ML1.6a** Explain why missing values are not admitted. No explanation is given of whether or not missing values are admitted. - [ ] **ML1.6b** Provide explicit examples (in function documentation, vignettes, or both) for how missing values may be imputed, rather than simply discarded. No such examples are provided. - [x] **ML1.7**–**ML1.8** Missing values not admitted, so not applicable. ### 6 Pre-processing - [x] **ML2.0** Pre-processing steps are appropriately defined and parametrized. - [x] **ML2.1** Broadcasting not used to reconcile dimensionally incommensurate input data, so not applicable. - [ ] **ML2.2b** Any extended documentation (such as vignettes) which demonstrates the use of explicit values for numeric transformations should explicitly describe why particular values are used. The vignette for continuous data utilizes several `recipes` steps for transforming, “variables to be distributed as Gaussian-like as possible,” and normalizing, “numeric data to have a mean of zero and standard deviation of one,” yet no explanation is given for why this is necessary, nor for why these values are used. - [ ] **ML2.3** No data on numeric transformations recorded in return objects. - [x] **ML2.4** There are no default values defining numeric transformations, so not applicable. - [x] **ML2.5** Transformations must be explicitly defined, so not applicable. - [x] **ML2.6** There are no distinct functions for implementing transformations, so not applicable. - [ ] **ML2.7** Explicit transformations are documented, but not how these may be reversed, even though this could be documented. ### 7 Model and Algorithm Specification - [x] **ML3.1** Model specification is implemented as a distinct stage - [x] **ML3.2** Models can not be specified without directly fitting, but nor is this meaningful in the context of this pacakge. - [ ] **ML3.3** Where ML software implements its own distinct classes of model objects, the properties and behaviours of those specific classes of objects should be explicitly compared with objects produced by other ML software. In particular, where possible, ML software should provide extended documentation (as vignettes or equivalent) comparing model objects with those from other ML software, noting both unique abilities and restrictions of any implemented classes. No comparisons are made with equivalent methods from other software, even though this could readily be done. - [x] **ML3.4** No training rates used or implemented, so not applicable. #### 7.1 Control Parameters - [x] **ML3.5** No control parameters implemented, so not applicable. - [ ] **ML3.6** Unless explicitly justified otherwise (for example because ML software under consideration is an implementation of one specific algorithm), ML software should: - [ ] **ML3.6a** Implement or otherwise permit usage of multiple ways of exploring search space - [ ] **ML3.6b** Implement or otherwise permit usage of multiple loss functions. There is no ability to use alternative ways of exploring search space, nor of multiple loss functions (or equivalent). #### 7.2 CPU and GPU processing - [x] **ML3.7** There is no C++ code, so not applicable. ### 8 Model Training - [ ] **ML4.0** *ML software should generally implement a unified single-function interface to model training, able to receive as input a model specified according to all preceding standards.* The package exports several distinct functions for model training, both leaving it up to the user to select an appropriate one, and suggesting a design decision likely to expand functions through adding new functions for each new mode of training. - [x] **ML4.1**–**ML4.2** No optimizer implemented directly, so not applicable. #### 8.1 Batch Processing - [x] **ML4.3**–**ML4.6** There is no explicitly ability to implement batch processing, so not applicable. #### 8.2 Re-sampling - [x] **ML4.7**–**ML4.8** There are no explicit re-sampling routines, so not applicable. ### 9 Model Output and Performance #### 9.1 Model Output - [ ] **ML5.0** No single function defined via **ML4.0**, so a variety of return objects are implemented rather than a single, unified object. - [ ] **ML5.1** … the properties and behaviours of trained models produced by ML software should be explicitly compared with equivalent objects produced by other ML software. No such comparison is made. - [ ] **ML5.2** The structure and functionality of objects representing trained ML models should be thoroughly documented. In particular, - [ ] **ML5.2a** Either all functionality extending from the class of model object should be explicitly documented, or a method for listing or otherwise accessing all associated functionality explicitly documented and demonstrated in example code. No such documentation is provided. - [ ] **ML5.2b** Documentation should include examples of how to save and re-load trained model objects for their re-use in accordance with **ML3.1**, above. Such documentation is not provided, even though it could be. #### 9.2 Model Performance - [x] **ML5.3** Model performance is assessed via distinct functions. - [ ] **ML5.4** Model performance should be able to be assessed according to a variety of metrics. The `score` function is effectively hard-coded and unable to permit usage of alternative scoring metrics. - [ ] **ML5.4b** It should be possible to submit custom metrics to a model assessment function, and the ability to do so should be clearly documented including through example code. It is not possible to submit custom scoring metrics. ### 10 Documentation - [x] **ML6.0** Descriptions clearly distinguish training and testing stages and associated data sets. - [ ] **ML6.1** ML software intentionally designed to address only a restricted subset of the workflow described here should clearly document how it can be embedded within a typical *full* ML workflow in the sense considered here. No demonstration is provided for how the workflow enabled by this package can be embedded within a more complete ML workflow, even though such documentation could readily be provided. - [ ] **ML6.1** Such demonstrations should include and contrast embedding within a full workflow using at least two other packages to implement that workflow. Also not done. ### 11 Testing #### 11.1 Input Data - [x] **ML7.0** Input data need not be labelled, so not applicable. - [ ] **ML7.1** Tests should demonstrate effects of different numeric scaling of input data (see **ML2.2**). No such tests implemented. - [x] **ML7.2** Missing data are not imputed, so not applicable. #### 11.2 Model Classes - [ ] **ML7.3a** These tests should explicitly identify restrictions on the functionality of model objects in comparison with those of other packages. No tests implemented to demonstrate restrictions on classes of objects generated by this package. - [ ] **ML7.3b** These tests should explicitly identify functional advantages and unique abilities of the model objects in comparison with those of other packages. No such tests are present. #### 11.3 Model Training - [x] **ML7.4**–**ML7.5** Training rates not explicitly considered, so not applicable. - [x] **ML7.6** Training epochs not explicitly considered, so not applicable. - [ ] **ML7.7** ML software should explicitly test different optimization algorithms, even where software is intended to implement one specific algorithm. Different algorithms not tested, and they could be. - [ ] **ML7.8** ML software should explicitly test different loss functions, even where software is intended to implement one specific measure of loss. Different loss functions not tested, and they could be. - [ ] **ML7.9** Tests should explicitly compare all possible combinations in categorical differences in model architecture, such as different model architectures with same optimization algorithms, same model architectures with different optimization algorithms, and differences in both. Not implemented - [x] **ML7.10** There is no information on paths taken by optimizers, so this can not be tested and is not applicable. #### 11.4 Model Performance - [ ] **ML7.11** All performance metrics available for a given class of trained model should be thoroughly tested and compared. Performance metrics are neither tested nor compared, and they could readily be.

    Import from clipboard

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lost their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template is not available.


    Upgrade

    All
    • All
    • Team
    No template found.

    Create custom template


    Upgrade

    Delete template

    Do you really want to delete this template?

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in via Google

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Tutorials

    Book Mode Tutorial

    Slide Mode Tutorial

    YAML Metadata

    Contacts

    Facebook

    Twitter

    Feedback

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions

    Versions and GitHub Sync

    Sign in to link this note to GitHub Learn more
    This note is not linked with GitHub Learn more
     
    Add badge Pull Push GitHub Link Settings
    Upgrade now

    Version named by    

    More Less
    • Edit
    • Delete

    Note content is identical to the latest version.
    Compare with
      Choose a version
      No search result
      Version not found

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub

        Please sign in to GitHub and install the HackMD app on your GitHub repo. Learn more

         Sign in to GitHub

        HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Available push count

        Upgrade

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Upgrade

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully