Smit Patel
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # CSCD01 Assignment 2 # Table of Contents - [Bugs](#Bugs) - [Bugs Fixed](#Bugs-Fixed) - [Customer Acceptance](#Customer-Acceptance) - [Software Development Process](#Software-Development-Process) * [Sprint Board](#Sprint-Board) * [Sprint Backlog](#Sprint-Backlog) * [Challenges](#Challenges) * [Daily Scrum](#Daily-Scrum) * [Team Meetings](#Team-Meetings) # Bugs *Part of A1 Interview* | Member Name | Bug to Demonstrate | | --- | --- | | Sharjeel | [Duplicate check_finite when calling scipy.linalg functions #18837](https://github.com/scikit-learn/scikit-learn/issues/18837) [Easy] <br><br> The bug is about potentially causing unnecessary overhead to validate the data when the data has already been validated. <br><br> The fix involves checking scipy linalg functions used in different packages of scikit-learn and adding `check_finite=False` to the parameters of the affected calls. | | Anand | [PolynomialFeatures doesn't work correctly when degree=0 #19551](https://github.com/scikit-learn/scikit-learn/issues/19551) [No Rating] <br><br> PolynomialFeatures pre-processes data by transforming it into a new feature matrix consisting of all polynomial combinations of the same features but with degree less than or equal to the specified degree. However, given ``degree=0``, a ``ValueError`` due to a broadcasting problem. <br><br> To fix, simply add a condition for when ``degree < 1`` to raise an exception. | | Charles | [Error on using None for missing values in SimpleImputer with boolean arrays #18461](https://github.com/scikit-learn/scikit-learn/issues/18461) [Easy] <br><br> When extrapolating data, to identify which data points need to be extrapolated, values such as ``np.nan`` or ``None`` is used as a placeholder. However, the placeholder domain cannot be used interchangeably, as choosing one placeholder over the other may lead to a variety of different outputs given the same set of input data. <br><br> To fix, take time to validate the set of input data and set all values to a common placeholder value. | | Smit | [sklearn.utils.multiclass.type_of_target with sparse csr matrix raises ValueError #14860](https://github.com/scikit-learn/scikit-learn/issues/14860) [Easy] <br><br> This bug is caused when we pass a sparse csr matrix to the function ``type_of_target()``. The expected output is not returned on a valid input and instead an error is thrown. <br><br> The potential fix for this bug is to tackle the boolean check it does and add extra constraints to see if it is not a sparse matrix. This breaks each case down to multiple if/else statements. | | Donnie | [Sequential forward selection - unsupervised fit_transform bug #19538](https://github.com/scikit-learn/scikit-learn/issues/19538) [No Rating] <br><br> The bug is invoked when you don’t provide the ``Y`` vector as a parameter to the ``fit_transform()`` function of the ``SequentialFeatureSelector``. This is a bug because you don’t require the ``Y`` vector for unsupervised transformations. <br><br> The fix is most likely to remove all processing of the ``Y`` vector when it is ``None``. | | Jun | [OrdinalEncoder: Deprecate automatically assuming lexicographic ordering #14954](https://github.com/scikit-learn/scikit-learn/issues/14954) [Easy] <br><br> The argument referenced can be found here: https://sourcegraph.com/github.com/scikit-learn/scikit-learn/-/blob/sklearn/preprocessing/_encoders.py#L720 <br><br> The issue is with the call to the ``_unique()`` function, in which it will re-order your labels. This can be fixed by addressing the code in the lines located at: https://sourcegraph.com/github.com/scikit-learn/scikit-learn/-/blob/sklearn/utils/_encode.py#L7 | | Gaurav | [OneHotEncoder.get_feature_names doesn't work with integer column names #16593](https://github.com/scikit-learn/scikit-learn/issues/16593) [Easy] <br><br> This bug has to do with the fact that Python is weakly typed: concatenation of the integer column name with a string throws an error. <br><br> There is some discussion that column names should not be integers in the first place. Data should be sanitized and type converted as it is passed into the function, if necessary. | # Bugs Fixed ## [#16593](https://github.com/scikit-learn/scikit-learn/issues/16593) [\[PR #8\]](https://github.com/UTSCCSCD01/course-project-bestteam/pull/8/files) **Description:** - developers addressed concerns on typecasting numerical data types to a string data type - developers agree that column names should not be a numerical data type - a custom error message was implemented; it will be raised if the column names are not strings **Files Changed:** - ``sklearn/preprocessing/_encoders.py`` - ``sklearn/preprocessing/tests/test_encoders.py`` ## [#19551](https://github.com/scikit-learn/scikit-learn/issues/19551) [\[PR #11\]](https://github.com/UTSCCSCD01/course-project-bestteam/pull/11/files) **Description:** - the developer who opened the issue believed that the transform method in ``PolynomialFeatures`` should not result in an error when the degree is 0 - the ``scikit-learn`` community members decided that such a feature would be inconsistent with the rest of the code base and also not bring in anything of value - the bug was fixed by raising a ``ValueError`` when `include_bias` is false, otherwise returning a constant column of ones for the sake of composability - [this comment](https://github.com/scikit-learn/scikit-learn/issues/19551#issuecomment-786111368) links to the scikit-learn user outlining the details of this particular fix **Files Changed:** - ``sklearn/preprocessing/_data.py`` - ``sklearn/preprocessing/tests/test_data.py`` ## [#18461](https://github.com/scikit-learn/scikit-learn/issues/18461) [\[PR #12\]](https://github.com/UTSCCSCD01/course-project-bestteam/pull/12/files) **Description:** - developers bring forth concerns where ``None`` is being used as an empty placeholder for imputing data in place of ``np.NaN`` - ``mean`` and ``median`` imputation should only involve numerical data types and the empty placeholder - ``most_frequent`` and ``constant`` can accept ``None`` as a valid element - ``None`` is not a numerical data type and should not be treated as such in the imputation strategies ``mean`` and ``median`` - an error message was implemented to be raised when ``None`` appears as an element to be used in the context of numerical computation **Files Changed:** - ``sklearn/impute/_base.py`` - ``sklearn/impute/tests/test_impute.py`` ## [#19538](https://github.com/scikit-learn/scikit-learn/issues/19538) [\[PR #9\]](https://github.com/UTSCCSCD01/course-project-bestteam/pull/9/files) and [\[PR 10\]](https://github.com/UTSCCSCD01/course-project-bestteam/pull/10/files) **Description:** - the Sequential Feature Selector (SFS) transforms (adds or removes) features in the input data (X) based on the fit of a particular learning algorithm - in unsupervised learning algorithms, the fit is determined purely on the input matrix X - when using unsupervised learning models, ``fit_transform()`` (fit model and transform data) raises an error requiring the user to provide the response vector Y; the unsupervised model does not require the response vector - this issue was solved by making the response vector an optional parameter and delegating the role of verifying the response vector to the actual underlying model itself **Files Changed:** - ``sklearn/feature_selection/_sequential.py`` - ``sklearn/feature_selection/tests/test_sequential.py`` # Customer Acceptance ``scikit-learn`` uses the ``pytest`` module to write test cases for each of their modules. The team has created additional test cases or have modified existing test cases in an effort to verify that the bugs mentioned in the list above are in fact correctly fixed. The following commands can be used to verify each of the changed modules. ### [#16593](https://github.com/scikit-learn/scikit-learn/issues/16593) and [#19551](https://github.com/scikit-learn/scikit-learn/issues/19551) ``` $ python -m pytest sklearn/preprocessing ``` ### [#18461](https://github.com/scikit-learn/scikit-learn/issues/18461) ``` $ python -m pytest sklearn/impute ``` ### [#19538](https://github.com/scikit-learn/scikit-learn/issues/19538) ``` $ python -m pytest sklearn/feature_selection ``` # Software Development Process ## Sprint Board ![](https://i.imgur.com/klJHin2.png) ## Sprint Backlog - [BES-4 Clone scikit-learn repo to CSCD01 course repo](https://mcsapps.utm.utoronto.ca/jira/browse/BES-4) - [BES-1 Error on using None for missing values in SimpleImputer with boolean arrays](https://mcsapps.utm.utoronto.ca/jira/browse/BES-2) - [BES-2 OneHotEncoder.get_feature_names doesn't work with integer column names](https://mcsapps.utm.utoronto.ca/jira/browse/BES-2) - [BES-3 PolynomialFeatures doesn't work correctly when degree=0](https://mcsapps.utm.utoronto.ca/jira/browse/BES-3) - [BES-7 Sequential forward selection - unsupervised fit_transform bug](https://mcsapps.utm.utoronto.ca/jira/browse/BES-7) ## Challenges 1. Adding the Scikit-learn repository to our repository without referencing the main `scikit-learn` repo. 2. Project set up and test case execution were challenging as the `Scikit Developer's Guide` was not intuitive; we ran into some undocumented issues. ## Daily Scrum ### Smit | Date | Task | | --- | --- | | Feb 22 | Short listed potential bugs to reproduce and fix. | | Feb 24 | Picked out a bug, perfomed investigation and reproduced the bug on local machine. Prepared for A1 interview. | | Feb 26 | Cloned Scikit-learn repo into our repo. | | Mar 1 | Held meeting with team members to pick two/three bugs to fix. Organized ourselves and assigned everyone some task to finish by A2 deadline. | | Mar 3 | Fixed cloned scikit-learn repo so that it does not point to the original repo. Helped team members with building the scikit package. | | Mar 5 | Did technical writing on our software development process and reviewed JIRA tickets to make sure we finish on time. | ### Sharjeel | Date | Task | | --- | --- | | Feb 22 | Chose a bug to investigate and reproduce. Prepare for A1 interview | | Feb 24 | Worked on reproducing my chosen bug and understanding the cause behind it | | Feb 26 | Researched potential fixes to the bug that I chose and look through all the packages in the codebase to find places that were affected by that bug | | Mar 1 | Attended team meeting and decided what bugs to work on for the sprint and who will lead them. Created a new sprint and tickets for assignment 2 on Jira board. | | Mar 3 | Worked with a team member on sequential forward selection [bug](https://mcsapps.utm.utoronto.ca/jira/browse/BES-7) | | Mar 5 | Continue working on the bug and created a test suite for the bug. Review PRs created by other team members | ### Gaurav | Date | Task | | --- | --- | | Feb 22 | Chose a bug to investigate and reproduce. Investigated bug documentation on GitHub and went over the comments. | | Feb 24 | Played around with code to try again see what inputs cause the bug and which inputs do not. Went over A1 document to refresh memory for interview. | | Feb 26 | Created a test senario which would reproduce the bug and other which shows the expected output. | | Mar 1 | Attended team meeting to decide which bugs to implement a bug fix for. Picked the bug I was investigating earlier. | | Mar 3 | Played around with code trying to figure out where the big was happening. | | Mar 5 | Implement bug fix and tests. | ### Anand | Date | Task | | --- | --- | | Feb 22 | Investigated open issues on scikit-learn's github repository | | Feb 24 | Choose a sufficiently easy bug which also mentioned implementation strageties | | Feb 26 | Prepared for interview with D01 TAs by setting up the condidtions required to display the bug | | Mar 1 | Participated in team meeting to decide which issues to take up this sprint | | Mar 3 | Researched PolynomialFeatures to understand the choosen issue and its fixes | | Mar 5 | Implemented a fix for the bug and worked on add new unit tests | ### Donnie | Date | Task | | --- | --- | | Feb 22 | Selected a bug for potential fixing | | Feb 24 | Investegated bug reproduction, a potential solution for fixing the bug, and prepared for the CSCD01 A1 interview | | Feb 26 | Attended CSCD01 interview | | Mar 1 | Attended team meeting for sprint planning, bug fix selection, and task assignment | | Mar 3 | Investigated on how to build project and run test cases | | Mar 5 | Worked on implementation of bug fix for [BES-8](https://mcsapps.utm.utoronto.ca/jira/browse/BES-8) | ### Jun | Date | Task | | --- | --- | | Feb 22 | Investigated issues on scikit-learn's repo | | Feb 24 | Selected a bug and investigated it to find a solution | | Feb 26 | Attend interview for Assignment 1 during tutorial hours | | Mar 1 | Reviewed JIRA board to see assigned tasks for team members | | Mar 3 | Setup local development environment and investigate how to build and run tests for scikit-learn. | | Mar 5 | Implement bug fix and tests. | ### Charles | Date | Task | | --- | --- | | Feb 22 | Looked over existing issues on scikit's for something to reproduce and possibly fix. | | Feb 24 | Selected the None bug. Reproduced it under several variations of different code segments. | | Feb 26 | Presented in tutorials. Helped shortlist list of bugs. | | Mar 1 | Went over previously selected issues and shortlisted the items down even further. | | Mar 3 | Worked on setting up test cases for None issue. | | Mar 5 | Worked on creating an implementation to solve the new test cases. | ## Team Meetings Team meetings were held at 9:00PM EST on a Discord voice call. ### Feb 21 **Goal:** CSCD01 interview preparation **Outcome:** - For the next meeting, every team member was expected to select a unique bug for potential fixing and determine how to reproduce the bug. - For the interview, every team member was expected to familiarize themselves with the A1 report. ### Feb 24 **Goal:** Pooling of bugs and bug reproduction demonstration **Outcome**: - Documentation of selected bugs on GitHub. ### March 1 **Goal:** Sprint planning and task distribution **Outcome**: - Selection of bugs for fixing. - Assignment of members to each selected bug.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully