owned this note changed a year ago
Published Linked with GitHub

Part 3 - Data Analysis

Group 1

  1. Data Cleaning and Transformation

    • What does it mean to you to "clean" data, and where do you put this data?

    • How do you incorporate and document manual steps?
  2. Analyses

    • What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
    • To what extent does a tool, such as Git and GitHub, address these challenges?
    • What are some pros/cons of sharing your analysis code?

Group 2

  1. Data Cleaning and Transformation

    • What does it mean to you to "clean" data, and where do you put this data?
      • addressing missing data

      • correct format - string or numeric data

      • outliers

      • dealing with image dataset - filtering anything that is not trustable

      • metadata

      • raw data and clean data in the different folder-

      • what data is raw data?

    • How do you incorporate and document manual steps?
      • comments
      • write codes for the future us
      • tracing
      • version control
  2. Analyses

    • What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
    • To what extent does a tool, such as Git and GitHub, address these challenges?
    • What are some pros/cons of sharing your analysis code?
      • Cons

        • The code works for me but not for others
        • Setting up the environment is easy for me and hard for others
          • python venv vs conda
            • matlab is better
        • Because I've shared my code, I can call it a day, you can just read the code and understand it
        • Takes time/effort to comment every line of code
      • Pros

        • I can see what my teammate has done and understand more about the project
        • Papers that share code help me to build my own research on top of it
        • Consistency of naming strategies
        • Forces you to clean up your processes
          • You want to make it as painless as possible for the next person

Group 3

  1. Data Cleaning and Transformation

    • What does it mean to you to "clean" data, and where do you put this data?
    • How do you incorporate and document manual steps?
  2. Analyses

    • What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
    • To what extent does a tool, such as Git and GitHub, address these challenges?
    • What are some pros/cons of sharing your analysis code?

Group 4

  1. Data Cleaning and Transformation

    • What does it mean to you to "clean" data, and where do you put this data?
    • How do you incorporate and document manual steps?
  2. Analyses

    • What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
    • To what extent does a tool, such as Git and GitHub, address these challenges?
    • What are some pros/cons of sharing your analysis code?

Group 5

  1. Data Cleaning and Transformation
    • What does it mean to you to "clean" data, and where do you put this data?
    • How do you incorporate and document manual steps?

  1. Analyses
    • What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
    • To what extent does a tool, such as Git and GitHub, address these challenges?
    • What are some pros/cons of sharing your analysis code?
Select a repo