owned this note
owned this note
Published
Linked with GitHub
# Part 3 - Data Analysis
# Group 1
1. **Data Cleaning and Transformation**
- What does it mean to you to "clean" data, and where do you put this data?
-
- How do you incorporate and document manual steps?
2. **Analyses**
- What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
- To what extent does a tool, such as Git and GitHub, address these challenges?
- What are some pros/cons of sharing your analysis code?
# Group 2
1. **Data Cleaning and Transformation**
- What does it mean to you to "clean" data, and where do you put this data?
- addressing missing data
- correct format - string or numeric data
- outliers
- dealing with image dataset - filtering anything that is not trustable
- metadata
- raw data and clean data in the different folder-
- what data is raw data?
- How do you incorporate and document manual steps?
- comments
- write codes for the future us
- tracing
- version control
-
2. **Analyses**
- What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
- To what extent does a tool, such as Git and GitHub, address these challenges?
- What are some pros/cons of sharing your analysis code?
- Cons
- The code works for me but not for others
- Setting up the environment is easy for me and hard for others
- python venv vs conda
- matlab is better
- Because I've shared my code, I can call it a day, you can just read the code and understand it
- Takes time/effort to comment every line of code
- Pros
- I can see what my teammate has done and understand more about the project
- Papers that share code help me to build my own research on top of it
- Consistency of naming strategies
- Forces you to clean up your processes
- You want to make it as painless as possible for the next person
# Group 3
1. **Data Cleaning and Transformation**
- What does it mean to you to "clean" data, and where do you put this data?
- How do you incorporate and document manual steps?
2. **Analyses**
- What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
- To what extent does a tool, such as Git and GitHub, address these challenges?
- What are some pros/cons of sharing your analysis code?
# Group 4
1. **Data Cleaning and Transformation**
- What does it mean to you to "clean" data, and where do you put this data?
- How do you incorporate and document manual steps?
2. **Analyses**
- What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
- To what extent does a tool, such as Git and GitHub, address these challenges?
- What are some pros/cons of sharing your analysis code?
# Group 5
1. **Data Cleaning and Transformation**
- What does it mean to you to "clean" data, and where do you put this data?
- How do you incorporate and document manual steps?
ඞ
2. **Analyses**
- What are some challenges you've faced, or foresee facing, when it comes to documenting how you've analyzed data?
- To what extent does a tool, such as Git and GitHub, address these challenges?
- What are some pros/cons of sharing your analysis code?