# PSNM - Paper Outline
1. Goal: a review paper for showing the state-of-the-art between Data Science & Machine Learning. So, there 3 points in this paper we need to clarify:
- Data Science: what is it? how is Data Science? What are the problems and solutions of Data Science? Finally, what are the challenges and new approaches today (the state-of-the-art)?
- Machine Learning: how can we classify ML? The principle of ML-approaches??? Likewise, what are the challenges of Machine Learning?
- Emerging topics and trends, here if we put some points that you both mentioned, that would be more interesting.
+ pitfalls
+ issues and technical debt
2. Outline (like the format of a common paper)
### Abstract
There are 4 points in writting the abstract (~10-12 setences, this also depend...):
* Introduce the field of this review paper/research (~1-2 sentences)
* The problem in this topic we would like to mention (~2-3 sentences)
* Contribution of this work/What is your approach? (~2-3 sentences)
* What could people receive when they read your paper? (Why they need to read your paper) (~2-3 sentences)
### I. Introduction
The main content of Introduction: normally there are 4 parts (this could be changed flexibly)
* Paragraph 1: introduce the topic, mention the main problem (this paper is a survey, so the main problem could be around state-of-the-art and challenges between Data Science & ML). It would be good for some main related works mentioned here.
* Paragraph 2: from the problem, what is your contribution/method? Because of such a survey, your contributions could be listed with what points you want to summarize and make a survey here.
* Paragraph 3: a bit detail about the points of survey, you could give the overview results of the survey highlights in this paper. So, what are our contributions/research questions???:
* **What is datascience and activities in terms of data science?**
* **How is the connection between data science and others?**
* **What is the role of ML in terms of Data Science?**
* **What challenges are people concerning about Data and Machine Learning? (State-of-the-Art)?**
* Paragraph 4: finally, summarize the organization of this paper. For example, Section II reveals ..., Section III shows ...
### II. Related Work
### III. Background

1. Data Science and Multidisciplinary
We could introduce about data science and its relationship with other terms. List and classify the activities of Data Scientists here. (Then, in Section II, we will mention in detail, what are the main points and what are the State-of-the-Art). *We could give a definition here for Data Science before going through sub-section 1.*
Data Science Activities
* Ref: http://web.cs.ucla.edu/~miryung/Publications/icse2018-datascientist-slides.pdf
2. Principles of Machine Learning
Here, lets talk about the common principle of ML and DL. That is "*extracting insights from data-features*". Therefore, we need an objective function for training and a loss functions for validating. So, how are they? We could give a definition here for Machine Learning.
* a common ML-algorithm
* loss function
* regularization
* cross-validation
I think here are the basic things about ML. Then, in Section III, you could also list the specific ML types including:
* Linear: Linear regression vs Logistic Regression
* Tree-based: Decission tree, Random Forest, Gradient Boosting
* Neural Networks: neural networks
### II. Data Science - State of Art and Challenges
1. Metrics in data science activities
- Ref: https://agenda.infn.it/event/19049/contributions/98115/attachments/66537/81428/Kuznetsov-Tuesday.pdf
- There are, e.g., querying data, gathering data, etc.
Qualitative
Quantitative
*Then, you could mention the process of analyzing data here*
2. Challenges and new scientific approaches
### III. Machine Learning - State of Art and Challenges

1. The principles of machine learning
2. Common Tasks and problems that can be solved using machine learning
3. Forms of machine learning (e.g. supervised, unsupervised, reinforcement etc.)
4. Specific machine learning algorithms
a) Linear Regresssion
b) Logistic Regression
c) Decision Trees
d) Random Forests
e) Gradient Boosting Machines
...
d) Neural Networks (much more detailed than the rest, because of the importance)
- https://www.youtube.com/watch?v=njKP3FqW3Sk&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=1
5. Challenges and new approaches ***Maybe part of Emerging topics and trends***
- Dominigos, A Few Useful Things to Know about Machine Learning, KDD, 2014
- https://research.google/pubs/pub43146/
### IV. Emerging topics and trends
1. Model Interpretability
[1] Carvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. "Machine learning interpretability: A survey on methods and metrics." Electronics 8, no. 8 (2019): 832.
2. Fairness and Bias
[2] Mehrabi, Ninareh, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. "A survey on bias and fairness in machine learning." arXiv preprint arXiv:1908.09635 (2019).
***Pitfalls in Machine Learning (belong to this point ???)***
3. Data Privacy and Safety
[3] Saria, Suchi, and Adarsh Subbaswamy. "Tutorial: safe and reliable machine learning." arXiv preprint arXiv:1904.07204 (2019).
*I'm thinking about some review results which we could make some plots in this section.*
<!-- ### V. A taxonomy of Data Analysis & Machine Learning Frameworks
* How do you think?
* We could make a survey here.
* Ref: https://lexfridman.com/files/slides/2020_01_06_deep_learning_state_of_the_art.pdf -->
<!-- *I remember I did saw a review about this, but let me check later* -->
### VI. Conclusion
### References