# Manuel asking:
## Sampling and Generalizability
> Brian Caffo said in his lecture that machine learning does not assume a population model. To me, this seems a little shortsighted (or maybe I’m too much on the statistics-side). How do people practicing machine learning justify the claim that their predictions should be valid for new data if there is no assumption that the new data and the old data come from the same population?
- [x] 1. (re Brian Caffo) no population model in ML: how can we assume validity of the results to a new dataset if we do not assume that it comes from the same population?
- [x] 2. reweighting: is that what ML people are used to do?
## Explanation vs. Predition
> As scientists, we want to explain the world. We want to find structures, regularities, rules that describe empirical phenomena. To what extent can machine learning help us achieve this goal? If we can predict something (more or less) perfectly, but we don’t know why and our method is ignorant with regard to general rules, is the method still scientific?
- [x] 3. the explanatory duty of science (structures, regularities, rules that describe empirical phenomena): isn't prediction ignorant of scientific concerns?
> There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
- [x] 4. There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
## Construct Validity
> Markowetz and colleagues state that their paper is based on a single central thesis: “The user’s mental state affects the way he interacts with a machine”. To what extent has this claim been tested (for situations where data from digital devices was used to infer something about a person’s characteristics) and how would you test it given the simultaneous claim that self reports are a poor measure?
- [x] 5. how can we validate the claim that there is a strong link between affective states and person's behavior, if we have invalidated questionnaires and self reports?
> To what extent is the machine learning community in psychology interested in traditional quality criteria of testing, e.g., validity, reliability, objectivity?
- [x] 6. how much do ML-inspired psychologists/computer scientists care about psychometric qualities of iunstruments (validity, reloiability, objectivity)?
> They also overlook construct validity of their proposed measures (e.g. typos as indicators of stress) and take it as granted.
- [x] 7. Markowetz overlook important aspects of construct validity. Can we simply assume that typos are good indicators of stress?
> I found Markovetz et al. relying on strawman argumentation while downplaying traditional psychometrics.
- [x] 8. They attack traditional psychometrics by attributing weaknesses to traditional psychometrics
## Hype:
> Using a bit of exaggeration in my wording, I have the feeling that Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. I am missing a bit of modesty and reflection in this paper. As far as I remember, there was no single mention of drawbacks and difficulties of psycho-informatics. In my impression, they blindly assume that the number of social interactions on a smartphone can be directly translated into the severity of depression, to mention just one example. I have doubts that this connection can be so easily made. And I certainly think that psycho-informatics has more problems that need to be overcome. For example, there are concerns about privacy. In the paper these concerns are only mentioned briefly and are quickly trivialised. Where do you stand on the advantage-disadvantage debate?
- [x] 9. Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. Maybe there is a bit of modesty and reflection missing in this paper. Drawbacks and difficulties of psycho-informatics are not treated. They simply assume that the number of social interactions on a smartphone can be directly translated into the severity of depression. Can this connection be so easily made? What are potential drawbacks and problems that psycho-informatics needs to overcome? Where is your position on te advantage-disadvantage debate?
> They also mention that the whole BD and ML are over-hyped buzzwords, but they still treat them as if these terms are not not so over rated amd see extraordinary potential to them.
- [x] 10. Markowetz and colleagues make conflicting statements when they say that Big Data and Machine Learning are over-hyped buzzwords but at the same time treat these terms as not over-hyped and see extraordinary potential.