# interview Katrijn van Deun
## Xyntheia:
> A first question: Rob Meijer told us yesterday that a) he is in favor of standardized (and preferably validated) tests; and b) he would like to see behavior, since that is difficult to predict. This is particularly relevant for high stake contexts. Big data obtained via devices, as proposed by Markowetz et al. (2014), appear to provide a whole new type of behavioral data, so accommodating his second point. But with the COTAN-criteria in mind, I'm wondering about a way to standardize and validate such behavioral measures. How urgent is a standardization and/or validation procedure for these data? Are these data currently used in high stakes contexts? Are there any ideas on how to make these data suitable for high-stake contexts? (edited)
- [x] - satndardizing and validating behavioral measures of such data: an urgent call?
- [x] - using in high-stakes contexts: are they being used for it? will they be?
> Related to my high stakes question a practical example. I'm thinking of the forensic context now, but it may well apply to other contexts where self-report or observation is unlikely to give us answers we need. We do use big data to complement evidence in forensic investigation, but I don't think we're allowed to use big data as a complement in forensic psychiatric reports (not sure about it though). The former is used to evaluate guilt, while the second one evaluates the psychiatric status of the suspect. Both contribute to a decision regarding punishment. That makes me wonder about the current "legal" status of big data, and makes me ask a question simular to that of @Esther Maassen yesterday. Do we have or need legislation for the use of big data?
- [x] - legal status of BD in forensic investigations: can we use them to compliment psychiatric assessment (measurihng guilt vs. "diagnosis")
## Angelika:
> As scientists, we want to explain the world. We want to find structures, regularities, rules that describe empirical phenomena. To what extent can machine learning help us achieve this goal? If we can predict something (more or less) perfectly, but we don’t know why and our method is ignorant with regard to general rules, is the method still scientific?
- [x] - the explanatory duty of scienc (tructures, regularities, rules that describe empirical phenomena): isn't prediction ignorant of scientific concerns?
> Brian Caffo said in his lecture that machine learning does not assume a population model. To me, this seems a little shortsighted (or maybe I’m too much on the statistics-side). How do people practicing machine learning justify the claim that their predictions should be valid for new data if there is no assumption that the new data and the old data come from the same population?
- [x] - (re Brian Caffo) no population model in ML: how can we assume validity of the results to a new dataset if we do not assume that it comes from the same population?
> Data from digital devices and services has been used to measure personality. According to the APA, personality is defined as individual differences in characteristic patterns of thinking, feeling and behavior. I can see how you would get valid measures of behavior with tracking data on people’s digital devices. However, it seems much more natural to me to use self-reports for thoughts and feelings. Do you think you can assess thoughts and feelings in a valid way through data from digital devices and services?
- [x] - at their best, BD passive data of smartphones measure behavior. Though, personality is "defined as individual differences in characteristic patterns of thinking, feeling and behavior". How representative are behaviors of thoughts and emotions?
> Markowetz and colleagues state that their paper is based on a single central thesis: “The user’s mental state affects the way he interacts with a machine”. To what extent has this claim been tested (for situations where data from digital devices was used to infer something about a person’s characteristics) and how would you test it given the simultaneous claim that self reports are a poor measure?
- [x] - how can we validate the claim that there is a strong link between affective states and person's behavior, if we have invalidated questionnaires and self reports?
> In psychological studies, participants often receive some feedback or a debriefing at the end. How do you explain the results of studies using big data / machine learning to participants? How would you explain the results of a psychological testing procedure that is based on Big Data?
- [x] - in psych studies we give feedbacks on how individuals have preformed **(right?)** how can we communicate the results of BD to participants?
> A maybe related point: There has been a lot of discussion about fairness of machine learning algorithms. To what extent do you think could the increased use of machine learning methods in psychological research (and testing!) lead to an increase in (perceived) unfairness of the procedures? How can we tackle this issue?
- [x] - are ML algorithms fair, given the hot discussion over them being "racist" and discriminating, e.g., in hiring processes?
> To what extent is the machine learning community in psychology interested in traditional quality criteria of testing, e.g., validity, reliability, objectivity?
- [x] - how much do ML-inspired psychologists/computer scientists care about psychometric qualities of iunbstruments (validity, reloiability, objectivity)?
## Maximilian:
> Machine learning approaches often have a better prediction performance than standard statistical approaches. In turn, machine learning approaches oftentimes lack interpretability. Suppose an analyst is only interested in performance and not at all in interpretability. Other stakeholders might argue that interpretability is still important to justify decisions to other people. Examples would be personnel selection and medical diagnosis. How can we justify a decision if we cannot even communicate properly how the decision was made? What are your thoughts about this?
- [x] **Reform**: Machine learning approaches often have a better prediction performance than standard statistical approaches. In turn, machine learning approaches oftentimes lack interpretability. Suppose an analyst is only interested in performance and not at all in interpretability. Other stakeholders might argue that interpretability is still important to justify decisions to other people. Examples would be personnel selection and medical diagnosis. How can we justify a decision if we cannot even communicate properly how the decision was made? What are your thoughts about this?
> There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
- [x] **Reform**: There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
> I cannot remember that I learned anything about machine learning techniques in my Bachelor and Master. Why is machine learning not in the standard curriculum of social sciences? How should statistical/programming/mathematical education change in your opinion?
- [x] **Reform**: In most Bachelor and Master programs in the social sciences, machine learning techniques are not taught. Why is that the case? How should statistical/programming/mathematical education in the social sciences change in your opinion?
> Markowetz and colleagues mention that data from digital devices can be used by the medical doctor or the psychotherapist to adjust medication doses, adjust treatment plans, etc. Since the data would be extremely complex, it would require a lot of further education and work for the doctor/psychotherapist to be able to properly judge the data and not rely on eye balling and gut feelings. Is that even realistic? Does that require an additional data scientist/statistician in the staff team to help out the doctor/psychotherapist?
- [x] **Reform**: Markowetz and colleagues mention that data from digital devices can be used by the medical doctor or the psychotherapist to adjust medication doses, adjust treatment plans, etc. Since the data would be extremely complex, it would require a lot of further education and work for the doctor/psychotherapist to be able to properly judge the data and not rely on eye balling and gut feelings. Is that even realistic? Does that require an additional data scientist/statistician in the staff team to help out the doctor/psychotherapist?
> Using a bit of exaggeration in my wording, I have the feeling that Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. I am missing a bit of modesty and reflection in this paper. As far as I remember, there was no single mention of drawbacks and difficulties of psycho-informatics. In my impression, they blindly assume that the number of social interactions on a smartphone can be directly translated into the severity of depression, to mention just one example. I have doubts that this connection can be so easily made. And I certainly think that psycho-informatics has more problems that need to be overcome. For example, there are concerns about privacy. In the paper these concerns are only mentioned briefly and are quickly trivialised. Where do you stand on the advantage-disadvantage debate?
- [x] **Reform**: Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. Maybe there is a bit of modesty and reflection missing in this paper. Drawbacks and difficulties of psycho-informatics are not treated. They simply assume that the number of social interactions on a smartphone can be directly translated into the severity of depression. Can this connection be so easily made? What are potential drawbacks and problems that psycho-informatics needs to overcome? Where is your position on te advantage-disadvantage debate?
> To what extent is the machine learning community in psychology interested in traditional quality criteria of testing, e.g., validity, reliability, objectivity?
- [x] - how much do ML-inspired psychologists/computer scientists care about psychometric qualities of iunstruments (validity, reloiability, objectivity)?
## Manuel:
> I found Markovetz et al. relying on strawman argumentation while downplaying traditional psychometrics.
- [x] **Reform**: They attack traditional psychometrics by attributing
> They also overlook construct validity of their proposed measures (e.g. typos as indicators of stress) and take it as granted.
- [x] **Reform**: Markowetz overlook important aspects of construct validity. Can we simply assume that typos are good indicators of stress?
> They also mention that the whole BD and ML are over-hyped buzzwords, but they still treat them as if these terms are not not so over rated amd see extraordinary potential to them.
- [x] **Reform**: Markowetz and colleagues make conflicting statements when they say that Big Data and Machine Learning are over-hyped buzzwords but at the same time treat these terms as not over-hyped and see extraordinary potential.
## Denny:
> I wonder what Katrijn thinks of the by now well known problem that a lot of bias tends to creep into ML and DL approaches because of their sole focus on prediction (biased items are usually good predictors) and because of the fact that training data are often a very selective subset (e.g. only white faces, etc.)
- [x] **Reform**: Machine learning tends to introduce bias on purpose because it tends to improve predictions. Is that a problem in your opinion? Also, machine learning relies on training data, which is often very selective (e.g., only white faces, etc.).
---
# Maximilian asking:
> A first question: Rob Meijer told us yesterday that a) he is in favor of standardized (and preferably validated) tests; and b) he would like to see behavior, since that is difficult to predict. This is particularly relevant for high stake contexts. Big data obtained via devices, as proposed by Markowetz et al. (2014), appear to provide a whole new type of behavioral data, so accommodating his second point. But with the COTAN-criteria in mind, I'm wondering about a way to standardize and validate such behavioral measures. How urgent is a standardization and/or validation procedure for these data? Are these data currently used in high stakes contexts? Are there any ideas on how to make these data suitable for high-stake contexts? (edited)
- satndardizing and validating behavioral measures of such data: an urgent call?
- using in high-stakes contexts: are they being used for it? wiull they be?
> Related to my high stakes question a practical example. I'm thinking of the forensic context now, but it may well apply to other contexts where self-report or observation is unlikely to give us answers we need. We do use big data to complement evidence in forensic investigation, but I don't think we're allowed to use big data as a complement in forensic psychiatric reports (not sure about it though). The former is used to evaluate guilt, while the second one evaluates the psychiatric status of the suspect. Both contribute to a decision regarding punishment. That makes me wonder about the current "legal" status of big data, and makes me ask a question simular to that of @Esther Maassen yesterday. Do we have or need legislation for the use of big data?
- legal status of BD in forensic investigations: can we use them to compliment psychiatric assessment (measurihng guilt vs. "diagnosis")
> A maybe related point: There has been a lot of discussion about fairness of machine learning algorithms. To what extent do you think could the increased use of machine learning methods in psychological research (and testing!) lead to an increase in (perceived) unfairness of the procedures? How can we tackle this issue?
- are ML algorithms fair, given the hot discussion over them being "racist" and discriminating, e.g., in hiring processes?
> In psychological studies, participants often receive some feedback or a debriefing at the end. How do you explain the results of studies using big data / machine learning to participants? How would you explain the results of a psychological testing procedure that is based on Big Data?
- in psych studies we give feedbacks on how individuals have preformed **(right?)** how can we communicate the results of BD to participants?
> Data from digital devices and services has been used to measure personality. According to the APA, personality is defined as individual differences in characteristic patterns of thinking, feeling and behavior. I can see how you would get valid measures of behavior with tracking data on people’s digital devices. However, it seems much more natural to me to use self-reports for thoughts and feelings. Do you think you can assess thoughts and feelings in a valid way through data from digital devices and services?
- at their best, BD passive data of smartphones measure behavior. Though, personality is "defined as individual differences in characteristic patterns of thinking, feeling and behavior". How representative are behaviors of thoughts and emotions?
> Machine learning approaches often have a better prediction performance than standard statistical approaches. In turn, machine learning approaches oftentimes lack interpretability. Suppose an analyst is only interested in performance and not at all in interpretability. Other stakeholders might argue that interpretability is still important to justify decisions to other people. Examples would be personnel selection and medical diagnosis. How can we justify a decision if we cannot even communicate properly how the decision was made? What are your thoughts about this?
**Reform**: Machine learning approaches often have a better prediction performance than standard statistical approaches. In turn, machine learning approaches oftentimes lack interpretability. Suppose an analyst is only interested in performance and not at all in interpretability. Other stakeholders might argue that interpretability is still important to justify decisions to other people. Examples would be personnel selection and medical diagnosis. How can we justify a decision if we cannot even communicate properly how the decision was made? What are your thoughts about this?
> I cannot remember that I learned anything about machine learning techniques in my Bachelor and Master. Why is machine learning not in the standard curriculum of social sciences? How should statistical/programming/mathematical education change in your opinion?
**Reform**: In most Bachelor and Master programs in the social sciences, machine learning techniques are not taught. Why is that the case? How should statistical/programming/mathematical education in the social sciences change in your opinion?
> Markowetz and colleagues mention that data from digital devices can be used by the medical doctor or the psychotherapist to adjust medication doses, adjust treatment plans, etc. Since the data would be extremely complex, it would require a lot of further education and work for the doctor/psychotherapist to be able to properly judge the data and not rely on eye balling and gut feelings. Is that even realistic? Does that require an additional data scientist/statistician in the staff team to help out the doctor/psychotherapist?
**Reform**: Markowetz and colleagues mention that data from digital devices can be used by the medical doctor or the psychotherapist to adjust medication doses, adjust treatment plans, etc. Since the data would be extremely complex, it would require a lot of further education and work for the doctor/psychotherapist to be able to properly judge the data and not rely on eye balling and gut feelings. Is that even realistic? Does that require an additional data scientist/statistician in the staff team to help out the doctor/psychotherapist?
> I wonder what Katrijn thinks of the by now well known problem that a lot of bias tends to creep into ML and DL approaches because of their sole focus on prediction (biased items are usually good predictors) and because of the fact that training data are often a very selective subset (e.g. only white faces, etc.)
**Reform**: Machine learning tends to introduce bias on purpose because it tends to improve predictions. Is that a problem in your opinion? Also, machine learning relies on training data, which is often very selective (e.g., only white faces, etc.).
# Manuel asking:
> As scientists, we want to explain the world. We want to find structures, regularities, rules that describe empirical phenomena. To what extent can machine learning help us achieve this goal? If we can predict something (more or less) perfectly, but we don’t know why and our method is ignorant with regard to general rules, is the method still scientific?
- the explanatory duty of scienc (tructures, regularities, rules that describe empirical phenomena): isn't prediction ignorant of scientific concerns?
> Brian Caffo said in his lecture that machine learning does not assume a population model. To me, this seems a little shortsighted (or maybe I’m too much on the statistics-side). How do people practicing machine learning justify the claim that their predictions should be valid for new data if there is no assumption that the new data and the old data come from the same population?
- (re Brian Caffo) no population model in ML: how can we assume validity of the results to a new dataset if we do not assume that it comes from the same population?
> Markowetz and colleagues state that their paper is based on a single central thesis: “The user’s mental state affects the way he interacts with a machine”. To what extent has this claim been tested (for situations where data from digital devices was used to infer something about a person’s characteristics) and how would you test it given the simultaneous claim that self reports are a poor measure?
- how can we validate the claim that there is a strong link between affective states and person's behavior, if we have invalidated questionnaires and self reports?
> There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
- [ ] **Reform**: There is a huge divergence between Breiman's statement that "[t]he goal is not interpretability, but accurate information" and what most statisticians and researchers in the social sciences think. The latter mostly prefer simple models. What is your own opinion on this matter?
> Using a bit of exaggeration in my wording, I have the feeling that Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. I am missing a bit of modesty and reflection in this paper. As far as I remember, there was no single mention of drawbacks and difficulties of psycho-informatics. In my impression, they blindly assume that the number of social interactions on a smartphone can be directly translated into the severity of depression, to mention just one example. I have doubts that this connection can be so easily made. And I certainly think that psycho-informatics has more problems that need to be overcome. For example, there are concerns about privacy. In the paper these concerns are only mentioned briefly and are quickly trivialised. Where do you stand on the advantage-disadvantage debate?
- [ ] **Reform**: Markowetz and colleagues praise psycho-informatics as a revolution that kind of solves all problems. Maybe there is a bit of modesty and reflection missing in this paper. Drawbacks and difficulties of psycho-informatics are not treated. They simply assume that the number of social interactions on a smartphone can be directly translated into the severity of depression. Can this connection be so easily made? What are potential drawbacks and problems that psycho-informatics needs to overcome? Where is your position on te advantage-disadvantage debate?
> To what extent is the machine learning community in psychology interested in traditional quality criteria of testing, e.g., validity, reliability, objectivity?
- [ ] - how much do ML-inspired psychologists/computer scientists care about psychometric qualities of iunstruments (validity, reloiability, objectivity)?
> I found Markovetz et al. relying on strawman argumentation while downplaying traditional psychometrics.
**Reform**: They attack traditional psychometrics by attributing
> They also overlook construct validity of their proposed measures (e.g. typos as indicators of stress) and take it as granted.
**Reform**: Markowetz overlook important aspects of construct validity. Can we simply assume that typos are good indicators of stress?
> They also mention that the whole BD and ML are over-hyped buzzwords, but they still treat them as if these terms are not not so over rated amd see extraordinary potential to them.
**Reform**: Markowetz and colleagues make conflicting statements when they say that Big Data and Machine Learning are over-hyped buzzwords but at the same time treat these terms as not over-hyped and see extraordinary potential.