Negligence(過失)

# Negligence(過失) [toc] ---- ## Outline 1. Two kinds of Negligence - In scientific analysis: e.g., [債務時代的增長 (Growth in a Time of Debt)](https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt) - In scientific design: e.g., lack of the reports about randomization and blinding; tolerance of low-powered/small sample studies. ---- 2. Methods to detect negligence in scientific analysis - statcheck: aduit 8 psych journals between 1985 and 2013. - GRIM: audit on Festinger and Carlsmith's 1959 "cognitive dissonance" paper. - John Carlise's error spoting tech: audit on Yoshitaka Fujii's results. ---- 3. Potential negative concequence caused by negligence in scientific design - Contaminated immortalised cell line in bio experiments. - Animal studies did not regularly report how to run randomization and blideness. --- ### Negligence in scientific analysis - [Reinhart and Rogoff(2010)](https://www.aeaweb.org/articles?id=10.1257/aer.100.2.573) 總計全球各國年負債-GDP比例，指出比例超過90%，GDP成長率就會變成負值。 - 主張獲得知名政經學者宣傳，先進國家財經首長踴躍引用，擬定國家政策。 - 發現數值錯誤後，修正得到即使比例超過90%，GDP依然成長，只是幅度趨緩[(Shuchman, 2013)](https://www.forbes.com/sites/realspin/2013/04/18/that-reinhart-and-rogoff-committed-a-spreadsheet-error-completely-misses-the-point/?sh=382dc83a37e2)。 ---- #### 示範影片 {%youtube ItGMz0ERvcw %} ---- #### [statcheck](http://statcheck.io/)檢測心理學期刊論文的數據瑕疵 - [Nuijten, Hartgerink, van Assen, et al. (2016)](https://doi.org/10.3758/s13428-015-0664-2) 上圖~各期刊至少有一項p值計算錯誤的平均比例；下圖～p值計算錯誤導致結論不一致的平均比例 ![](https://media.springernature.com/full/springer-static/image/art%3A10.3758%2Fs13428-015-0664-2/MediaObjects/13428_2015_664_Fig3_HTML.gif =400x) ---- #### [GRIM test](https://en.wikipedia.org/wiki/GRIM_test)挑出長達半世紀的錯誤 - 認知失調(cognitive dissonance)的經典實驗[(Festinger and Carlsmith, 1959)](https://psycnet.apa.org/doiLanding?doi=10.1037%2Fh0041593)。 ![](https://i.imgur.com/NVMp8Fc.png) [Matti(2016)](https://mattiheino.com/tag/social-influence/) ---- 個位數字 | 除以20 0 | 0.00 1 | 0.05 2 | 0.10 3 | 0.15 4 | 0.20 5 | 0.25 6 | 0.30 7 | 0.35 8 | 0.40 9 | 0.45 ---- #### [Scientists Make Mistakes. I Made a Big One.](https://elemental.medium.com/when-science-needs-self-correcting-a130eacb4235) |![](https://miro.medium.com/fit/c/131/131/0*7Rr80OqJ8zu9tdk4)|![](https://i.imgur.com/ZPOQYVv.png)| |---|---| ![](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.mm.bing.net%2Fth%3Fid%3DOIP.X-b4YqzGcczTg-rnYLBKTgHaE5%26pid%3DApi&f=1 =300x) ---- ### 為何需要開發數據偵錯技術？ - 大多數學者的論文發表並未開放資料；違反"科學知識的公共性"(communism) - 藉由隱藏部分資料，誤導同行專家的評估；違反"有組織的懷疑"(organized skepticism) - **如果所有學者都能公開資料，有必要開發數據偵錯技術嗎？** --- ### 科學社群慣例導致的常態性過失 ---- #### [Non-random sampling detection method](https://associationofanaesthetists-publications.onlinelibrary.wiley.com/doi/full/10.1111/anae.13938) - Carlisle(2017) - 5% 的麻醉醫學研究並非真正的隨機樣本資料 ---- ##### 癌症細胞株培養技術 - 1950年代發明，用於開發抗癌藥物的實驗樣品。 - 發明後許多實驗室採用，卻累積大量人為操作或樣本污染造成的數據錯誤。 - 多位學者組成的[工作小組(American Type Culture Collection Standards Development Organization Workgroup ASN-0002)](https://www.nature.com/articles/nrc2852#citeas)進行回顧研究，2010年發表的報告表示錯誤已經**積重難返**。 ---- - 中國的癌症細胞株有多"雷"? [(Huang, Liu, Zheng, and Shen, 2017)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170384) - Stuart Ritche追蹤2010年後工作小組的努力，發現這群學者參考空難失事調查的方式，重建癌症細胞樣品研究數據錯誤的原因，改善整體研究環境的作業品質。 ---- ##### 動物實驗操作慣例隱藏的設計瑕疵 - [Macleod et la.(2015)](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002273) 調查149份動物實驗研究的操作內容。 - 只有25%的文獻報告隨機化方法細節。 - 約30%的文獻報告如何實施雙盲測試。 - 只有0.7%報告使用的實驗動物數量。 - 動物實驗通常樣本數不超過20 - p-hacking的源頭 --- ### 低樣本數的問題 ---- - 互動式說明:[Distribution of Cohen's d, p-values, and power curves for an independent two-tailed t-test](https://shiny.ieis.tue.nl/d_p_power/) - 建議設定：Cohen's d (δ) = 0.5; alpha = 0.05; p-value (lower limit) = 0. - 逐步調整 Participants per group = 20 30 40 ... 100 - 留意此句描述中的數字變化： "you can expect XXXX% of p-values to fall in the selected area between p = 0 and p = 0.05 ." ---- - 思考互動式說明學到的事：如果有一套需要實驗驗證的差異是0.5，你會相信那一系列的實驗？ - 100件各組樣本數20的實驗結果 - 100件各組樣本數40的實驗結果 - 100件各組樣本數60的實驗結果 - 100件各組樣本數80的實驗結果 ---- #### 心理學研究的效果量有多高？ ![](https://www.frontiersin.org/files/Articles/442717/fpsyg-10-00813-HTML/image_m/fpsyg-10-00813-g003.jpg =600x) [Schäfer and Schwarz (2019)](https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00813/full) ---- #### 心理學研究的考驗力有多低？ ![](https://i.imgur.com/nAQFkhf.png =600x) 不同領域的研究低於指定考驗力的累積分佈。取自[Szucs and Ioannidis (2017)](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2000797) ---- #### 充斥低考驗力研究造成的問題 1. 高估低效果的研究主題 2. 增加不顯著結果被拒稿的機會 3. 提高文獻發表偏誤 ---- #### 人類特質遺傳度研究 - **5-HT2a**與人類記憶表現的關聯性 [(de Quervain, Henke, Aerni, et al., 2003)](https://www.nature.com/articles/nn1146) - Stuart Ritchie表示：2005年左右**5-HT2a**是學術界熱門話題，2014年乏人問津。 - 十年來遺傳學研究技術昇級，可收集的樣本數增加。後續研究採用考驗力更高的GWAS方法[(Marioni et al., 2018)](https://www.nature.com/articles/s41398-018-0150-6)，確認2003年的研究高估關聯性。 --- ### 科學研究分析與設計的過失從何而來 - 科學家養成過程形塑的習慣 - 論文同儕評審系統忽視/未察覺諸類過失 - 只在乎論文論點能否說服評審，少關注資料正確性及設計瑕疵 - 研究社群文化偏離[科學精神](https://hackmd.io/@tcpsr/H1JWzwNyd#/1/6) --- #### Glossary stylised fact (典型化事實) at wiki: https://en.wikipedia.org/wiki/Stylized_fact ###### tags:`2021 Book` `Journal club`