Data Dojo Würzburg 34

DataDojo@Lunch - live

June 2025

  • When: Wednesday, June 18th, 2025 at 11:00am until 12:30pm (90 minutes)
  • Where: CCTB or online (ask for hybrid setup and link)
  • Info: DataDojo Website, Repo

Towards DataDojo 2.0

In the CCTB general assembly in April 2025, we decided to make some changes to the DataDojo to make it more valuable and fun. This is what we'll try first:

  • everyone reads the two-page Points of Significance paper Importance of being uncertain before the event
  • on May 7 we meet at 11 in the large seminar room (regular seminar time) and discuss the content of the paper for roughly 15 minutes
  • then we split into pairs - each pair works together on one machine to reproduce one of the figures (probably Fig. 2) for roughly 60 minutes - I will provide the required data
  • then we meet again and every pair shares their result and lessons learned. This takes another 15 minutes

So, as before, it will be hands-on and 90 minutes, so feel free to bring your lunch. But it will be more focused and with smaller groups.

Assign pairs

Currently a dumb version (does not take experience level and language preferences into account).

"Caro Sascha Axel Joana Felix Jannis Magdalena Markus"
⍉⬚""↯⊟2⌈÷2⊸⧻°⍆⊜□⊸≠@ 

Dataset

PoS - Error Bars

Please read Error Bars before the dojo.

Comprehension Questions

Make them answerable with yes/no or similar to make the first part more interactive

  • what are the three most common types of Error bars?
  • which of the error bar types does not tell us anything about our measurement uncertainty?
  • True or false:
    • if the s.e.m. error bar of a sample does not include 0, then the mean of the underlying population is significantly different from 0
    • if the 95% CI of two samples overlap, the difference between the means of those groups is not significant
    • for large sample sizes (n>15) the length of the 95% CI is roughly twice of the s.e.m. length
    • if we take two samples with same sample size from the same distribution, the mean of the second has a 95% chance to fall within the 95% CI of the first (and vice-versa)

Task

Reproduce Figure 2a using simulations.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

To calculate the width of a 95% CI for a sample of size n drawn from a normal distribution:

  • calculate the standard error of the mean (s.e.m.) for the sample (sd(x)/sqrt(n))
    • (if you use np.std in the calculation, set ddof=1)
  • calculate the critical t-value for n-1 degrees of freedom and a value of 0.975 (depends only on n not on the specific sample)
    • R (qt(.975, n-1))
    • julia with Distributions (quantile(TDist(n-1), .975)
    • python with scipy (scipy.stats.t.ppf(.975, n-1))
  • multiply the s.e.m. with the t-value

Getting started

Further ideas

  • for each sample, calculate the fraction of 100 other random samples of the same size, that falls within the 95% CI of the first - what is the distribution of these fractions if you repeat this many times?
  • reproduce any of the other figures
  • make your figures interactive
  • add explanatory text

Collaborative Tools and Workflow

Use your own device or CoCalc. Free choice of programming language, libraries, and tools.

Future Suggestions

Feel free to add suggestions to the list.

Points of Significance

Go through the papers of the Points of Significance series.

Points of View

Go through the papers of the Points of View series.

Medical Statistics

The Medical Statistics series consists of 14 reviews in the journal BMC Critical Care

Book club

Go through chapters of one of these books

Coding Dojo Katas

With a stronger focus on coding rather than data analysis, there is a nice collection of Katas