changed 2 years ago

Data Dojo Würzburg 17

DataDojo@Lunch - live

October 2022

  • When: Thursday, October 13th, 2022 at 11:30am until 1:00pm Wednesday, October 5th, 2022 at 11:00am until 12:30pm (90 minutes)
  • Where: CCTB and online (CCTB Seminar Zoom Link)
  • Info: DataDojo Website, Repo

Participants

Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.

  • Markus (python/scikit-learn)
  • » Add your name here «

Dataset

Machine Learning Series

We are doing a series of Data Dojos on machine learning. The task is to classify tree species by their traits (e.g. height, stem diameter, geographic location).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
We use a subset of the recently published database: Tallo

The full dataset contains measurements for almost 500k individual trees from more than 5k species.

In the first dojo of the series, we filtered the full set to 3 species with reasonable overlap (Fagus sylvatica, Pinus pinaster, Quercus ilex). Now we want to try different Machine Learning methods to classify tree species from traits.

In the second dojo we created our first models. A very simple "Majority Vote" model and some K-Nearest-Neighbor (KNN) models with scikit-learn.

In the third dojo we explored the effect of scaling on the performance of the KNN models.

Session 4 - Decision Trees

Question Pool:

  • Generic
    • What is supervised machine learning?
    • How to evaluate the performance of our model(s)?
    • What kinds of (classical) models exist?
  • Specific
    • Manually build a decision tree
    • Create and evaluate a decision tree model in scikit-learn
    • Visualize a Decision Tree
    • Add your own questions
  • Further Ideas
    • TBD
    • Add your own ideas

Collaborative Tools and Workflow

For Notebooks (R, python, julia, js, ) with real time collaboration CoCalc seems to be the best option right now. It worked great the last couple of times so we'll stick to it for now. You need to register an account there (it is free).

Future Suggestions

Add your suggestions to the list and

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
to the end of a line you are interested in

Data Sets

Tools/Languages

Skills

  • interactive maps
  • dashboards
  • animations

Data Sources

all data types are welcome, including tables, images, videos, sounds, DNA,

Select a repo