# Mito reads filtering with FastK <!-- Put the link to this slide here so people can follow --> slide: https://hackmd.io/YvPVQo0MQKqagtPLTY0VKQ?both --- A few slides on dealing with mitochondrion HiFi reads --- ## A good example: ![](https://i.imgur.com/kHjmvQ4.png) --- **Mito reads selected by mapping against a close-related mito** sum = 7237237, n = 816, largest = 16347, smallest = 2320, N50 = 10613 ![](https://i.imgur.com/D6hdQ4z.png) --- Make a dict of the kmer occurences within a read: ``` 1 1302 2 428 3 330 ``` ![](https://i.imgur.com/lGWnR97.png) --- Make a rule: Filter out reads that have too many occurences of low covered kmers. In the next case: filtered out reads if more than 200 kmers occur 1 to 20 times in the reads --- --- ![](https://i.imgur.com/wrliNAY.png) --- - Sea start mito with frameshift before filtering - No frameshift after filtering --- - passed_no.py - code to get only the "passed" reads But don't think is the only and best filter. Let's look at some other cases: ![](https://i.imgur.com/javpgYw.png) --- Using the same filter for this species ![](https://i.imgur.com/JMrYOCU.png) --- So the best filter would be: - separete the reads around the different peaks (studying: how to automatically determine the peaks, which ones, etc..) - Then plot profiles again and remove outliers ![](https://i.imgur.com/bGXRyXu.png) --- Buttt.... how about the bee case? https://hackmd.io/SdEMPFz0S1GTZH6ZJm3WqA?both=# --- Plants (separating chloroplasts and mitos) https://hackmd.io/XnCMhNDSSwiX7jtHdp2f9g --- Further ideas: Calculate the median of the kmer coverage in each read? --- All mito kmers coverage ![](https://i.imgur.com/BnpoYCO.png) ---
{"metaMigratedAt":"2023-06-16T03:37:31.249Z","metaMigratedFrom":"YAML","title":"Talk slides template","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"b707d883-ca75-4770-acc3-2f82c1c92f1e\",\"add\":2651,\"del\":3176}]"}
    226 views
   Owned this note