owned this note
owned this note
Published
Linked with GitHub
# Day *2* Q&A
<!--- I remind you that these documents will be uploaded to the repository branch that will be created and that the NBIS training code of conduct should be followed. Be respectful to each other so you do not edit others posts. Hack md alows for simultaneous editing. -->
## General questions
- Q: The schedule shows their is no coffee break, is that true?
- A: You'll get coffe breaks :)
- Q:
- A:
- Q:
- A:
- Q:
- A:
---------------------------------------------------
## Intro & fundamentals
- Q: Could you give an example, like google searching engine, what type of algorithm G uses?
- A: They use a graph based algorithm. It's called pagerank: https://en.wikipedia.org/wiki/PageRank. This seems to be only one of which is being used nowadays.
- A: A small comment is that not all displayed pages follows the algorithm but that is part of their buissness model.
- Q: Find largest POSITIVE integer?
- A: I think in this example, yes but in general this could be used for anything that is “comparable”. In python for example, this means the object needs to have a `__lt__` (less than) dunder method implemented.
- Comment: de Bruijn Graphs are used in genome assemblies
- Comment: How Fast Do Algorithms Improve?: https://ieeexplore.ieee.org/document/9540991
- Comment: Dijsktra's two-stack algorithm is a very simple and cool algorithm for evaluating mathematical expressions, just using two stacks in a clever way: https://switzerb.github.io/imposter/algorithms/2021/01/12/dijkstra-two-stack.html
- Comment: One of the postdocs in my previous group wrote an algorithm that can solve the electrostatics of N particles linearly (though it is typically order N^2 because they 'speak to each other'). More info can be found here; https://www.sciencedirect.com/science/article/pii/S0021999120301534. We are now trying to implement it in my new group in Uppsala too, in case you're interested in knowing more let me know; thijs.smolders@kemi.uu.se.
- Q: at around what N do we usually see stack overflow here? (recursive fibonacci sequence)
- A: It depends on the memory of your computer. Some programming languages also have a "maximum recursion depth" which is a guard against stack overflow. /Matias
- Q: If big-O notation is based on the algorithmic steps, written in psuedo-code - how can we know for use what constitutes a "single" step? I.e. looking up the index of a list was just described as a single step, does this hold true for the way the computer processes the request?
- A: Good question! In big-O notation we are interested in what "step" dominates the overall complexity. In an iterative algorithm this usualy means how many times the inner most loop is being executed for a given N. In the case of looking up the index of an array, this is a constant operation, on the data-structure array. However, resizing when new elements are added or removed to an array is often done using an algorithm where e.g. the array is doubled each time it gets full. This means we don't need to copy N items each time a new element is added. Does this sort of answer your question? /Matias //I think so!, thanks
## Data structures
- common data structures:
- Class
- Dictionary
- array
- heap
- graphs
- red-black tree
- linked list
- matrix
- hash table
- set
- queue
- binary tree
-
- Q: Will you consider 3-d array also a type of data structure?
- A: Yes I would say so, it's a generalization of 1-d array. In the machine learning literature it would probably be refered to as a tensor data-structure /Matias
- A: As above a 3-d array could also be seen as a data cube as extension of a matrix which is a 2-d array, arrays can ofcourse be of n:th order data structures as arays of arrays and so on.
- Q: In one application, can we combine multiple data structures? E.g., can I use both tree and array at the same time?
- A: can you ellaborate what you mean, in genneral an application will contain multiple data structures.A given datastructure can usually be nested such as a Node in a binary tree could be an array, more common would be a Class contains properties which in them self can be classes or other data structures, and the same data could be represented in multiple data structure, but in genreal when you start doing data structure conversion one needs to be careful of what design one is implementing
- Q:
- A:
- Q: A tree is a graph, earlier you said (if I understood correctly) a graph was a data type, how is a tree a data structure then?
- A: You are correct in that a tree is a type of Graph. In particular, it's a graph without any cycles. I think what Marcus meant (but he might correct me) is that a Graph is a very general mathematical object. I would however call both a Graph and a Tree a data-structure. That then can have different implementations. For example, a graph can be implemented using a linked list, adjecency matrix or adjecency list. /Matias
- Comment: Amortized cost is the expected time complexity of an algorithm for a sequence of operations. The cannonical example is array resizing. /Matias
- Q: Why is a matrix a data structure and not a data type? Especially if it is just represented as a array in the memory?
- A:I the literature for example the book Intermediate Problem solving and data structures by Helman et.al the two are used interchangably, Knuth (The art of computer programming, vol 1 Fundamental Algorithms third ed.) calls them informations structures. In generall I would clasify a Data Type as an implementation of a Data Structure. One could also argue that data types are primitives, or that a data structure needs to have a realtion to other data. /Lars
- A:
- Q:(talking about binary trees)does this relate to depth-first vs breadth-first search?
- A: yes
- A: They can also be used in more general graphs. In a directed and unweighted graph BFS will also give you the shortest path, in linear time, between all vertices (Faster than Dijkstra's) :) /Matias
- Q: this is a super primitive question, so sorry about that, but what is the definition of "complexity" in the assignment? Also, what kind of information can I take as a given for a tree data structure?
- A: Not at all primitive! It should be understood as time complexity, i.e. how long time it takes to run the algorithm. A binary tree is a graph without cycles, where each node has at most two children. /Matias
- Q: Maybe trivial, but when we talk about time complexity are we referring to runtime? or an abstract mesure?
- A: Yes we are but in a mathematical sense, as in how many times something that takes some constant time would be executed by your computer. The actual running time will of course still vary depending on architecture and programming language. However, the beauty of mathematical analysis like this is that if you can for example show that an algorithm takes order of $2^N$ time then it will probably be prohibitively slow independet of other factors. /Matias
- Comment: Menti does not show indentations
## Searching
- Q:
- A:
## Sorting
- comment: https://www.youtube.com/playlist?list=PLOmdoKois7_FK-ySGwHBkltzB11snW7KQ How to visualise some different sorting as Hungarian folkdance. (Less risk of death from audio overload)(Last one on the list is this representation)
https://www.youtube.com/watch?v=ibtN8rY7V5k
- comment: link to sorting canvas page https://uppsala.instructure.com/courses/69215/pages/sorting?module_item_id=611182
- Q: Why is the condition i > 0 necessary in the insertion sort pseudocode? As it is, it seems to me that i will always be greater than 1, am I missing something? Did you mean i = i - 1?
- A: You're right (I think). I believe this is a mistake caused by combining the computer science and mathematical definition of the first element of an array i.e. if it should start at 0 or 1. If the outer loop is instead set to cover the range 1..len(array), this code should run in e.g. Python. It should be i = j - 1. /Matias
- Q: In the heap sort pseudocode iRightChild is not used. Is child+1 the equivalent?
- A: Yes it is. In general, for a heap sorted array the index of the children of node $i$ is $left = 2i + 1$ and $right = 2i + 2$. /Matias
- Q:
- A:
- Q:
- A:
- Q:
- A:
## BLAST
- comment: In the *Needleman-Wunsch* pseudocode for calculating the alignment, the call to calculate F which means that it is easy to make the assumtion that the complexity of the sequence is more complex, but the question was regarding the comapring caclulating F and assembling the sequence once the F had been calculated.
- Q: This question is not really related to the algorithms themselves, but about the resulting alignments from the *Smith Waterman* or *Needleman-Wunsch* algorithms, can we compare the scores between different alignments? What would be a "good" score? Is it decided by a rule of thumb?
- A: Marcus: You can absolutely compare the score between different alignments, because this is how you determine which pair of sequences are a "match". In constructing phylogenetic trees, many matches are made and their scores are used to reconstruct evolutionary distance (kind of). As to what is "good" — that depends on the specifics of the case. An actual bioinformatician will know more.
- A: Jonas: As long as you are using the same algorithm, a good score will be higher for a better alignment compared to another worse alignment. What makes a score "good", however, you can see by aligning it to itself to see the maximum score. Scoring values and sequences lengths make it hard to determine an objectively "good" score.
-great, thanks!!
- Q:
- A:
- Q:
- A:
- Q:
- A:
## Parallellism
- comment: Merge sort in parallel https://www.mcs.anl.gov/~itf/dbpp/text/node127.html
- Q: Now the SOC architecture becomes more popular, will it be helpful for us to worry less about memory management issue ? Like M1-ARM
- A:
- Q:
- A:
- Q:
- A:
- Q:
- A:
## Questions above this line
-----------------------------------------------------------------
# Day 2 feedback
- F: Will be fairly necessary to talk about multi-platform tomorrow, cover some basic principle for instance. Because I am also have collaborative project with NBIS, found cross platform is actual a matter. (I am quite technically prepared with Unix and Linux, but within the team, we have Windows users as well.)
- F: Good pacing, but a bit short. Could have done with some more examples of how to think about algorithms and calculate complexity.
- F: