Thèse YE - HackMD

# Thèse YE ## chapter 1 line 15: explain the interest of virtualization. It is not only for leegacy software applications. Explain also that you can have RT guest OS and non RT guest OS line 30: Talk about several OS instead of RTOS only. The problem that you explain is also encountered in VMs. line 32 and line 28 have the same idea ! line 35 : Why are we taling about IoT ? There are a lot of domains that can make use of virtualization line 38: I don't understand > thereby significantly optimizes the use of resources and contributes to improve computing performances. ## chapter 2 line 10 What is the role of this sentence ? It does not bring anything here ... > Even though hardware context switch has been a research subject for over a decade, there are still some challenges that need to be overcome, especially in terms of preempting hardware accelerators. line 17: I dont understand >In addition to existing Central Processing Units (CPU), applications parallelism is also possible with the supports from appropriate computer hardware peripherals > line 22: Is the FPGA a hardware accelerator or an IP implemented in the FPGA ? line 26: I guess you can find more references and more pertinent ones related to reconfigurable architectures line 28: I dont understand >An FPGA is an integrated circuit that provides the probability of digital circuit implementation after fabrication > line 40 Not always true.... > FPGAs configuration in a reconfigurable architecture is typically performed by a CPU. > line 54 : I dont see the logical link between paragraph 54 and 55 and 56 line 57 already said ? why here ? >These architectures enable high-performance applications through interleaved hardware and software execution. Coupling is very important because slow communication can severely affect the potential advantages of acceleration\cite{Vipin}. line 82 until 103: There is no link with the previous paragraphs. I dont understand the logic and the interest line 127 : This could be transfered into the introduction line 171: I dont understand the following sentence > Today, the term “virtual machine” is used for static architectures which provide support for accelerators and are often referred to as Hypervisor for virtual FPGAs (vFPGAs) \cite{Knodel}. > line 179: link with next section DPR Resource virtualization is not clear line 211: I don't understand >Works in such domain tend to consider the FPGA accelerators as static coprocessors that servers for multiple virtual machines > line 298: Open contradiction >A computer with a CPU core can obviously perform multiple tasks at the same time, that is, software tasks run in parallel. But that is not the case. > line 327: I dont understand >Due to the reconfigurable feature of FPGAs, in a reconfigurable architecture, it is common to reuse a third-party or legacy IP because most FPGAs are dedicated to specific functions > line 337: I dont understand >Such FPGAs have more complex circuits, which is essential for their functions as hardware IPs or accelerators in reconfigurable computing systems. line 413: I dont understand > Since the readback technique cannot extract a completely relevant context, resulting in a large data footprint > line 487 ???? >Unfortunately, these existing methods allow multiple hardware tasks preemption but operating systems are not supported. > ## Chapter 3 line 10: I dont understand >including extensive factors into consideration > line 94: not clear > Ker-ONE is responsible for guaranteeing real-time constraints with no or at least minimal modification of the original RTOS scheduling settings. > line 236: I dont understand > In this section, we discuss how to efficiently explore the design space for appropriate designations. > line 499: I am not sure that it is true > This is because the amount of resources in a PR is linearly related to the size of reconfiguration bitstream. > line 1095: not a sentence > based on the new task model, using a schedulability-check tool to check the updated real-time tasks > line 1155 : The example is not really clear. Can we describe it a little bit more with real IPs? ## Chapter 4 ### general remarks Globally, I think that this chapter is difficult to read. I had difficulty to understand the problem here. 1. What is the problem we try to resolve exactly ? 2. What is the role of Ker-one in all that (especially for DMAs)? Maybe you need to summarize the pb in the beginning of the chapter by explaining that several VM may need to access some resources at the same time. For the DMA we need to show the pb of several VM trying to access to resource at the same time. What happens in this case ? Can we preempt the IP ? What about the DMA in this case ? Which information needs to be stored when switching from one task to another ? How is it handled ? ,etc., etc. ## specific comments line 32: We talk about VM but before only simple OS. Maybe we need to talk about guest OS in case of virtualization ... line 47: A link is missing before this paragraph. line 77: is the approach only compatible with ARM ? I guess you can say that it is a more generic approach and give an example with the ARM platform line 98: I think we need to explain what TTC and AXI timers are. THis could be explain in this chapter or in chapter 2. line 100: I dont understand > Although this TTC timer virtualization mechanism slightly increases the VM switch overhead, this mechanism is still preferred for GPOSs since it avoids frequent hyper-calls or traps and facilitates the VM timer emulation. ???? >For example, in addition to TTC timers, a set of hardware timers may be available in the PS part/OS. > line 122: Already said in Line 120 > This mechanism makes it possible to avoid the overhead caused by saving and restoring the timers context i.e. their registers > line 300: not clear > For example, the output can be re-routed to another hardware accelerator or to another external device (such as chip to chip data exchange), without the need of going back and forth to the main memory. line 312: it is not a sentence >The virtualization mechanism which provides the benefits of AXI DMA in the PL part transfers without need for CPU intervention and supports data-dependent memory transfers. > line 322: not clear > Once the VMM receives the hyper-call, the data stream is exposed to the AXI DMA as a DMA buffer for fetching data and writing results back. line 324: I dont understand > Alternatively, there is still data in the AXI DMA buffer has not been processed, which means that the request has to wait for the AXI DMA. > line 364: What do you call custom IP design ?? Is it a generic interface ? line 387 What is exclusive approach ??? >we should note that the AXI DMA devices with an exclusive approach are not fully used, due to the waiting times for the end of an algorithm, or when the input/output buffers of an interface are empty. > line 419 I dont understand >Two concatenate DR blocks mean that the DMA read operation occupies two timing units. line 436 I dont see the half of the current data that on the figure > For example, as shown in the figure, the time to start the DMA read operation of the next data block can be set when the hardware accelerator finishes half of the current data block computation > Figure 4.13 : Solutions to solve the issue. According to this figure, it seems that solutions1 or 2 are better than 3 because data are processed more rapidly .. or you say exactly the opposite. ## Chapter 5 ### Global comments This Chapter is OK to me ### Specific comments line 22: I think this has to be in the related works chapter, not here line 27: This is not a sentence : > First of all, considering that the proposed priority-based preemptive round-robin scheduling policy is applied in Ker-ONE, thereby enabling preemptive scheduling of FPGA resources among multiple OSs line 27 : This is not a sentence : > Since some FPGAs have the ability to dynamically replace HW accelerators, in the same reconfigurable region > line 47: to remove or replace > As mentioned in the paper earlier > line 49: To put in the related works chapter ? line 148: Why talking about SD-CARD? line 197: I dont understand >In order to prevent overflow of the receiver FIFO, PCAP has to transfer this data fills up the receiver FIFO > line 199: not well said >When readback continuously, due to the lack of data flow control on the PL part of the PCAP, the DevC DMA must have sufficient bandwidth to deal with the PL readback. > line 201 : ???? >It means that when specifying source and destination lengths of the data transfer must be taken into consideration. ## Chapter 6 equation 6.4.3 and 6.4.4 I dont understand the rows and columns of the matrix >Ye: equation 6.4.3 and 6.4.4 ： the rows of the matrix represent：PRs the columns of the matrix represent：functions For equation 6.4.3: RTOS relies on 4 hardware accelerators to fulfill its computation, so there are 4 columns. For equation 6.4.3: GPOS only relies on 3 hw accelerators. line 1028: Define $V^P$, $E_{rt}$ >Already defined in Chapter 3. The vertex set of graph as $\mathit{V}=\{\mathit{V}^{f},\mathit{V}^{p}\}$. $\mathit{V}^{f}=\{\mathit{v}^{f_1}, \mathit{v}^{f_2}, ...\}$ include the vertices of function set $\mathcal{F}$, and $\mathit{V}^{p}=\{\mathit{v}^{p_1}, \mathit{v}^{p_2}, ...\}$ stands for the set $\mathcal{P}$. > The edge set E is a set of undirected lines existing between function nodes and PR nodes Figure 6.7: p1 means $pr_1$ ? >Yes line 1049 : We need an explanation for (0.6 $\mu$s = 0.50 + 0.05 + 0.05 Where does it come from ? In which case are we ?) line 1072: not clear >The major difference between the Tian's work and this thesis is in the preemption overhead($T_{preempt}$), since in this work, the preemption overhead depends on the algorithm of accelerators > line 1072: ??? >Besides that the influences of the positions of the consistency points have to be considered, which determines the worst-case waiting time before an accelerator is successfully preempted. equation 6.4.5: All the terms need to be defined $c^* + \max\Big\{\Delta(v^f_{rt,1}, v^f_{gp,1}), \Delta(v^f_{rt,1}, v^f_{gp,2}), \Delta(v^f_{rt,1}, v^f_{gp,3}), \Delta(v^f_{rt,1}, v^f_{rt,2}) \Big\}$ line 1127: not clear to me... sorry. I dont really understand the three steps and the associated graphs line 1190: It looks like the summary is the sumary of chapter 3 ! ## Chapter 7 line 20: not a sentence > As FPGA devices have demonstrated their capability to provide high performance and accelerate heavy computation. line 23: What about schedulability studies in the contributions ?