A20 Core notes

# A20 Core notes ## Instruction Unit: The most important Parts in the Instruction Unit is the instruction fetch Unit, the slice Unit, two branch history tables, the completion Unit and a branch target buffer table. ### Instruction fetch unit: the main component of this unit is the instruction cache unit, microcode unit, a branch prediction table. #### Instruction cache unit: The cache unit has two seperate cache table. One for directory cache and one for data cache (tri_128_34_4w_1r1w for directory and tri_512x162_4w_0 for data). other component of the cache Unit is cache miss handling and cache select unit. ##### about ERATS In the Instruction cache Unit there is a shadow translation lookaside buffer(Ierat), which translate effective address to real address (iuq_ic_ierat). Shadow TLB shadows the main TLB.The A2O has 2 of them. 1 for data accesses and 1 for instruction fetches. These shadow TLBs are a copy of the subset of the main TLB. It is used to reduce the latency of the address translation operation and to avoid contention in main TLB between instruction fetches and data accesses. It is implemented with content-addressable memory which virtual address is key and physical address is the result. The Erat is operated in least-recently-used fashion. ##### Some more info for address translation: The virtual address has a length of 88bit while the effective address has 64bit. The real address has a bitsequence length of 42 bit, which can up to 4TB main memory. The real address will be generated from the MMU, as part as the translation process from the effective address. more in manual p.51 or p.60, p.176 #### Microcode unit (iuq_uc): It is used to implement higher level instruction other than the machine code. in the microcode unit, these instruction will be translated into machine code instructions. IBM uses microcode also as a synonym for firmware. there is a read only memory for even and odd address of the microcode. #### Completion unit (iuq_cpl): This unit receives signal from lq, xu, axu if the instruction has been completed or if some spr has been activated. It also send controlsignals to the fixed point unit. register rename also sends its signal to it, so the completion unit knows with execution has been completed. #### Slice unit (iuq_slice): To have an well performed out-of-order Processor, a dispatcher and a register renamer is necessary. In A2O, these units are integrated in the Slice unit. There is also an instruction decoder, which does the first level decode so the Instruction unit knoew where to distribute the instruction to the reservation station. ##### Register rename (iuq_rn): To enhance the performance of the out-of-order processor, a register rename unit can be implemented to separate the depency of the registers. this can be done before the instruction dispatch. ##### Dispatcher unit (iuq_dispatcher): the dispatcher collects instruction from Instruction register rename unit and send it to the reservation station. It can dispatch up to 2 instruction per cycle. ### Connections of signals in instruction unit: after the instruction is fetched, the instruction will be going to the decoder,then register renamer and dispatched to the reservation station. To keep track of which instruction has been completed, a completion unit will receive signals from the other execution units. #### other component in Instruction fetch unit: #### iuq_spr: calculate some special purpose register like cpcr's but will only used in the instruction unit itself. cpcr's only count credits. **strange thing:** in manual there is only 3 cpcr but in iuq_spr there are 5 cpcr. #### iuq_ram: **not sure what this unit does** ## Execution Units: ### Fixed-point Units: There are two Fixed-point Unit (FXU) xu0 and xu1. While xu0 has specific module that can handle much more complex instructions such as multiply and divide, the xu1 has only an ALU unit to handle addition, subtraction, logical function and rotate. Each Fixed-point Unit has an decoder and a bypasser. Rotator: Rearange the bit z.b LSB to MSB Each FXU has an bypass module, which is used for pipeline bypassing. Bypassing:f.e. If the next instructionexecutionstage needs a value from the last executionstage, but this stage hasn't written in the memory yet, the bypasser can still pass the value to the current executionstage. #### Module for Adder for each FXU: xu_alu_add #### Module for rotate for each FXU: tri_st_rot #### Module for logical function tri_st_rot_ins Module for logical function is located in tri_st_rot_ins which is a submodule of tri_st_rot. #### Decoder in fixed point Unit (xu0_dec, xu1_dec) In decoder module, it is listed what instruction the FXU can execute and which bits of the instruction register is responsable for. It also forwarding these instructions to the component that is responsable for that instruction. **question about decoder in xu:** it says that for simple add and sub (p.46) it needs 1 delay cycle, but it is possible that simple add and sub will be executed in a later stage by adding a latch to the signal to buffer it to the next cycle. why? #### Bypass in fixed point Unit(xu0_byp, xu1_byp): The Bypasser in this core is a "fully bypasser", which means it forwards the result to every possible stage of a pipeline. The forwarded values are the results of the calculated value in the corresponding fixed-point unit. **not sure: It also looks like the actual value f.e dividend and divisor value is passed by the bypass instead in the instruction itself.** #### Divider (xu0_div_r4): **question about the divider:** But the module itself doesn't loop at all, eventhough a commented c-code is inside for the algorithm, i don't understand how the divider works at all. It also says that the divider have to recirculate the bits for the calculation, but there is no connection to the rotate module at all. #### multiplier (tri_st_mult): **question about the multiplier:** it says in the manual that the multiply unit is pipelined but has a single-cycle throughput and single cycle latency for 32x 32bit multiply operations. However on p.46 it says that multiply latency is 3-6 cycles. does this make sense? #### General purpose register file (xu_gpr): is xu_gpr general purpose register file.**not sure why is it saved in two 144*78 module. should be 32 register a 64 bit but it doesn't fit in this register file module at all. p108 states that gpr has 32 register, p.80 for register list per thread.** #### Special purpose register file (xu_spr): xu_spr registerfile for special purpose register, has core specific SPR and thread specific SPR. the core specific spr calculate the control registers and distribute it. #### Difference of xu0 from xu1: xu0 has much more feature: -two cycle comparator xu_alu_cmp -a counting leading zero module tri_st_cntlz (1 cycle) -a population counter to determine the power of 10 tri_st_popcnt -module for determine left most zero byte xu0_dlmzb -converting module for binary coded decimal (BCD) to density packed decimal (DPD), some IBM computer uses this decoding for their alphabet coding or for storing xu0_bcd -Divider module. it needs 65 cycles for come divide instructions. It loops itself with register ex3_cycles_q ex3_cycles_d and decrement with ex3_cycles_din xu0_div_r4. -a module for multiply tri_st_mult -a branch module which calculate the branch instructions xu0_br -for bit permutation with a mask, there is also an module just for that xu0_bprm ### Load/Store Unit It is splitted into 3 big submodule: lq_lsq, lq_data and lq_ctl. Like it's name says, it handles the loading and storing data instructions. It also manages the data cache and D-Erat. ### Load/Store Control Unit (lq_ctl): Although it is called control Unit, it has alot of other feature inside this submodule of lq. The control parts are: lq_byp, lq_dcc, lq_spr. The cache part, the translation table and the decoder is also located in lq_ctl. #### D-Erat(lq_derat): About Erat it is mentioned in the Instuction Unit. The D-Erat contains 32 entries. If a miss happens in D-Erat, the penalty is 19 cycle. #### Level 1 Data Cache Control (lq_dcc): p157 for more information. #### Bypasser in the Load/Store Unit(lq_byp): mostly same as in the FXU. Here it forwards the result of a load instruction. #### Decoder (lq_dec): Decodes the instruction further from iuq_idec, which is first renamed, than dispatched to the reservation station and finally comes here. The different Load/store instructions there. nearly every signal is going to lq_dcc. Signal output is mainly 1 bit to signalise the cache which instruction it really is, instead of using a bitsequence which i personally would expect. #### Special purpose register Unit (lw_spr): like in other Unit, it receives signals and calculate special purpose register, which is sent to other unit. But here it doesn't have any store unit inside. #### Level 1 Data Directory Wrapper (lq_dir): **not sure: I would assume that the data cache can also do a directory search like the instruction cache can do. but the structure of lq_dir is not really the same like in iuq_ic_dir. It has directory array, valid register array, a flush generator and an address generator adder.** The directory Array is initialised outside of lq_dir (tri_64x34_8w_1r1w) #### Prefetcher (lq_pfetch): The prefetcher is used to load data from the main memory to the cache, There is a Reference Predictor Table inside the lq_pfetch. I would assume this table is used for faster loading from main memory to cache. #### Load/Store Data Rotator (lq_data): The 32KB cache table is located there. There are also submodule for save rotator(lq_data_st) and load rotator (lq_data_ld). This rotator are like bit rotation in FXU. **not sure what the function would serve in cache storage though**. #### Level 2 Command Queue (lq_lsq): Queues is used to absorb brusts in cache accesses and to maintain the order of memoy operation by keeping all in-flight memory instructions in program order. In case of a store queue, the data is not stored until they reach the retirement point. This is done so, because then it can avoid dependency problems between load and store. A similar function has the Load queue. The load queue is programmed in lq_ldq module, the store que in lq_stq. So in General lq_lsq is used as an interface with L2 cache and to arrange the dataflow it gets from L2 cache. lq_arb is the submodule of lq_lsq, which communicates with L2 cache. The queues is saved in a tri_64x34_8w_1r_1w register array. To arrange the order, it needs to communicate with the reservation station. This is implementated in lq_odq. There is also another submodule called lq_imq, which serves to communicate with the mmu. ### Floating Point Unit I assume it is similar to the FXU, so this will get an investigation later, if I have time. ## Reservation Station (RV) The reservation station is used to reorder Instructions which is dispatched from the Instruction unit. Normally, the Instruction will be renamed inside the RV, but in A2O, this is already done in the instruction unit before dispatch. Each Execution Unit gets it's own submodule of it's specific reservation station. **There is also a registerfile tri_144x78_2r4w which is discribed as Load Queue register file. Why is this register file not inside the lq module?** Like other Unit there is a bypass module in the Reservation Station. ### Reservation Station dependencies (rv_deps): rv_deps collects the signals from the Instruction Unit and distribute it other rv stations. There are 2 decoder for each thread to decode if, the instruction is one of the follows: mulli, mulld, mulldo, mulhd, mulhdu, erativax, ldawx, dcbtls, dcbtstls, lbarx, ldarx, lharx or lwarx and has output brick_cycle and is_brick. These are some special multiply or load instructions. **not sure why though**. There is also a rv_dep submodule inside, which holds the dependency scorecards and second level of itag muxing. **also not sure why because i don't understand itag.** #### rv_station: Each Execution Unit instantiate the module rv_station with different input as reservation station. **need some deeper understanding about the rv_station f.e. why the units are separated and how the work.** ## Memory management Unit (mmq) As mentioned earlier, the Erats are only shadow TLB of the main TLB. This TLB is located in the MMU. The corresponding submodules are: mmq_tlb_lrat, mmq_tlb_req, mmq_tlb_ctl, mmq_tlb_cmp and the TLB registerfile 4 times tri_128x168_1w_0. ### Control Unit of the TLB (mmq_tlb_ctl): Since the MMU has ### Request queue (mmq_tlb_req): Here if D-Erat and I-Erat has a miss, it needs to get the address translation from the main TLB. So a queue is implementated to structure the requests. This module however is not responsable to send the requested addresses back. ### Logical to real address translation (mmq_tlb_lrat): This is a Translation from logical address to real address. **not sure why is there a logical address and what it is used for.** ### MMU Invalidate Control Logic (mmq_inval): This module returns the real address to the load store unit and also send signals to the debug unit which is inside the MMU, if some invalide things happens. ### Special purpose register (mmq_spr): Like other spr in other module, it produces spr to signal other unit. ### mmq_htw ### mmq_tlb_cmp ### mmq_perv ## Investigation of the Connections between the parts typical variable naming: origin_destination_name val=valid, **act=activate?** Since register files are saved in xu_spr and xu_gpr which is located in xu, hence many register has to transferred from xu to other unit. these registers is builded up by the ex3_spr_wd signal which is an output of xu_spr_cspr. **what i expected:** I thought that the control signals are also passed together as an bitsequence but instead the code is written that the bitsequence will be separated and named. This leads to confusion for person like me, who haven't had enough experience with cores yet. ### FXU Unit to Instruction Unit: ### Control registers #### Machine state register related: Machine state Register MSR is a unique register type which controls important chip functions such as the enabling or disabling of various interrupt types. It is a type of SPR. It can be written from a GPR or read into a GPR. it is used in iuq_fetch, iuq_slice_top and iuq_cpl_top. **question about these signals:** somehow by tracking the signal which we end up in iuq_idec, the signal is only used on scan latches and the output of the latches isn't used at all. p.286 for function definitions and bitorder in a instruction. xu_iu_msr_ucle(spr_msr_ucle): machine state register user-mode cache lock enable bit. description in manual: p.164 connection from. xu_iu_msr is the machine state register, which is splitted xu_iu_msr_de(spr_msr_de): debug interrupt enable. a debug interrupt is a type of imprecise interrupt. imprecise interrupt is a type of interrupt which occurs in pipeline architecture and leaves the system in a not well-defined state. p.331 for more of debug interrupt. xu_iu_msr_pr(spr_msr_pr): Problem state supervisor mode or user mode xu_iu_msr_is(spr_msr_is): Direction of instruction fetches to address space 0 or 1. address spaces is separated that one associates with interrupt-handling, other system-level code and/or data. the other is for application-level code and/or data. (address spaces is most likely implemented in mmu) not enough time to understand every bit of this register, so leave it like this for now. everything is written p.286 anyway if deeper understanding is needed. #### debug control register related debug control register 0 (dbcr0): used to enable debug modes and events, reset the processor, and control timer operation when debugging. used in iuq_ifetch and iuq_cpl. some register bit is named weirdly: xu_iu_iac1_en,..., xu_iu_iac4_en even though it is used in dbcr0, it doesn't state it in the name. also xu_iu_t0_dbcr0_dac1,..., xu_iu_t0_dbcr0_dac4 has a t0 in it which specified the thread which is used. **question about data address compare debug event enables:** why is this the only signal, which is threadspecific in dbcr0 while the others are not. f.e instruction address compare or other control signals in dbcr0. debug control register 1(dbcr0): only IAC12M and IAC34M will passed outside the xu_spr to iuq. the other bits will be used in xu_spr_tspr itself. #### embedded control register related (EPCR) register to handle interruption in guest state **question:** what are guest states? everything except DMIUH bit is passed from xu_spr to iuq. #### core configuration register 2 related (ccr2) passed values are en_dcr: enable device control instruction ifrat: instruction force real address translation, access i-erat or just use real address= effective address as translation. ifratsc: storage control ucode_dis: enable or disable microcode #### Execution configuration register 4 related (xucr4) passed only mmu_mchk, which determines the hardware behavior after an ERAT or TLB address translation parity or multihit error is detected. ### integer exeption register #### ucode_xer **notsure:** a 7bit sequence but not sure what it can be. signal comes from bypass and goes to microcode unit. It can be the string index of the xer register. there is also a ucoder_xer_val bit, which sould be an valid bit. ### debug state register dbsr_ide: to state if it's an inprecise debug event. ### Interrupt signals external_mchk, ext_interrupt, dec_interrupt, udec_interrupt, perf_interrupt, fit_interrupt, crit_interrupt... more on iuq stated. these signals also comes from xu_spr. Mostlikely to state if there is an interrupt happening in the thread. ### some instruction identifier after decode for erat xu_iu_is_eratre, xu_iu_is_eratwe, xu_iu_is_eratsx, xu_iu_is_eratilx, xu_iu_is_erativax ### other signals xu_iu_execute_vld: comes from xu0_dec to iuq_cpl_top **maybe:** it checks if xu0 has been executed it's instruction. xu1_iu_execute_vld xu_iu_itag, xu1_iu_itag xu_iu_n_flush, xu_iu_np1_flush, xu_iu_flush2ucode, xu_iu_np1_async_flush: flush things because of branch and jump? xu_iu_exception: comes from xu0_dec, signal if an exception occurs? xu_iu_mtiar: xu_iu_btar: xu_iu_perf_events: signal for performance event. xu_iu_t0_rest_ifar: xu_iu_raise_iss_pri: xu_iu_single_instr_mode: if only one instruction is passed by? xu_iu_run_thread: thread control? xu_iu_rs_data: data, whole 32/64bit datasequence xu_iu_rb: some bit of a register? xu_iu_ra_entry: 4 bit real address entry? xu_iu_ws: can stand for write to set. comes from xu0_dec. xu_iu_pri: priority. to iuq_spr xu_iu_pri_val: priority? to iuq_spr xu_iu_val: ### Instruction Unit to FXU **a lot of signal is named ord and act. what do they mean? ordered** Since the completion control unit controls the xu_spr (registerfile for special purpose register), there are alot of signals which are transferret from iuq_cpl_ctrl to xu_spr. #### Signals iu_xu_ord_read_done, iu_xu_ord_write_done: from iuq_ierat to xu_spr should signalise that ierat has done write or read. **for saving or load value from an address?** iu_xu_ord_n_flush_req: need to flush. iu_xu_ord_par_err: parity error signal. iu_xu_rfi, iu_xu_rfgi, iu_xu_rfci, iu_xu_rfmci: from iuq_cpl_ctrl to xu_spr. rfi stands for return from interrupt, rfgi return from guest interrupt. signalise that it is returned from a interrupt. iu_xu_act: **signal to tell XU to activate?** iu_xu_int, iu_xu_gint, iu_xu_cint, iu_xu_mcint: different integer type, **don't understand these integer types** iu_xu_dear_update: data exception register update from iuq_cpl_ctrl to xu_spr. iu_xu_dbsr_update: debug status register update from iuq_cpl_ctrl to xu_spr. iu_xu_esr_update: Exception sydrome register update from iuq_cpl_ctrl to xu_spr. iu_xu_dbsr_ude: unconditional debug event from iuq_cpl_ctrl to xu_spr. iu_xu_dbsr_ide: imprecise debug event from iuq_cpl_ctrl to xu_spr iu_xu_force_gsrr: guest save restore register but not sure about force. iu_xu_quiesce:tell the xu that iu is quiesced. iu_xu_icache_quiesce: tell the xu unit that icache is quiesced.(silenced, stopped) iu_xu_instr_cpl: from iuq_cpl_table, just a bit, not sure what it means. iu_xu_async_complete: maybe if the async interrupt is handled. iu_xu_credits_returned:**not sure but** since the command interface is a credit-based interface, this is used to tell the xu unit, that the instruction unit has a credit again? #### Control registers iu_xu_nia_t0, iu_xu_nia_t1 iu_xu_esr_t0, iu_xu_esr_t1 exception syndrome register for exceptions iu_xu_mcsr_t0, iu_xu_mcsr_t1 machine check syndrome register last 15 bit which are not reserved. error detection register. iu_xu_dear_t0, iu_xu_dear_t1 data exception register: for both threads from iuq_cpl_ctrl to xu_spr iu_xu_dbell_taken, iu_xu_cdbell_taken, iu_xu_gdbell_taken, iu_xu_gcdbell_taken, iu_xu_gmcdbell_taken: doorbell interrupt p.342 for more. iu_xu_stop: from iuq_cpl_ctrl to xu_spr_cspr. looks like instruction unit tells execution unit to stop execution and spr_cspr sends other signal to other units using special purpose registers. #### datapath iu_xu_ex5_data:from iuq_ierat to bypass, Instruction which has been loaded from ierat most likely. **question about instruction transfer from instruction unit to execution unit:** since the datapath only has 32/64bit width, does that mean that only one new instruction is going to xu unit each cycle and the full potential of the xu unit is not used with 2 threads and 4 xu units in total? #### Level 1 Cache to Level 2 Cache Connections: The Level 2 Cache is not inside the core code, so a Level 2 Cache is needed to simulate the core. ##### from lq_arb to L2 Cache: ac_an_st_data: 128bit for the datatunnel. ac_an_st_data_pwr_token: Datatoken ac_an_req: valid bit ac_an_req_ttype: ttype ac_an_req_thread: thread id ac_an_req_pwr_token: Power token ac_an_req_ld_core_tag: coretag ac_an_req_ld_xfr_len: Opsize ac_an_req_ra: physical address ac_an_req_wimg_w: write-through ac_an_req_wimg_i: caching-inhibited ac_an_req_wimg_m: memory coherency required ac_an_req_wimg_g: guarded ac_an_req_endian: endianness attributes ac_an_req_user_defined: user ac_an_st_byte_enbl: 16 bit byte enable 16*8=128 ## other things: Information about register size p.213 p.442 tri_event_mux1t diagram ex0,ex1,ex2...., stages of execution pipeline. iu0,iu1,...., stages of instruction pipeline. In some module, there are RAMB36 instantiations, which module is not defined in the core. This is actually an xilinx specific module, which is used in the blockram of the FPGA. So maybe we can only simulate with xilinx. **not sure:** so there are alot of ex stages, but some of them are just passed through value, and i don't know why they do this. I think maybe some value are just passed by is because it needs to stall and wait for other values to be calculated before it can progress. **still not sure:** about why the call latches but it uses @posedge and also a flipflop would be more reasonable to save and to pass value. f.e in xu0_bcd ex3_bcd_rt_q value is just passed by ex2_bcd_rt and passes to the output after the p-latch. **question about ex00:** Why is the stimuli application delay shorter than a clock cycle? is it because the models under test isn't always a clock edge sensitive design? **vlsi2 solution ex00** /home/vlsi2/ex00/solution **to investigate** BCD to DPD module used where? divider swapped out? **meaning of JTAG is Joint Test Action Group?*** **meaning of ITAG is Information Technology Architecture Group?*** **slowspr means slow special purpose register? if so, why does fast and slow special purpose register exist.*** **is the IAR register the program counter?, somehow i can't find it in the code. should it be in the instruction fetch unit or a separate one?** bram, wrapper signal understanding, parameter to tune. ## External connections ### L2 Cache connections ac_an_st_data: output 128x2bit for the datatunnel to store. ac_an_st_data_pwr_token: output Datatoken ac_an_req:output valid bit ac_an_req_ttype:output ttype ac_an_req_thread:output thread id ac_an_req_pwr_token:output Power token ac_an_req_ld_core_tag:output coretag ac_an_req_ld_xfr_len:output Opsize ac_an_req_ra:output physical address ac_an_req_wimg_w:output write-through ac_an_req_wimg_i:output caching-inhibited ac_an_req_wimg_m:output memory coherency required ac_an_req_wimg_g:output guarded ac_an_req_endian:output endianness attributes ac_an_req_user_defined:output user ac_an_st_byte_enbl:output 16 bit byte enable 16*8=128 clk: input define processor clock clk2x: input define processor 2x clk4x: input define processor 4x reset: input reset button an_ac_coreid: input coreid to distinguish this core from core in the system. an_ac_pm_thread_stop: input can be used to stop the A2 core from fetching instructions. Stopping a thread causes all instructions that have begun executing to be completed and all prefetched instructions to be discarded. an_ac_ext_interrupt: input, xu_spr an_ac_crit_interrupt: input, xu_spr an_ac_perf_interrupt: input, xu_spr an_ac_external_mchk: input, xu_spr an_ac_flh2l2_gate: input Forward load hit to Level2 an_ac_reservation_vld: input, xu_spr an_ac_debug_stop: input is a core input signal used by external debug tools to simultaneously stop all core threads. ac_an_debug_trigger: output pulse **not sure: a debug has happened inside the core** an_ac_tb_update_enable: input, can be used to stop timer clock pulses from incrementing the timers. When set to zero, the an_ac_tb_update_enable input signal blocks the timer clock from incrementing timer facilities. an_ac_tb_update_pulse: input, when Timer Clock Select (TCS) field is 1, then an_ac_tb_update_pulse is the timer clock. an_ac_hang_pulse: input, xu_spr ac_an_pm_thread_running: output, tread stop status ac_an_machine_check: output, input, xu_spr ac_an_recov_err: output, ac_an_checkstop: output, ac_an_local_checkstop: output, report checkstop error an_ac_stcx_complete: input, something with L2 cache and stcx instruction an_ac_stcx_pass: input, an_ac_reld_data_vld: input, reload data is coming next cycle an_ac_reld_core_tag: input, reload data destination tag (which load queue) an_ac_reld_data: input, reload data an_ac_reld_qw: input, quadword address of reload data beat an_ac_reld_ecc_err: input, reload data contains a correctable ecc error an_ac_reld_ecc_err_ue: input, reload data contains a uncorrectable ecc error an_ac_reld_data_coming: input, an_ac_reld_crit_qw: input, an_ac_reld_l1_dump: input, an_ac_req_ld_pop: input, credit for a load an_ac_req_st_pop: input, credit for a store an_ac_req_st_gather: input, credit for a store due to L2 gathering of store commands an_ac_sync_ack: input, to be done: high level picture of the memory synopsis synthesis (sa) track address that the core starts with. or generally how it does start up, understand the signals, maybe write a easy tb to initialize the core.(so) ### short summary about isa manual p.1193 #### Reset Mechanisms thread reset operation will be internally invoked either by the watchdog timer or the debug facilities using DBCR0(rst). External mechanism such as a reset button can also cause a thread to be reset. #### Thread State after Reset The initial thread state is controlled by the register contents after reset. ##### Thread Enable Register After reset, the Thread Enable Register is set to the value 0x0000_0000_0000_0001, which indicates that only thread 0 is enabled. ##### Machine State Register The state of the MSR for all thread is set as follow: CM=0 computation mode GS=0 hypervisor state UCLE=0 user cache Locking enable SPV=0 SPE/Embedded Floating-Point/Vector Unavailable CE=0 Critical Input interrupts disabled DE=0 Debug interrupts disabled EE=0 External Input interrupts disabled PR=0 Supervisor mode FP=0 FP unavailable ME=0 Machine Check interrupts disabled FE0=0 FP exception type Program interrupts disabled FE1=0 IS=0 Instruction Address Space 0 DS=0 Data Address Space 0 PMM=0 Performance Monitor Mark ##### Logical Partition Identification Register LPIDR is set to 0. ##### Processor Version Register Implementation-Dependent. ##### TLB entry maps the last page in the implemented effective storage address space with following settings: V=1 valid EPN=last page effective address space RPN=last page physical address space TS=0 translation address space IND=0 direct entry TLPID translation logical partition ID TGS translation hypervisor state SIZE smallest page size supported SX=1 page is execute accessible in supervisor mode SR=1 page is read accessible in supervisor mode SW=1 page is write accessible in supervisor mode VF=0 no virtualization fault ### Initialization Requirements For the initalization following resources is necessary: -invalidate the instruction cache and data cache -initialize system memore as required by the operating system or application code. -initailize the interrupt vector prefix register and interrupt vector offset register -initialize other registers as needed by the system -initialize off-chip system facilities -dispatch the operating system or application code ### Code changes ramb modules are exchanged with tc_sram provided from the eth. generate code has to be changed since synopsys doesn't support it. concrete: add an parameter p=0 and change the generate standalone block by adding if(p==0) and adding names for the begin blocks === to == some <= within always block to = iuq_ic_select 1546, 1547 changed to 'b0 overleaf latex ## Tuneable parameter tri_a2o.vh line 13 define THREADS1: set if the core is single thread or multithread c_wrapper.v: line 10: commented out for smt2 setup line 165: [0:0] to wire [0:1] ac_an_special_attn; line 181: [0:0] to wire [0:1] an_ac_pm_fetch_halt; line 205: [0:0] to wire [0:1] an_ac_sleep_en; line 207: [0:0] to wire [0:1] an_ac_uncond_dbg_event; line 171: [0:3] to wire [0:7] ac_an_event_bus0; line 172: [0:3] to wire [0:7] ac_an_event_bus1; to run it in smt2 line 42: set the floating point registerfile size ENC: power of 2 line 45: define the gpr width line 51: set the general purpose registerfile size line 54: set CR registerfile size line 59:: set lr_pool register file size line 62: set ctr_register file size line c.v line 175 float_type: set if you want to have the core with a floating point unit or not. mmu_a2o.vh line 25: Set if you want the core on erat-only mode or mmu mode.