# [Exp] Highlights in NVMain source code ###### tags: `research-GraphRC` [TOC] ## Demystify NVMain column - Comparison: Column in RC-NVM and DRAM - DRAM column ![](https://i.imgur.com/FgvGFdQ.png) - For a x2 DRAM, each column in a subarray contains 2 bits - For a x4 DRAM, each column in a subarray contains 4 bits - ![](https://i.imgur.com/CCJLJcK.jpg) - Both RC-NVM and traditional DRAM operate in a similar manner - Justify my guessing on configuration setting (Proven wrong after the author gave us source code for RC_NVM) ```c=1 ;******************************************************************************** ; General memory system configuration ; Number of banks per rank BANKS 8 ; Number of ranks per channel ; for 3D DRAM, rank and channel is interchangable RANKS 4 ; Number of channels in the system CHANNELS 2 ; Number of rows in one bank ROWS 8192 ; Number of VISIBLE columns in one bank (real column number = COLS * DeviceWidth) COLS 1024 ; whether enable sub-array level parallelism (SALP) ; options: ; No SALP: MATHeight = ROWS ; SALP: number of subarrays = ROWS / MATHeight MATHeight 1024 ... ; address mapping scheme ; options: SA:R:RK:BK:CH:C (SA-Subarray, R-row, C:column, BK:bank, RK:rank, CH:channel) AddressMappingScheme RK:SA:BK:CH:R:C ``` ## Gem5 trace format |Field Name| CYCLE | OP | ADDRESS | DATA (64B) | THREADID | |--------- | ----- | -- | ------- |----------- | -------- | |Field Num | 0 | 1 | 2 | 3 | 4 | |Instr |0 |R | 0x180 | 00e8da000000e8d5eb05004883c408c331ed4989d15e4889e24883e4f0505449c7c0a00a400048c7c1e00a400048c7c7a4024000e8a7010000f490904883ec08| 0| | |6 |R | 0x1c0 | 488b05112f28004885c07402ffd04883c408c390909090909090909090909090554889e5534883ec08803db03b280000755fbb48306800488b05aa3b28004881| 0| | |7 |R | 0x87e60 | 0100000000000000d2efffffff7f0000000000000000000000000000000000002100000000000000000060ffffffffff1000000000000000fffb8b0700000000| 0| | |9 |R | 0x340 | bf07004600e8960c0000b800000000c9c39090909090909090909090909090904157b8000000004d89c741564989ce415541544d89cc55534881ec7802000048| 0| | |19 |R | 0x380 | 8954240831d24885c048897c2418897424140f85a902000048637c2414488b442408891560352800488d7cf808488b8424b002000048893d24552800488905a5| 0| ## Testing environment - Ubuntu 16.04 - gcc 5 - [Update] Should work on Ubuntu 20.04 with gcc 9.3.0 ## NVMain structure - ![](https://i.imgur.com/56jJ1nL.png) ### Address mapping - Input: Physical address mapping scheme - i.e. Channel=1, Rank=1, Row=10, Bank=3, Subarray=6, Col=8 - Output: Ordered address mapping - i.e. order = {3, 0, 2, 4, 5, 1}; // R, C, BA, RA, CH, SA - [src/TranslationMethod.cpp](https://github.com/WeiCheng14159/NVmain/blob/master/src/TranslationMethod.cpp) ### Address Translation - Input: Physical address - Output: Corresponding memory domain (i.e. channel, rank, bank, row, col) - [src/AddressTranslator.cpp](https://github.com/WeiCheng14159/NVmain/blob/ad28c0c72d0d4eb838b2c3b3f779ca1e4a67cce3/src/AddressTranslator.cpp#L185) :::spoiler code ```cpp=1 /* * Translate() translates the physical address and returns the corresponding * values of each memory domain */ void AddressTranslator::Translate( uint64_t address, uint64_t *row, uint64_t *col, uint64_t *bank, uint64_t *rank, uint64_t *channel, uint64_t *subarray ) { uint64_t refAddress; MemoryPartition part; uint64_t *partitions[6] = { row, col, bank, rank, channel, subarray }; if( GetTranslationMethod( ) == NULL ) { std::cerr << "Divider Translator: Translation method not specified!" << std::endl; return; } int busOffsetBits = mlog2( busWidth / 8 ); int burstBits = mlog2( (busWidth * burstLength) / 8 ); lowColBits = burstBits - busOffsetBits; /* first of all, truncate the bus offset bits */ refAddress = address >> busOffsetBits; /* then, truncate the lowest column bits */ refAddress >>= lowColBits; /* 0->4, low to high, FindOrder() will find the correct one */ for( int i = 0; i < 6; i++ ) { FindOrder( i, &part ); /* * The new memsize does not include this partition, so dividing by the * new memsize will give us the right channel/rank/bank/whatever. */ *partitions[part] = Modulo( refAddress, part ); /* * "Mask off" the first partition number we got. For example if memsize = 1000 * and the address is 8343, the partition would be 8, and we will look at 343 * to determine the rest of the address now. */ refAddress = Divide( refAddress, part ); } } ``` ::: ### Memory Controller - Schedule memory request - [src/MemoryController.cpp](https://github.com/WeiCheng14159/NVmain/blob/ad28c0c72d0d4eb838b2c3b3f779ca1e4a67cce3/src/MemoryController.cpp#L231) :::spoiler code ```cpp=1 void MemoryController::ScheduleCommandWake( ) { /* Schedule wake event for memory commands if not scheduled. */ ncycle_t nextWakeup = NextIssuable( NULL ); /* Avoid scheduling multiple duplicate events. */ bool nextWakeupScheduled = GetEventQueue()->FindCallback( this, (CallbackPtr)&MemoryController::CommandQueueCallback, nextWakeup, NULL, commandQueuePriority ); if( !nextWakeupScheduled ) { GetEventQueue( )->InsertCallback( this, (CallbackPtr)&MemoryController::CommandQueueCallback, nextWakeup, NULL, commandQueuePriority ); } } ``` ::: ### Rank - `Rank` is a subclass of `NVMObject` - `StandardRank` is a subclass of `Rank` - [Ranks/StandardRank/StandardRank.cpp](https://github.com/WeiCheng14159/NVmain/blob/master/Ranks/StandardRank/StandardRank.cpp) - `void StandardRank::Cycle( ncycle_t steps )` - :::spoiler code ```cpp= void StandardRank::Cycle( ncycle_t steps ) { for( ncounter_t childIdx = 0; childIdx < GetChildCount( ); childIdx++ ) GetChild( childIdx )->Cycle( steps ); /* Count cycle numbers and calculate background energy for each state */ switch( state ) { /* active powerdown */ case STANDARDRANK_PDA: fastExitActiveCycles += steps; if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD3P * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Epda * (double)steps ); break; /* precharge powerdown fast exit */ case STANDARDRANK_PDPF: fastExitPrechargeCycles += steps; if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD2P1 * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Epdpf * (double)steps ); break; /* precharge powerdown slow exit */ case STANDARDRANK_PDPS: slowExitCycles += steps; if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD2P0 * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Epdps * (double)steps ); break; /* active standby */ case STANDARDRANK_REFRESHING: case STANDARDRANK_OPEN: activeCycles += steps; if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD3N * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Eactstdby * (double)steps ); break; /* precharge standby */ case STANDARDRANK_CLOSED: standbyCycles += steps; if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD2N * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Eprestdby * (double)steps ); break; default: if( p->EnergyModel == "current" ) backgroundEnergy += ( p->EIDD2N * (double)steps ) * (double)deviceCount; else backgroundEnergy += ( p->Eprestdby * (double)steps ); break; } } ``` ::: - `bool StandardRank::IsIssuable( NVMainRequest *req, FailReason *reason )` - :::spoiler code ```cpp= bool StandardRank::IsIssuable( NVMainRequest *req, FailReason *reason ) { uint64_t opBank; bool rv; req->address.GetTranslatedAddress( NULL, NULL, &opBank, NULL, NULL, NULL ); rv = true; if( req->type == ACTIVATE ) { if( nextActivate > GetEventQueue( )->GetCurrentCycle( ) || ( lastActivate[(RAWindex + 1) % rawNum] + p->tRAW ) > GetEventQueue()->GetCurrentCycle() ) { rv = false; if( reason ) reason->reason = RANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } if( rv == false ) { if( nextActivate > GetEventQueue( )->GetCurrentCycle( ) ) { actWaits++; actWaitTotal += nextActivate - GetEventQueue( )->GetCurrentCycle( ); } if( ( lastActivate[RAWindex] + p->tRRDR ) > GetEventQueue( )->GetCurrentCycle( ) ) { rrdWaits++; rrdWaitTotal += ( lastActivate[RAWindex] + p->tRRDR - (GetEventQueue()->GetCurrentCycle()) ); } if( ( lastActivate[( RAWindex + 1 ) % rawNum] + p->tRAW ) > GetEventQueue( )->GetCurrentCycle( ) ) { fawWaits++; fawWaitTotal += ( lastActivate[( RAWindex + 1 ) % rawNum] + p->tRAW - GetEventQueue( )->GetCurrentCycle( ) ); } } } else if( req->type == READ || req->type == READ_PRECHARGE ) { if( nextRead > GetEventQueue( )->GetCurrentCycle( ) ) { rv = false; if( reason ) reason->reason = RANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == WRITE || req->type == WRITE_PRECHARGE ) { if( nextWrite > GetEventQueue( )->GetCurrentCycle( ) ) { rv = false; if( reason ) reason->reason = RANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == PRECHARGE || req->type == PRECHARGE_ALL ) { if( nextPrecharge > GetEventQueue( )->GetCurrentCycle( ) ) { rv = false; if( reason ) reason->reason = RANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == POWERDOWN_PDA || req->type == POWERDOWN_PDPF || req->type == POWERDOWN_PDPS ) { rv = CanPowerDown( req ); if( !rv && reason ) reason->reason = RANK_TIMING; } else if( req->type == POWERUP ) { rv = CanPowerUp( req ); if( !rv && reason ) reason->reason = RANK_TIMING; } else if( req->type == REFRESH ) { /* firstly, check whether REFRESH can be issued to a rank */ if( nextActivate > GetEventQueue()->GetCurrentCycle() || ( lastActivate[( RAWindex + 1 ) % rawNum] + p->tRAW > GetEventQueue( )->GetCurrentCycle( ) ) ) { rv = false; if( reason ) reason->reason = RANK_TIMING; return rv; } /* REFRESH can only be issued when all banks in the group are issuable */ assert( (opBank + banksPerRefresh) <= bankCount ); for( ncounter_t i = 0; i < banksPerRefresh; i++ ) { rv = GetChild( opBank + i )->IsIssuable( req, reason ); if( rv == false ) return rv; } } else { /* Unknown command -- See if child module can handle it. */ rv = GetChild( req )->IsIssuable( req, reason ); } return rv; } ``` ::: - `bool StandardRank::IssueCommand( NVMainRequest *req )` - :::spoiler code ```cpp= bool StandardRank::IssueCommand( NVMainRequest *req ) { bool rv = false; if( !IsIssuable( req ) ) { ... } else { rv = true; switch( req->type ) { case ACTIVATE: rv = this->Activate( req ); break; case READ: case READ_PRECHARGE: rv = this->Read( req ); break; case WRITE: case WRITE_PRECHARGE: rv = this->Write( req ); break; case PRECHARGE: case PRECHARGE_ALL: rv = this->Precharge( req ); break; case POWERDOWN_PDA: case POWERDOWN_PDPF: case POWERDOWN_PDPS: rv = this->PowerDown( req ); break; case POWERUP: rv = this->PowerUp( req ); break; case REFRESH: rv = this->Refresh( req ); break; default: std::cout << "NVMain: Rank: Unknown operation in command queue! " << req->type << std::endl; break; } } return rv; } ``` - `bool StandardRank::Activate( NVMainRequest *request )` - :::spoiler code ```cpp= bool StandardRank::Activate( NVMainRequest *request ) { ... /* * Ensure that the time since the last bank activation is >= tRRD. This is to limit * power consumption. */ if( nextActivate <= GetEventQueue()->GetCurrentCycle() && lastActivate[( RAWindex + 1 ) % rawNum] + p->tRAW <= GetEventQueue( )->GetCurrentCycle( ) ) { /* issue ACTIVATE to target bank */ GetChild( request )->IssueCommand( request ); if( state == STANDARDRANK_CLOSED ) state = STANDARDRANK_OPEN; /* move to the next counter */ RAWindex = (RAWindex + 1) % rawNum; lastActivate[RAWindex] = GetEventQueue()->GetCurrentCycle(); nextActivate = MAX( nextActivate, GetEventQueue()->GetCurrentCycle() + p->tRRDR ); } else { std::cerr << "NVMain Error: Rank Activation FAILED! " << "Did you check IsIssuable?" << std::endl; } return true; } ``` ::: - `bool StandardRank::Read( NVMainRequest *request )` - :::spoiler code ```cpp= bool StandardRank::Read( NVMainRequest *request ) { ... /* issue READ or READ_PRECHARGE to target bank */ bool success = GetChild( request )->IssueCommand( request ); /* Even though the command may be READ_PRECHARGE, it still works */ nextRead = MAX( nextRead, GetEventQueue()->GetCurrentCycle() + MAX( p->tBURST, p->tCCD ) * request->burstCount ); nextWrite = MAX( nextWrite, GetEventQueue()->GetCurrentCycle() + MAX( p->tBURST, p->tCCD ) * (request->burstCount - 1) + p->tCAS + p->tBURST + p->tRTRS - p->tCWD ); ... } ``` ::: ### Bank - `Bank` is a subclass of `NVMObject` - `StandardBank` is a subclass of `Bank` - [Banks/DDR3Bank/DDR3Bank.cpp](https://github.com/WeiCheng14159/NVmain/blob/master/Banks/DDR3Bank/DDR3Bank.cpp) - `bool DDR3Bank::IsIssuable( NVMainRequest *req, FailReason *reason )` :::spoiler code ```cpp= bool DDR3Bank::IsIssuable( NVMainRequest *req, FailReason *reason ) { bool rv = true; uint64_t opRank, opBank, opRow, opSubArray; req->address.GetTranslatedAddress( &opRow, NULL, &opBank, &opRank, NULL, &opSubArray ); if( nextCommand != CMD_NOP ) return false; if( req->type == ACTIVATE ) { /* if the bank-level nextActive is not satisfied, cannot issue */ if( nextActivate > ( GetEventQueue()->GetCurrentCycle() ) || state == DDR3BANK_PDPF || state == DDR3BANK_PDPS || state == DDR3BANK_PDA ) { rv = false; if( reason ) reason->reason = BANK_TIMING; actWaits++; actWaitTotal += nextActivate - (GetEventQueue()->GetCurrentCycle()); } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == READ || req->type == READ_PRECHARGE ) { if( nextRead > (GetEventQueue()->GetCurrentCycle()) || state != DDR3BANK_OPEN ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == WRITE || req->type == WRITE_PRECHARGE ) { if( nextWrite > (GetEventQueue()->GetCurrentCycle()) || state != DDR3BANK_OPEN ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else if( req->type == PRECHARGE || req->type == PRECHARGE_ALL ) { if( nextPrecharge > (GetEventQueue()->GetCurrentCycle()) || ( state != DDR3BANK_CLOSED && state != DDR3BANK_OPEN ) ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } else { if( req->type == PRECHARGE_ALL ) { std::deque<ncounter_t>::iterator it; for( it = activeSubArrayQueue.begin(); it != activeSubArrayQueue.end(); ++it ) { rv = GetChild( (*it) )->IsIssuable( req, reason ); if( rv == false ) break; } } else { rv = GetChild( req )->IsIssuable( req, reason ); } } } else if( req->type == POWERDOWN_PDA || req->type == POWERDOWN_PDPF || req->type == POWERDOWN_PDPS ) { if( nextPowerDown > (GetEventQueue()->GetCurrentCycle()) || ( state != DDR3BANK_CLOSED && state != DDR3BANK_OPEN ) || ( ( req->type == POWERDOWN_PDPF || req->type == POWERDOWN_PDPS ) && state == DDR3BANK_OPEN ) ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } for( ncounter_t saIdx = 0; saIdx < subArrayNum; saIdx++ ) { if( !GetChild(saIdx)->IsIssuable( req ) ) { rv = false; break; } } } else if( req->type == POWERUP ) { if( nextPowerUp > (GetEventQueue()->GetCurrentCycle()) || ( state != DDR3BANK_PDPF && state != DDR3BANK_PDPS && state != DDR3BANK_PDA ) ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } for( ncounter_t saIdx = 0; saIdx < subArrayNum; saIdx++ ) { if( !GetChild(saIdx)->IsIssuable( req ) ) { rv = false; break; } } } else if( req->type == REFRESH ) { if( nextActivate > ( GetEventQueue()->GetCurrentCycle() ) || ( state != DDR3BANK_CLOSED && state != DDR3BANK_OPEN ) ) { rv = false; if( reason ) reason->reason = BANK_TIMING; } else { rv = GetChild( req )->IsIssuable( req, reason ); } } else { /* Unknown command, just ask child modules. */ rv = GetChild( req )->IsIssuable( req, reason ); } return rv; } ``` ::: ### NVMObject - This is the MOST important class in this simulator - Nearly ALL components are derived from this class ```cpp= class NVMObject { public: NVMObject( ); virtual ~NVMObject( ); virtual void AddChild( NVMObject *c ); NVMObject *_FindChild( NVMainRequest *req, const char *childClass ); NVMObject_hook *GetParent( ); std::vector<NVMObject_hook *>& GetChildren( ); NVMObject_hook *GetChild( NVMainRequest *req ); NVMObject_hook *GetChild( ncounter_t child ); NVMObject_hook *GetChild( ); ncounter_t GetChildId( NVMObject *c ); ncounter_t GetChildCount( ); ... protected: NVMObject_hook *parent; AddressTranslator *decoder; ... std::vector<NVMObject_hook *> children; std::vector<NVMObject *> *hooks; HookType hookType, currentHookType; }; ``` ## Tracing code execution in NVMain - main function ```cpp= int main( int argc, char *argv[] ) { TraceMain *traceRunner = new TraceMain( ); return traceRunner->RunTrace( argc, argv ); } ``` - int TraceMain::RunTrace( int argc, char *argv[] ) ```cpp= int TraceMain::RunTrace( int argc, char *argv[] ) { ... AddChild( nvmain ); nvmain->SetParent( this ); ... nvmain->SetConfig( config, "defaultMemory", true ); ... ``` - void NVMObject::AddChild( NVMObject *c ) ```cpp= void NVMObject::AddChild( NVMObject *c ) { NVMObject_hook *hook = new NVMObject_hook( c ); std::vector<NVMObject *>::iterator it; /* * Copy all hooks from the parent to the child NVMObject. */ for( int i = 0; i < static_cast<int>(NVMHOOK_COUNT); i++ ) { for( it = hooks[i].begin(); it != hooks[i].end(); it++ ) { c->AddHook( (*it) ); } } children.push_back( hook ); } ``` ### Function calling graph ```graphviz digraph g { graph [ rankdir = "LR" ]; node [ fontsize = "16" shape = "record"]; edge [ ]; "traceRunner" [ label = "<f0> traceRunner| <f1>" ]; "nvmain" [ label = "<f0> nvmain| <f1>" ]; "memoryControllers_0" [ label = "<f0> memoryControllers[0]| <f1>" ]; "memoryControllers_1" [ label = "<f0> memoryControllers[1]| <f1>" ]; "InterconnectFactory" [ label = "<f0> InterconnectFactory| <f1>" ]; main -> traceRunner; traceRunner -> nvmain; nvmain:f0->memoryControllers_0:f0; nvmain:f0->memoryControllers_1:f0->InterconnectFactory:f0; } ``` ### Run simulator - Run command ```bash=1 ./nvmain.debug Config/PCM_ISSCC_2012_4GB.config Tests/Traces/hello_world.nvt 0 ``` - Simulator calling hierarchy ```c=1 defaultMemory -- defaultMemory.channel0.FRFCFS-WQF ---- defaultMemory.channel0.FRFCFS-WQF.channel0 ------ defaultMemory.channel0.FRFCFS-WQF.channel0.rank0 -------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank0 ---------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank0.subarray0 -------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank1 ---------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank1.subarray0 -------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank2 ---------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank2.subarray0 -------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank3 ---------- defaultMemory.channel0.FRFCFS-WQF.channel0.rank0.bank3.subarray0 ... ```