Try   HackMD

Study Notes for NVMe

2020/08/04

1. NVMe Queue

In NVMe base spec. revision 1.4 (hereinafter referred to as spec. 1.4) page 7:

Two types of queues:
Where each type has:
Submission queue
In spec. 1.4 page 8:

A Submission Queue (SQ) is a circular buffer with a fixed slot size that the host software uses to submit commands for execution by the controller.

The submission queue byte layout is as show below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Sumbission Queue Entry Byte Layout (Reference from spec. 1.4 in page 65-66, figure form https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_TD12_Onufryk.pdf)

spec. 1.4 in page 65

Each command is 64 bytes in size.

Completion queue
In spec. 1.4 page 9:

A Completion Queue (CQ) is a circular buffer with a fixed slot size used to post status for completed commands.

An entry in the Completion Queue is at least 16 bytes in size.

The completion queue byte layout is as show below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Completion Queue Entry Byte Layout (Reference from spec. 1.4 in page 77, figure form https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_TD12_Onufryk.pdf)

spec. 1.4 in page 77

An entry in the Completion Queue is at least 16 bytes in size.

1. Admin queues

Admin Command Set may be submitted to the Admin Submission Queue.

2. I/O queues

The NVM Express interface is based on a paired Submission and Completion Queue mechanism.

3. Confused Issues:

In spec. 1.4 page 7:
Does not require uncacheable / MMIO register reads in the command submission or completion path;

In spec. 1.4 page 284 Figure 432: Command Processing
Step 3. The controller transfers the command(s) from in the Submission Queue slot(s) into the controller for future execution.
This operation is not a reading opeartion? or what does controller transfering mean?

What is the slot? or slot size?
How many bytes of one logic block size? (512 Bytes ??) namespace LBAF

What contents does exatly the Metadata store in? (an address)
It depends on the PRP or SGL entry?

2. Why are there two different lists, the Physical Region Page Entry and List (PRP)and Scatter Gather List (SGL)?

In spec. 1.4 page 66:
PRPs shall be used for all Admin commands for NVMe over PCIe implementations.

SGLs shall be used for all Admin and I/O commands for NVMe over Fabrics implementations. (Just only Fabrics implementation ???)

3. Physical Region Page Entry and List (PRP)

1. PRP

In spec. 1.4 page 68:
A physical region page (PRP) entry is a pointer to a physical memory page.
where the physical memory page size defined in spec. 1.4 page 48

Offset 14h: CC Controller Configuration
10:07 Memory Page Size(MPS):
The minimum host memory page size is 4 KiB and the maximum host memory page size is 128 MiB.

2. PRP List

In spec. 1.4 page 69:
A physical region page list (PRP List) is a set of PRP entries in a single page of contiguous memory.

4. Scatter Gather List (SGL)

In spec. 1.4 page 70:
A Scatter Gather List (SGL) is a data structure in memory address space used to describe a data buffer.

What is the definition for data buffer in NVMe spec. ?

5. Command Set

1. Admin Command Set

To reference the spec. 1.4 chapter 5 "ADMIN COMMAND SET".

2. Confused Issues:

Admin and NVM Command Set, Admin and NVM Vendor Specific Commands..
how to determine??

> In spec. 1.4 page 94, figure 139. Are that commands determined by combined opcode C0h to FFh ??

3. NVM Command Set

To reference the spec. 1.4 chapter 6 "NVM COMMAND SET".

6. Controller Architecture

In spec. 1.4 page 274:
There types of controller: I/O, Administrative, Discovery.

1. I/O Controller

In spec. 1.4 page 275:
An I/O controller is a general purpose controller that supports commands that provide access to an NVM subsystem’s non-volatile storage medium and may support commands that provide management capabilities.

2. Administrative Controller

In spec. 1.4 page 278:
An administrative controller is a controller whose intended purpose is to provide NVM subsystem management capabilities.

3. Discovery Controller

In spec. 1.4 page 283:
A discovery controller is a special type of controller used in NVMe over Fabrics to provide access to a Discovery Log Page.

4. Command Submission and Completion Mechanism

A picture is worth a thousand words. The command processing flow is as show as below (in Spec. 1.4 page 284):

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

1. Confused issues:

In step 5.
Phase Tag inverted from the previous entry to indicate to the host that this completion queue entry is a new entry.
The term inverted does mean the toggle behavior? (0 to 1, 1 to 0)

In step 6.
The controller optionally generates an interrupt to the host
The term optionally does mean that feature not must be done??

5. Fused Operation

When the controller would ececute the fused operation ?? (Atomic ?? When)

6. Resets

1. NVM Subsystem Reset

In spec. 1.4 page 290:
A value of 4E564D65h (“NVMe”) is written to the NSSR.NSSRC field;
in spec. 1.4 page 42:
As shown in Figure 68: Register Definition
NSSR (NVM Subsystem Reset)
in spec. 1.4 page 50
As shown in Figure 80: NSSRC - NVM Subsystem Reset Control

2. Controller Level Reset

In spec. 1.4 page 290:
There are five methods to initiate a Controller Level Reset:
• NVM Subsystem Reset;
• Conventional Reset (i.e., PCI Express Hot, Warm, or Cold reset);
• PCI Express transaction layer Data Link Down status;
• Function Level Reset (i.e., PCI reset); and
• Controller Reset (i.e., CC.EN transitions from ‘1’ to ‘0’).

3. Queue Level

In spec. 1.4 page 291:
A queue level reset is performed by deleting and then recreating the queue.

7. Controller Initialization

The detail could be referenced in spec. 1.4 page 295-296, but we may be summarize the actions as below:

  1. Set the PCI and PCI Express register.
  2. Waiting for CSTS.RDY to become '0'.
  3. Set the AQA, ASQ, and ACQ values.
  4. Arbitraion mechanism > CC.AMS
    memory page size > CC.MPS
    I/O Command Set > CC.CSS
  5. CC.EN to '1'
  6. CSTS.RDY to '1'