# PCI and AHCI
###### tags: `PCI` `PCI configuration space` `AHCI`
## PCI (Peripheral Component Interconnect)
Peripheral Component Interconnect[2] (abbreviated PCI, also referred to as Conventional PCI[citation needed] to differentiate from its successor PCI Express) is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format that is independent of any particular processor's native bus. Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space.[3] It is a parallel bus, synchronous to a single bus clock.
see: [Peripheral Component Interconnect](https://en.wikipedia.org/wiki/Peripheral_Component_Interconnect)
>> PCI 是 local computer bus, 是 PCI Local Bus Standard 的一部分, 用來使其他硬體跟 CPU 溝通. PCI bus 支援處理器 bus 的功能.
>> PCI 上的 Devices 會被配置位置 (cpu 上的 address space). 這是跟 single bus clock 平行 (parallel) 同步 (synchronous) 的.
>> >> 所謂 local bus 是指直接跟 CPU 連接 (或者幾乎接近) 的 bus, 藉此減少額外產生的 bottleneck
## PCI configuration space
PCI configuration space is the underlying way that the Conventional PCI, PCI-X and PCI Express perform auto configuration of the cards inserted into their bus.
> 在PCIe的拓撲結構中,最多256個Bus, 每條Bus最多支援32個Device,每個Device最多支援8個Function,所以,由Bus:Device:Function(BDF)構成了每個Function的唯一的"身分證號碼"
> 對於Legacy PCI來說,Configuration Space有 256 Bytes, 對於PCIe, Configuration Space有 4096 Bytes
> > 在 Host 内存中有一个大小256MB的Memory Block, 專門用来存放所有的configuration space. 為甚麼是 256 MB 呢? 每個 function 都有 4KB 的空間所以
> > 256 (Bus) * 32 (Device) * 4KB (function) = 256 MB
> 這麼多的Function,Host怎么知道他们具有甚麼功能?答案是,每个Function都有一个大小为4KB的configuration space。在系统上電的過程中,在枚舉整個PCI Bus之后,就會將所有的BDF的configuration space讀到Host內存中


> 對於x86架構的CPU而言,有定義Memory和IO的指令,但沒有配置空間相關的指令。所以需要有一個譯碼器把配置命令翻譯一下,這個譯碼器一般是在 north bridge 裡面,現在Intel的CPU已经自動集程 north bridge,所以現在的CPU可以直接完成翻譯工作。具體來說,有以下兩種方式可以完成對配置空間的訪問。
> IO方式(CF8h/CFCh)
> Memory方式(ECAM)
> IO 方式, 這邊舉 x86 CPU 架構下, CPU提供了兩組I/O寄存器用於訪問配置空間
> > 配置空间控制寄存器 CF8h-CFBh
> > 配置空间数据寄存器 CFCh-CFFh

> 訪問代碼如下
```c=
// for PCI
address = BIT31 | ((Bus & 0xFF) << 16) | ((Dev & 0x1F) << 11) | ((Fun & 0x7) << 8) | (Reg & 0xfffffffc);
// write cfg register
IoWrite8(0xcf8, address);
// read data register
data8 = IoRead8(0xcfc);
data16 = IoRead16(0xcfc);
data32 = IoRead32(0xcfc);
```
> 以上的代码适用于PCI和PCIe寄存器的访问,对于PCIe而言,I/O方式只能访问前256个寄存器。
> > 因為 2 ^ 6 (bit 2 ~ 7) * 4 (配置空间控制寄存器 CF8h-CFBh)
> [name=Ztex]
> Bit31代表enable bit。一定要置起来,否则不起作用。
> Memory方式(ECAM)
>Memory方式访问PCI/PCIe配置空间需要知道MMCFG的基本地址。这个地址是由BIOS program到mmcfg_base寄存器中的。从ECAM中获取PCI/PCIe的寄存器同直接访问内存没有太大的区别,对于c语言,用指针很容易访问。
```c=
#define pcie_addr(m, b, d, f, o) (m + ((b & 0xff) << 20) + ((d & 0x1f) << 15) + ((f & 0x7) << 12) + (o & 0xfffffffc)
#define mmio_read8(addr) (*(volatile uint8 *)addr)
#define mmio_write8(addr, data8) *(volatile uint8 *)addr = data8
#define mmio_read16(addr) (*(volatile uint16 *)addr)
#define mmio_write16(addr, data16) *(volatile uint16 *)addr = data16
#define mmio_read32(addr) (*(volatile uint32 *)addr)
#define mmio_write32(addr, data32) *(volatile uint32 *)addr = data32
```
> mmcfg 的 base address 得看 cpu spec
see: [PCI configuration space](https://en.wikipedia.org/wiki/PCI_configuration_space)
see: [浅析PCI配置空间](https://blog.csdn.net/zhuzongpeng/article/details/78809687)
see: [PCI/PCIe 的那些事 (2)- 配置空间 (Configuration Space)](https://blog.csdn.net/huangkangying/article/details/50570612)
see : [硬件设备识别扫盲篇](https://www.jianshu.com/p/91c2649b43b8)
## AHCI (Advanced Host Controller Interface)
This specification defines the functional behavior and software interface of the **Advanced Host Controller Interface, which is a hardware mechanism that allows software to communicate with Serial ATA devices**.
AHCI is a PCI class device that acts as a data movement engine **between system memory and Serial ATA devices**.
AHCI host devices (referred to as **host bus adapters, or HBA**) support from **1 to 32 ports**.
An HBA must support ATA and ATAPI devices, and **must support both the PIO and DMA protocols**.
An HBA may optionally support a command list on each port for overhead reduction, and to support Serial ATA Native Command Queuing via the FPDMA Queued Command protocol for **each device of up to 32 entries**.
An HBA may optionally support 64-bit addressing.
AHCI describes a **system memory structure** which contains a **generic area for control and status**, and **a table of entries describing a command list** (an HBA which does not support a command list shall have a depth of one for this table).
Each command list entry contains information necessary to program an
SATA device, and a pointer to a descriptor table for transferring data between system memory and the device.
see: [spec](https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/serial-ata-ahci-spec-rev1-3-1.pdf)
## PCI in linux
references:
1. [PCI Drivers](https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch12.html)
2. [理解 lspci](https://silverwind1982.pixnet.net/blog/post/359421940-%E7%90%86%E8%A7%A3-lspci)
3. [lspci詳解分析](https://www.itread01.com/content/1557387607.html)
4. [Linux下查看PCI-E插槽信息 @ 立你斯學習記錄](https://b8807053.pixnet.net/blog/post/45059362)
5. [linux下遍历访问PCIE设备配置空间](https://blog.csdn.net/penghuicheng/article/details/84454803?utm_medium=distribute.pc_relevant_download.none-task-blog-baidujs-9.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-task-blog-baidujs-9.nonecase)
* `/proc/iomem` describes the I/O mapping of all of the device in memory space
see: https://stackoverflow.com/questions/20469549/understanding-proc-iomem
aslo see: https://www.kernel.org/doc/html/v4.18/vm/highmem.html
```shell=
$> cat /proc/iomem
...
e0000000-e3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
e0000000-e3ffffff : reserved
fea00000-feafffff : PCI Bus 0000:00
fea00000-feafffff : pnp 00:01
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed00000-fed003ff : PCI Bus 0000:00
fed01000-fed01fff : reserved
fed01000-fed01fff : PCI Bus 0000:00
fed01000-fed01fff : pnp 00:01
fed03000-fed03fff : PCI Bus 0000:00
fed03000-fed03fff : pnp 00:01
fed06000-fed06fff : PCI Bus 0000:00
fed06000-fed06fff : pnp 00:01
fed08000-fed09fff : PCI Bus 0000:00
fed08000-fed09fff : pnp 00:01
fed1c000-fed1cfff : PCI Bus 0000:00
fed1c000-fed1cfff : pnp 00:01
fed40000-fed44fff : MSFT0101:00
fed64000-fed64fff : dmar0
fed65000-fed65fff : dmar1
fed80000-fedbffff : PCI Bus 0000:00
fed80000-fedbffff : pnp 00:01
fee00000-feefffff : PCI Bus 0000:00
fee00000-feefffff : pnp 00:01
fee00000-fee00fff : Local APIC
ff800000-ffffffff : reserved
100000000-17fffffff : System RAM
```
> 假設我們今天拿了個 32-bits 系統, 傳統上 Linux kernel 切 3/1
> 所以 0x00000000-0xc0000000 是 user memory, 0xc0000000-0xffffffff 是 kernel memory
> 所以一次 kernel 最多可以 mapping 1GiB physical memory, 實際上更少 ~896MiB
> 這邊看到的是 `request_mem_region` 的 memory, 用來像大家所這一段記憶體我用了
> [name=ztex]
* `lspci`
```shell=
$> lspci -v
...
00:12.0 Class 0106: Device 8086:31e3 (rev 06) (prog-if 01)
Subsystem: Device 7270:8086
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 341
Memory at a1310000 (32-bit, non-prefetchable) [size=8K]
Memory at a131b000 (32-bit, non-prefetchable) [size=256]
I/O ports at 4080 [size=8]
I/O ports at 4088 [size=4]
I/O ports at 4060 [size=32]
Memory at a1319000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Kernel driver in use: ahci
...
01:00.0 Class 0106: Device 1b4b:9235 (rev 11) (prog-if 01)
Subsystem: Device 1b4b:9235
Flags: bus master, fast devsel, latency 0, IRQ 342
I/O ports at 3028 [size=8]
I/O ports at 3034 [size=4]
I/O ports at 3020 [size=8]
I/O ports at 3030 [size=4]
I/O ports at 3000 [size=32]
Memory at a1200000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at a1210000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [e0] SATA HBA v0.0
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci
...
```
what does `+ -1b.0- [02-3a]-` means?
```
1b.0 is a slot and function number of the PCIe root hub. In this case, it contains a PCIe bridge. The busses behind this bridge would be numbered 02 to 3a, even though there are currently no devices attached to them.
In a similar way, your GPU is behind the bridge 01.0, and your LAN controller behind the bridge 1d.0, which may be an internal bridge.
```
* `vendor id`:`device id`
```cpp
Searching by vendor and device ID:
struct pci_dev *dev = NULL;
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
configure_device(dev);
Searching by class ID (iterate in a similar way):
pci_get_class(CLASS_ID, dev)
Searching by both vendor/device and subsystem vendor/device ID:
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
```
> 用 `lspci` 看 pci buses 跟 pci device. 因為這邊我們要控制 sata disk, 所以找到 `achi` driver. (achi 是一種跟 sata device 溝通的 interface)
> 以 1b.0- [02-3a] 當一個例子, 1b.0 分別是 slot 跟 function number. 這個 hub 有一個 bridge, 下面的 [02-3a] 是 buses
> 我們可以看到 achi controller 的 base address 分別是 `a1319000` 跟 `a1200000`
> 我們要透過個 base address 加上 offset 找到 port register
> [name=ztex]
* access achi base address register
see [ahci spec](https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/serial-ata-ahci-spec-rev1-3-1.pdf) **2.1 PCI Header**

which implies **AHCI Base Address <BAR5>**.
from which we can get the **AHCI Base Address** like ...
```clike=
#define REQUIRE_BAR 5
io_base = pci_resource_start(dev, REQUIRE_BAR);
/*
also see:
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
bus and slot and number. If the device is
found, its reference count is increased.
pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3)
pci_find_capability() Find specified capability in device's capability
list.
pci_resource_start() Returns bus start address for a given PCI region
pci_resource_end() Returns bus end address for a given PCI region
pci_resource_len() Returns the byte length of a PCI region
pci_set_drvdata() Set private driver data pointer for a pci_dev
pci_get_drvdata() Return private driver data pointer for a pci_dev
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
*/
```
in `pci.h`
```clike=
#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)
#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags)
#define pci_resource_len(dev,bar) \
((pci_resource_start((dev), (bar)) == 0 && \
pci_resource_end((dev), (bar)) == \
pci_resource_start((dev), (bar))) ? 0 : \
\
(pci_resource_end((dev), (bar)) - \
pci_resource_start((dev), (bar)) + 1))
```
After this we need to get the **port register** to get the **SSTATUS**
see [ahci spec](https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/serial-ata-ahci-spec-rev1-3-1.pdf) **3 HBA Memory Registers**
, which implies that the **offset of port 1 register** is **100h**

The formula for computing **offset of a given port register** is in **3.3 Port Registers (one set per port)**

$$Port{\space}offset=100h+(port{\space}number*80h)$$,
plus we look **3.3.10 Offset 28h: PxSSTS – Port x Serial ATA Status (SCR0: SStatus)** to figure out how to get the detection of device.

> 我們從 ahci spec 2.1 PCI Header, 可以看到 achi base address 在 BAR 5
> 又看到 ahci spec 3 HBA Memory Registers, offset 100h 開始, port register
> 在 3.3 Port Registers 中, offset 28h 代表 SStatus
> 然後 linux 又提供了 `pci.h` 這個方便的東西, 所以我的作法是:
>> 利用 `pci_get_device` 使用 vendor id + device id (一個個試硬碟在哪個 pci device) 拿到 pci_device
>> `pci_enable_device`
>> 利用 `pci_resource_start` 拿到 achi base address, 可以拿去跟 `lspci` 的 address 驗證, 一樣
>> address + port register offset (formula 寫在上面, 一個個試硬碟在哪個 port) + 28h
>> 利用 `ioremap` 讀數值, 判斷這個數值是哪個狀態
> [name=ztex]
So the code look like ...
```clike=
#include <linux/pci.h>
#define CONTROLLER1_VENDOR_ID 0x8086
#define CONTROLLER1_DEVICE_ID 0x31e3
#define CONTROLLER2_VENDOR_ID 0x1b4b
#define CONTROLLER2_DEVICE_ID 0x9235
#define REQUIRE_BAR 5
#define SSTATUS_OFFSET 0x28
#define PORT_REGISTER_OFFSET(port_number) (0x100+port_number*0x80)
#define SLOT1_PORT 1 // belongs to controller 2
#define SLOT2_PORT 2 // belongs to controller 2
#define SLOT3_PORT 0 // belongs to controller 1
#define SLOT4_PORT 1 // belongs to controller 1
#define DET_NDNP 0x0
#define DET_PDNP 0x1
#define DET_PDEP 0x3
#define DET_OFFLINE 0x4
unsigned int show_sstatus(unsigned long port_register_base, unsigned long offset) {
unsigned int *reg = ioremap(port_register_base + offset, 4);
unsigned int register_data = *reg;
unsigned int det = register_data & 0xf; // 0:3 device detection
printk(KERN_INFO "sstatus: %x\n", register_data);
switch(det) {
case DET_NDNP:
printk(KERN_INFO "No device detected and Phy communication not established\n");
break;
case DET_PDNP:
printk(KERN_INFO "Device presence detected but Phy communication not established\n");
break;
case DET_PDEP:
printk(KERN_INFO "Device presence detected and Phy communication established\n");
break;
case DET_OFFLINE:
printk(KERN_INFO "Phy in offline mode as a result of the interface being disabled or running in a BIST loopback mode\n");
break;
default:
printk(KERN_INFO "Unrecognize device detection\n");
break;
}
return det;
}
static int __init pci_monitor_init(void)
{
int ret = 0;
size_t i = 0;
/* search for pci device through vendor id, device id*/
controller1.dev = pci_get_device(controller1.vendor, controller1.device, controller1.dev);
if (controller1.dev == NULL) {
printk(KERN_WARNING "Cannot found pci dev (1) through: %x:%x\n", controller1.vendor, controller1.device);
return -1;
}
controller2.dev = pci_get_device(controller2.vendor, controller2.device, controller2.dev);
if (controller2.dev == NULL) {
printk(KERN_WARNING "Cannot found pci dev (2) through: %x:%x\n", controller2.vendor, controller2.device);
return -1;
}
// Enable pci device
ret = pci_enable_device(controller1.dev);
if(ret < 0)
printk(KERN_WARNING "pci enable fail, vendor(%x):device(%x)\n", controller1.vendor, controller1.device);
printk(KERN_WARNING "pci (1) enable success, vendor(%x):device(%x)\n", controller1.vendor, controller1.device);
ret = pci_enable_device(controller2.dev);
if(ret < 0)
printk(KERN_WARNING "pci enable fail, vendor(%x):device(%x)\n", controller2.vendor, controller2.device);
printk(KERN_WARNING "pci (2) enable success, vendor(%x):device(%x)\n", controller2.vendor, controller2.device);
/* Get the I/O base address from the appropriate base address register (bar) in the configuration space */
controller1.io_base = pci_resource_start(controller1.dev, REQUIRE_BAR);
controller2.io_base = pci_resource_start(controller2.dev, REQUIRE_BAR);
/* Assign which pci the slots belong to*/
slots[1].controller = &controller2;
slots[2].controller = &controller2;
slots[3].controller = &controller1;
slots[4].controller = &controller1;
slots[1].port_number = SLOT1_PORT;
slots[2].port_number = SLOT2_PORT;
slots[3].port_number = SLOT3_PORT;
slots[4].port_number = SLOT4_PORT;
/* Get the port register base with port register offset */
for(i = 1; i < 5; i++) {
slots[i].port_base = slots[i].controller->io_base + PORT_REGISTER_OFFSET(slots[i].port_number);
}
for(i = 1; i < 5; i++) {
printk(KERN_INFO "slot%zu port regiseter base: %lx\n", i, slots[i].port_base);
slots[i].detection_state = show_sstatus(slots[i].port_base, SSTATUS_OFFSET);
}
for(i = 1; i < 5; i++) {
printk(KERN_INFO "slot%zu detection state: %u\n", i, slots[i].detection_state);
}
/* register cdev */
pci_monitor_dev = MKDEV(pci_monitor_major, 0);
int alloc_ret = 0;
int cdev_ret = 0;
alloc_ret = alloc_chrdev_region(&pci_monitor_dev, 0, num_of_dev, DRIVER_NAME);
if(alloc_ret < 0) {
if (cdev_ret == 0) cdev_del(&pci_monitor_cdev);
printk(KERN_ALERT "%s driver: alloc_chrdev_region error.\n", DRIVER_NAME);
}
pci_monitor_major = MAJOR(pci_monitor_dev);
cdev_init(&pci_monitor_cdev, &fops);
cdev_ret = cdev_add(&pci_monitor_cdev, pci_monitor_dev, num_of_dev);
if(cdev_ret < 0) {
if (alloc_ret == 0) unregister_chrdev_region(pci_monitor_dev, num_of_dev);
printk(KERN_ALERT "%s driver: cdev_add error.\n", DRIVER_NAME);
}
printk(KERN_WARNING "%s driver(major: %d) installed.\n", DRIVER_NAME, pci_monitor_major);
return 0;
}
static void __exit pci_monitor_exit(void)
{
pci_disable_device(controller1.dev);
pci_disable_device(controller2.dev);
pci_release_region(controller1.dev, REQUIRE_BAR);
pci_release_region(controller2.dev, REQUIRE_BAR);
pci_dev_put(controller1.dev);
pci_dev_put(controller2.dev);
printk(KERN_WARNING "pci uninstall\n");
/* cdev */
unregister_chrdev_region(pci_monitor_dev, num_of_dev);
cdev_del(&pci_monitor_cdev);
}
/*
struct resource * request_region (unsigned long start, unsigned long n, const char *name)
Allocate I/O port region.
struct resource * request_mem_region (unsigned long start, unsigned long n, const char *name)
Allocate I/O memory region.
void release_region (unsigned long start, unsigned long n)
Release I/O port region.
void release_mem_region (unsigned long start, unsigned long n)
Release I/O memory region.
int release_resource (struct resource *res)
Release any resource.
int check_region (unsigned long start, unsigned long n)
Check I/O port region availability.
int check_mem_region (unsigned long start, unsigned long n)
Check I/O memory region availability.
void * ioremap (unsigned long phys_addr, unsigned long size)
Remap I/O memory into kernel address space.
void * ioremap_nocache (unsigned long phys_addr, unsigned long size)
Remap I/O memory into kernel address space (no cache).
void iounmap (void *addr)
Unmap I/O memory from kernel address space.
*/
```
## Translating Addresses in Kernel Space
references:
1. [Translating Addresses in Kernel Space](https://www.tldp.org/LDP/khg/HyperNews/get/devices/addrxlate.html)
2. [How to allow access to memory in a kernel module? [closed]](https://unix.stackexchange.com/questions/296906/how-to-allow-access-to-memory-in-a-kernel-module)
3. [ioremap_nocache函数说明](https://blog.csdn.net/Tommy_wxie/article/details/8539451)
4. [[知乎]老狼:深入PCI与PCIe之二:软件篇](https://www.shuzhiduo.com/A/GBJryXjEz0/)
5. [How To Write Linux PCI Drivers](https://www.kernel.org/doc/html/latest/PCI/pci.html)
6. [Accessing PCI Regions](http://www.embeddedlinux.org.cn/essentiallinuxdevicedrivers/final/ch10lev1sec3.html)
7. [[经典]Linux内核中ioremap映射的透彻理解](https://blog.csdn.net/do2jiang/article/details/5450839)
8. [I/O Resource Management](https://os.inf.tu-dresden.de/l4env/doc/html/dde_linux/group__mod__res.html)
9. [OREILLY Chapter 12. PCI Drivers](https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch12.html)
10. [PCI Support Library](https://www.kernel.org/doc/html/v4.18/driver-api/pci.html#)
11. [2. The PCI Express Port Bus Driver Guide HOWTO](https://www.kernel.org/doc/html/latest/PCI/pciebus-howto.html)
12. [How to write register from linux kernel module (cpu: ARM)](https://stackoverflow.com/questions/16935041/how-to-write-register-from-linux-kernel-module-cpu-arm)
13. [The modernization of PCIe hotplug in Linux](https://lwn.net/Articles/767885/)
14. [Linux Daemon Writing HOWTO](http://www.netzmafia.de/skripten/unix/linux-daemon-howto.html)