PCI
PCI configuration space
AHCI
Peripheral Component Interconnect[2] (abbreviated PCI, also referred to as Conventional PCI[citation needed] to differentiate from its successor PCI Express) is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format that is independent of any particular processor's native bus. Devices connected to the PCI bus appear to a bus master to be connected directly to its own bus and are assigned addresses in the processor's address space.[3] It is a parallel bus, synchronous to a single bus clock.
see: Peripheral Component Interconnect
PCI 是 local computer bus, 是 PCI Local Bus Standard 的一部分, 用來使其他硬體跟 CPU 溝通. PCI bus 支援處理器 bus 的功能.
PCI 上的 Devices 會被配置位置 (cpu 上的 address space). 這是跟 single bus clock 平行 (parallel) 同步 (synchronous) 的.所謂 local bus 是指直接跟 CPU 連接 (或者幾乎接近) 的 bus, 藉此減少額外產生的 bottleneck
PCI configuration space is the underlying way that the Conventional PCI, PCI-X and PCI Express perform auto configuration of the cards inserted into their bus.
在PCIe的拓撲結構中,最多256個Bus, 每條Bus最多支援32個Device,每個Device最多支援8個Function,所以,由Bus:Device:Function(BDF)構成了每個Function的唯一的"身分證號碼"
對於Legacy PCI來說,Configuration Space有 256 Bytes, 對於PCIe, Configuration Space有 4096 Bytes在 Host 内存中有一个大小256MB的Memory Block, 專門用来存放所有的configuration space. 為甚麼是 256 MB 呢? 每個 function 都有 4KB 的空間所以
256 (Bus) * 32 (Device) * 4KB (function) = 256 MB
這麼多的Function,Host怎么知道他们具有甚麼功能?答案是,每个Function都有一个大小为4KB的configuration space。在系统上電的過程中,在枚舉整個PCI Bus之后,就會將所有的BDF的configuration space讀到Host內存中
對於x86架構的CPU而言,有定義Memory和IO的指令,但沒有配置空間相關的指令。所以需要有一個譯碼器把配置命令翻譯一下,這個譯碼器一般是在 north bridge 裡面,現在Intel的CPU已经自動集程 north bridge,所以現在的CPU可以直接完成翻譯工作。具體來說,有以下兩種方式可以完成對配置空間的訪問。
IO方式(CF8h/CFCh)
Memory方式(ECAM)
IO 方式, 這邊舉 x86 CPU 架構下, CPU提供了兩組I/O寄存器用於訪問配置空間
配置空间控制寄存器 CF8h-CFBh
配置空间数据寄存器 CFCh-CFFh
訪問代碼如下
// for PCI
address = BIT31 | ((Bus & 0xFF) << 16) | ((Dev & 0x1F) << 11) | ((Fun & 0x7) << 8) | (Reg & 0xfffffffc);
// write cfg register
IoWrite8(0xcf8, address);
// read data register
data8 = IoRead8(0xcfc);
data16 = IoRead16(0xcfc);
data32 = IoRead32(0xcfc);
以上的代码适用于PCI和PCIe寄存器的访问,对于PCIe而言,I/O方式只能访问前256个寄存器。
因為 2 ^ 6 (bit 2 ~ 7) * 4 (配置空间控制寄存器 CF8h-CFBh)
Ztex
Bit31代表enable bit。一定要置起来,否则不起作用。
Memory方式(ECAM)
Memory方式访问PCI/PCIe配置空间需要知道MMCFG的基本地址。这个地址是由BIOS program到mmcfg_base寄存器中的。从ECAM中获取PCI/PCIe的寄存器同直接访问内存没有太大的区别,对于c语言,用指针很容易访问。
#define pcie_addr(m, b, d, f, o) (m + ((b & 0xff) << 20) + ((d & 0x1f) << 15) + ((f & 0x7) << 12) + (o & 0xfffffffc)
#define mmio_read8(addr) (*(volatile uint8 *)addr)
#define mmio_write8(addr, data8) *(volatile uint8 *)addr = data8
#define mmio_read16(addr) (*(volatile uint16 *)addr)
#define mmio_write16(addr, data16) *(volatile uint16 *)addr = data16
#define mmio_read32(addr) (*(volatile uint32 *)addr)
#define mmio_write32(addr, data32) *(volatile uint32 *)addr = data32
mmcfg 的 base address 得看 cpu spec
see: PCI configuration space
see: 浅析PCI配置空间
see: PCI/PCIe 的那些事 (2)- 配置空间 (Configuration Space)
see : 硬件设备识别扫盲篇
This specification defines the functional behavior and software interface of the Advanced Host Controller Interface, which is a hardware mechanism that allows software to communicate with Serial ATA devices.
AHCI is a PCI class device that acts as a data movement engine between system memory and Serial ATA devices.
AHCI host devices (referred to as host bus adapters, or HBA) support from 1 to 32 ports.
An HBA must support ATA and ATAPI devices, and must support both the PIO and DMA protocols.
An HBA may optionally support a command list on each port for overhead reduction, and to support Serial ATA Native Command Queuing via the FPDMA Queued Command protocol for each device of up to 32 entries.
An HBA may optionally support 64-bit addressing.
AHCI describes a system memory structure which contains a generic area for control and status, and a table of entries describing a command list (an HBA which does not support a command list shall have a depth of one for this table).
Each command list entry contains information necessary to program an
SATA device, and a pointer to a descriptor table for transferring data between system memory and the device.
see: spec
references:
/proc/iomem
describes the I/O mapping of all of the device in memory space
$> cat /proc/iomem
...
e0000000-e3ffffff : PCI MMCONFIG 0000 [bus 00-3f]
e0000000-e3ffffff : reserved
fea00000-feafffff : PCI Bus 0000:00
fea00000-feafffff : pnp 00:01
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed00000-fed003ff : PCI Bus 0000:00
fed01000-fed01fff : reserved
fed01000-fed01fff : PCI Bus 0000:00
fed01000-fed01fff : pnp 00:01
fed03000-fed03fff : PCI Bus 0000:00
fed03000-fed03fff : pnp 00:01
fed06000-fed06fff : PCI Bus 0000:00
fed06000-fed06fff : pnp 00:01
fed08000-fed09fff : PCI Bus 0000:00
fed08000-fed09fff : pnp 00:01
fed1c000-fed1cfff : PCI Bus 0000:00
fed1c000-fed1cfff : pnp 00:01
fed40000-fed44fff : MSFT0101:00
fed64000-fed64fff : dmar0
fed65000-fed65fff : dmar1
fed80000-fedbffff : PCI Bus 0000:00
fed80000-fedbffff : pnp 00:01
fee00000-feefffff : PCI Bus 0000:00
fee00000-feefffff : pnp 00:01
fee00000-fee00fff : Local APIC
ff800000-ffffffff : reserved
100000000-17fffffff : System RAM
假設我們今天拿了個 32-bits 系統, 傳統上 Linux kernel 切 3/1
所以 0x00000000-0xc0000000 是 user memory, 0xc0000000-0xffffffff 是 kernel memory
所以一次 kernel 最多可以 mapping 1GiB physical memory, 實際上更少 ~896MiB
這邊看到的是request_mem_region
的 memory, 用來像大家所這一段記憶體我用了
ztex
lspci
$> lspci -v
...
00:12.0 Class 0106: Device 8086:31e3 (rev 06) (prog-if 01)
Subsystem: Device 7270:8086
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 341
Memory at a1310000 (32-bit, non-prefetchable) [size=8K]
Memory at a131b000 (32-bit, non-prefetchable) [size=256]
I/O ports at 4080 [size=8]
I/O ports at 4088 [size=4]
I/O ports at 4060 [size=32]
Memory at a1319000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Kernel driver in use: ahci
...
01:00.0 Class 0106: Device 1b4b:9235 (rev 11) (prog-if 01)
Subsystem: Device 1b4b:9235
Flags: bus master, fast devsel, latency 0, IRQ 342
I/O ports at 3028 [size=8]
I/O ports at 3034 [size=4]
I/O ports at 3020 [size=8]
I/O ports at 3030 [size=4]
I/O ports at 3000 [size=32]
Memory at a1200000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at a1210000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [e0] SATA HBA v0.0
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci
...
what does + -1b.0- [02-3a]-
means?
1b.0 is a slot and function number of the PCIe root hub. In this case, it contains a PCIe bridge. The busses behind this bridge would be numbered 02 to 3a, even though there are currently no devices attached to them.
In a similar way, your GPU is behind the bridge 01.0, and your LAN controller behind the bridge 1d.0, which may be an internal bridge.
vendor id
:device id
Searching by vendor and device ID:
struct pci_dev *dev = NULL;
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
configure_device(dev);
Searching by class ID (iterate in a similar way):
pci_get_class(CLASS_ID, dev)
Searching by both vendor/device and subsystem vendor/device ID:
pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
用
lspci
看 pci buses 跟 pci device. 因為這邊我們要控制 sata disk, 所以找到achi
driver. (achi 是一種跟 sata device 溝通的 interface)
以 1b.0- [02-3a] 當一個例子, 1b.0 分別是 slot 跟 function number. 這個 hub 有一個 bridge, 下面的 [02-3a] 是 buses
我們可以看到 achi controller 的 base address 分別是a1319000
跟a1200000
我們要透過個 base address 加上 offset 找到 port register
ztex
#define REQUIRE_BAR 5
io_base = pci_resource_start(dev, REQUIRE_BAR);
/*
also see:
pci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain,
bus and slot and number. If the device is
found, its reference count is increased.
pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3)
pci_find_capability() Find specified capability in device's capability
list.
pci_resource_start() Returns bus start address for a given PCI region
pci_resource_end() Returns bus end address for a given PCI region
pci_resource_len() Returns the byte length of a PCI region
pci_set_drvdata() Set private driver data pointer for a pci_dev
pci_get_drvdata() Return private driver data pointer for a pci_dev
pci_set_mwi() Enable Memory-Write-Invalidate transactions.
pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
*/
in pci.h
#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)
#define pci_resource_end(dev, bar) ((dev)->resource[(bar)].end)
#define pci_resource_flags(dev, bar) ((dev)->resource[(bar)].flags)
#define pci_resource_len(dev,bar) \
((pci_resource_start((dev), (bar)) == 0 && \
pci_resource_end((dev), (bar)) == \
pci_resource_start((dev), (bar))) ? 0 : \
\
(pci_resource_end((dev), (bar)) - \
pci_resource_start((dev), (bar)) + 1))
After this we need to get the port register to get the SSTATUS
see ahci spec 3 HBA Memory Registers
, which implies that the offset of port 1 register is 100h
The formula for computing offset of a given port register is in 3.3 Port Registers (one set per port)
plus we look 3.3.10 Offset 28h: PxSSTS – Port x Serial ATA Status (SCR0: SStatus) to figure out how to get the detection of device.
我們從 ahci spec 2.1 PCI Header, 可以看到 achi base address 在 BAR 5
又看到 ahci spec 3 HBA Memory Registers, offset 100h 開始, port register
在 3.3 Port Registers 中, offset 28h 代表 SStatus
然後 linux 又提供了pci.h
這個方便的東西, 所以我的作法是:利用
pci_get_device
使用 vendor id + device id (一個個試硬碟在哪個 pci device) 拿到 pci_device
pci_enable_device
利用pci_resource_start
拿到 achi base address, 可以拿去跟lspci
的 address 驗證, 一樣
address + port register offset (formula 寫在上面, 一個個試硬碟在哪個 port) + 28h
利用ioremap
讀數值, 判斷這個數值是哪個狀態
ztex
So the code look like …
#include <linux/pci.h>
#define CONTROLLER1_VENDOR_ID 0x8086
#define CONTROLLER1_DEVICE_ID 0x31e3
#define CONTROLLER2_VENDOR_ID 0x1b4b
#define CONTROLLER2_DEVICE_ID 0x9235
#define REQUIRE_BAR 5
#define SSTATUS_OFFSET 0x28
#define PORT_REGISTER_OFFSET(port_number) (0x100+port_number*0x80)
#define SLOT1_PORT 1 // belongs to controller 2
#define SLOT2_PORT 2 // belongs to controller 2
#define SLOT3_PORT 0 // belongs to controller 1
#define SLOT4_PORT 1 // belongs to controller 1
#define DET_NDNP 0x0
#define DET_PDNP 0x1
#define DET_PDEP 0x3
#define DET_OFFLINE 0x4
unsigned int show_sstatus(unsigned long port_register_base, unsigned long offset) {
unsigned int *reg = ioremap(port_register_base + offset, 4);
unsigned int register_data = *reg;
unsigned int det = register_data & 0xf; // 0:3 device detection
printk(KERN_INFO "sstatus: %x\n", register_data);
switch(det) {
case DET_NDNP:
printk(KERN_INFO "No device detected and Phy communication not established\n");
break;
case DET_PDNP:
printk(KERN_INFO "Device presence detected but Phy communication not established\n");
break;
case DET_PDEP:
printk(KERN_INFO "Device presence detected and Phy communication established\n");
break;
case DET_OFFLINE:
printk(KERN_INFO "Phy in offline mode as a result of the interface being disabled or running in a BIST loopback mode\n");
break;
default:
printk(KERN_INFO "Unrecognize device detection\n");
break;
}
return det;
}
static int __init pci_monitor_init(void)
{
int ret = 0;
size_t i = 0;
/* search for pci device through vendor id, device id*/
controller1.dev = pci_get_device(controller1.vendor, controller1.device, controller1.dev);
if (controller1.dev == NULL) {
printk(KERN_WARNING "Cannot found pci dev (1) through: %x:%x\n", controller1.vendor, controller1.device);
return -1;
}
controller2.dev = pci_get_device(controller2.vendor, controller2.device, controller2.dev);
if (controller2.dev == NULL) {
printk(KERN_WARNING "Cannot found pci dev (2) through: %x:%x\n", controller2.vendor, controller2.device);
return -1;
}
// Enable pci device
ret = pci_enable_device(controller1.dev);
if(ret < 0)
printk(KERN_WARNING "pci enable fail, vendor(%x):device(%x)\n", controller1.vendor, controller1.device);
printk(KERN_WARNING "pci (1) enable success, vendor(%x):device(%x)\n", controller1.vendor, controller1.device);
ret = pci_enable_device(controller2.dev);
if(ret < 0)
printk(KERN_WARNING "pci enable fail, vendor(%x):device(%x)\n", controller2.vendor, controller2.device);
printk(KERN_WARNING "pci (2) enable success, vendor(%x):device(%x)\n", controller2.vendor, controller2.device);
/* Get the I/O base address from the appropriate base address register (bar) in the configuration space */
controller1.io_base = pci_resource_start(controller1.dev, REQUIRE_BAR);
controller2.io_base = pci_resource_start(controller2.dev, REQUIRE_BAR);
/* Assign which pci the slots belong to*/
slots[1].controller = &controller2;
slots[2].controller = &controller2;
slots[3].controller = &controller1;
slots[4].controller = &controller1;
slots[1].port_number = SLOT1_PORT;
slots[2].port_number = SLOT2_PORT;
slots[3].port_number = SLOT3_PORT;
slots[4].port_number = SLOT4_PORT;
/* Get the port register base with port register offset */
for(i = 1; i < 5; i++) {
slots[i].port_base = slots[i].controller->io_base + PORT_REGISTER_OFFSET(slots[i].port_number);
}
for(i = 1; i < 5; i++) {
printk(KERN_INFO "slot%zu port regiseter base: %lx\n", i, slots[i].port_base);
slots[i].detection_state = show_sstatus(slots[i].port_base, SSTATUS_OFFSET);
}
for(i = 1; i < 5; i++) {
printk(KERN_INFO "slot%zu detection state: %u\n", i, slots[i].detection_state);
}
/* register cdev */
pci_monitor_dev = MKDEV(pci_monitor_major, 0);
int alloc_ret = 0;
int cdev_ret = 0;
alloc_ret = alloc_chrdev_region(&pci_monitor_dev, 0, num_of_dev, DRIVER_NAME);
if(alloc_ret < 0) {
if (cdev_ret == 0) cdev_del(&pci_monitor_cdev);
printk(KERN_ALERT "%s driver: alloc_chrdev_region error.\n", DRIVER_NAME);
}
pci_monitor_major = MAJOR(pci_monitor_dev);
cdev_init(&pci_monitor_cdev, &fops);
cdev_ret = cdev_add(&pci_monitor_cdev, pci_monitor_dev, num_of_dev);
if(cdev_ret < 0) {
if (alloc_ret == 0) unregister_chrdev_region(pci_monitor_dev, num_of_dev);
printk(KERN_ALERT "%s driver: cdev_add error.\n", DRIVER_NAME);
}
printk(KERN_WARNING "%s driver(major: %d) installed.\n", DRIVER_NAME, pci_monitor_major);
return 0;
}
static void __exit pci_monitor_exit(void)
{
pci_disable_device(controller1.dev);
pci_disable_device(controller2.dev);
pci_release_region(controller1.dev, REQUIRE_BAR);
pci_release_region(controller2.dev, REQUIRE_BAR);
pci_dev_put(controller1.dev);
pci_dev_put(controller2.dev);
printk(KERN_WARNING "pci uninstall\n");
/* cdev */
unregister_chrdev_region(pci_monitor_dev, num_of_dev);
cdev_del(&pci_monitor_cdev);
}
/*
struct resource * request_region (unsigned long start, unsigned long n, const char *name)
Allocate I/O port region.
struct resource * request_mem_region (unsigned long start, unsigned long n, const char *name)
Allocate I/O memory region.
void release_region (unsigned long start, unsigned long n)
Release I/O port region.
void release_mem_region (unsigned long start, unsigned long n)
Release I/O memory region.
int release_resource (struct resource *res)
Release any resource.
int check_region (unsigned long start, unsigned long n)
Check I/O port region availability.
int check_mem_region (unsigned long start, unsigned long n)
Check I/O memory region availability.
void * ioremap (unsigned long phys_addr, unsigned long size)
Remap I/O memory into kernel address space.
void * ioremap_nocache (unsigned long phys_addr, unsigned long size)
Remap I/O memory into kernel address space (no cache).
void iounmap (void *addr)
Unmap I/O memory from kernel address space.
*/
references: