1 2 The Linux IPMI Driver 3 --------------------- 4 Corey Minyard 5 <minyard@mvista.com> 6 <minyard@acm.org> 7 8The Intelligent Platform Management Interface, or IPMI, is a 9standard for controlling intelligent devices that monitor a system. 10It provides for dynamic discovery of sensors in the system and the 11ability to monitor the sensors and be informed when the sensor's 12values change or go outside certain boundaries. It also has a 13standardized database for field-replacable units (FRUs) and a watchdog 14timer. 15 16To use this, you need an interface to an IPMI controller in your 17system (called a Baseboard Management Controller, or BMC) and 18management software that can use the IPMI system. 19 20This document describes how to use the IPMI driver for Linux. If you 21are not familiar with IPMI itself, see the web site at 22http://www.intel.com/design/servers/ipmi/index.htm. IPMI is a big 23subject and I can't cover it all here! 24 25Configuration 26------------- 27 28The Linux IPMI driver is modular, which means you have to pick several 29things to have it work right depending on your hardware. Most of 30these are available in the 'Character Devices' menu then the IPMI 31menu. 32 33No matter what, you must pick 'IPMI top-level message handler' to use 34IPMI. What you do beyond that depends on your needs and hardware. 35 36The message handler does not provide any user-level interfaces. 37Kernel code (like the watchdog) can still use it. If you need access 38from userland, you need to select 'Device interface for IPMI' if you 39want access through a device driver. 40 41The driver interface depends on your hardware. If your system 42properly provides the SMBIOS info for IPMI, the driver will detect it 43and just work. If you have a board with a standard interface (These 44will generally be either "KCS", "SMIC", or "BT", consult your hardware 45manual), choose the 'IPMI SI handler' option. A driver also exists 46for direct I2C access to the IPMI management controller. Some boards 47support this, but it is unknown if it will work on every board. For 48this, choose 'IPMI SMBus handler', but be ready to try to do some 49figuring to see if it will work on your system if the SMBIOS/APCI 50information is wrong or not present. It is fairly safe to have both 51these enabled and let the drivers auto-detect what is present. 52 53You should generally enable ACPI on your system, as systems with IPMI 54can have ACPI tables describing them. 55 56If you have a standard interface and the board manufacturer has done 57their job correctly, the IPMI controller should be automatically 58detected (via ACPI or SMBIOS tables) and should just work. Sadly, 59many boards do not have this information. The driver attempts 60standard defaults, but they may not work. If you fall into this 61situation, you need to read the section below named 'The SI Driver' or 62"The SMBus Driver" on how to hand-configure your system. 63 64IPMI defines a standard watchdog timer. You can enable this with the 65'IPMI Watchdog Timer' config option. If you compile the driver into 66the kernel, then via a kernel command-line option you can have the 67watchdog timer start as soon as it intitializes. It also have a lot 68of other options, see the 'Watchdog' section below for more details. 69Note that you can also have the watchdog continue to run if it is 70closed (by default it is disabled on close). Go into the 'Watchdog 71Cards' menu, enable 'Watchdog Timer Support', and enable the option 72'Disable watchdog shutdown on close'. 73 74IPMI systems can often be powered off using IPMI commands. Select 75'IPMI Poweroff' to do this. The driver will auto-detect if the system 76can be powered off by IPMI. It is safe to enable this even if your 77system doesn't support this option. This works on ATCA systems, the 78Radisys CPI1 card, and any IPMI system that supports standard chassis 79management commands. 80 81If you want the driver to put an event into the event log on a panic, 82enable the 'Generate a panic event to all BMCs on a panic' option. If 83you want the whole panic string put into the event log using OEM 84events, enable the 'Generate OEM events containing the panic string' 85option. 86 87Basic Design 88------------ 89 90The Linux IPMI driver is designed to be very modular and flexible, you 91only need to take the pieces you need and you can use it in many 92different ways. Because of that, it's broken into many chunks of 93code. These chunks (by module name) are: 94 95ipmi_msghandler - This is the central piece of software for the IPMI 96system. It handles all messages, message timing, and responses. The 97IPMI users tie into this, and the IPMI physical interfaces (called 98System Management Interfaces, or SMIs) also tie in here. This 99provides the kernelland interface for IPMI, but does not provide an 100interface for use by application processes. 101 102ipmi_devintf - This provides a userland IOCTL interface for the IPMI 103driver, each open file for this device ties in to the message handler 104as an IPMI user. 105 106ipmi_si - A driver for various system interfaces. This supports KCS, 107SMIC, and BT interfaces. Unless you have an SMBus interface or your 108own custom interface, you probably need to use this. 109 110ipmi_smb - A driver for accessing BMCs on the SMBus. It uses the 111I2C kernel driver's SMBus interfaces to send and receive IPMI messages 112over the SMBus. 113 114ipmi_watchdog - IPMI requires systems to have a very capable watchdog 115timer. This driver implements the standard Linux watchdog timer 116interface on top of the IPMI message handler. 117 118ipmi_poweroff - Some systems support the ability to be turned off via 119IPMI commands. 120 121These are all individually selectable via configuration options. 122 123Note that the KCS-only interface has been removed. The af_ipmi driver 124is no longer supported and has been removed because it was impossible 125to do 32 bit emulation on 64-bit kernels with it. 126 127Much documentation for the interface is in the include files. The 128IPMI include files are: 129 130net/af_ipmi.h - Contains the socket interface. 131 132linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI. 133 134linux/ipmi_smi.h - Contains the interface for system management interfaces 135(things that interface to IPMI controllers) to use. 136 137linux/ipmi_msgdefs.h - General definitions for base IPMI messaging. 138 139 140Addressing 141---------- 142 143The IPMI addressing works much like IP addresses, you have an overlay 144to handle the different address types. The overlay is: 145 146 struct ipmi_addr 147 { 148 int addr_type; 149 short channel; 150 char data[IPMI_MAX_ADDR_SIZE]; 151 }; 152 153The addr_type determines what the address really is. The driver 154currently understands two different types of addresses. 155 156"System Interface" addresses are defined as: 157 158 struct ipmi_system_interface_addr 159 { 160 int addr_type; 161 short channel; 162 }; 163 164and the type is IPMI_SYSTEM_INTERFACE_ADDR_TYPE. This is used for talking 165straight to the BMC on the current card. The channel must be 166IPMI_BMC_CHANNEL. 167 168Messages that are destined to go out on the IPMB bus use the 169IPMI_IPMB_ADDR_TYPE address type. The format is 170 171 struct ipmi_ipmb_addr 172 { 173 int addr_type; 174 short channel; 175 unsigned char slave_addr; 176 unsigned char lun; 177 }; 178 179The "channel" here is generally zero, but some devices support more 180than one channel, it corresponds to the channel as defined in the IPMI 181spec. 182 183 184Messages 185-------- 186 187Messages are defined as: 188 189struct ipmi_msg 190{ 191 unsigned char netfn; 192 unsigned char lun; 193 unsigned char cmd; 194 unsigned char *data; 195 int data_len; 196}; 197 198The driver takes care of adding/stripping the header information. The 199data portion is just the data to be send (do NOT put addressing info 200here) or the response. Note that the completion code of a response is 201the first item in "data", it is not stripped out because that is how 202all the messages are defined in the spec (and thus makes counting the 203offsets a little easier :-). 204 205When using the IOCTL interface from userland, you must provide a block 206of data for "data", fill it, and set data_len to the length of the 207block of data, even when receiving messages. Otherwise the driver 208will have no place to put the message. 209 210Messages coming up from the message handler in kernelland will come in 211as: 212 213 struct ipmi_recv_msg 214 { 215 struct list_head link; 216 217 /* The type of message as defined in the "Receive Types" 218 defines above. */ 219 int recv_type; 220 221 ipmi_user_t *user; 222 struct ipmi_addr addr; 223 long msgid; 224 struct ipmi_msg msg; 225 226 /* Call this when done with the message. It will presumably free 227 the message and do any other necessary cleanup. */ 228 void (*done)(struct ipmi_recv_msg *msg); 229 230 /* Place-holder for the data, don't make any assumptions about 231 the size or existence of this, since it may change. */ 232 unsigned char msg_data[IPMI_MAX_MSG_LENGTH]; 233 }; 234 235You should look at the receive type and handle the message 236appropriately. 237 238 239The Upper Layer Interface (Message Handler) 240------------------------------------------- 241 242The upper layer of the interface provides the users with a consistent 243view of the IPMI interfaces. It allows multiple SMI interfaces to be 244addressed (because some boards actually have multiple BMCs on them) 245and the user should not have to care what type of SMI is below them. 246 247 248Creating the User 249 250To user the message handler, you must first create a user using 251ipmi_create_user. The interface number specifies which SMI you want 252to connect to, and you must supply callback functions to be called 253when data comes in. The callback function can run at interrupt level, 254so be careful using the callbacks. This also allows to you pass in a 255piece of data, the handler_data, that will be passed back to you on 256all calls. 257 258Once you are done, call ipmi_destroy_user() to get rid of the user. 259 260From userland, opening the device automatically creates a user, and 261closing the device automatically destroys the user. 262 263 264Messaging 265 266To send a message from kernel-land, the ipmi_request() call does 267pretty much all message handling. Most of the parameter are 268self-explanatory. However, it takes a "msgid" parameter. This is NOT 269the sequence number of messages. It is simply a long value that is 270passed back when the response for the message is returned. You may 271use it for anything you like. 272 273Responses come back in the function pointed to by the ipmi_recv_hndl 274field of the "handler" that you passed in to ipmi_create_user(). 275Remember again, these may be running at interrupt level. Remember to 276look at the receive type, too. 277 278From userland, you fill out an ipmi_req_t structure and use the 279IPMICTL_SEND_COMMAND ioctl. For incoming stuff, you can use select() 280or poll() to wait for messages to come in. However, you cannot use 281read() to get them, you must call the IPMICTL_RECEIVE_MSG with the 282ipmi_recv_t structure to actually get the message. Remember that you 283must supply a pointer to a block of data in the msg.data field, and 284you must fill in the msg.data_len field with the size of the data. 285This gives the receiver a place to actually put the message. 286 287If the message cannot fit into the data you provide, you will get an 288EMSGSIZE error and the driver will leave the data in the receive 289queue. If you want to get it and have it truncate the message, us 290the IPMICTL_RECEIVE_MSG_TRUNC ioctl. 291 292When you send a command (which is defined by the lowest-order bit of 293the netfn per the IPMI spec) on the IPMB bus, the driver will 294automatically assign the sequence number to the command and save the 295command. If the response is not receive in the IPMI-specified 5 296seconds, it will generate a response automatically saying the command 297timed out. If an unsolicited response comes in (if it was after 5 298seconds, for instance), that response will be ignored. 299 300In kernelland, after you receive a message and are done with it, you 301MUST call ipmi_free_recv_msg() on it, or you will leak messages. Note 302that you should NEVER mess with the "done" field of a message, that is 303required to properly clean up the message. 304 305Note that when sending, there is an ipmi_request_supply_msgs() call 306that lets you supply the smi and receive message. This is useful for 307pieces of code that need to work even if the system is out of buffers 308(the watchdog timer uses this, for instance). You supply your own 309buffer and own free routines. This is not recommended for normal use, 310though, since it is tricky to manage your own buffers. 311 312 313Events and Incoming Commands 314 315The driver takes care of polling for IPMI events and receiving 316commands (commands are messages that are not responses, they are 317commands that other things on the IPMB bus have sent you). To receive 318these, you must register for them, they will not automatically be sent 319to you. 320 321To receive events, you must call ipmi_set_gets_events() and set the 322"val" to non-zero. Any events that have been received by the driver 323since startup will immediately be delivered to the first user that 324registers for events. After that, if multiple users are registered 325for events, they will all receive all events that come in. 326 327For receiving commands, you have to individually register commands you 328want to receive. Call ipmi_register_for_cmd() and supply the netfn 329and command name for each command you want to receive. Only one user 330may be registered for each netfn/cmd, but different users may register 331for different commands. 332 333From userland, equivalent IOCTLs are provided to do these functions. 334 335 336The Lower Layer (SMI) Interface 337------------------------------- 338 339As mentioned before, multiple SMI interfaces may be registered to the 340message handler, each of these is assigned an interface number when 341they register with the message handler. They are generally assigned 342in the order they register, although if an SMI unregisters and then 343another one registers, all bets are off. 344 345The ipmi_smi.h defines the interface for management interfaces, see 346that for more details. 347 348 349The SI Driver 350------------- 351 352The SI driver allows up to 4 KCS or SMIC interfaces to be configured 353in the system. By default, scan the ACPI tables for interfaces, and 354if it doesn't find any the driver will attempt to register one KCS 355interface at the spec-specified I/O port 0xca2 without interrupts. 356You can change this at module load time (for a module) with: 357 358 modprobe ipmi_si.o type=<type1>,<type2>.... 359 ports=<port1>,<port2>... addrs=<addr1>,<addr2>... 360 irqs=<irq1>,<irq2>... trydefaults=[0|1] 361 regspacings=<sp1>,<sp2>,... regsizes=<size1>,<size2>,... 362 regshifts=<shift1>,<shift2>,... 363 364Each of these except si_trydefaults is a list, the first item for the 365first interface, second item for the second interface, etc. 366 367The si_type may be either "kcs", "smic", or "bt". If you leave it blank, it 368defaults to "kcs". 369 370If you specify si_addrs as non-zero for an interface, the driver will 371use the memory address given as the address of the device. This 372overrides si_ports. 373 374If you specify si_ports as non-zero for an interface, the driver will 375use the I/O port given as the device address. 376 377If you specify si_irqs as non-zero for an interface, the driver will 378attempt to use the given interrupt for the device. 379 380si_trydefaults sets whether the standard IPMI interface at 0xca2 and 381any interfaces specified by ACPE are tried. By default, the driver 382tries it, set this value to zero to turn this off. 383 384The next three parameters have to do with register layout. The 385registers used by the interfaces may not appear at successive 386locations and they may not be in 8-bit registers. These parameters 387allow the layout of the data in the registers to be more precisely 388specified. 389 390The regspacings parameter give the number of bytes between successive 391register start addresses. For instance, if the regspacing is set to 4 392and the start address is 0xca2, then the address for the second 393register would be 0xca6. This defaults to 1. 394 395The regsizes parameter gives the size of a register, in bytes. The 396data used by IPMI is 8-bits wide, but it may be inside a larger 397register. This parameter allows the read and write type to specified. 398It may be 1, 2, 4, or 8. The default is 1. 399 400Since the register size may be larger than 32 bits, the IPMI data may not 401be in the lower 8 bits. The regshifts parameter give the amount to shift 402the data to get to the actual IPMI data. 403 404When compiled into the kernel, the addresses can be specified on the 405kernel command line as: 406 407 ipmi_si.type=<type1>,<type2>... 408 ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>... 409 ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1] 410 ipmi_si.regspacings=<sp1>,<sp2>,... 411 ipmi_si.regsizes=<size1>,<size2>,... 412 ipmi_si.regshifts=<shift1>,<shift2>,... 413 414It works the same as the module parameters of the same names. 415 416By default, the driver will attempt to detect any device specified by 417ACPI, and if none of those then a KCS device at the spec-specified 4180xca2. If you want to turn this off, set the "trydefaults" option to 419false. 420 421If you have high-res timers compiled into the kernel, the driver will 422use them to provide much better performance. Note that if you do not 423have high-res timers enabled in the kernel and you don't have 424interrupts enabled, the driver will run VERY slowly. Don't blame me, 425these interfaces suck. 426 427 428The SMBus Driver 429---------------- 430 431The SMBus driver allows up to 4 SMBus devices to be configured in the 432system. By default, the driver will register any SMBus interfaces it finds 433in the I2C address range of 0x20 to 0x4f on any adapter. You can change this 434at module load time (for a module) with: 435 436 modprobe ipmi_smb.o 437 addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]] 438 dbg=<flags1>,<flags2>... 439 slave_addrs=<addr1>,<addr2>,... 440 [defaultprobe=0] [dbg_probe=1] 441 force_kipmid=<enable1>,<enable2>,... 442 443The addresses are specified in pairs, the first is the adapter ID and the 444second is the I2C address on that adapter. 445 446The debug flags are bit flags for each BMC found, they are: 447IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8 448 449Setting smb_defaultprobe to zero disabled the default probing of SMBus 450interfaces at address range 0x20 to 0x4f. This means that only the 451BMCs specified on the smb_addr line will be detected. 452 453Setting smb_dbg_probe to 1 will enable debugging of the probing and 454detection process for BMCs on the SMBusses. 455 456Discovering the IPMI compilant BMC on the SMBus can cause devices 457on the I2C bus to fail. The SMBus driver writes a "Get Device ID" IPMI 458message as a block write to the I2C bus and waits for a response. 459This action can be detrimental to some I2C devices. It is highly recommended 460that the known I2c address be given to the SMBus driver in the smb_addr 461parameter. The default adrress range will not be used when a smb_addr 462parameter is provided. 463 464The force_kipmid parameter forcefully enables (if set to 1) or disables 465(if set to 0) the kernel IPMI daemon. Normally this is auto-detected 466by the driver, but systems with broken interrupts might need an enable, 467or users that don't want the daemon (don't need the performance, don't 468want the CPU hit) can disable it. 469 470When compiled into the kernel, the addresses can be specified on the 471kernel command line as: 472 473 ipmb_smb.addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]] 474 ipmi_smb.dbg=<flags1>,<flags2>... 475 ipmi_smb.defaultprobe=0 ipmi_smb.dbg_probe=1 476 ipmi_smb.force_kipmid=<enable1>,<enable2>,... 477 478These are the same options as on the module command line. 479 480Note that you might need some I2C changes if CONFIG_IPMI_PANIC_EVENT 481is enabled along with this, so the I2C driver knows to run to 482completion during sending a panic event. 483 484 485Other Pieces 486------------ 487 488Watchdog 489-------- 490 491A watchdog timer is provided that implements the Linux-standard 492watchdog timer interface. It has three module parameters that can be 493used to control it: 494 495 modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type> 496 preaction=<preaction type> preop=<preop type> start_now=x 497 nowayout=x 498 499The timeout is the number of seconds to the action, and the pretimeout 500is the amount of seconds before the reset that the pre-timeout panic will 501occur (if pretimeout is zero, then pretimeout will not be enabled). Note 502that the pretimeout is the time before the final timeout. So if the 503timeout is 50 seconds and the pretimeout is 10 seconds, then the pretimeout 504will occur in 40 second (10 seconds before the timeout). 505 506The action may be "reset", "power_cycle", or "power_off", and 507specifies what to do when the timer times out, and defaults to 508"reset". 509 510The preaction may be "pre_smi" for an indication through the SMI 511interface, "pre_int" for an indication through the SMI with an 512interrupts, and "pre_nmi" for a NMI on a preaction. This is how 513the driver is informed of the pretimeout. 514 515The preop may be set to "preop_none" for no operation on a pretimeout, 516"preop_panic" to set the preoperation to panic, or "preop_give_data" 517to provide data to read from the watchdog device when the pretimeout 518occurs. A "pre_nmi" setting CANNOT be used with "preop_give_data" 519because you can't do data operations from an NMI. 520 521When preop is set to "preop_give_data", one byte comes ready to read 522on the device when the pretimeout occurs. Select and fasync work on 523the device, as well. 524 525If start_now is set to 1, the watchdog timer will start running as 526soon as the driver is loaded. 527 528If nowayout is set to 1, the watchdog timer will not stop when the 529watchdog device is closed. The default value of nowayout is true 530if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not. 531 532When compiled into the kernel, the kernel command line is available 533for configuring the watchdog: 534 535 ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t> 536 ipmi_watchdog.action=<action type> 537 ipmi_watchdog.preaction=<preaction type> 538 ipmi_watchdog.preop=<preop type> 539 ipmi_watchdog.start_now=x 540 ipmi_watchdog.nowayout=x 541 542The options are the same as the module parameter options. 543 544The watchdog will panic and start a 120 second reset timeout if it 545gets a pre-action. During a panic or a reboot, the watchdog will 546start a 120 timer if it is running to make sure the reboot occurs. 547 548Note that if you use the NMI preaction for the watchdog, you MUST 549NOT use nmi watchdog mode 1. If you use the NMI watchdog, you 550must use mode 2. 551 552Once you open the watchdog timer, you must write a 'V' character to the 553device to close it, or the timer will not stop. This is a new semantic 554for the driver, but makes it consistent with the rest of the watchdog 555drivers in Linux. 556 557 558Panic Timeouts 559-------------- 560 561The OpenIPMI driver supports the ability to put semi-custom and custom 562events in the system event log if a panic occurs. if you enable the 563'Generate a panic event to all BMCs on a panic' option, you will get 564one event on a panic in a standard IPMI event format. If you enable 565the 'Generate OEM events containing the panic string' option, you will 566also get a bunch of OEM events holding the panic string. 567 568 569The field settings of the events are: 570* Generator ID: 0x21 (kernel) 571* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format) 572* Sensor Type: 0x20 (OS critical stop sensor) 573* Sensor #: The first byte of the panic string (0 if no panic string) 574* Event Dir | Event Type: 0x6f (Assertion, sensor-specific event info) 575* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3) 576* Event data 2: second byte of panic string 577* Event data 3: third byte of panic string 578See the IPMI spec for the details of the event layout. This event is 579always sent to the local management controller. It will handle routing 580the message to the right place 581 582Other OEM events have the following format: 583Record ID (bytes 0-1): Set by the SEL. 584Record type (byte 2): 0xf0 (OEM non-timestamped) 585byte 3: The slave address of the card saving the panic 586byte 4: A sequence number (starting at zero) 587The rest of the bytes (11 bytes) are the panic string. If the panic string 588is longer than 11 bytes, multiple messages will be sent with increasing 589sequence numbers. 590 591Because you cannot send OEM events using the standard interface, this 592function will attempt to find an SEL and add the events there. It 593will first query the capabilities of the local management controller. 594If it has an SEL, then they will be stored in the SEL of the local 595management controller. If not, and the local management controller is 596an event generator, the event receiver from the local management 597controller will be queried and the events sent to the SEL on that 598device. Otherwise, the events go nowhere since there is nowhere to 599send them. 600 601 602Poweroff 603-------- 604 605If the poweroff capability is selected, the IPMI driver will install 606a shutdown function into the standard poweroff function pointer. This 607is in the ipmi_poweroff module. When the system requests a powerdown, 608it will send the proper IPMI commands to do this. This is supported on 609several platforms. 610 611There is a module parameter named "poweroff_control" that may either be zero 612(do a power down) or 2 (do a power cycle, power the system off, then power 613it on in a few seconds). Setting ipmi_poweroff.poweroff_control=x will do 614the same thing on the kernel command line. The parameter is also available 615via the proc filesystem in /proc/ipmi/poweroff_control. Note that if the 616system does not support power cycling, it will always to the power off. 617 618Note that if you have ACPI enabled, the system will prefer using ACPI to 619power off. 620

