Enabling Debug Statements in the Solaris kernel. Introduction Debug statements in the Solaris kernel are extremely useful in figuring out what is going on when controllers keep failing over. The main debug flags we are interested in are RdacDebug, sd_error_level, and ssd_error_level. The flags you use depend on which driver you want to get debug statements from. Solaris debug flags Associated Driver Associated Host Adapter Default Value Debug Value RdacDebug rdriver pseudo/rdnexus 0 1 (or 0xfd, see below) sd_error_level sd SCSI, JNI fibre 4 0 ssd_error_level ssd Most Fibre cards including LSI, QLogic 4 0 Debugging can be turned on via /etc/system every time the system boots, or it can be turned on temporarily via adb. Enabling Debug with /etc/system. Edit /etc/system. At the bottom of the file, add the appropriate lines from the following list: set rdriver:RdacDebug=1 set sd:sd_error_level=0 set ssd:ssd_error_level=0 Enabling Debug with adb Note that changes made to debug settings with adb will be lost the next time the system is rebooted. Start the adb kernel debugger in write mode adb -kw Turn rdac debug information on: RdacDebug/W 1 turn Solaris disk driver debug on sd_error_level/W 0 To turn debug off: RdacDebug/W 0 sd_error_level/W 4 Quit adb with the command dollar-sign Q $q Note: The "/W" must be upper case on a 64 bit system, but lower case on a 32-bit system. Finding Out the Current Debug Level Start the kernel debugger in read-only mode adb -k Show debug levels: sd_error_level/X ssd_error_level/X RdacDebug/X Quit adb $q Note that the /X must be upper case on a 64-bit system, but lower case on a 32-bit system. Fine-Tuning 1: sd_error_level meanings The sd_error_level and ssd_error_level values are defined in /usr/include/sys/scsi/impl/services.h as follows: #define SCSI_ERR_ALL 0 #define SCSI_ERR_UNKNOWN 1 #define SCSI_ERR_INFO 2 #define SCSI_ERR_RECOVERED 3 #define SCSI_ERR_RETRYABLE 4 #define SCSI_ERR_FATAL 5 #define SCSI_ERR_NONE 6 For each setting of sd/ssd_error_level, only errors with severity levels AT or ABOVE the current setting will be displayed. So, sd_error_level of 3 will display recovered, retryable and fatal errors, but not info, unknown, or all "other" errors. Fine-Tuning 2: RdacDebug=0xfd RdacDebug started life as a simple "true" or "false" flag. Most of the debug messages in the RDAC driver simply say: if ( RdacDebug ) print a message. But, as time went on, we had some code that we only wanted to see debug for part of the time. So some statements treat RdacDebug as a "bit vector." RdacDebug=0xfd is turning on the following "bits": 0x80+0x40+0x20+0x10+0x08+0x04+0x01 Most of these bits are meaningless. The only ones that matter are 0x10, 0x04 and 0x02. The messages printed by these various bit flags are classified as follows: RdacDebug & 0x10 prints messages from configuration functions: SdAddPathProp, EditNodeType, RemoveLayeredProp, HideExistingRdacDip, CreatePathProperties, FindNoPseudoArray, SdCheckExposedArrays, SdChkPseudo. RdacDebug & 0x02 prints messages from RDAC daemon timers: "Restart daemon wait timed out", "Resolution daemon wait timed out", "SdALTCheck: called", "RdacRestart Daemon Initialized", "RdacRestart Daemon TIMER", and from one ioctl call: RDAC_DEBUG_ENABLED RdacDebug & 0x04 prints messages from the following ioctl call : RDAC_GET_SLICE_START So: RdacDebug=0xfd prints out debug messages for newly-discovered devices as they are configured, and it prints messages to help tell if the disk's "slice" table looks valid. If you set RdacDebug=0xff, you would also see repeated messages showing you that the rdac daemon was alive and waiting for work. Note that setting RdacDebug=0xff is, in most cases, overkill. 0xfd is the right level to use in most cases. The "0x02" bit is useful only in situations where you suspect the Rdac daemon isn't doing what it is supposed to -- that is, failovers aren't happening, or they are taking too long.