Fault Manager is part of self-healing functionality that provides fault isolation and component restart, in this case hardware component (SMF will take care of software components). Make sure that you run the service and have required packages.
# pkginfo |grep fmd system SUNWfmd Fault Management Daemon and Utilities system SUNWfmdr Fault Management Daemon and Utilities (Root) |
# svcs fmd STATE STIME FMRI online Jun_29 svc:/system/fmd:default |
# fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-diagnosis 1.6 active CPU/Memory Diagnosis cpumem-retire 1.1 active CPU/Memory Retire Agent eft 1.16 active eft diagnosis engine fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis io-retire 1.0 active I/O Retire Agent sysevent-transport 1.0 active SysEvent Transport Agent syslog-msgs 1.0 active Syslog Messaging Agent zfs-diagnosis 1.0 active ZFS Diagnosis Engine zfs-retire 1.0 active ZFS Retire Agent |
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jun 23 02:30:30 2578e639-38cd-4cd8-9c16-87e96116f41e AMD-8000-2F Major Fault class : fault.memory.dimm_sb Affects : mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0 degraded but still in service FRU : "CPU 1 DIMM 3" (hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=oryx/motherboard=0 /chip=1/memory-controller=0/dimm=3) Description : The number of errors associated with this memory module has exceeded acceptable levels. Refer to http://sun.com/msg/AMD-8000-2F for more information. Response : Pages of memory associated with this memory module are being removed from service as errors are reported. Impact : Total system memory capacity will be reduced as pages are retired. Action : Schedule a repair procedure to replace the affected memory module. Use fmdump -v -u <EVENT_ID> to identify the module. |
# fmadm repair 2578e639-38cd-4cd8-9c16-87e96116f41e fmadm: recorded repair to 2578e639-38cd-4cd8-9c16-87e96116f41e |
# fmadm reset eft fmadm: eft module has been reset |