SMF Best Practices and Troubleshooting
SMF Best Practices
Most services describe configuration policy. If the configuration you want is not implemented, modify the policy description by modifying the service. Modify the values of service properties or create new service instances with different property values. Do not disable service instances and edit configuration files that are intended to be managed by an SMF service.
Do not modify manifests and system profiles that are delivered by Oracle or third-party software vendors. These manifests and profiles might be replaced when you upgrade your system, and then your changes to these files will be lost. Instead, either create a site profile to customize the service, or use the svccfg
command or the inetadm
command to manipulate the properties directly. The /lib/svc/manifest/site
and /var/svc/manifest/site
directories are also reserved for site-specific use. Oracle Solaris does not deliver manifests into those directories.
To apply the same custom configuration to multiple systems, use the svcbundle
command or the svccfg extract
command to create a profile file. Customize property values in that file, and include comments about the reason for each customization. Copy the file to /etc/svc/profile/site
on each system, and restart the manifest-import
service on each system. See Configuring Multiple Systems.
When you create a site profile, make sure the configuration defined does not conflict with configuration defined in another site profile for the same service or service instance. When SMF finds conflicting configuration in the same layer of the service configuration repository, the affected service instance is placed in the maintenance state.
Do not use non-standard locations for manifest and profile files. See Service Bundles for manifest and profile standard locations.
When you create a service for your own use, use site
at the beginning of the service name: svc:/site/service_name:instance_name
.
Do not modify the configuration of the master restarter service, svc:/system/svc/restarter:default
, except to configure logging levels as described in Specifying the Amount of Startup Messaging.
Before you use the svccfg delcust
command, use the svccfg listcust
command with the same options. The delcust
subcommand can potentially remove all administrative customizations on a service. Use the listcust
subcommand to verify which customizations will be deleted by the delcust
subcommand.
In scripts, use the full service instance FMRI: svc:/service_name:instance_name
.
Troubleshooting Services Problems
This section discusses the following topics:
Committing configuration changes into the running snapshot
Fixing services that are reported to have problems
Manually transitioning an instance to the
degraded
ormaintenance
stateFixing a corrupt service configuration repository
Configuring the amount of messaging to display or store on system startup
Transitioning or booting to a specified milestone
Using SMF to investigate booting problems
Converting
inetd
services to SMF services
Understanding Configuration Changes
In the service configuration repository, SMF stores property changes separately from properties in the running snapshot. When you change service configuration, those changes do not immediately appear in the running snapshot.
The refresh operation updates the running snapshot of the specified service instance with the values from the editing configuration.
By default, the svcprop
command shows properties in the running snapshot, and the svccfg
command shows properties in the editing configuration. If you have changed property values but not performed a configuration refresh, the svcprop
and svccfg
commands show different property values. After you perform a configuration refresh, the svcprop
and svccfg
commands show the same property values.
Rebooting does not change the running snapshot. The svcadm restart
command does not refresh configuration. Use the svcadm refresh
or svccfg refresh
command to commit configuration changes into the running snapshot.
Repairing an Instance That Is Degraded, Offline, or in Maintenance
Use the svcs -x
command with no arguments to display explanatory information about any service instances that match either of the following descriptions:
The service is enabled but is not running.
The service is preventing another enabled service from running.
The following list summarizes how to approach service problems:
Diagnose the problem, starting with viewing the service log file.
Fix the problem. If fixing the problem involves modifying service configuration, refresh the service.
Move affected services to a running state.
How to Repair an Instance That Is in Maintenance
A service instance that is in maintenance is enabled but not able to run.
Determine why the instance is in maintenance.
The instance might be transitioning through the
maintenance
state because an administrative action has not yet completed. If the instance is transitioning, its state should be shown asmaintenance*
.In the following example, the “State” and “Reason” lines show that the
pkg/depot
service is in themaintenance
state because its start method failed.$ svcs -x svc:/application/pkg/depot:default (IPS Depot) State: maintenance since September 11, 2013 01:30:42 PM PDT Reason: Start method exited with $SMF_EXIT_ERR_FATAL. See: http://support.oracle.com/msg/SMF-8000-KS See: pkg.depot-config(1M) See: /var/svc/log/application-pkg-depot:default.log Impact: This service is not running.
Log in to the Oracle support site to view the referenced Predictive Self-Healing knowledge article. In this case, the article tells you to view the log file to determine why the start method failed. The
svcs
output gives the name of the log file. See Viewing Service Log Files for information about how to view the log file. In this example, the log file shows the start method invocation and the fatal error message.[ Sep 11 13:30:42 Executing start method ("/lib/svc/method/svc-pkg-depot start"). ] pkg.depot-config: Unable to get publisher information: The path '/export/ipsrepos/Solaris11' does not contain a valid package repository.
Fix the problem.
One or more of the following steps might be needed.
Update service configuration.
If fixing the reported problem required modifying service configuration, use the
svccfg refresh
orsvcadm refresh
command for any services whose configuration changed. Verify that the configuration is updated in the running snapshot by using thesvcprop
command to check property values or by other tests specific to this service.Ensure dependencies are running.
Sometimes the “Impact” line in the
svcs -x
output tells you that services that depend on the service that is in themaintenance
state are not running. Use thesvcs -l
command to check the current state of dependent services. Ensure that all required dependencies are running. Use thesvcs -x
command to verify that all enabled services are running.Ensure contract processes are stopped.
If the service that is in the
maintenance
state is a contract service, determine whether any processes that were started by the service have not stopped. When a contract service instance is in a maintenance state, the contract ID should be blank, as shown in the following example, and all processes associated with that contract should have stopped. Usesvcs -l
orsvcs -o ctid
to check that no contract exists for a service instance in maintenance. Usesvcs -p
to check whether any processes associated with this service instance are still running. Any processes shown bysvcs -p
for a service instance in maintenance should be killed.$ svcs -l system-repository fmri svc:/application/pkg/system-repository:default name IPS System Repository enabled true state maintenance next_state none state_time September 17, 2013 07:18:19 AM PDT logfile /var/svc/log/application-pkg-system-repository:default.log restarter svc:/system/svc/restarter:default contract_id manifest /lib/svc/manifest/application/pkg/pkg-system-repository.xml dependency require_all/error svc:/milestone/network:default (online) dependency require_all/none svc:/system/filesystem/local:default (online) dependency optional_all/error svc:/system/filesystem/autofs:default (online)
Notify the restarter that the instance is repaired.
When the reported problem is fixed, use the
svcadm clear
command to return the service to theonline
state. For services in themaintenance
state, theclear
subcommand tells the restarter for that service that the service is repaired.$ svcadm clear pkg/depot:default
If you specify the
-s
option, thesvcadm
command waits to return until the instance reaches theonline
state or until it determines that the instance cannot reach theonline
state without administrator intervention. Use the-T
option with the-s
option to specify an upper bound in seconds to make the transition or determine that the transition cannot be made.Verify that the instance is repaired.
Use the
svcs
command to verify that the service that was in maintenance is now online. Use thesvcs -x
command to verify that all enabled services are running.
How to Repair an Instance That Is Offline
A service instance that is offline is enabled but not running or available to run.
Determine why the instance is offline.
The instance might be transitioning through the
offline
state because its dependencies are not yet satisfied. If the instance is transitioning, its state should be shown asoffline*
.Fix the problem.
Enable service dependencies.
If required dependencies are disabled, enable them with the following command:
$ svcadm enable -r FMRI
Fix dependency file.
A dependency file might be missing or unreadable. You might want to use
pkg fix
orpkg revert
to fix this type of problem. See thepkg
(1) man page.
Restart the instance if necessary.
If the instance was offline because a required dependency was not satisfied, fixing or enabling the dependency might cause the offline instance to restart and come online with no further administrative action needed.
If you made some other fix to the service, then restart the instance.
$ svcadm restart FMRI
Verify that the instance is repaired.
Use the
svcs
command to verify that the instance that was offline is now online. Use thesvcs -x
command to verify that all enabled services are running.
How to Repair an Instance That Is Degraded
A service instance that is degraded is enabled and running or available to run, but is functioning at a limited capacity.
Determine why the instance is degraded.
Fix the problem.
Request the restarter to online the instance.
When the reported problem is fixed, use the
svcadm clear
command to return the instance to theonline
state. For instances in thedegraded
state, theclear
subcommand requests that the restarter for that instance transition the instance to theonline
state.$ svcadm clear pkg/depot:default
Verify that the instance is repaired.
Use the
svcs
command to verify that the instance that was degraded is now online. Use thesvcs -x
command to verify that all enabled services are running.
Marking an Instance as Degraded or in Maintenance
You can mark a service instance as being in either the degraded
state or the maintenance
state. You might want to do this if the application is stuck in a loop or is deadlocked, for example. The information about the state change propagates to the dependencies of the marked instance, which can help debug other related instances.
Specify the -I
option to request an immediate state change.
When you mark an instance as maintenance
, you can specify the -t
option to request a temporary state change. Temporary requests last only until reboot.
If you specify the -s
option with the svcadm mark
command, svcadm
marks the instance and waits for the instance to enter the degraded
, or maintenance
state before returning. Use the -T
option with the -s
option to specify an upper bound in seconds to make the transition or determine that the transition cannot be made.
Diagnosing and Repairing Repository Problems
On system startup, the repository daemon, svc.configd
, performs an integrity check of the configuration repository stored in /etc/svc/repository.db
. If the svc.configd
integrity check fails, the svc.configd
daemon writes a message to the console similar to the following:
svc.configd: smf(5) database integrity check of:
/etc/svc/repository.db
failed. The database might be damaged or a media error might have
prevented it from being verified. Additional information useful to
your service provider is in:
/system/volatile/db_errors
The system will not be able to boot until you have restored a working
database. svc.startd(1M) will provide a sulogin(1M) prompt for recovery
purposes. The command:
/lib/svc/bin/restore_repository
can be run to restore a backup version of your repository. See
http://support.oracle.com/msg/SMF-8000-MY for more information.
The service configuration repository can become corrupted for any of the following reasons:
Disk failure
Hardware bug
Software bug
Accidental overwrite of the file
The following procedure shows how to replace a corrupt repository with a backup copy of the repository.
How to Restore a Repository From Backup
Log in.
Using the
root
password, log in either remotely or at thesulogin
prompt.Run the repository restore command:
# /lib/svc/bin/restore_repository
Running this command takes you through the necessary steps to restore a non-corrupt backup. SMF automatically takes backups of the repository as described in Repository Backups.
SMF maintains persistent and non-persistent configuration data. See Service Configuration Repository for descriptions of these two repositories. The
restore_repository
command only restores the persistent repository. Therestore_repository
command also reboots the system, which destroys the non-persistent configuration data. The non-persistent data is runtime data that is not needed across system reboot.When started, the
/lib/svc/bin/restore_repository
command displays a message similar to the following:See http://support.oracle.com/msg/SMF-8000-MY for more information on the use of this script to restore backup copies of the smf(5) repository. If there are any problems which need human intervention, this script will give instructions and then exit back to your shell.
After the
root
(/
) file system is mounted with write permissions, or if the system is a local zone, you are prompted to select the repository backup to restore:The following backups of /etc/svc/repository.db exists, from oldest to newest: ... list of backups ...
Backups are given names, based on type and the time the backup was taken. Backups beginning with
boot
are completed before the first change is made to the repository after system boot. Backups beginning withmanifest_import
are completed aftersvc:/system/manifest-import:default
finishes its process. The time of the backup is given in YYYYMMDD_HHMMSS format.Enter the appropriate response.
Typically, the most recent backup option is selected.
Please enter either a specific backup repository from the above list to restore it, or one of the following choices: CHOICE ACTION ---------------- ---------------------------------------------- boot restore the most recent post-boot backup manifest_import restore the most recent manifest_import backup -seed- restore the initial starting repository (All customizations will be lost, including those made by the install/upgrade process.) -quit- cancel script and quit Enter response [boot]:
If you press Enter without specifying a backup to restore, the default response, enclosed in
[]
is selected. Selecting-quit-
exits therestore_repository
script, returning you to your shell prompt.
Note - Selecting -seed-
restores the seed
repository. This repository is designed for use during initial installation and upgrades. Using the seed
repository for recovery purposes should be a last resort.
After the backup to restore has been selected, it is validated and its integrity is checked. If there are any problems, the restore_repository
command prints error messages and prompts you for another selection. Once a valid backup is selected, the following information is printed, and you are prompted for final confirmation.
After confirmation, the following steps will be taken:
svc.startd(1M) and svc.configd(1M) will be quiesced, if running.
/etc/svc/repository.db
-- renamed --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS
/system/volatile/db_errors
-- copied --> /etc/svc/repository.db_old_YYYYMMDD_HHMMSS_errors
repository_to_restore
-- copied --> /etc/svc/repository.db
and the system will be rebooted with reboot(1M).
Proceed [yes/no]?
4. Type yes
to remedy the fault.
The system reboots after the restore_repository
command executes all of the listed actions.
Specifying the Amount of Startup Messaging
By default, each service that starts during system boot does not display a message on the console. Use one of the following methods to change which messages appear on the console and which are recorded only in the svc.startd
log file. The value of logging-level can be one of the values shown in the table below.
When booting a SPARC system, specify the
-m
option to theboot
command at theok
prompt. See “Messages options” in thekernel
(1M) man page.ok boot -m logging-level
When booting an x86 system, edit the GRUB menu to specify the
-m
option. See Adding Kernel Arguments by Editing the GRUB Menu at Boot Time in Booting and Shutting Down Oracle Solaris 11.2 Systems and “Messages options” in thekernel
(1M) man page.Prior to rebooting a system, use the
svccfg
command to change the value of theoptions/logging
property. If this property has never been changed on this system, then it will not exit and you will have to add it. The following example changes to verbose messaging. The change takes effect on the next restart of thesvc.startd
daemon.$ svccfg -s system/svc/restarter:default listprop options/logging $ svccfg -s system/svc/restarter:default addpg options application $ svccfg -s system/svc/restarter:default setprop options/logging=verbose $ svccfg -s system/svc/restarter:default listprop options/logging options/logging astring verbose
Table A-1 SMF Startup Message Logging Levels
Logging Level Keyword | Description |
| Display on the console any error messages that require administrative intervention. Also record these messages in |
|
In addition to the messaging provided at the |
| In addition to the messaging provided at the |
Specifying the SMF Milestone to Which to Boot
When you boot a system, you can specify the SMF milestone to which to boot.
By default, all services for which the value of the general/enabled
property is true
are started at system boot. To change the milestone to which to boot a system, use one of the following methods. The value of milestone can be the FMRI of a milestone service or a keyword as shown in Table A–2.
When booting a SPARC system, specify the
-m
option to theboot
command at theok
prompt. See the-m
option in thekernel
(1M) man page.ok boot -m milestone=milestone
When booting an x86 system, edit the GRUB menu to specify the
-m
option. See Adding Kernel Arguments by Editing the GRUB Menu at Boot Time in Booting and Shutting Down Oracle Solaris 11.2 Systems and the-m
option in thekernel
(1M) man page.Prior to rebooting a system, use the
svcadm milestone
command with the-d
option. Note that with or without the-d
option, this command restricts and restores running services immediately. With the-d
option, the command also makes the specified milestone the default boot milestone. This new default is persistent across reboots.$ svcadm milestone -d milestone
This command does not change the current run level of the system. To change the current run level of the system, use the
init
command.If you specify the
-s
option,svcadm
changes the milestone and then waits for the transition to the specified milestone to complete before returning. Thesvcadm
command returns when all instances have transitioned to the state necessary to reach the specified milestone or when it determines that administrator intervention is required to make a transition. Use the-T
option with the-s
option to specify an upper bound in seconds to complete the milestone change operation or return.
The following table describes SMF boot milestones, including any corresponding Oracle Solaris run level. A system’s run level defines what services and resources are available to users. A system can be in only one run level at a time. For information about run levels,see How Run Levels Work in Booting and Shutting Down Oracle Solaris 11.2 Systems , the inittab
(4) man page, and the /etc/init.d/README
file. For more information about SMF boot milestones, see the milestone
subcommand in the svcadm
(1M) man page.
Table A-2 SMF Boot Milestones and Corresponding Run Levels
SMF Milestone FMRI or Keyword | Corresponding Run Level | Description |
---|---|---|
|
| The The |
|
| The |
| s or S | Ignore temporary enable and disable requests for |
| 2 | Ignore temporary enable and disable requests for |
| 3 | Ignore temporary enable and disable requests for |
To determine the milestone to which a system is currently booted, use the svcs
command. The following example shows that the system is booted to run level 3, milestone/multi-user-server
:
$ svcs 'milestone/*'
STATE STIME FMRI
online 9:08:05 svc:/milestone/unconfig:default
online 9:08:06 svc:/milestone/config:default
online 9:08:07 svc:/milestone/devices:default
online 9:08:25 svc:/milestone/network:default
online 9:08:31 svc:/milestone/single-user:default
online 9:08:51 svc:/milestone/name-services:default
online 9:09:13 svc:/milestone/self-assembly-complete:default
online 9:09:23 svc:/milestone/multi-user:default
online 9:09:24 svc:/milestone/multi-user-server:default
Using SMF to Investigate System Boot Problems
This section describes actions to take if your system hangs during boot or if a key service fails to start during boot.
How to Investigate Problems Starting Services at System Boot
If problems occur when starting services at system boot, sometimes the system will hang during boot. This procedure shows how to investigate services problems that occur at boot time.
Boot without starting any services.
The following command instructs the
svc.startd
daemon to temporarily disable all services and startsulogin
on the console.ok boot -m milestone=none
See Specifying the SMF Milestone to Which to Boot for a list of SMF milestones that you can use with the
boot -m
command.Log in to the system as
root
.Enable all services.
# svcadm milestone all
Determine where the boot process is hanging.
When the boot process hangs, determine which services are not running by running
svcs
-a
. Look for error messages in the log files in/var/svc/log
.After fixing the problems, verify that all services have started.
Verify that all needed services are online.
# svcs -x
Verify that the
console-login
service dependencies are satisfied.This command verifies that the
login
process on the console will run.# svcs -l system/console-login:default
Continue the normal booting process.
How to Force Single-User Login if the Local File System Service Fails During Boot
Local file systems that are not required to boot the system are mounted by the svc:/system/filesystem/local:default
service. When any of those file systems cannot be mounted, the filesystem/local
service enters a maintenance state. System startup continues, and any services that do not depend on filesystem/local
are started. Services that have a required dependency on the filesystem/local
service are not started.
This procedure explains how to change the configuration of the system so that a sulogin
prompt appears immediately after the service fails instead of allowing system startup to continue.
Modify the
system/console-login
service.$ svccfg -s svc:/system/console-login svc:/system/console-login> addpg site,filesystem-local dependency svc:/system/console-login> setprop site,filesystem-local/entities = fmri: svc:/system/filesystem/local svc:/system/console-login> setprop site,filesystem-local/grouping = astring: require_all svc:/system/console-login> setprop site,filesystem-local/restart_on = astring: none svc:/system/console-login> setprop site,filesystem-local/type = astring: service svc:/system/console-login> end
Refresh the service.
$ svcadm refresh console-login
When a failure occurs with the
system/filesystem/local:default
service, use thesvcs -vx
command to identify the failure. After the failure has been fixed, use the following command to clear the error state and allow the system boot to continue:$ svcadm clear filesystem/local
Converting inetd
Services to SMF Services
The inetd.conf
file on your system should contain no entries. The inetd.conf
file should contain only comments that this is a legacy file no longer directly used. If the inetd.conf
file contains any entries, follow the instructions in this section to convert these configurations to SMF services. Services that are configured in the inetd.conf
file but are not configured as an SMF service are not available for use. Services that are configured in the inetd.conf
file are not restarted by the inetd
command directly. Rather, the inetd
command is the delegated restarter for the converted services.
During initial system boot, configurations in the inetd.conf
file are automatically converted to SMF services. After initial system boot, entries might be added to the inetd.conf
file by installing additional software that is not delivered by Image Packaging System (IPS) packages. Software that is delivered by IPS packages includes any required SMF manifest, and that SMF manifest instantiates that service instance with the correct property values.
If the inetd.conf
file on your system contains any entries, use the inetconv
command to convert those configurations to SMF services. The inetconv
command converts inetd.conf
entries into SMF service manifest files and imports those manifests into the SMF repository to instantiate the service instances. See the inetconv
(1M) man page for information about command options and to see examples of using the command.
The name of the new SMF manifest incorporates the service_name from the inetd.conf
entry. The entry from the inetd.conf
file is saved as a property of the new service instance. The new SMF manifest specifies property groups and properties to define the actions listed in the inetd.conf
entry. After running the inetconv
command, use the svcs
and svcprop
commands to ensure the new service instance was created and has the correct property values.
The inetd
command is the delegated restarter for SMF internet services. Do not use the inetd
command directly to manage these services. Use the inetadm
command with no options or operands to see a list of services that are controlled by inetd
. Use the inetadm
, svcadm
, and svccfg
commands to configure and manage these converted services.
The inetconv
command does not modify the input inetd.conf
file. You should manually delete any entries in the inetd.conf
file after successfully running inetconv
.
For information about configuring inetd
services that are already converted to SMF services, see Modifying Services that are Controlled by inetd.