SMF Workshop
SMF Workshop
(SMF) - Workshop
Ganesh Hiregoudar
Renaud Manus
OP/N1 RPE Approachability
Sun Microsystems
Thanks!
• We would like to thanks the following engineers
who participated in writing and delivering this SMF
workshop.
> Jarod Nash,
> Jason Banham,
> Lee Brooks,
> Rhodri Davies,
> Phill Hughes
Terminology
• Service
> Object (application, software state of a device, set of
other services) that can be managed and observed.
• Instance
> Child of service object.
• FMRI (Fault Managed Resource Identifier)
> 3 components:
svc: /system/system-log :default
$ svcs '*print*'
$ svcs cron
$ svcs dns/client
Commands: svcadm(1M)
• Enable, disable, refresh, restart service instances
• Mark in special states (maintenance)
• Synchronously wait for changes (-s)
$ grep lianep /etc/user_attr
lianep::::auths=solaris.smf.modify,solaris.smf.manage
$ svcs apache2
STATE STIME FMRI
- ? svc:/network/http:apache2
$ # create /etc/apache2/httpd.conf
$ svcadm enable apache2
STATE STIME FMRI
online 19:19:01 svc:/network/http:apache2
$ # edit /etc/apache2/httpd.conf
$ svcadm refresh apache2
$ svcs apache2
STATE STIME FMRI
online 19:19:33 svc:/network/http:apache2
$ svcadm disable apache2
$ svcs apache2
STATE STIME FMRI
disabled 19:20:07 svc:/network/http:apache2
$ inetadm -l ftp
SCOPE NAME=VALUE
name="ftp"
endpoint_type="stream"
proto="tcp6"
isrpc=FALSE
wait=FALSE
exec="/usr/sbin/in.ftpd -a"
user="root"
...
default tcp_wrappers=FALSE
svccfg(1M)
svccfg manipulates the repository
Uses sub-commands to perform actions
Common sub-commands
select <fmri> : select a service / instance
list : show children of the selected service
listprop : display the properties for a service
setprop : change a property value for a service
delete : delete a service / instance
validate <file> : validate an XML manifest file
import <file> : import a manifest file into repository
unselect : navigate to parent of current selection
Commands: svccfg(1M)
• Import, export manifests; apply, extract profiles
• Interactive mode for modifying properties
$ svccfg -v import /var/svc/manifest/network/http-apache2.xml
svccfg: Refreshed network/http:/apache2
svccfg: Successful import.
$ svccfg
svc:> select network/http:apache2
svc:/network/http:apache2> listprop
...
general framework
general/enabled boolean false
...
start method
start/exec astring "/lib/svc/method/http-apache2 start"
start/timeout_seconds count 60
start/type astring method
svc:/network/http:apache> editprop
[$EDITOR launches, allows direct editing of properties]
svc:/network/http:apache2> exit
Example:
Increase the file descriptor limit for lpsched
# svccfg
svc:> select application/print/server
svc:/...> listprop
svc:/...> setprop lpsched/fd_limit = 8192
svc:/...> quit
# svcadm refresh print/server:default
# svcadm restart print/server:default
svcs - example
Checking dependencies (NFS server)
# svcs -d svc:/network/nfs/server:default
STATE STIME FMRI
online Nov_15 svc:/network/loopback:default
online Nov_15 svc:/network/physical:default
online Nov_15 svc:/network/rpc/bind:default
online Nov_15 svc:/network/rpc/keyserv:default
online Nov_15 svc:/system/filesystem/local:default
online Nov_15 svc:/network/rpc/gss:ticotsord
online Nov_15 svc:/network/nfs/mapid:default
online 10:32:25 svc:/network/nfs/nlockmgr:default
inetadm - examples
Changing properties of an inetd service (in.ftpd)
# inetadm -l svc:/network/ftp:default
SCOPE NAME=VALUE
name="ftp"
endpoint_type="stream"
proto="tcp6"
isrpc=FALSE
wait=FALSE
exec="/usr/sbin/in.ftpd -a"
user="root"
.
.
.
# inetadm -m svc:/network/ftp:default exec="/usr/sbin/in.ftpd -a -l"
OfflineThe service instance is not running although the configuration data has been read.
This is usually the result of a dependency that has not been met, or if there is an
error in the start method.
Online The service instance is running. All of its dependencies have been met.
Disabled A service instance is not running. This may be the default state for a service
when it is first imported into the repository, or an administrator may have
marked the service as disabled. It will require operator assistance to move out
of this state.
Degraded A limited set of failures (usually dependencies) may cause the service instance
to function in a limited capacity, eg: if an inetd service can work with IPv4 and
IPv6 but the latter is not configured, then only IPv4 will be in use and thus the
service may be in a degraded state.
Maintenance The service instance is unavailable due to an error. There are many
reasons why, ranging from unsatisfied dependencies and failed start methods,
to more complicated reasons.
Repository
• The repository is THE source for all known services
on the system
svc:/network/nfs/server Dependent
A mechanism for a service to
DEPENDENCIES declare itself as a dependency
svc:/system/filesystem/local of another service
Colour Key:
svc:/network/rpc/gss
Implicit Relationship
Explicit Relationship svc:/network/rpc/keyserv
svc:/network/rpc/bind
> restart_on
– none : required only for startup
– error : restart if dependency fails due to hw or sw error
– restart : restart if dependency restart to any reason
– refresh : restart if dependency restarts or is refreshed
Manifest: dependents
• A service manifest can specify which services are
dependent upon it.
• It is forbidden to modify the manifest of other
services
• For predictable startup behaviour, the service
manifest should specify one of the major milestones
as a dependent.
<dependent
name='nfs-client_multi-user'
grouping='optional_all'
restart_on='none'>
<service_fmri value='svc:/milestone/multi-user' />
</dependent>
Development: utmpd(1M) example
<service name='system/utmp' type='service' version='1'>
<create_default_instance enabled='true' />
<single_instance />”
<dependency name='milestone' grouping='require_all'
restart_on='none' type='service'>
<service_fmri value='svc:/milestone/sysconfig'/>
</dependency>
<dependent name='utmpd_multi-user' grouping='optional_all'
restart_on='none'>
<service_fmri value='svc:/milestone/multi-user'/>
</dependent>
# nslookup webcache
;; connection timed out; no servers could be reached
# svcs svc:/network/dns/client:default
offline 17:34:25 svc:/network/dns/client:default
Lab – Unable to resolve webcache
# tail /var/svc/log/network-smtp:sendmail.log
[ Nov 22 16:17:53 executing start method ("/lib/svc/method/smtp-sendmail start") ]
[ Nov 22 16:18:03 Method or service exit timed out. Killing contract 122 ]
[ Nov 22 16:18:03 Method "start" failed due to signal Killed ]
Lab – sendmail won't start
Knowing the start method timed out we can examine the repository for this service:
Having extracted this value we can compare it with the value in the manifest:
# cd /var/svc/manifest/network
# more smtp-sendmail.xml
<?xml version="1.0"?>
...
<exec_method
type='method'
name='start'
exec='/lib/svc/method/smtp-sendmail start'
timeout_seconds='120' />
Lab – sendmail won't start
# svccfg
svc:> select svc:/network/smtp:sendmail
svc:/network/smtp:sendmail> setprop start/timeout_seconds = 120
svc:/network/smtp:sendmail> quit
# svcadm refresh svc:/network/smtp:sendmail
# svcadm clear svc:/network/smtp:sendmail
# svcs svc:/network/smtp:sendmail
STATE STIME FMRI
online 17:03:48 svc:/network/smtp:sendmail
Lab – sendmail won't start
# svccfg
svc:> select svc:/network/smtp:sendmail
svc:/network/smtp:sendmail> selectsnap initial
[initial]svc:/network/smtp:sendmail> revert
svc:/network/smtp:sendmail> quit
# svcadm refresh svc:/network/smtp:sendmail
# svcadm clear svc:/network/smtp:sendmail
# svcs svc:/network/smtp:sendmail
STATE STIME FMRI
online 17:03:48 svc:/network/smtp:sendmail
Lab – keyserv won't start
Ru n th e te s t3 .b re a km e s crip t
svc.startd[7]: svc:/network/nis/client:default:
Method "/lib/svc/method/yp" failed with exit status 96.
svc.startd[7]: svc:/network/rpc/keyserv:default:
Method "/usr/sbin/keyserv" failed with exit status 96.
# svcs svc:/network/nis/client:default
STATE STIME FMRI
maintenance 10:52:40 svc:/network/nis/client:default
# svcs svc:/network/rpc/keyserv:default
STATE STIME FMRI
maintenance 10:52:40 svc:/network/rpc/keyserv:default
Lab – keyserv won't start
# svcs -xv network/rpc/keyserv:default
svc:/network/rpc/keyserv:default (RPC encryption key storage)
State: maintenance since Wed Feb 09 17:20:46 2005
Reason: Start method exited with SMF_EXIT_ERR_CONFIG.
See: http://sun.com/msg/SMF-8000-KS
See: keyserv(1M)
See: /var/svc/log/network-rpc-keyserv:default.log
Impact: This service is not running.
# tail network-nis-client:default.log
[ Feb 9 17:17:19 Stopping because service disabled. ]
[ Feb 9 17:17:20 Executing stop method (:kill) ]
[ Feb 9 17:20:45 Executing start method (“/lib/svc/method/yp”) ]
/lib/svc/method/yp: domainname not set
[ Feb 9 17:20:46 Method “start” exited with status 96 ]
# ls -l /etc/defaultdomain
/etc/defaultdomain: No such file or directory
Lab – keyserv won't start
# svcs svc:/network/ndd-nettune:default
STATE STIME FMRI
maintenance 16:02:26 svc:/network/ndd-nettune:default
Lab – customer ndd script
# svcs -xv network/ndd-nettune:default
svc:/network/ndd-nettune:default (ndd network tuning)
State: maintenance since Thu Feb 10 12:25:21 2005
Reason: Restarting too quickly.
See: http://sun.com/msg/SMF-8000-L5
See: man -M /usr/share/man -s 1M ndd
See: /var/svc/log/network-ndd-nettune:default.log
Impact: This service is not running.
# tail /var/svc/log/network-ndd-nettune:default.log
[ Feb 11 15:12:14 Executing start method ("/lib/svc/method/ndd-nettune") ]
/sbin/sh: /lib/svc/method/ndd-nettune: cannot execute
[ Feb 11 15:12:14 Stopping because all processes in service exited. ]
[ Feb 11 15:12:14 Executing start method ("/lib/svc/method/ndd-nettune") ]
/sbin/sh: /lib/svc/method/ndd-nettune: cannot execute
[ Feb 11 15:12:14 Stopping because all processes in service exited. ]
[ Feb 11 15:12:14 Executing start method ("/lib/svc/method/ndd-nettune") ]
/sbin/sh: /lib/svc/method/ndd-nettune: cannot execute
[ Feb 11 15:12:14 Stopping because all processes in service exited. ]
[ Feb 11 15:12:14 Restarting too quickly, changing state to maintenance ]
Lab – customer ndd script
# ls -l /lib/svc/method/ndd-nettune
-rw-r--r-- 1 root root 477 Feb 11 15:12 /lib/svc/method/ndd-nettune
# more /var/svc/log/network-ndd-nettune:default.log
[ Feb 11 15:19:48 Leaving maintenance because clear requested. ]
[ Feb 11 15:19:49 Enabled. ]
[ Feb 11 15:19:49 Executing start method ("/lib/svc/method/ndd-nettune") ]
[ Feb 11 15:19:49 Stopping because all processes in service exited. ]
[ Feb 11 15:19:49 Executing start method ("/lib/svc/method/ndd-nettune") ]
[ Feb 11 15:19:49 Stopping because all processes in service exited. ]
[ Feb 11 15:19:49 Executing start method ("/lib/svc/method/ndd-nettune") ]
[ Feb 11 15:19:49 Stopping because all processes in service exited. ]
[ Feb 11 15:19:51 Restarting too quickly, changing state to maintenance ]
Lab – customer ndd script
# cd /lib/svc/method
# file ndd-nettune
ndd-nettune: executable /sbin/sh script
# more /lib/svc/method/ndd-nettune
#!/sbin/sh
#
# ident "@(#)ndd-nettune.xml 1.0 04/09/21 SMI"
. /lib/svc/share/smf_include.sh
. /lib/svc/share/net_include.sh
# Make sure that the libraries essential to this stage of booting can be found.
LD_LIBRARY_PATH=/lib; export LD_LIBRARY_PATH
echo "ndd ran" >> /tmp/smf.out
/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 16384
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 16384
# Reset the library path now that we are past the critical stage
unset LD_LIBRARY_PATH
Lab – customer ndd script
A transient script should run once then exit and never be restarted.
In the log file you can see that the start method is being run repeatedly.
This suggests the service has been configured incorrectly, so you should
look at the repository:
# svccfg
svc:> select network/ndd-nettune
svc:/network/ndd-nettune> listprop startd/*
startd/duration astring child
Here you see this service has been mis-configured as a child service, so when
the process died svc.startd attempted to restart it. We know it should be
transient so:
# xmllint mysvc.xml
mysvc.xml:71: parser error : Opening and ending tag mismatch:
exec_method line 36 and service </service>
^
mysvc.xml:73: parser error : expected '>' </service_bundle>
^
mysvc.xml:74: parser error : Premature end of data in tag service_bundle
line 10
Write your own service
• little d : A simple daemon that listens on port 13567
> Requires a configuration file: /var/tmp/littled.conf
[email protected]
[email protected]
http://blogs.sun.com/ganesh
Additional slides for lab
Lab - Unable to print
Ru n t h e t e s t 5 .b re a k m e s c rip t
# lp -dlpdummy /etc/hosts
UX:lp: ERROR: Can't establish contact with the LP print service.
TO FIX: Either the LP print service has stopped,
or all message channels are busy. If the
problem continues, get help from your
system administrator.
# svcs -p svc:/application/print/server:default
STATE STIME FMRI
disabled 11:59:47 svc:/application/print/server:default
Lab - Unable to print
# svcs -xv svc:/application/print/server:default
svc:/application/print/server:default (LP print server)
State: disabled since Fri Feb 11 16:10:37 2005
Reason: Temporarily disabled by an administrator.
See: http://sun.com/msg/SMF-8000-1S
See: man -M /usr/share/man -s 1M lpsched
See: /var/svc/log/application-print-server:default.log
Impact: 1 dependent service is not running:
svc:/application/print/ipp-listener:default
Lab - Unable to print
# man -s 1M lpsched
# file /usr/lib/lpsched
/usr/lib/lpsched: executable /bin/ksh script
# more /usr/lib/lpsched
#!/bin/ksh
#
# ident "@(#)lpsched 1.2 04/11/01 SMI"
#
# Copyright 2004 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
[ -f /usr/lib/lp/local/lpsched ] || exit 1
.
.
.
# Check to see if lpsched is already running
state=`svcprop -p restarter/state svc:/application/print/server:default`
Lab - Unable to print
If you dig deeper and spend some time looking at /usr/lib/lpsched you may
notice the following lines in the script:
if [ "$OPTS" = "" ] ; then
/usr/sbin/svcadm enable -t svc:/application/print/server:default
if [ $? = 0 ] ; then
/bin/gettext "Print services started.\n"
exit 0
else
exit 1
fi
Lab - Unable to print
Something is clearly wrong with the start method as it is causing a loop.
# cd /var/svc/manifest/application/print
# more server.xml
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--
...
<exec_method
type='method'
name='start'
exec='/lib/svc/method/print-svc start'
timeout_seconds='60' />
<exec_method
type='method'
name='stop'
exec='/lib/svc/method/print-svc stop'
timeout_seconds='60' />
Lab - Unable to print
1) Using setprop
# svccfg
svc:> select application/print/server
svc:.../server> setprop start/exec = "/lib/svc/method/print-svc start"
svc:.../server> setprop stop/exec = "/lib/svc/method/print-svc stop"
svc:.../server> quit
# svcadm refresh print/server:default
# svcadm -v enable print/server
svc:/application/print/server:default enabled.
# svcs print/server
STATE STIME FMRI
online 17:29:45 svc:/application/print/server:default
Lab - Unable to print