Lecture 1
Lecture 1
(ECC 4209)
Lecture 1
(Introduction to Computer System
Administration)
[email protected]
Contents
1. What is system administration?
2. What do sysadmins do?
3. Principles and First Steps
4. Organizations and Certifications
5. Maturity and Complexity
6. Ethics
What is a system?
System: An organized collection of computers
interacting with a group of users.
Servers PCs
run on
run on
Network
Services Users
help to accomplish work
System State
• System policy
– specification of a system’s configuration and its
acceptable usage
• System state S(t)
– the current configuration (files, kernel, memory or CPU
usage) of a system
• Ideal states S*(t)
– states of the system that match the system policy. Over
time, the system state shifts away from the ideal state
• System administration
– modifying the system to bring it closer to S*(t)
What do sysadmins do?
• Small org: sysadmin can be entire IT staff
– Phone support
– Order and install software and hardware
– Fix anything that breaks from phones to servers
– Develop software
• Large org: sysadmin is one of many IT staff
– Specialists instead of “jack of all trades”
– Database admin, Network admin, Fileserver admin,
Help desk worker, Programmers, Logistics
Common Activities
1. Add and remove users.
2. Add and remove hardware.
3. Perform backups.
4. Install new software systems.
5. Troubleshooting.
6. System monitoring.
7. Auditing security.
8. Help users.
9. Communicate.
User Management
• Creating user accounts
– Consistency requires automation
– Startup (dot) files
• Namespace management
– Usernames and UIDs
– Multiple namespaces or Single Sign-in (SSI)?
• Removing user accounts
– Consistency requires automation
Backups
• Backup strategy and policies
– Scheduling: when and how often?
– Capacity planning
– Location: on-site vs. off-site
– Strategy 2 + 1
• Monitoring backups
– Checking logs
– Verifying media
• Performing restores when requested
Hardware Management
• Adding and removing hardware
– Configuration, cabling, etc.
• Purchase
– Evaluate and purchase servers + other hardware
• Capacity planning
– How many servers? How much bandwidth, storage?
• Data Center management
– Power, racks, environment (cooling, fire alarm)
• Virtualization
– When can virtual servers be used vs. physical?
Software Installation
• Automated consistent OS installs
– Desktop vs. server OS image needs.
• Installation of software
– Purchase, find, or build custom software.
• Managing software installations
– Distributing software to multiple hosts.
– Managing multiple versions of a software pkg.
• Patching and updating software
Troubleshooting
• Problem identification
– By user notification
– By log files or monitoring programs
• Tracking and visibility
– Ensure users know you’re working on problem
– Provide an ETA if possible
• Finding the root cause of problems
– Provide temporary solution if necessary
– Solve the root problem to permanently eliminate
System Monitoring
• Automatically monitor systems for
– Problems (disk full, error logs, security)
– Performance (CPU, mem, disk, network)
• Provides data for capacity planning
– Determine need for resources
– Establish case to bring to management
Helping Users
• Request tracking/ticketing system
– Ensures that you don’t forget problems.
– Ensures users know you’re working on their
problem; reduces interruptions, status queries.
– Lets management know what you’ve done.
• User documentation and training
– Policies and procedures
• Schedule and communicate downtimes
Communicate
• Customers
– Keep customer appraised of process
• When you’ve started working on a request with ETA
• When you make progress, need feedback
• When you’re finished.
– Communicate system status
• Uptime, scheduled downtimes, failures
– Meet regularly with customer managers
• Managers
– Meet regularly with your manager
– Write weekly status reports
Specialized Skills
• Heterogeneous Environments
– Integrating multiple-OSes, hardware types, or network
protocols, distributed sites
• Databases
– SQL RDMS
• Networking
– Complex routing, high speed networks, voice
• Security
– Firewalls, authentication, NIDS, cryptography
• Storage
– NAS, SANs, cloud storage
• Virtualization and Cloud Computing
– VMware, cloud architectures
Qualities of a Successful
Sysadmin
• Customer oriented
– Ability to deal with interrupts, time pressure
– Communication skills
– Service provider, not system police
• Technical knowledge
– Hardware, network, and software knowledge
– Debugging and troubleshooting skills
• Time management
– Automate everything possible.
– Ability to prioritize tasks: urgency and importance.
First Steps to Better SA
• Use a request system.
– Customers know what you’re doing.
– You know what you’re doing.
• Manage quick requests right
– Handle emergencies quickly.
– Use request system to avoid interruptions.
• Policies
– How do people get help?
– What is the scope of responsibility for SA team?
– What is our definition of emergency?
• Start every host in a known state.
Principles of SA
• Simplicity
– Choose the simplest solution that solves the entire problem.
– Work towards a predictable system
• Clarity
– Choose a straightforward solution that’s easy to change, maintain,
debug, and explain to other SAs
• Generality
– Choose reusable solutions that scale up; use open protocols.
• Automation
– Use software to replace human effort
• Communication
– Be sure that you’re solving the right problems and that people
know what you’re doing
• Basics First
– Solve basic infrastructure problems before advanced ones
Relevant Organizations
• USENIX: Advanced Computing Systems
Association
• LISA: Large Installation System Administration
• SAGE: System Administration Guild
• LOPSA: League of Professional System
Administrators
Types of Sites
• Small
– 2-10 computers, 1 OS, 2-20 users
– Small staff size requires outsourcing to obtain most
specialized skills
• Midsized
– 11-100 computers, 1-3 OSes, 21-100 users
• Large
– 100+ computers, multiples OSes, 100+ users
– Outsources to reduce costs, some specializations
Certifications
• CCNA, CCNP, CCIE (Cisco)
• cSAGE (SAGE)
• MCSA (Microsoft)
• RHCE (Red Hat)
• SCSA (Sun)
• VCP (VMware)
SAGE Job Descriptions
• Novice
– OS familiarity, help desk skills
• Junior
– Can use OS system administration tools
• Intermediate
– Understanding of distributed computing, common
servers, automate small tasks, independent action
• Senior
– Understanding of scaling issues, including capacity
planning, solve problems by addressing root cause,
higher level programming abilities, write proposals for
purchasing, data center planning, etc
SA Maturity Model (SAMM)
1. Ad Hoc
– Ad-hoc non-repeatable solutions, firefighting
2. Repeatable
– Some repeatable processes
3. Defined
– Documented standard processes
4. Managed
– Process effectiveness measured, adapted
5. Optimized
Maturity and Complexity