Troubleshooting
Troubleshooting
Regardless of how complex your particular computer or peripheral device might be, a
dependable troubleshooting procedure can be broken down into four basic steps: define your
symptoms, identify and isolate the potential source (or location) of your problem,
replace the suspected sub-assembly, and re-test the unit thoroughly to be sure that you
have solved the problem. If you have not solved the problem, start again from Step #1.
This is a “universal” procedure that you can apply to any sort of troubleshooting—not just for
personal computer equipment.
Changing Parts
Once a problem is isolated, technicians face another problem: the availability of spare parts.
Novice technicians often ask what kinds and quantity of spare parts they should keep on
hand. The best answer to give here is simply: none at all. The reason for this somewhat
drastic answer is best explained by the two realities of PC service.
Post
BIOS then checks the memory location at 0000:0472h. This address contains a flag that
determines whether the initialization is a cold start (power first applied) or a warm start
(reset button or <Ctrl>+<Alt>+<Del> key combination). A value of 1234h at this address
indicates a warm start—in which case, the (POST) routine is skipped. If any other value is
found at that location, a cold start is assumed, and the full POST routine will be executed.
The full POST checks many of the other higher-level functions on the motherboard, memory,
keyboard, video adapter, floppy drive, math co-processor, printer port, serial port, hard drive,
and other sub-systems. Dozens of tests are performed by the POST.
When an error is encountered, the single-byte POST code is written to I/O port 80h, where it
might be read by a POST-code reader. In other cases, you might see an error message
BIOS
When your computer is first turned on, it automatically loads a program called the BIOS, or
Basic Input/Output System, which is stored on a special chip on your computer’s
motherboard. The BIOS is essentially a combination of software and hardware in that it
consists of software, but the contents of that software is stored in a hardware chip.
One of the first things we should see on your computer’s monitor when we start the PC is
some type of message like "Hit Esc to enter Setup," although instead of Esc it may say F2 or
F10 or any number of other keys and instead of Setup it may say CMOS Setup or BIOS Setup
or just CMOS. Make note of the key required to enter the Setup program because we may
need that later (some start-up problems can only be solved by changing some BIOS/CMOS
settings via the Setup program).
Diagnostic Programs
Software and hardware complement one another Diagnostic program is used to detect both
hardware and software problems the computer memory can be diagnosed to known the size
the processor types too can also be checked to do these well one has to know how to
distinguish hardware problem from the software.
Bench Marking
We all know that today’s personal computers are capable of astounding performance. If you
doubt that, consider any of the current 3D games, such as Quake II or Monster Truck
Madness. However, it is often important to quantify the performance of a system. Just saying
that a PC is “faster” than another system is simply not enough—we must often apply a
number to that performance to measure the improvements offered by an upgrade, or to
objectively compare the performance of various systems. Benchmarks are used to test and
report the performance of a PC by running a set of well-defined tasks on the system. A
benchmark program has several different uses in the PC industry depending on what you’re
needs are:
System Comparisons: Benchmarks are often used to compare a system to one or more
competing machines (or to compare a newer system to older machines). Just flip through any
issue of PC Magazine or Byte, and you’ll see a flurry of PC ads all quoting numerical
performance numbers backed up by benchmarks. You might also run a benchmark to
establish the overall performance of a new system before making a purchase decision.
Upgrade Improvements: Benchmarks are frequently used to gauge the value of an up-grade.
By running the benchmark before and after the upgrade process, you can get a numerical
assessment of just how much that new CPU, RAM, drive, or motherboard might have
improved (or hindered) system performance.
Diagnostics Benchmarks sometimes have role in system diagnostics. Systems that are
performing poorly can be benchmarked as key components are checked or
reconfigured. This helps the technician isolate and correct performance problems far
more reliably than simple “visual” observations.
Obtaining Benchmarks
Benchmarks have been around since the earliest computers, and there are now vast arrays of
benchmark products to measure all aspects of the PC—as well as measure more specialized
issues, such as networking, real-time systems, and UNIX (or other operating sys-tem)
platforms. Table 4-1 highlights a cross-section of computer benchmarks for your reference. In
many cases, the table includes a URL or FTP site where you can obtain source code for the
benchmark, or download the complete benchmark program. Today, Ziff Davis and CMP
publish a suite of freeware benchmark utilities that have become standard tools for end users
and technicians alike.
1. BatteryMark BatteryMark uses a combination of hardware and software to measure
the battery life of notebook computers under real-world conditions (the hardware used
in BatteryMark is the same ZDigit II device required by version 1.0). BatteryMark
exercises a different 32-bit software workload engines for processor, disk, and
graphics tasks. BatteryMark mixes these workloads together and adds periodic
breaks in the work that reflect the way users pause while working. BatteryMark 2.0
works with Advanced Power Management (APM) under Windows 95.
2. NetBench NetBench is our benchmark test for checking the performance of net-work
file servers. NetBench provides a way to measure, analyze, and predict how a file
server will handle network file I/O requests. It monitors the response of the server as
multiple clients request data, and reports the server’s total throughput. To test
application servers, you should use the ServerBench utility instead.
3. ServerBench ServerBench is the latest version of Ziff-Davis’ standard benchmark
for measuring the performance of servers in a true client/server environment. Server-
Bench clients make requests of an application that runs on the server—the server’s
ability to service those requests is reported in transactions per second. ServerBench
4.0 runs on IBM’s OS/2 Warp Server, Microsoft’s Windows NT Server 4.0 (for both
Digital Alpha and x86-compatible processors), Novell’s NetWare 4.11, Sun’s Solaris
2.5 on SPARC, and SCO’s OpenServer Release 5 and UnixWare 2.1. To test network
file servers, use the NetBench utility instead.
4. WebBench WebBench is the Ziff Davis benchmark test for checking performance of
Web-server hardware and software. Standard test suites produce two overall scores
for the server: requests per second and throughput (as measured in bytes per second).
WebBench includes static testing (which involves only HTML pages), and dynamic
testing (including CGI executables, Internet Server API libraries, and Netscape Server
API dynamic link libraries.
5. JMark JMark is a suite of 11 synthetic benchmark tests for evaluating the
performance of Java virtual machines. The JMark 1.01 suite simulates a number of
important tests of Java functionality. It includes Java versions of a number of classic
benchmark test algorithms, as well as tests designed to measure graphics performance
in a GUI environment. You can download JMark 1.01 from Ziff Davis, or run the
tests online within your browser.
6. Wintune 97 Wintune for Windows 95/NT is a recent benchmark entry from CMP,
the publishers of Windows Magazine. Wintune 97 is an overall benchmark to measure
Windows 95/NT performance. It has a fast user interface that allows the program to
load much faster than the earlier Wintune 95, and will now support testing of the
latest Pentium II systems. Wintune 97 tests video systems on the fastest new
computers at full-screen resolution.
Run Scandisk
The ScanDisk utility is designed to check your drive for file problems (such as lost or cross-
linked clusters), then correct those problems. ScanDisk is also particularly useful for testing
for potential media (surface) errors on a disk. ScanDisk will report any problems and give
you the option of repairing the problems.
Run Defrag
Operating systems like DOS and Windows 95 segregate drive space into groups of sectors
called clusters. Clusters are used on an “as found” basis, so it is possible for the clusters that
compose a file to be scattered across a drive. This forces the drive to work harder (and take
longer) to read or write the complete file because a lot of time is wasted moving around the
drive. The Defrag utility allows related file clusters to be relocated together.
Defrag will relocate every file on the disk so that all their clusters are positioned together
(contiguous).
Memory Failure
Memory is a cornerstone of the modern PC. Memory that holds the program code and data
that is processed by the CPU—and it is this intimate relationship between memory and the
CPU that forms the basis of computer performance. With larger and faster CPUs constantly
being introduced, and more complex software is developed to take advantage of the
processing power. In turn, the more complex software demands larger amounts of faster
memory. With the explosive growth of Windows (and more recently, Windows 95) the
demands made on memory performance are more acute than ever. These demands have
resulted in a proliferation of memory types that go far beyond the simple, traditional DRAM.
Cache (SRAM), fast page-mode (FPM) memory, extended data output (EDO) memory, video
memory (VRAM), synchronous DRAM (SDRAM), flash BIOS, and other exotic memory
types (such as RAMBUS) now compete for the attention of PC technicians.
Memory Speed and Wait States
The PC industry is constantly struggling with the balance between price and performance.
Higher prices usually bring higher performance, but low cost makes the PC appealing to
more people. In terms of memory, cost-cutting typically involves using cheaper (slower)
memory devices. Unfortunately, slow memory cannot deliver data to the CPU quickly
enough, so the CPU must be made to wait until memory can catch up. All memory is rated in
terms of speed—specifically, access time. Access time is the delay between the time data in
memory is successfully addressed, to the point at which the data has been successfully
delivered to the data bus. For PC memory, access time is measured in nanoseconds (ns), and
current memory offers access times of 50 to 60 ns. 70-ns memory is extremely common.
The question often arises: “Can I use faster memory than the manufacturer
recommends?”
The answer to this question is almost always “Yes,” but rarely does performance benefit. As
you will see in the following sections, memory and architectures are typically tailored for
specific performance. Using memory that is faster should not hurt the memory or impair
system performance, but it costs more and will not produce a noticeable performance
improvement. The only time such a tactic would be advised is when your current system is
almost obsolete, and you would want the new memory to be useable on a new, faster
motherboard if you choose to upgrade the motherboard later on.
A wait state orders the CPU to pause for one clock cycle to give memory additional time to
operate. Typical PCs use one wait state, although very old systems might require two or
three. The latest PC designs with high-end memory or aggressive caching might be able to
operate with no (zero) wait states. As you might imagine, a wait state is basically a waste of
time, so more wait states result in lower system performance. Zero wait states allow optimum
system performance.
- Seek: This is a random-access operation by the disk drive as it tries to locate the required
track for reading or writing. This demands about 8.5- to 9.0 W.
- Read/write: A seek has been completed, and data is being read from or written to the drive.
This uses about 5.0 W.
- Idle: This is a basic power-conservation mode, where the drive is spinning and all other
circuitry is powered on, but the head actuator is parked and powered off. This drops power
demands to about 4 W, yet the drive is capable of responding to read commands within 40ms.
- Standby: The spindle motor is not running (the drive “spins down”). This is the main
power-conservation mode, and it requires just 1 W. It might require up to several seconds for
the drive will leave this mode (or spin-up) upon receipt of a command that requires disk
access.
Broadly speaking a hard disk can fail in four ways that will lead to a potential loss of data:
1. Firmware Corruption / Damage to the firmware zone
2. Electronic Failure
3. Mechanical Failure
4. Logical Failure
Combinations of these four types of failure are also possible. Whether the data on the hard
disk is recoverable or not depends on exactly what has happened to the disk and how bad the
damage is. All hard disks also develop bad sectors which can lead to data loss and drive
inaccessibility.
1. Firmware Corruption / Damage to the firmware zone
Hard disk firmware is the information that is used by the computer that allows it to correctly
interact with the hard disk. If the firmware of a hard disk becomes corrupted or unreadable
the computer is often unable to correctly interact with the hard disk. Frequently the data on
the disk is fully recoverable once the drive has been repaired and reprogrammed.
2. Electronic Failure
Electronic failure usually relates to problems on the controller board of the actual hard disk.
The computer may suffer a power spike or electrical surge that knocks out the controller
board on the hard disk making it undetectable to the BIOS. Usually, the data on the hard disk
has not suffered any damage and a 100% data recovery is possible.
3. Mechanical Failure
Usually worse than electronic failure, mechanical failure can quite often (especially if not
acted on early) lead to a partial and sometimes total loss of data. Mechanical failure comes in
a variety of guises such as read / write head failure and motor problems. One of the most
common mechanical failures is a head crash. Varying in severity, a head crash occurs when
the read-write heads of the hard disk come into contact, momentarily or continuously, with
the platters of the hard disk.
Head crashes can be caused by a range of reasons including physical shock, movement of the
computer, static electricity, power surges and mechanical read-write head failure. Mechanical
failure can usually be spotted by a regular clicking or crunching noise. It's not necessarily a
head crash, the most important things to do if you suspect mechanical problems is to switch
off the drive immediately as further use will make matters worse.
4. Logical Errors
Often the easiest and the most difficult problems to deal with, logical errors can range from
simple things such as an invalid entry in a file allocation table to truly horrific problems such
as the corruption and loss of the file system on a severely fragmented drive.
Logical errors are different to the electrical and mechanical problems above as there is
usually nothing 'physically' wrong with the disk, just the information on it.
Remedial Procedure
Some of the steps involved in the remedy of the Hard drive failure are given below:
1) The first thing to check for is whether or not the hard disk can be seen by the hard disk
controller; usually on a true hard disk failure, the disk will not be detectable by the controller
(but this is not always the case). Assuming you have an IDE hard disk, enter the BIOS setup
program and use the IDE detection facility of the BIOS to see if the disk's parameters can be
detected. If the disk cannot be auto detected using the auto detect feature in the BIOS
program implies immediately some sort of hardware problem.
2) If you can see the hard disk when you auto detect, the problem is more likely to be
software than hardware. Remember that you cannot usually boot a brand-new hard disk until
it has been partitioned and formatted.
3) See if the disk will boot up. If it will not boot, then boot from a floppy boot disk and then
use the FDISK command (or other partitioning software) to see if you can see the disk.
4) If the drive will boot up, then you should be getting a more specific error message of some
sort, or a more specific failure mode that you can use for troubleshooting. 5) If the drive is
detected in the BIOS setup but cannot be booted or accessed when booting from a floppy
disk, then there is a good chance that the disk itself may be bad. If possible, try connecting
the hard disk to another system and see if the problem is present there as well.
If the monitor appears to be totally dead, then make sure that the monitor is plugged in and
turned on, and see if it has power. Check for a power-on indicator on the front of the monitor,
and check to see if the monitor warms up.
Select the appropriate general failure troubleshooting procedure outlined below, based on
whether or not the monitor appears to have power.
Vacuum Cleaners and Keyboards
There is an ongoing debate as to the safety of vacuum cleaners with computer equipment.
The problem is static discharge. Many vacuum cleaners—especially small, inexpensive
models—use cheap plastic and synthetic fabrics in their construction. When a fast air flow
passes over those materials, a static charge is developed (just like combing your hair with a
plastic comb). If the charged vacuum touches the keyboard, a static discharge might have
enough potential to damage the keyboard-controller IC, or even travel back into the
motherboard for more serious damage.
670 Keyboards
Avoid removing the <Space Bar> unless it is absolutely necessary because the space bar is
often much more difficult to replace than ordinary keys. If you do choose to use a vacuum for
keyboard cleaning, take these two steps to prevent damage. First, be sure that the computer is
powered down and disconnect the keyboard from the computer before starting service. If a
static discharge does occur, the most that would be damaged is the keyboard itself. Second,
use a vacuum cleaner that is made for electronics work and certified as “static-safe.” Third,
try working on an anti-static mat, which is properly grounded. This will tend to “bleed-off”
static charges before they can enter the keyboard or PC.
Interrupts are signals sent to the CPU by external devices, normally I/O devices. They tell
the CPU to stop its current activities and execute the appropriate part of the operating system.
There are three types of interrupts:
1. Hardware Interrupts are generated by hardware devices to signal that they need
some attention from the OS. They may have just received some data (e.g., keystrokes
on the keyboard or a data on the ethernet card); or they have just completed a task
which the operating system previous requested, such as transferring data between the
hard drive and memory.
2. Software Interrupts are generated by programs when they want to request a system
call to be performed by the operating system.
3. Traps are generated by the CPU itself to indicate that some error or condition
occurred for which assistance from the operating system is needed.
Interrupts are important because they give the user better control over the computer. Without
interrupts, a user may have to wait for a given application to have a higher priority over the
CPU to be ran. This ensures that the CPU will deal with the process immediately.