SEVEN STEP TROUBLESHOOTING
Troubleshooting is a method of finding the cause of a problem and correcting it. The ultimate goal of troubleshooting is to get the equipment back into operation. This is a very important job because the entire production operation may depend on the troubleshooter's ability to solve the problem quickly and economically, thus returning the equipment to service. Although the actual steps the troubleshooter uses to achieve the ultimate goal may vary, there are a few general guidelines that should be followed. There are often cases where a familiar piece of equipment or system breaks down. In those cases, an abbreviated five-step troubleshooting process can be used to find the fault, get the system up and running. It is important to note that, although it is a five-step approach, the same basic guidelines of the seven-step troubleshooting method are followed. The steps are simply combined to be specific to the problem at hand. This article will briefly cover the five-step troubleshooting process, followed by a more in-depth look at the seven-step troubleshooting process.
Use a clear and logical approach
The five-step troubleshooting process consists of the following:
Within the four general guidelines previously mentioned, there are several action items that are important to the successful achievement of the goal of troubleshooting:
1. Verify that something is actually wrong.
A problem usually is indicated by a change in equipment performance or product quality. Verification of the problem will either provide you with indications of the cause if a problem actually exists or prevent the troubleshooter from wasting time and effort on "ghost" problems caused by the operator's lack of equipment understanding. Do not simply accept a report that something is wrong without personally verifying the failure. A few minutes invested up front can save a lot of time down the road.
2. Identify and locate the cause of the trouble.
Trouble is often caused by a change in the system. A thorough understanding of the system, its modes of operation, and how the modes of operation are supposed to work, the easier it will be to find the cause of the trouble. This knowledge allows the troubleshooter to compare normal conditions to actual conditions.
3. Correct the problem.
It is very important to correct the cause of the problem, not just the effect or the symptom. This often involves replacing or repairing a part or making adjustments. Never adjust a process or piece of equipment to compensate for a problem and consider the job finished; correct the problem!
4. Verify that the problem has been corrected.
Repeating the same check that originally indicated the problem can often do this. If the fault has been corrected, the system should operate properly.
5. Follow up to prevent further trouble.
Determine the underlying cause of the trouble. Suggest a plan to a supervisor that will prevent a future recurrence of this problem.
This basic troubleshooting philosophy is the basis for the seven-step troubleshooting method discussed later. It reflects the basic strategy for troubleshooting, though each individual facility may require a different application of the strategy specific for the equipment and policies at that facility. An important point to remember as we discuss the seven-step methodology is that we are discussing a philosophy - not a procedure. Using the seven-step philosophy, a procedure could be developed that would provide the most cost-effective and efficient means for troubleshooting a particular piece of equipment in a given facility. However, this procedure would not necessarily be effective when used with different equipment or even the same equipment installed in a different facility.
There is no substitute for experience is a catchy and, more often than not, true phrase. If only there were a way to capture even a small part of that experience to be used in the future either by those who have not been fortunate (or unfortunate, as the case may be) enough to see something for themselves or for those with who have seen too many years between experiences. This is the point of an equipment history, or troubleshooting log. This can tell quite a tale over the life of a piece of equipment. The troubleshooting log provides a valuable source of information from which the troubleshooter can draw on the experience of past troubleshooting efforts to quickly restore the equipment to service. Problems, symptoms, corrective actions, modifications, and preventive maintenance actions all should have entries that can be referenced at a later date. Many companies require their maintenance personnel or engineering staff to maintain historical data on equipment used within their facilities. These requirements are not intended to be a burden on the maintenance or engineering departments, nor are they meant to destroy every tree on the planet with unnecessary paperwork. The equipment history can help prevent the troubleshooter from "recreating the wheel." It can lead the troubleshooter to the solution to a problem that has not occurred in years and has troubleshooting efforts to move slowly as the troubleshooter checks every possibility. Additionally, documentation of recurring problems can provide the horsepower needed to get the right part or the engineering solution necessary to not only fix the problem, but also correct it. Without this historical data and documentation of a recurring problem and its associated costs, the arguments will often be met with the statement "if it is not written down, it did not happen." The equipment history/troubleshooting log is an ideal place to keep the records necessary to establish and maintain a common problems list. The purpose of the common problems list is to provide the troubleshooter with a ready reference of past problems and their corrective actions. It is from this list that quick fixes can be taken. If a problem occurs on a regular or routine basis, it should be put on the common problems list. This can be referred to at the beginning of a troubleshooting problem so the quick fixes can be tried. This can save the troubleshooter valuable time when troubleshooting. Troubleshooters or technicians need to be careful of what is placed on the common problems list. If something occurs once, it is not necessarily a common problem. The problem should be listed in the history section and should not be put on the common problems list until it occurs again. This is because the tools used for troubleshooting are only as good as their application. If the common problems list is too long and cumbersome, it cannot be used effectively. Figure 1 shows an example of a troubleshooting log that could be used as a common problems list. Completing the required information on a troubleshooting log may seem tedious, but the information on the log can be very beneficial to a technician looking for the solution to a problem several months or even years later.
Figure 1: Troubleshooting Log/Common Problems List
At this point in our discussion, we are ready to examine a method for effective, logical troubleshooting: the seven-step troubleshooting method. The seven-step troubleshooting method consists of the following seven steps:
When necessary, each of these steps should be used in the proper order. Deciding when each is necessary is a very important part of troubleshooting. This is where a strategy is developed into a procedure. Many of the more modern designs of equipment in use today offer extensive diagnostics programs and tools as an integral part of the equipment. Some have internal troubleshooting programs that allow the equipment to "troubleshoot" itself to a large degree. These programs and tools usually check inputs and outputs against pre-programmed normal parameters. If a discrepancy is noted, that function is flagged as a potential problem. Some programs are more sophisticated and will actually check functions to a component level, but they usually are only found on very expensive and high-tech equipment. The strategy that the program uses is a simple logical input-output comparison. Systems or equipment that are designed for some form of self-troubleshooting obviously do not require implementation of every one of the seven steps. The equipment itself may perform any one or all of the steps, with the exception of failure analysis and retest requirements. All that is required of the troubleshooter is an understanding of what the equipment diagnostics is indicating and what the quickest and most effective way of clearing the fault is. When any troubleshooting effort is necessary, writing down or referring to the seven steps will ensure that a conscious decision is made as to what steps apply and what steps do not apply. Approaching the problem in this fashion will ensure that valuable time is not wasted back-tracking to an action or thought process that was skipped initially. Next, we will take a look at each of the seven steps individually to see what should be accomplished for each step.
This is the most fundamental step in troubleshooting. Each and every person that has ever fixed anything has accomplished it. This step asks the question "Does a failure exist" The first step in identifying a failure is recognizing that a failure exists. This sounds ridiculously simple, and usually it is, but it is also very important. For example, a common failure can be as simple as the power is not connected to a power supply. Electric motors and electrical circuits will not operate without electricity! This is very simple troubleshooting, but it can save a lot of time and potential embarrassment. The symptom recognition step is very straightforward. It requires an entry in the troubleshooting log that states what the indications of a problem are. For example, the indication might be that pump #3 does not start. Always check for additional symptoms of common problems. Unusual symptoms of common troubles occur more often than common symptoms of unusual troubles. The following list provides some guidelines for entries made during the symptom recognition step:
The symptom elaboration step is the beginning of "actual" troubleshooting. The objective of this step is to obtain as much information about the problem as possible. Symptom elaboration is where the question "What is the problem" is asked. As its name implies, this step elaborates on the symptom written in step one. For example, perhaps the cylinder extension stroke is too slow but the retraction stroke timing is satisfactory. This step provides all of the information necessary to narrow the problem down in a logical fashion. The following points would be considered in the symptom elaboration step. * Be aware that a large number of equipment faults can produce similar symptoms. During this step, try to differentiate as much as possible between the characteristics of the symptoms.
-There may be a possibility of improper pressures, flows, or voltages exceeding maximum design specifications.
This step is intended to narrow down the possible faulty functions based on the information obtained in steps one and two. A functional block diagram of the equipment and the troubleshooting log (steps one and two) are needed for this step. The question asked by this step is "Would failure of this function cause the symptoms I am seeing" Again, the purpose of this step is to narrow the possibilities down to a list of probable faulty functions. Key points for this step include:
This step requires careful evaluation of each of the probable faulty functions listed in the previous step. The goal is to determine exactly which area of the system is causing or generating the problem. This is the first step that requires taking a measurement. The measurement taken may be a system pressure, operating speed, sequence, time delay, temperature, or any variable parameter that is related to the equipment operation. The purpose of this step is not to find the faulty component; it is just to isolate the problem to a circuit or function. More than one of the previously listed probable faulty functions may be contributing to the overall problem. This step is not complete until each and every listed possibility is properly checked. The following key points should be noted:
This step continues isolating the fault once the faulty function or functions have been determined. A thorough knowledge of the equipment operation, as well as individual component characteristics, is required for successful completion of this step. Schematic diagrams should be used at this point to ensure that no details go unnoticed. When localizing the trouble to a faulty component, keep in mind the following points:
This step requires the failed component(s) to be repaired or replaced and, most importantly, the cause of the failure corrected. The following key points should be noted.
Now that the equipment is operational, check all the functions that have been affected by the failure. Although the equipment has been repaired and is now functioning, all operations must be checked and verified. The information obtained in this step can also aid in troubleshooting next time by providing some baseline information. One key point to remember is:
The seven-step method and its associated important points are provided as a general guide to assist the maintenance person. Circumstances vary from task to task and may require a slightly different troubleshooting approach. Experience and the basic path outlined here will allow an appropriate approach and solve problems in a more efficient manner.
Because of the variety of items they are expected to maintain, troubleshooters do and use different things. Some use screwdrivers, while others use oscilloscopes, stethoscopes, voltmeters, or wiring diagrams. Some spend a great deal of time disassembling in order to gain access to test points or adjustments, and others spend none. Some require access to large amounts of documentation, while others need only a page. Whatever the nature of the equipment to be fixed, and the equipment used to accomplish that purpose, competent troubleshooters do not differ so much by WHAT they do, but rather by HOW they go about doing it. The strategy (the approach) is much the same for all equipment; only the tactics (the steps for implementing the strategy) differ. Here is what competent troubleshooters do, in the approximate order in which they do it.
Operators are the richest potential source of information about what is wrong and where the trouble is. Competent troubleshooters always talk to the operators when available. Operators are with the equipment when the trouble occurs, and they generally know what the operators and the equipment were doing when it happened. The operator can provide indications of the problem by describing what happened that was different from normal operation. Many times, the operator helps point the troubleshooter in the general direction:
This information may tell the competent troubleshooter a great deal about what is wrong and where. Sometimes though, the operator is not that helpful:
Under these circumstances, the troubleshooters patience and ability to ask the right questions may result in more helpful information. Attempting to make the operator feel inferior by using highly technical terms or sarcasm in questioning will not increase the level of communication or cooperation and only serves to waste valuable time. In many cases, the operator can tell the troubleshooter exactly what and where the trouble is. When that happens, other troubleshooting steps can be avoided; the troubleshooter merely verifies the symptoms and clears the trouble.
While sometimes wrong or not too helpful, operators are still the most potent source of information available, and competent troubleshooters head for the operator as a first step.
Immediately after their interview with the operator, competent troubleshooters verify symptoms. They know that hearing or seeing a symptom is not automatic proof of a malfunction. Just because equipment does not work properly does not mean something is wrong with it. Suppose an operator forgot to turn it on or plug it in Suppose a switch or a valve was left in the wrong position Human error is notorious for being a source of troubles, and competent troubleshooters know it is inefficient and potentially embarrassing to break out test equipment and sophisticated analytical procedures before verifying the symptoms. Before television, there was only radio. When a radio malfunctioned, its owner took it to the radio repair shop, plopped on the counter, and described its symptoms. Often, a customer would complain about a "hissing radio." "It doesn't work anymore," the owner would complain. "It just hisses and makes crackling sounds, but it doesn't get any stations." The minute the customer said "hisses," the troubleshooter would casually turn the radio around, look at the back, and verify the trouble. Sure enough, there was nothing wrong with the radio. The operator had accidentally snapped the AM short-wave switch to the short-wave position, making reception of AM stations impossible. From the troubleshooters point of view, the problem was then "How do I switch the switch without making the customer feel foolish" Generally, the solution was to take the radio to the back room and make the "fix" there or to ask the customer to return the following day. This is but one classic example of an operator-induced problem. There are many, many others, and experienced troubleshooters can describe the amount of time spent troubleshooting failure that were not there. Competent troubleshooters verify symptoms before proceeding with more involved efforts. They determine whether the trouble is real or not to ensure they do not spend time troubleshooting when they should be instructing an operator on how to avoid the trouble in the future. When the trouble is real, symptom verification will often provide benefits in addition to actual confirmation of a problems existence. By operating the equipment, the troubleshooter will often collect more clues about the trouble's location than were provided by the initially reported symptom. When something goes wrong, it can show up in more ways than one. Troubleshooters who can tell the difference between normal and abnormal operation will spot these additional clues. "I do a lot of troubleshooting by telephone," a highly competent troubleshooter of video equipment explained. "When a customer tells me whats wrong, Ill have them operate the system for me and tell me what happens. Lots of times I'm able to tell them what the problem is right on the phone. I don't even have to see the equipment." (Let alone rig test equipment.) No question about it. Competent troubleshooters verify symptoms before digging into the equipment itself.
Even before they have located a trouble, competent troubleshooters attempt quick fixes; that is, they attempt solutions that are fast to try, even though they may be illogical in terms of the symptoms presented. They check fuses, adjust controls, push circuit boards firmly into their sockets, clean contacts, clean filters, replace gaskets, vacuum, dust, and bang or kick interlocked doors or cabinets to make sure they are properly seated. They tighten this or reset that, adjust here or align there. Troubleshooters know that these actions will clear the trouble some of the time, and since they are rapidly accomplished, they are worth doing. If quick fixes work, time and effort have been saved. If they do not work, only a moment has been lost and the information has been gained that certain parts of the equipment, at least, are not the cause of the trouble. Often, troubleshooters engage in these rapid clearing actions while verifying symptoms and looking for other visible signs of malfunction. Auto mechanics, for example, are likely to twist or jiggle spark plug wires while looking around the engine compartment, regardless of the nature of the trouble. They know it is worth doing this since rough-running engines are sometimes caused by loose, oily, or wet contacts. In a way, quick fixes are a form of preventive maintenance. In common usage, preventive maintenance means periodic general servicing of equipment, whether it needs it or not. These actions are carried out because they will either lengthen the life of the equipment or increase the amount of time the equipment is operational they will minimize down time. Competent troubleshooters know that many troubles are caused by inadequate preventive maintenance or a total absence of such routine care. Thus, some quick-fix actions can be thought of as belated preventive maintenance. There is another reason for attempting a quick fix or, if you prefer, for attempting solutions without first doing detailed troubleshooting. Equipment troubles do not occur with equal probability; some are much more likely to occur than others. Competent troubleshooters know this. Troubleshooters also know which troubles are likely to occur most often and the symptoms associated with those troubles. Moreover, everybody knows that some troubles are more common than others. When the table lamp does not light, typical troubleshooting does not begin; the bulb is commonly changed first. When the car does not turn over, the battery is commonly checked. It is not always the battery, and it is not always the light bulb, but the probability is high that these are the sources of trouble. When they are, it would be inefficient to pretend that these probabilities do not exist, especially when clearing actions are quick and easy to take. It would be very costly to pretend that trouble probabilities do not exist and demand that troubleshooters always follow the same procedure for the sake of uniformity or because the prescribed procedure will eventually lead to the trouble. Efficient troubleshooting, then, requires that troubleshooters be armed with all available trouble-probability information. Unarmed, they are deprived of a potent tool for rapid trouble isolation. Regardless of what their information is called, competent troubleshooters attempt solutions that are rapid and efficient because they pay off generously either in a trouble cleared or in information gained.
When troubleshooters have talked with the operator, verified symptoms, and tried quick fixes, but still have not located the fault, additional information must be collected. The troubleshooting aid is the next most efficient source of information to check out. Why Because such aids offer some prepackaged information that troubleshooters would have to seek elsewhere if the aid were absent. Of the several types of troubleshooting aids, some are brief and not too helpful, while others are highly sophisticated or even automated. For example, consider the easy-to interpret "idiot lights" in automobile that indicate when the oil pressure is too low or when the alternator ceases to provide a suitable charge for the battery. The cockpit of a modern aircraft is loaded with bells, buzzers, and sirens to indicate various malfunctions and even impending malfunctions. In one fighter aircraft, for example, there is a repeating sound that changes frequency and tempo as gravity forces are built up during a turn. The higher the G-forces, the higher and more rapid the sound, telling the pilot of an approaching malfunction. The sound monitors a stress condition of the aircraft, and from listening the pilot knows whether or not a correction is needed. In this case, there is no need to talk with an operator, collect additional symptoms, or try quick fixes other than the one suggested by this troubleshooting aid. More and more modern equipment is being designed to provide direct information about troubles. Sensors detect troubles that are then reported by lights, sounds, and other forms of information display. These aids are a response to the growing complexity of some types of equipment, but reflect what is still a growing technology. Though immensely useful, it is still possible for the telltales themselves to fail, making the troubleshooting task even harder than before. Therefore, the importance of providing troubleshooters with other well-designed aids is still as strong as ever. A sometimes-overlooked troubleshooting aid is the "Caution" information attached to the equipment itself. "Caution: Remove all red tags before operating" is one example. "Caution: High voltage in this cabinet" is another. True, these aids do not help in locating troubles, but they do help to save the equipment and the troubleshooter from early death. Still another type of aid is the "If/Then" page, which typically describes symptoms on the left and suggested actions on the right. Troubleshooters may find aids like this in the owners manual that came with the automobile and the instructions accompanying your appliances. They indicate what some common troubles are and what to do about them. Similar troubleshooting aids may be provided with more sophisticated equipment, and some maintenance people construct their own. Related to these is the troubleshooting tree, a type of flowchart that walks the troubleshooter through a series of actions and decision points, and hopefully, to the trouble. Often called fully procedural troubleshooting aids, these aids are a form of thinking prompt, or a form of prepackaged analysis intended to relieve the troubleshooter of the need to memorize all the steps to follow. Well-constructed aids of this type do indeed improve the speed and accuracy with which faults are located, even by the inexperienced troubleshooter. At least one study comparing the usefulness of procedural troubleshooting aids with more traditionally constructed maintenance manuals, showed these aids to be better than the manuals. More troubles were located, and inexperienced troubleshooters made as few errors as experienced people. This should be expected, as the fully procedural troubleshooting aid is a carefully constructed and tested way of guiding the troubleshooter to the source of the problem. Troubleshooters have to know more about the system in order to make good use of traditional maintenance manuals. In addition to knowing the geography of the system, they need more specific troubleshooting knowledge to make up for the incomplete or inaccurate information in the manual. Then there are the sophisticated diagnostics aids used in locating malfunctions in computers and similar equipment. They do not require the troubleshooters assistance at all, except to initiate the diagnostic operation. Diagnostics are programs designed to exercise a system, note discrepancies between normal and abnormal operation, and report the nature, and often the exact source, of the trouble either on a video display or printer (for example, "Bad RAM at CY"). When troubleshooting aids exist, experienced troubleshooters use the ones that remind them of the efficient paths to follow for information collection or those that report specific troubles. They do not use aids containing information they have already memorized through practice and experience, and they do not use aids that are poorly designed.
When other sources of information fail to reveal the trouble's source, troubleshooters turn to a step-by-step search through the equipment itself. This is the last resort of competent troubleshooters, however, as it is the least time-efficient system of information gathering when compared to other information sources. This is not to say that the step-by-step search is unimportant; it is only to say that this procedure (oddly referred to as "systematic," "analytical," or "logical" troubleshooting) is used by competent people only after all other information sources fail. Several step-by-step search procedures might be used. A random search could be a way to test and replace components, and troubles would eventually be cleared. Unfortunately, since as many troubles would be located later as would be sooner, this approach is used only by the uninformed. A sequential search involves systematic testing, starting from one end of the equipment and working item by item to the other end. Although this procedure will also lead eventually to the trouble, it too is inefficient because troubles at the far end of the equipment take a long time to get to. The preferred search procedure is one that yields the most information for the least effort; that is, the most information per action, such as per test check or per trial replacement. Ideally, this search procedure is one that successively eliminates half the system as a possible trouble source. Called the split-half or half-split search, the procedure involves successively testing the system at or near its midpoint. When a test shows normal operation, then the portion of the system preceding that point is considered OK and is eliminated from suspicion. By successively eliminating approximately half of the remaining system with each test, the trouble is located more efficiently than with a random or sequential search. Four points must be made:
Once a trouble is located, someone is expected to eliminate it. Trouble clearing is often done by the troubleshooter, but sometimes it is assigned to someone else. The master auto mechanic, for example, does the diagnosis, but then may assign the actual repair work (trouble clearing) to someone else. The chief engineer at a radio or TV station may be called in to troubleshoot, and then turn the trouble-clearing activity over to the on-duty engineer. Manufacturers hotshot troubleshooters who travel to clients locations to solve difficult problems often leave the actual trouble clearing to the local staff. Trouble clearing is different from trouble locating, and locating requires a different set of skills than clearing. This article concentrates only on locating the source of the trouble.
Preventive maintenance is the process of clearing troubles before they happen, a process that good troubleshooters perform as regularly and carefully as time and policy permit. Performing PM is more than just a ritual or just another company policy; preventive maintenance saves a great deal of time and money and reduces equipment downtime. It is appropriate to do PM on some machines even before starting to hunt for the trouble. PM usually is fast and may clear the trouble. However, for most machines, PM is carried out after the trouble has been cleared. One troubleshooter explained it this way: "Look, when the customers machine is down and the plant has come to a grinding halt, they dont want to see my troubleshooters oiling and greasing. They want that equipment up and running! The oiling and greasing is done after the equipment is operational."
Competent troubleshooters always check to make sure the trouble is actually cleared and the system is functioning normally. They know too well how easy it is to cause a new trouble while clearing an old one. They also know how easy it is to leave something like a setscrew loose, or something unplugged or out of adjustment. Therefore, a final check of normal operation is a necessary part of the troubleshooting sequence.
Troubleshooters are not immune to the bureaucratic plea to "fill out those forms!" Even though paperwork is not troubleshooting, it is part of the troubleshooters job. Often, the history of a machine is recorded in an equipment log. Dates of PMs, information about retrofit, and parts that have been changed are recorded at the time of service or repair. Referring to and keeping up a log are two paperwork activities that are part of the maintenance job. Sometimes troubles can be quickly located by simply reading the history in the log, often because the same trouble occurs regularly in that equipment. For this reason, the equipment log is a useful source of information, and good troubleshooters take the time to update these logs as well as to refer to them.
Once the equipment is returned to service, the user is informed of this fact. Often, operators are instructed in the proper use or care of the equipment or cautioned about peculiarities of the system. Although this activity is not strictly part of the troubleshooting procedure, it is important to the continued proper functioning of the equipment.
The next step is to review a flowchart depiction of the action and decision steps in the strategy just described. A flowchart is a graphical tool used to represent the steps of a process. The flowchart uses standard symbols to represent process steps, decisions, and other events. Figure 2 shows typical standard flowchart symbols.
Figure 2: Typical Flowchart SymbolsA flowchart depicting the typical troubleshooting process just described is shown in Figure 3. This flowchart represents the troubleshooting procedures followed by an individual at the location where the equipment trouble is noted.
Figure 3: Flowchart ModelTroubleshooters usually receive a report of trouble in the form of a symptom:
After locating the correct machine (and good troubleshooters always make sure they have the right machine), they try to interview the operator. Unless the machine is jammed or otherwise inoperable, they operate the machine and verify the symptoms collected from the operator. If the problem is operator-induced, they clear it and then instruct the operator in ways to prevent the problem from occurring again. If the problem is real, they try quick fixes (check interlocks, plugs, and cables; replace units). If any of these work, preventive maintenance may be called for and carried out. Then, final checks are made, documentation (paperwork) is completed, the area is cleaned and checked, and the area supervisor is informed. If quick fixes do not solve the problem, troubleshooters follow troubleshooting aids if they are available. If aids are not available, a half-split search procedure is used as a last resort. When troubleshooters develop a good idea about where or what the trouble is, they test their hypothesis by attempting a solution. If a solution does not work, the search is continued. If the solution does work, troubleshooters complete any preventive maintenance that is indicated and then follow the end steps already described (final check, documentation, area check, and communication).
The seven-step troubleshooting method described previously assumes that little may be known of the process or system with a problem. Many times that is the case. The technician, electrician, or mechanic must systematically try to resolve the problem by using his or her skills and intuition. There are often cases, however, where a familiar piece of equipment or system breaks down. In those cases, an abbreviated five-step process can be used to find the fault, get the system up and running. It is important to note that, although this is a five-step approach, the same basic guidelines of the seven-step troubleshooting method are followed. The steps are simply combined to be specific to the problem at hand.
The five-step troubleshooting process consists of the following:
Each of these steps is described next using the flowchart approach.
Next, the troubleshooter should observe the equipment or system to get a first-hand impression of the trouble. During this observation, the troubleshooter should note all abnormal symptoms. To evaluate the equipment thoroughly and elaborate on the symptoms observed, the troubleshooter will probably need to examine the equipment documentation. This includes prints, operating characteristics, and procedures. Since the equipment operator is probably most knowledgeable about the equipment, it is important to discuss the documentation with the operator. This will help to determine if any changes exist. Some examples of useful graphic documentation are:
A panel graphic is a graphic representation of the system that is mounted on an equipment or system control panel. Although the panel is intended to provide the operator with a big picture of the operations, it can be useful to the troubleshooter during this step. Figure 4 is an example of a panel graphic.
Figure 4: Panel Graphic
Figure 5: Loop DiagramA more useful diagram for electricians and technicians is the piping and instrumentation diagram, described next.
A piping and instrumentation diagram (P&ID;) shows the functional layout of a fluid system and its piping, valves, and instrumentation as clearly and accurately as possible. It is accurate to the extent that all components are connected to each other as shown in relation to flow path orientation. A P&ID; does not attempt to represent the actual physical layout of equipment, i.e., a valve that may appear to be right at the discharge of a pump can physically be located quite some distance from the pump and on another elevation (floor). Many times, however, a P&ID; will use a broken line encircling a group of equipment to indicate that it is all in one building. Another name commonly used for P&IDs; is bubble diagram due to the use of a circle for locators and symbols. A piping and instrumentation diagram depicts all components of a particular system, including pipe sizes, flange sizes, valve sizes, flow direction, and references to other related diagrams. Rather than try to pictorially include all the valves, piping, instruments, and equipment in a fluid system, a P&ID; uses standardized symbols to represent these items. A section of a simple P&ID; is shown in Figure 6.
Figure 6: Simple P&ID; Section
The P&ID; is useful when troubleshooting entire systems or processes to find a faulty component. A P&ID; shows the relationship between mechanical, electrical, and control components of the system. It does not give any details on the electrical or control circuitry. For circuit troubleshooting, the block diagram, wiring diagram, and schematic diagram may be more useful.
Figure 7: Block Diagram
Block diagrams are used to show the parts included in the system and the electrical order the parts are in. Knowing this, the system can be analyzed to determine where a fault might lie. Block diagrams are useful but have some disadvantages. They do not show the accurate physical location of the components in the system. Also, a single line represents all electrical connections. There usually is no indication whether the single line represents a cable or several cables.
Schematic diagrams (often just called schematics) are drawings that show all the components in their proper electrical positions, but not necessarily in their proper physical locations. Schematic diagrams are very useful to the technician troubleshooting an electrical or electronic circuit. Schematic diagrams usually are designed to be read from left to right and from top to bottom. There typically are standard electrical diagram symbols and device function numbers on these diagrams. The positions of the contacts and switches usually are shown as they would be in the relaxed or de-energized state. A schematic diagram is shown in Figure 8.
Figure 8: Schematic Diagram
A wiring diagram is structured such that is represents all the wires that were presented in the schematic diagram in their actual locations. It shows all electrical connections in an enclosure. Each wire is labeled to indicate where each end of the wire is terminated, such as a terminal board location. Using the documents described so far, a technician can accomplish a great deal toward finding the cause of the problem. During this step, the technician identifies possible faults that could result in the problem. These faults should be listed so that they can be checked and eliminated if possible. The flowchart in Figure 9 shows a block-by-block representation of this step. In the next step, we will discuss isolating the real cause of the problem.
Figure 9: Step One: Verify That a Problem Actually Exists
The second step of the five-step troubleshooting process relies heavily on the troubleshooters technical skills and intuition. During this step, the troubleshooter is actively involved in isolating the cause of the problem. This involves physical activity, such as reading instrumentation, connecting test equipment, adjusting parameters, and possible dis assembly. It also involves mental activity, such as logic, evaluation, and reasoning. The specialized knowledge of the troubleshooter plays a key part in this step. To safely and effectively isolate the cause of the problem, keep the following in mind:
Using the appropriate documents and test equipment, the troubleshooter continues to eliminate possible causes. As each check is completed, the trouble becomes more isolated. Using techniques previously discussed, such as half-splitting and signal tracing, helps to narrow the problem down quickly. A flowchart illustrating the process used to isolate the cause of the problem is shown in Figure 10. Once the problem has been isolated to a specific component, it can be repaired. Correcting the problem is discussed next.
Figure 10: Step Two: Isolate the Cause of the Problem
The third step of the five-step troubleshooting process is correcting the cause of the problem. This step involves performance of the repair or other activity that eliminates the problem. This can be as simple as turning a switch or adjusting a valve, or it could be as complex as re-winding a motor or overhauling a pump. To correct the cause of the problem, the troubleshooter performs both failure analysis and a retest of the equipment. This is shown in the flowchart pictured in Figure 11.
Figure 11: Step Three: Correct the Cause of the Problem
Once the corrective action is taken, the troubleshooter should verify that the trouble has been corrected. This usually involves rechecking the same indications that proved there was a problem. This time though, the checks should prove that a problem does not exist. This step should be thorough. If there are both an abbreviated procedure and an expanded procedure for checking the equipment, use the expanded procedure. This helps ensure that the problem no longer exists and did not mask another problem. During this verification, the following should be observed:
By thoroughly verifying the proper operation of the repaired equipment, the troubleshooter can be relatively sure the problem has been resolved correctly. To help ensure the problem does not reoccur, the next step in the process is performed.
The final step in the five-step troubleshooting process is to follow up to prevent future problems. This step involves taking preventive measures and recommend actions that could help keep the equipment from failing. This may include the following:
Although the system retest and preventative measures taken may not seem as vital as fixing the problem and getting the equipment back on-line, these steps are vital to long-term productive performance. The flowchart shown in Figure 12 depicts the actions taken in these steps.
Figure 12: Step Four: Verify That the Problem Has Been Corrected Step Five: Follow Up to Prevent Future Problems
In observing competent troubleshooters in action, troubleshooters will not always see the use of the most efficient, ideal procedure. Sometimes the use of less-than-ideal tactics as a means of dealing with various constraints can be performed. For example, telephone maintenance people are often faced with a trouble referred to as CCIO ("Cant Call In or Out"). Once they have ruled out trouble in the central office as the cause, they are supposed to check the telephone instrument itself to verify that the trouble exists as reported. Then, they are supposed to check their way from the instrument toward the telephone exchange until they pinpoint the trouble. However, they do not always troubleshoot in this manner. At one company, they always examine the first checkpoint they come to as they are driving toward the customers telephone, regardless of where that point is in the logical chain of test points. Why Because the cost of operating repair trucks is so high that company policy has been set to follow a more efficient vehicle-use procedure. Gas is saved, but the procedure takes longer. Policy says that it is more important to minimize "windshield time" than it is to maximize troubleshooting efficiency. In observing troubleshooters at another plant, a troubleshooter would note that other troubleshooters generally fail to verify their diagnosis with test equipment. Instead, the troubleshooter might keep trying different solutions until success in clearing the trouble is seemingly achieved; appearing as if changing parts at random is the solution. Why Not because the troubleshooter is not aware of more efficient troubleshooting procedures, but because the test equipment is awkwardly located some distance away. It is easier to test the guesses by changing parts than to take the time and effort needed to verify a diagnosis. Why is the test equipment kept in the tool crib instead of at a location closer to those who need it It has always been done that way! At a third company, troubleshooters follow a similar procedure, not because the test equipment is relatively inaccessible, but because the schematic diagrams are classified and are kept locked up. It is easier, though less efficient, to try a string of solutions than to bother signing the schematics in and out. The variations just described illustrate two types of reasons for deviating from the ideal troubleshooting strategy:
If a troubleshooting strategy deviates from the models shown on the previous pages, the deviation should be for good reasons rather than because it has always been done that way.
Now that two types of ideal troubleshooting strategies have been identified,seven-step and flow charting, it is time to develop a specific troubleshooting procedure that will fit specific equipment and related situation. The troubleshooter will translate an ideal strategy into specific tactics appropriate to troubleshooting your equipment and create a troubleshooting tool.
When the flowchart meets the test criteria, the troubleshooter has derived the ideal troubleshooting strategy for the equipment.
Most problems a troubleshooter faces are relatively simple to analyze and repair. The equipment, or an associated component, fails and the failure is obvious. There are times, however, when the problem is not so apparent. In fact, some problems only occur sometimes. When a failure is sporadic, or it is not always present, it is called an intermittent failure.
An intermittent failure can create much aggravation and frustration for the troubleshooter. It also can create havoc within a process or system operation. Diagnosing the fault, as difficult as it is, can be accomplished using these general guidelines:
A brief description of each of these guidelines follows.
If the problem is no longer apparent and operator error has been ruled out, the system or equipment must be examined to find the fault. One of the first things a troubleshooter should try to do is recreate the problem. Using information obtained from the operator and from any equipment history or logs, make an attempt to establish operating conditions that are similar to those that existed at the time of failure. This may require placing the equipment in a state that is contrary to other equipment operation. For this reason, troubleshooting of an intermittent failure is performed off-line, usually in a maintenance shop. Three basic types of intermittent problems will be described. Most intermittent problems fall into one of the following categories:
Although other classifications could be used, an intermittent problem usually occurs only under certain circumstances. Contrary to common belief, most equipment does not have a mind of its own. Two of the most likely things to change in a system during operation are temperature and mechanical functions. For this reason, the first two categories exist. The third category, erratic failure, is a catchall for other intermittent problems. It is also the most difficult type of problem to troubleshoot.
This is just a partial list of devices available for long-range monitoring of the suspect system. Other means can be used to help diagnose the erratic failure. Once the monitoring has been performed, the results must be analyzed. Each aspect of the factors that may contribute to the failure must be assessed to determine the real cause of the problem. While using a monitoring device provides useful information concerning the symptoms of the problem, it does not identify the cause of the problem.
Simply fixing a trouble does not necessarily solve a problem. Many times, a repair results in only temporary restoration of system performance. This is because the emphasis is often on getting the equipment up and running, not on fixing the problem. Consider the following example:
Maintenance mechanics were called upon to repair a circulation pump that had failed during normal operation. Upon investigation of the failure, the mechanic determined that fuses had blown in the pump controller. An electrician replaced the fuses and re-energized the pump and controller. Together, the electrician and the mechanic observed the restart of the pump. Upon hearing abnormal grinding noise and observing a noticeable deficiency in pump discharge flow rate, they de-energized the pump. The mechanic inspected the pump and found severely worn and damaged bearings. The bearings were replaced, and the pump was placed into operation. The pump discharge flow rate was normal, and no abnormal sounds were heard during the trial run of the pump, so it was returned to service. Although the normal life expectancy of shaft bearings on the pump was in excess of 3,500 hours of continuous use, the bearings once again failed after only 560 hours of continuous use. The bearing replacement was performed again, with similar results. After the third bearing failure, a thorough examination of the pump revealed a severely misaligned impeller shaft. By the time this problem was detected, the shaft was badly scored and had suffered heat damage. If the time had been taken to diagnose the root cause of the initial problem, the shaft probably could have been saved. At the very least, costly downtime and replacement bearing costs could have been avoided.
The above example may seem extreme, but there have probably been worse instances. By taking the time and using a problem-solving tool or technique, the root cause of a trouble can be determined.
Determining the root cause of a problem involves considering the possible causes of the effect. In this case, the trouble's symptom is the effect and the system components and operating conditions are potential causes. A cause and effect diagram is used to consider the possible causes associated with a particular problem. The cause and effect diagram is developed as necessary to help isolate the primary or root cause of the problem. This involves gathering data, considering all factors (causes) that contribute to the trouble (effect), and using a process of elimination to determine the root cause. Although the cause and effect diagram is considered a performance-improvement tool, it is worthwhile to consider it as a troubleshooting aid. Using the cause and effect diagram technique can help prevent "hunt-and-peck" troubleshooting and reduce the aggravation associated with undisciplined problem solving. To resolve a problem in the quickest manner possible and help prevent its re-occurrence, the causes of the problem must be identified. The cause and effect diagram is used to graphically show the relationship of each of these causes to the problem. The graphical method used to display this relationship is a fishbone diagram. The term "fishbone" refers to the appearance of the diagram once it has been drawn. Its shape resembles that of a fishbone. An example of a fishbone diagram is shown in Figure 13.
Figure 13: Fishbone Diagram
There are many techniques that can be used to determine the possible causes of a problem. One of the best techniques is brainstorming. Brainstorming is a group-oriented problem-solving technique. To effectively brainstorm, a group of people who are willing to work together is required. This is especially useful when troubleshooting a process with many components. It can also be useful when dealing with a piece of equipment that has mechanical, electrical, and control devices. The more diverse the equipment, system, or process, the more useful the brainstorming technique becomes. Brainstorming involves systematically listing all possible causes for a problem. Giving each member of the group a turn to suggest a possible cause does this. Each suggestion is considered as plausible and is written down for consideration. Each member suggests one cause at a time until there are no more suggestions. All group members then, prior to any discussion, review the compiled list. Once the group has reviewed the list of suggested causes, they are discussed. During the discussion, suggested causes that are similar should be grouped together. For instance, "faulty circuit breaker" and "loose wiring" may be grouped under the heading "electrical" or, even more generically, "materials." Some general group headings that are useful in any troubleshooting analysis are:
Each of these areas can be broken down into smaller subgroups as required. Once the major headings are determined, the initial fishbone diagram should look like the one shown in Figure 14.
Figure 14: Major Group Headings on a Fishbone Diagram
Next, the subgroups can be determined, if necessary, and individual causes added. An example of subgroups could be "mechanical equipment" and "electrical equipment" under the "equipment" group. As the subgroups and individual tasks are added, the cause and effect diagram takes on its fishbone appearance. This is shown in Figure 15.
Figure 15: Expanded Cause and Effect Diagram
Figure 16: Placing the Problem on the Cause and Effect Diagram
Figure 17: Line Pointing to the Problem
Figure 18: Designating Major Causes
Figure 19: Designating Minor Causes on a Cause and Effect Diagram
Each minor cause on the cause and effect diagram has factors that contribute to them. Once the minor causes have been designated, the factors that contribute to these causes must be identified. These factors normally are very specific. Depending on the type of trouble initially identified in step one, the factors could be major process components, such as a tank level detector; a specific control device, such as a valve actuator solenoid; or a discrete component, such as a 220-ohm, -watt resistor. These contributing factors are designated on the cause and effect diagram by writing them individually on a line that points into the minor cause it is associated with. The resulting diagram is shown in Figure 20.
Figure 20: Designating Factors That Contribute to Minor Causes