Basic Concepts of Reliability
In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
Figure 1.1 Binomial Distribution
In probability theory and statistics, the Poisson distribution (or Poisson law of small numbers1) is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independent of the time since the last event. (The Poisson distribution can also be used for the number of event in other specified intervals such as distance, area or volume.)
Figure 1.2 Poisson Distribution
Mean time to recovery (MTTR) is the average time that a device will take to recover from any failure. Examples of such devices range from self-resetting fuses (where the MTTR would be very short, probably seconds), up to whole systems which have to be repaired or replaced.
The MTTR would usually be part of a maintenance contract, where the user would pay more for a system whose MTTR was 24 hours, than for one of, say, 7 days. This does not mean the supplier is guaranteeing to have the system up and running again within 24 hours (or 7 days) of being notified of the failure. It does mean the average repair time will tend towards 24 hours (or 7 days). A more useful maintenance contract measure is the maximum time to recovery which can be easily measured and the supplier held accountable. Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The MTBF is typically part of a model that assumes the failed system is immediately repaired (zero elapsed time), as a part of a renewal process. This is in contrast to the mean time to failure (MTTF), which measures average time between failures with the modeling assumption that the failed system is not repaired.
The definition of MTBF depends on the definition of what is considered a system failure. For complex, repairable systems, failures are considered to be those out of design conditions which place the system out of service and into a state for repair. Failures which occur that can be left or maintained in an unrepaired condition, and do not place the system out of service, are not considered failures under this definition. In addition, units that are taken down for routine scheduled maintenance or inventory control are not considered within the definition of failure.
MTTF - Mean Time To Failure
A estimate of the average, or mean time until a design's or component's first failure, (you may not want to include external failures), or disruption in the operation of the product, process, procedure, or design occurs. Mean time until a failure assumes that the product CAN NOT be repaired and the product CAN NOT resume any of it's normal operations.
MTTF is related to items such as expected and/or operating replacement even though it sometimes may be.
In a lot of design and components, MTTF it is especially near to the MTBF, (Mean Time between Failures), which typically is a small amount longer than MTTF. This is due to the fact that MTBF includes the repair time of the designs or components. If a design or component work for an extended time, then it fails, is repaired in a reasonable amount of time, and then once again work for an extended time, the MTTF is the average, or mean, of the amount of time that it is in operational condition. The MTBF is the average time between failures to include the average repair time, or MTTR.
System Reliability Models
System Availability is calculated by modeling the system as an interconnection of parts in series and parallel. The following rules are used to decide if components should be placed in series or parallel:
• If failure of a part leads to the combination becoming inoperable, the two parts are considered to be operating in series
• If failure of a part leads to the other part taking over the operations of the failed part, the two parts are considered to be operating in parallel.
Availability in Series
As stated above, two parts X and Y are considered to be operating in series if failure of either of the parts results in failure of the combination. The combined system is operational only if both Part X and Part Y are available. From this it follows that the combined availability(A) is a product of the availability of the two parts. The combined availability is shown by the equation below:
The implications of the above equation are that the combined availability of two components in series is always lower than the availability of its individual components.
Availability in Parallel
As stated above, two parts are considered to be operating in parallel if the combination is considered failed when both parts fail. The combined system is operational if either is available. From this it follows that the combined availability is 1 - (both parts are unavailable). The combined availability is shown by the equation below:
A=1— (1-Ax) (1-Ay)
Markov analysis provides a means of analyzing the reliability and availability of systems whose components exhibit strong dependencies. Other systems analysis methods (such as the Kinetic Tree Theory method employed in fault tree analyses) generally assume component independence that may lead to optimistic predictions for the system availability and reliability parameters. Some typical dependencies that can be handled using Markov models are:
• Components in cold or warm standby
• Common maintenance personnel
• Common spares with a limited on-site stock
The major drawback of Markov methods is that Markov diagrams for large systems are generally exceedingly large and complicated and difficult to construct. However, Markov models may be used to analyse smaller systems with strong dependencies requiring accurate evaluation. Other analysis techniques, such as fault tree analysis, may be used to evaluate large systems -using simpler probabilistic calculation techniques. Large systems which exhibit strong component dependencies in isolated and critical parts of the system may be analyzed using a combination of Markov analysis and simpler quantitative models.
Maintenance Concepts and Strategies
1. Breakdown maintenance
It means that people wait until equipment fails and repair it. Such a thing could be used when the equipment failure does not significantly affect the operation or production or generate less significant loss other than repair cost.
2. Preventive maintenance
It is a daily/weekly maintenance (cleaning, inspecting, oiling and re-tightening), designed to retain the healthy condition of equipment and prevent failure through the prevention of deterioration by periodic inspection or equipment condition diagnosis, to measure deterioration. It is further divided into periodic maintenance and predictive maintenance. Just like human life is extended by preventive medicine, the equipment service life can be prolonged by doing preventive maintenance.
2a. Periodic maintenance (Time based maintenance - TBM)
Time based maintenance consists of periodically inspecting, servicing and cleaning equipment and replacing parts to prevent sudden failure and process problems.
2b. Predictive maintenance
This is a method in which the service life of important part is predicted based on inspection or diagnosis, in order to use the parts to the limit of their service life. Compared to periodic maintenance, predictive maintenance is condition based maintenance. It manages trend values, by measuring and analyzing data about deterioration and employs a surveillance system, designed to monitor conditions through an on-line system.
3. Corrective maintenance
It improves equipment and its components so that preventive maintenance can be carried out reliably. Equipment with design weakness must be redesigned to improve reliability or improving maintainability
4. Maintenance prevention
It indicates the design of a new equipment. Weaknesses of current machines are sufficiently studied (on site information leading to failure prevention, easier maintenance and prevention of defects, safety and ease of manufacturing) and are incorporated before commissioning new equipment.
UNIT-IV Condition based maintenance (CBM),
Condition-based maintenance (CBM), shortly described, is maintenance when need arises. This maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating.
Condition-based maintenance was introduced to try to maintain the correct equipment at the right time. CBM is based on using real-time data to prioritize and optimize maintenance resources. Observing the state of the system is known as condition monitoring. Such a system will determine the equipment`s health, and act only when maintenance is actually necessary. Developments in recent years have allowed extensive instrumentation of equipment, and together with better tools for analyzing condition data the maintenance personnel of today are more than ever able to decide what is the right time to perform maintenance on some piece of equipment. Ideally condition-based maintenance will allow the maintenance personnel to do only the right things, minimizing spare parts cost, system downtime, and time spent on maintenance.
'Condition monitoring is the process of monitoring a parameter of condition in machinery, such that a significant change is indicative of a developing failure. It is a major component of predictive maintenance. The use of conditional monitoring allows maintenance to be scheduled, or other actions to be taken to avoid the consequences of failure, before the failure occurs. Nevertheless, a deviation from a reference value (e.g. temperature or vibration behavior) must occur to identify impeding damages. Predictive Maintenance does not predict failure. Machines with defects are more at risk of failure than defect free machines. Once a defect has been identified, the failure process has already commenced and CM systems can only measure the deterioration of the condition. Intervention in the early stages of deterioration is usually much more cost effective than allowing the machinery to fail. Condition monitoring has a unique benefit in that the actual load, and subsequent heat dissipation that represents normal service can be seen and conditions that would shorten normal lifespan can be addressed before repeated failures occur. Serviceable machinery includes rotating equipment and stationary plant such as boilers and heat exchangers.
UNIT-V RELIABILITY CENTERED MAINTENANCE
Reliability-centered maintenance, often known as RCM, is a process to ensure that assets continue to do what their users require in their present operating context.
It is generally used to achieve improvements in fields such as the establishment of safe minimum levels of maintenance, changes to operating procedures and strategies and the establishment of capital maintenance regimes and plans. Successful implementation of RCM will lead to increase in cost effectiveness, machine uptime, and a greater understanding of the level of risk that the organization is presently managing.
Total productive maintenance (TPM) has been around for almost 50 years. To the "west" it wrongly thought it is a new way of looking at maintenance: to the Japanese, it is an established process. Like all processes, it has a host of acronyms and buzzwords.
In TPM, the machine operator is thoroughly trained to perform much of the simple maintenance and fault-finding. Eventually, by working in "Zero Defects" teams that include a technical expert as well as operators, they can learn many more tasks - sometimes all those within the scope of an operator. Tradesmen are also trained at doing the more skilled tasks to help ensure process reliability.
This should be fully documented, Autonomous Maintenance ensures appropriate and effective efforts are expended after the machine becomes wholly the domain of one person or team. Safety is paramount, so training must be appropriate. Operators are often capable of high standards of technical ability, this is improved through the use of "best practice" procedures and proper training of these procedures.
TPM is a critical adjunct to lean manufacturing. If machine uptime is not predictable and if process capability is not sustained, the process must keep extra stocks to buffer against this uncertainty and flow through the process will be interrupted. Unreliable uptime is caused by breakdowns or badly performed maintenance. If maintenance is done properly (Right First time), uptime will improve - as will "OEE" (Overall Equipment Effectiveness - basically how many "sellable” items "are" actually produced as opposed to how many the machine "should" produce in a given time).
One way to think of TPM is "deterioration prevention”: deterioration is what happens naturally to anything that is not "taken care of". For this reason many people refer to TPM as "total productive manufacturing" or "total process management". TPM is a proactive, approach that essentially aims to identify issues as soon as possible and plan to prevent any issues before occurrence. One motto is "zero error, zero work-related accident, and zero loss".