The Design and Implementation of a Remote Fault Reasoning Diagnosis System for Meteorological Satellites Data Acquisition

: Under the background of the trouble shooting requirements of FENGYUN-3 (FY-3) meteorological satellites data acquisition in domestic and oversea ground stations, a remote fault reasoning diagnosis system is developed by Java 1.6 in eclipse 3.6 platform. The general framework is analyzed, the workflow is introduced. Based on the system, it can realize the remote and centralized monitoring of equipment running status in ground stations (cid:712) triggering automatic fault diagnosis and rule based fault reasoning by parsing the equipment quality logs, generating trouble tickets and importing expert experience database, providing text and graphics query methods. Through the practical verification, the system can assist knowledge engineers in remote (cid:451) precise and rapid fault location with friendly graphical user interface, boost the fault diagnosis efficiency, enhance the remote monitoring ability of integrity operating control system. The system has a certain practical significance to improve reliability of FY-3 meteorological satellites data acquisition.


Introduction
FENGYUN3 meteorological satellites data acquisition network has four domestic ground stations, they are Beijing, Jiamusi, Urumqi, Guangzhou, and one oversea ground station in Kiruna, Sweden [1].The data acquisition and transmission network was founded in 2008, and with the increasing complexity of the system, many factors, such as the diversity of device, decentralization of ground stations, different skill level between operators, have seriously restrict the improvement of fault diagnosis and removal efficiency, which greatly affects equipment MTTR (MTTR: Mean Time To Repair) [2], challenging the reliability and timeliness.Under the background of the trouble shooting requirements, it is necessary to develop a fault reasoning diagnosis system which has the functions of remote centralized monitoring and auxiliary precise fault reasoning for domestic and oversea ground stations, it has practical significance for improving data reception quality of meteorological satellites, and the similar systems are empty in China [3,4,5].

System Architecture And Function Design
Based on parsing the equipment quality logs, the system combines remote monitoring with fault reasoning, consisting of three subsystems, as shown in Figure 1, they are remote monitoring subsystem of equipment status fault reasoning and diagnosis subsystem and report generation and release subsystem.

Remote Monitoring Subsystem of Equipment Status
Based on collecting and parsing of quality logs, the system can realize the remote centralized monitoring of the status of distributed ground stations data receiving devices automatically, consists of three modules: quality logs push module, logs format review module, logs parsing and display module.The quality logs (.REP) automatically record all working parameters of the ground data receiving devices in the form of text, which is the only valid and reliable data that reflects the real working state of the equipment, as the input of the fault diagnosis system.The file contains the following information: record time(UTC), elevating axis of the antenna, azimuth axis of the antenna, working status of the antenna(Standby, scan, command status, signal lock information, program tracking mode, auto tracking mode, test mode), signal level, antenna warning information(0 for normal and 1 for alarm), status end indication<CR>.
After every pass finished, quality logs(.REP) will be automatically push to integrated operation control center by FTP, after format review, subsequently, format review, only parsing and display correct logs.

Fault Reasoning and Diagnosis Subsystem
Fault reasoning and diagnosis subsystem is designed to an expert experience system, consists of fault regulation management module, rule base for fault diagnosis fault reasoning and diagnosis module.
Fault regulation management module realize the increase, deletion, modification and inquiry function of the fault rules, fault rules dynamic update based on expert knowledge and practical experience.
Rule base for fault diagnosis is a database, which expert knowledge language is transformed into data language to store fault diagnosis rules.
Fault reasoning and diagnosis module has two ways to reasoning malfunction cause: automatic fault diagnosis and rule based fault reasoning.

Report Generation and Release Subsystem
Report generation and release subsystem generates the malfunction investigation report based on standard template, and through SMS, E-mail releases reports.

Design of Fault Diagnosis Rule Base
FENGYUN3 meteorological satellites data acquisition system consists of management & control subsystem (M&C), antenna subsystem, channel subsystem, storage and transmission subsystem, time calibration subsystem [6].Among them, the M&C is the control center of the system, it assigns tasks to all data receiving devices based on the receiving timetable and prior agreement (including antenna polarization mode, converter frequency, demodulator parameter, etc.) , "guide" data reception and transmission automatically [7].

Design and the Implementation of Re asoning Machine
According to the symptom data provided by the user, reasoning machine adopts a certain strategy call the corresponding knowledge in the knowledge base [8], analyze and diagnose, until the root cause of the malfunction is located [9].
Reasoning machine has two ways to work, one is automatic fault diagnosis, and the other is rule based fault diagnosis.
(1) Automatic fault diagnosis Defined a parent-node as a root node which has similar characteristics of the actual fault phenomenon, the leaf nodes are searched in their subtree by traversing the fault tree, retrieval based on the relationship between the root and leaf node fault feature, until the root cause is identified and located.
(2) Rule based fault diagnosis Rule based fault diagnosis is suitable for the situation that fault related knowledge is clear and detailed, the reasoning engine will carry on comprehensive identification based on mathematical statistics and critical rules, a list of diagnostic results is given based on the available failure data, and find diagnosis result of the biggest certainty factor, that is look up the node in the knowledge base with closest to the fault tree leaf signatures, and as the result of fault diagnosis, improving the efficiency of fault diagnosis.
For example, as shown in Figure 4, after a rule matching based on fault symptom, the following situations will occur: If the match is successful only once, assuming the leaf N223, then the probabilities of nodes and leaves at each level The probability of the other nodes and leaves is: (2) Then the leaf N223 as the final result of fault diagnosis, located as root cause.
If the match is successful many times, assuming the leaf N223 N224 and leaf N231 belong to node N23, then the probabilities of nodes and leaves at each level are: (3) (4) (5) The probability of the other nodes and leaves is: (6) If none of the matches were successful, then the probabilities of nodes and leaves at each level are: (7) After completing the above work, put the matching probability results in the statistical table, compare and find the biggest one as the result of the fault diagnosis.

Flow Diagrams of Algorithm
As shown in Figure 4: Open the file scanning process, monitoring the FTP folder of the fault diagnosis system, determine whether the device logs (.REP) is transferred.
Check the format of the logs.Display the incorrect the file name on GUI (Graphical User Interface).
Display the details of the logs of correct format on GUI, including File type Orbit number Satellite name start-time end-time.
Triggering automatic fault diagnosis and rule based fault reasoning by parsing the equipment quality logs.
A list of diagnostic results is given based on the available failure data.

Application Example
The practical effect of the fault reasoning diagnosis system is eventually demonstrated with application examples.
Based on java1.6,eclipse 3.6 platform, the fault diagnosis system is developed, for example, the antenna named NPSS1 in Kiruna ground station (Sweden) has broken down in March 29th,2016, the software interface of the fault diagnosis system(located in Beijing Integrated Operation Control Center) is shown in Figure 5.After receiving the device log and finished format checking, parsing the log and display it in two ways: text and graphics.It is found that the difference of the NPSS1 antenna azimuth angle between the command value and actual running angle was abnormal jump by rule based fault reasoning, the normal value should be 0 , so the root cause was located: antenna azimuth mechanical fault.The diagnosis report generated by the system automatically.In our actual businesses, the fault diagnosis system effectively solves the contradiction between the increasing complexity of the distributed ground data receiving network of FY3 satellite and the lack of skilled operators, especially, it has unique advantages in remote reasoning for overseas ground station failures, make up for the language and time differences, effectively shorten the time of troubleshooting.Another example of the antenna stow because of strong wind in Kiruna (May 5th, 2017) as shown in Figure 6

Figure 1 .
Figure 1.Framework of Remote Fault Diagnosis System

Figure 2 .Figure 3 .
Figure 2. The structure of fault diagnosis rule base

Figure 4 .
Figure 4. Flow diagrams of algorithm

Figure 5
Figure 5 Example 1 of the fault diagnosis system and Figure 7.After the system working automatically, operators could start the backup mechanism immediately, effectively avoided further deterioration.

Figure 6 Figure 7
Figure 6 Example2 of the fault diagnosis system (signal level)