Data Mining of Substation Relay Data: Mythili Chaganti Badrul H. Chowdhury

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1

Data mining of Substation Relay Data


Mythili Chaganti1, Badrul H. Chowdhury, Senior member, IEEE

• Date and Time.


Abstract—Data Mining aims to make sense of data by Date displays the date stored in the internal calendar
revealing meaningful relationships. This paper discusses data when the event occurs.
mining techniques applied to data recorded by over current Time shows the time stamp of that particular event.
relays at several substations. The purpose is to classify faults,
verify relay settings and determine fault induced trip per
• Firmware Identification Data (FID)
substations. One of the most demanding tasks at hand is to FID string format is as follows:
study the data residing in the relays on a statistical basis. C4.5 FID = [PN] – R [RN] – V [VS] – D [RD] – E [ER],
approach of data mining (Decision tree induction) is discussed Where:
in this paper. [PN] = Product Name (e.g., SEL-251)
[RN] = Revision Number (e.g., R605)
Index terms—Data Mining, decision tree, fault classification, [VS] = Version Specifications (e.g., V656rn1rqyb)
over current relays. [RD] = Revision Date (e.g., YYMMDD = 9704030)
[ER] = EEPROM Version Specification
I. INTRODUCTION • Currents & Voltages
Currents and voltages are measured in amperes and
F or many years until very recently, the traditional
method of capturing fault data on power systems at
many substations had been with a cathode-ray oscillograph.
volts respectively on the primary side. These are
given for all the 44 quarter cycles in an event. These
But as technology improved over the last decade, are described in the initial 7 columns of an event.
tremendous changes have also occurred in the area of power • Over current elements.
systems measurements. One such revolutionary change is 3 different types of over current elements mentioned
the use of digital relaying equipment, namely, are
microprocessor-based relays used as the main source of • Phase over current elements for phase and three-
data storage. These relays are provided with sophisticated phase faults
programmable logic equations which contribute to high 51P/51T, 50LT, 50H, 50C comes under
this category
reliability and reduced cost of maintenance [9]. This
• Negative-sequence over current elements for
superiority in relay functions has encouraged many electric
phase-to-phase faults
utility companies to install digital relaying equipment. The
51QT/51QP, 51QT/50QP
flip side is that now, there is more data at hand to analyze.
• Ground over current elements for ground faults
For this task, data mining tools are becoming popular. Data
51NT/51NP, 50NLT/50NLP, 50NH
mining is defined as the non trivial process of identifying
• Output & input contacts
valid, novel, potentially useful and ultimately
A logical ‘1’at the connection asserts the
understandable patterns in data [7, 8]. This paper studies in particular input or output contacts
detail the data obtained from a utility company and
• Event type
discusses some of the possible data mining techniques The event field provides an abbreviation of the event
applied to this set of data type which triggers the particular event and can be
one of the following
II. DATA DESCRIPTION ET – External Trigger
Data that is going to be examined in this paper was ER – Event Report trigger
extracted from the microprocessor based substation relays The phases involved in the event are shown and they
of a regional utility company. Data was collected from fall under
seven distribution substations covered by the utility. • Phase – Ground faults
Among the seven stations, six are categorized under • Phase – Phase faults
rural category and one comes under urban. • Double phase – Ground faults and
The data stored in these relays has the following format • 3 – Phase faults
• Relay Terminal Identifier • Location
This gives the information on that particular terminal Location column of the event report tells us that the
feeder position. fault locator operated successfully and it shows the
equivalent distance to a fault. In the event report
1
Mythili Chaganti (email: [email protected]) and Badrul H. Chowdhury
these appear without any units and we have the
(email: [email protected]) are with the Electrical & Computer Engineering freedom to choose our unit of measure.
Department, University of Missouri-Rolla, Rolla, MO 65409-0040
2

• Shot Pre-fault data remains same as during fault data for the
This column lists the reclosing relay shot when the phases not involved in the fault. Fig.1 shows a phase to
event report was initiated. phase fault as observed from SEL 5601 oscillograph. On
• Targets Phase B we can see that pre-fault data remains same as that
Fault type front row targets are displayed at the of the fault data and hence this particular phase is not
initiation of the event report. involved and it records and CA fault.
• Currents
Maximum phase, negative-sequence and residual
currents measured near the middle of the fault in
primary amperes or when the event triggered if no
fault occurs are noted in this field.
• Settings
All the local and global settings which help the relay
to work as per the logic equations are set at the end
of the event report. All the settings are made within
their pre described ranges.

III. RELAY DATA ANALYSIS


All the data obtained are from SEL relays [1]. These
relays have different characteristics based on their
identification and product name.
Among the given 7 substations, data from stations 1, 2,
3, 4 and 6 are from SEL 251 type of relays. Station 7 has a
SEL 151 relay which has an additional directional element
and Station 5 has a SEL 251CD relay which does not
consider multiple shot reclosing.
Fig. 1. SEL oscillograph of a C-A fault
Our analysis is based only on SEL 251 -2, 3 types of
relays. Microsoft Excel’s packaged database is used for a
In Fig.2 phases A and C have their pre-fault and fault
preliminary analysis.
data unchanged. Hence a BG fault is recorded.
Initially, the data recorded from the relay was imported
into Excel work sheets and formatted retaining only the
current and voltage values. Macros function of Excel was
used to automate the whole process for several events [6].
A. Fault Classification
Event column of the report gives us the type of fault and
is used to classify the split per fault type per substation.
By observing the currents on the primary side we can
determine the phases involved in a fault. Any particular
event has 11 cycles i.e., 44 quarter cycles of pre-fault, fault
and post-fault current and voltage information. Phase
involvement is determined using the load compensated and
uncompensated current magnitudes i.e. pre-fault data
remains same as during fault data for the phases not
involved in the fault.
• Compared currents are taken from two rows at the
middle of stored faulted data. If uncompensated
current magnitudes are in large ratios between phases
(4:1), fault type becomes immediately apparent as
single phase or two phase. Fig. 2. Phase to Ground Fault on B
• If not, we say that the same current is load
compensated by a factor of 1.5.If these load When ever 51N (Over current element for Ground
compensated current magnitudes are in moderate faults) picks up we should note a ground fault on any of the
ratios (1.5:1), the relay lists a single or two phase fault three phases but in our preliminary examination of the data
• If it does not fall into any of this category, it is a three we noticed that at Station 7, a bulk sub, the relay
phase fault determined phase to phase fault when the currents picked
These observations are verified using SEL software that up strongly project it as phase to ground fault.
produces oscillatory graphs for all the events which
captures the 44 quarter cycles and also shows the relay
elements in action [1].
3

B. Sequence of Events Under group settings we find the reclosing relay open
The last twelve events are stored in memory with the interval times given in seconds and if the sum total of all
latest event at the top. These are helpful in understanding these times is greater than or equal to the duration of the
the sequence of events. All the information about date, time, above fault sequence then we define that fault as permanent
even type, location, reclosing relay shot, maximum phase fault.
current, enabled setting group and targets of the past twelve In other words, if the relay is in reclose cycle state and
events are stored in a single file. This can also be retrieved before going to reset state it observes a successful close,
by ‘History’ command of the Microprocessor based relay. then it falls under the permanent fault category.
Table 1 shows the record of last twelve events recorded at Thus we can conclude that the above sequence of events
Station 1 feeder position 53. result in a permanent fault at phases A and B for the
particular feeder position
TABLE 1 HISTORY OF EVENTS AT STATION 1 FEEDER 53 Let us assume an event begins and picks up any fault
Event # Date Time Event type Shot and while the relay is in reclose cycle state, if it fails to clear
12 10/25/01 12:27:14 ABT 0 the fault and go to reset state again within the given reset
11 10/25/01 12:27:19 AB 1 interval settings then relay goes to lockout state. Such faults
10 10/25/01 12:27:24 AB 1 are classified under temporary faults
9 10/25/01 12:27:25 ABT 1 If an event occurs at a particular time and has two
8 10/25/01 12:27:50 AB 2 successful holds and recloses, after ten minutes the same
7 10/25/01 12:27:51 AB 2 event occurs with similar fault current, similar phases
6 10/25/01 12:27:52 AB 2
involved then we say that it is a semi-permanent fault.
5 10/25/01 12:28:52 AB 3
With these definitions as criterion, all the data from the
4 10/25/01 12:28:53 ABT 3
relays of type 251-2, 3 are analyzed.
It is seen from our analysis that at Station 1 feeder
If the ‘79’ column of an event is in Reset state (R) then
position 53 the settings as described by the relay are
it represents the beginning of an event and for this event the
perfectly matched. There are 2 cases in Station 6, positions
shot would always be zero. After that, if next event is in
19 and 20 where we observed inconsistency in the relay
Reclose cycle state (C) it implies that this event is a
settings. Both the events record a lock out state and
continuation of the same fault as described by the previous
continue to pick up faults. The instantaneous relay settings
event. And the shot column tells us the number of
of the respective events show that the reclosing is disabled.
successful trips and recloses it had.
One of the reasons for this to happen is since this station is
From Table 1, it is evident that AB fault initiates at
a bulk sub, which often covers a greater area, and it might
event 12 and continues till event 4, in the whole sequence
experience multiple feeder operations. Also sometimes due
has 3 successful trips and recloses and the total time is less
to poor tree trimming and high winds the phase which is
than 90 seconds. (Difference between time stamps of events
involved in fault keeps being burnt away before it re strikes.
12 and 4 gives us the total time of the fault)
Event 12 picks up at 12:27:14 seconds and at the end of
4th cycle Trip output contact asserts. Now after a delay of 3 C. Fault Inducing Trip
cycles the circuit breaker deasserts and thus current falls to If the output column of ‘Trip and Close’ has the logical
zero. If we observe the settings for this event we can see values of 1 and 0 respectively we say that a trip has
that input 1 is assigned to circuit breaker (52A or 52AR) occurred in the event. Each trip is designated as one caused
and its operation is followed by the assertion and by relay (initiated by a fault) or as caused by any external
deassertion of the input. device.
Event 11 is in reclose cycle state indicating that it is a When an event is triggered and pick up of over current
continuation of the same fault. At the end of 4 cycles we see elements is recorded then it implies a fault has occurred at
that circuit breaker closes and over current element picks up the event and if in this case an assertion at the trip output
for normal current flow. This successful trip and close is contact means the fault has induced the trip condition. TR
recorded in the shot column of event 11. equation of the SELogic equations gives us information of
The same sequence of events are observed at event 10 the particular over current element responsible for trip.
which also records same fault If TR equation is blank we can say that no over current
Event 9 continues the fault sequence but this time a trip trip condition is present.
output contact is asserted at the end of 4 cycles and Often a trip event record is generated when there is no
following the circuit breaker operating time figure it opens fault. The trips are caused by external device (e.g., a control
itself after 3 cycles delay. switch or SCADA).
We can see the circuit breaker closing in event 8 fourth
quarter cycle thus increasing the shot column to 2 D. Fault Level vs. Pickup of relay
successful trips and recloses. The relay determines fault level by taking values from
We see the relay coming to reset state at the beginning two consecutive rows, squaring both of them and takes a
of event 3 implying that till event 4 we had a continuing root. This is repeated for all the 44 cycles of data.
fault with 3 successful trips and recloses for the whole Magnitude is (Value1) 2 + (Value 2) 2 and
period of 90 seconds.
4

Angle is ⎛ Value 2 ⎞ . IV. DATA MINING TOOLS


ArcTan ⎜ ⎟
⎝ Value1 ⎠ Data mining, the extraction of information from large
databases, is one of the powerful technologies with great
Peak magnitude of the fault current is shown in the potential to retrieve most important information in a
current column of event report. The settings for the group particular data base. The most commonly used techniques
of over current elements and their pick up level are in data mining are
multiplied with current transformer ratio to get the primary • Artificial neural networks
current in Amperes. If the current in the data is greater than • Decision trees
or equal to the calculated current then we see that those • Genetic algorithms
over current elements gets picked up. • Nearest neighbor method
E. Instantaneous Relay Pickup • Association Rule induction
Even though our data set is not large enough to run a
SEL – 251 relay has a 48 bit relay word which contains
data mining tool, this paper uses C4.5 technique to generate
information of relay elements, intermediate logic results and
decision trees and rules. To get a considerable size of data
programmable logic variables.
we combine several events and the entire data set is
Six rows of total eight variables in each row count to the
executed. The algorithm is used to classify the faults and to
total of 48 bit relay word.
determine the number of instances a fault is correctly
Row1 and row 2 has over current elements, row 3 has
classified given the currents and voltages [3].
reclosing relay states and input contacts, row 4 contains
current thresholds and trip contact information, row 5 and A. C4.5 Methodology
row 6 are the programmable logic variables used in defining The entire data set is represented in several rows and
SELogic equations and can take the logical OR and AND columns and each row is called a tuple and each column is
combinations of the elements used in previous 4 rows. the value of a particular attribute. These values can be
Out of the sixteen variables in the top two rows we are either continuous (numeric) or categorical. In our present
interested in the instantaneous over current elements pick data set, we have currents in Amperes and voltages in Volts
up. The two instantaneous over current elements are: of all three phases as our attributes and the number of
• 50H – Phase instantaneous over current element cycles in a particular event i.e., 44 as our tuples. Since 44
• 50NH – Ground/Residual instantaneous over data instances make a very small data set, as mentioned
current element earlier, we combined 12 events to make 528 tuples per data
For any given event one of the programmed logic set. Now we have defined our predicting attributes as the
equation tells us which instantaneous setting is to be currents and voltages but we must define our goal or target
considered for the pick up of the relay. All the group attribute to discover any relationship between them. In our
settings are taken on the secondary side of the transformer case, the target attribute is event type which gives us the
and we are given the current transformer ratio to have them fault classification of any event.
converted to primary side amperes and compare with the The data set is now divided into 2 sets -- one named as
pick up level current. the training data on which our algorithm is executed and
If the currents measured in amperes in an event meet the C4.5 model is built and the other set is called the test set
instantaneous pick up setting of the relay, they are classified where our model is used to predict how well it performs
as instantaneous relay pick up. after training. Accuracy is the ratio of the number of tuples
F. Breaker Opening Times in the test data classified correctly.
For a C4.5 algorithm to be executed the data should be
A logical 1 at the output column of trip indicates the
presented in a particular format namely “.data file” where
relay has tripped. When any event is tripped the trip output
each line of the file describes each tuple providing the
contacts remain asserted for at least length of trip duration
values of all attributes and then the case’s target value all
time in cycles TDUR of the global setting. If a trip is
separated by commas and terminated by a period.
initiated, then the over current element responsible for the
It should also have a “.names file” which provides
event to trip drops off opening the circuit breaker via a
names for attributes and their types starting with target
programmable tripping variable ‘TR’ of the SELogic
attribute. The attribute values in .data file should appear in
equation. The quarter cycles, in an event, when trip happens
the same order as defined in .names file [4].
and when circuit breaker opens is noted, difference between
Then the C4.5 algorithm takes these files and generates
the two results in breaker trip opening time.
decision trees and rules which are explained in the next
G. I-squared T Calculations section.
Fault current as given in the currents column of the
event in amperes squared and multiplied by time in cycles V. RESULTS
as long as fault current is considerable without dropping to Pie Diagrams shown in Figs. 3 to 7 represent the break
zero is the basis for I squared T calculations. down of fault per sub type.
5

% Fault - Station 1 % Fault - Station 5


Line - Li ne
Line - Line

Line - Ground
Line - Ground

Double Line -
Double Li ne -
Ground
Ground
3-Phase
3-Phase

No Fault
Exte rnal Trips

Fig. 3. Break down of fault at Substation 1 Fig. 7. Break down of fault at Substation 5

% Fault - Station 2
As mentioned in Section II, all the substations are classified
under urban or rural categories. Figs. 8 and 9 show the
break down of faults in these categories respectively.
Line - Line

% Fault - Urban category


Line -Ground

Li ne - Line

Li ne - Ground

Double Li ne -
Fig. 4. Break down of fault at Substation 2 Ground
3-Phase

% Fault - Station 3

Line - Li ne Fig. 8. Break down of fault at Urban Category


Line - Ground

Doubl e Line - % Fault - Rural Category


Line - Line
Ground
3-Phase
Line - Ground
Exte rnal Trips
Double Li ne -
Ground
L 3 - Phase

Fig. 5. Break down of fault at Substation 3 External Trips

No Faul t
% Fault - Station 4

Fig. 9. Break down of fault at Rural Category


Line - Line
The total number of events available per station and their
Line - Ground division is shown in Table 2
Double Line -
Ground TABLE 2 STATIONS AVAILABLE AND THEIR CATEGORY
3-Phase Station Number Category Number of events
Station 1 Rural 28
Station 2 Rural 39
Station 3 Rural 37
Fig. 6. Break down of fault at Substation 4
Station 4 Rural 24
Station 6 Urban 149

The total number of permanent and temporary faults


calculated from the five different substations is shown in
Table 3.
6

TABLE 3 PERMANENT AND TEMPORARY FAULTS PER SUBSTATION


Sample ‘.data’ file is shown below:
Number of Station Station Station Station Station
0, 0, 0, 2, 3.
1 2 3 4 6
Permanent 1 1 3 - 3
-2, 0, 0, -5, 3.
Faults 2, 0, 0, 0, 3.
Temporary - - 1 1 2 2, 0, -2, 5, 3.
Faults 0, 0, 2, 0, 3.
Fault 7 14 13 9 44 -2, 0, 2, -5, 3.
induced -2, 0, -2, 0, 3.
trips 2, 0, 0, 2, 3.
0, 0, 0, 0, 3.
Ratio of Instantaneous relay pick up’s to Fault level pick -2, 0, 0, 0, 3.
up’s per station are shown in Table 4. 2, 0, 0, 0, 3.
2, 0, 0, 0, 3.
TABLE 4 RATIO OF INSTANTANEOUS PICK UP TO FAULT LEVEL PER STATION
7, 44, -85, 49, 3.
Station # Instantaneous Pick up/Fault Level Pick up
212, -107, -66, 392, 3.
1 18/27
2 28/39
107, -258, 248, 114, 3.
3 28/35 -615, 255, 102, -978, 3.
4 16/23 -92, 421, -248, -265, 3.
6 111/132
Each row represents a different case and the first four
Average breaker opening time per substation is shown in columns are the predicting attributes and the last column is
Table 5. target attribute ended with a period. The default option to
execute a C4.5 algorithm is:
TABLE 5 AVERAGE BREAKER OPENING TIMES C4.5 –f res.names where res.names is the name of our
Station # Average breaker opening time .names file.
1 12 cycles The next step is to observe the confusion matrix for the
2 8 cycles accuracy of our prediction. Table 6 shows the confusion
3 8 cycles matrix obtained by executing C4.5 algorithm.
4 13 cycles From the confusion matrix shown in Table 6, the
6 11 cycles diagonal elements are those which are correctly classified
and those on the off diagonal are falsely classified. So there
are 40 cases when a fault is CG and it is classified correctly
Sample of C4.5 ‘.names file’ is shown below:
and 9 cases when it a CG fault but classified as AB fault.
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
Thus accuracy can be obtained as the ratio of number of
correctly classified faults to the total number of cases
IR: continuous.
observed. In our case the accuracy is 90.45 %. This paper
IA: continuous.
considers cases where the data was near perfect. But
IB: continuous.
calculation of the robustness helps us to determine any
IC: continuous.
outliers in the data such as noise.
The first line is the target attribute which takes all the
possible faults and the next three lines are the attributes
which are helpful in predicting the target and they all take
continuous values.

TABLE 6 CONFUSION MATRIX OBTAINED FROM C4.5 ALGORITHM

1 2 3 4 5 6 7 8 9 10 11
1
2
3 40 4
4
5
6 79
7 9
8
9 3 5 80
10
11
7

[5] P.M. Anderson, Power System Protection, IEEE Press Power


VI. CONCLUSIONS Engineering Series, 1999.
[6] Microsoft Excel, Visual Basic User’s Guide, Version 5.0,
Any substation which has microprocessor based relays 1993.
generate large amounts of data in a very short time. In such [7] S. Madan, Son Won-Kuk, K.E. Bollinger, “Applications of
situations, one is faced with large data sets but little data mining for power systems,” 1997 IEEE Canadian
information, and no knowledge at all. Exploration and Conference on Electrical and Computer Engineering, 25-28
analysis of the data by automatic means or tools such as May 1997.
Excel and C4.5 provide us with interesting patterns and [8] J. Mullich, “Data Mining: Making Data Meaningful,”
allow the discovery of meaningful rules. With this statistical Computer, vol.: 30, no. 9, Sept. 1997, pp 18.
[9] K. Zimmerman, Microprocessor-Based Distribution Relay
and graphical analysis tool, a utility can predict when an
Applications, SEL Inc., Belleville, IL, USA.
outage occurs at a particular substation and how well their
relays are performing. Data mining helps in fault
IX. BIOGRAPHIES
classification when the event type is unknown. It also helps
one to validate relay settings. In situations where the data is Mythili Chaganti obtained her Bachelors of Technology in
not consistent, data mining proves to be highly beneficial. Electrical Engineering from Jawaharlal Nehru Technological
Data mining results can help improve system-wide University, India, in 2000 and she is currently pursuing her MSEE
at the University of Missouri – Rolla specializing in Power Systems
performance that eventually leads to improved customer
Engineering under the guidance of Dr Badrul Chowdhury. Her
relations.
Masters thesis focuses on Data Mining algorithms for fault analysis
on sub-station relays.
VII. ACKNOWLEDGMENT
The authors gratefully acknowledge Schweitzer Badrul H. Chowdhury obtained his M.S. and Ph.D. degrees in
Engineering Labs of Pullman, WA for their generous donation of Electrical Engineering from Virginia Tech, Blacksburg, VA in 1983
SEL 5601 - their relay analysis software. and 1987 respectively. He is currently a Professor in the Electrical
& Computer Engineering department of the University of Missouri-
VIII. REFERENCES Rolla. From 1987 to 1998 he was with the University of Wyoming’s
Electrical Engineering department where he reached the rank of
[1] Schweitzer Engineering Labs, SEL 251, – 2, 3 Instruction
Professor. Dr. Chowdhury’s research interests are in power system
Manual, 2000.
[2] J. Han and M. Kamber, Data Mining Concepts and modeling, analysis and control; power electronics and drives.
Techniques, Morgan Kaufmann Publishers, 2001.
[3] D. St. Clair, “CS 404 Data Mining and Knowledge
Discovery,” University of Missouri Rolla, Fall 2003 Course
Notes. http://web.umr.edu/~stclair/
[4] J.R. Quinlan, C4.5: Program for Machine Learning, Morgan
Kaufmann Series, 1943.

You might also like