Date on Master's Thesis/Doctoral Dissertation


Document Type

Master's Thesis

Degree Name

M. Eng.


Computer Engineering and Computer Science

Committee Chair

Kantardzic, Mehmed


Electric power-plants--Automation


Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations to enforce companies to operate within the limits of the law resulting in environmentally safe operation. Specifically, power companies operate electric generating facilities under guidelines drawn-up and enforced by the EPA. Acid rain and other harmful factors require that electric generating facilities report hourly emissions recorded via a Supervisory Control and Data Acquisition (SCADA) system. SCADA is a control and reporting system that is present in all power plants consisting of sensors and control mechanisms that monitor all equipment within the plants. The data recorded by a SCADA system is collected by the EPA and allows them to enforce proper plant operation relating to emissions. This data includes a lot of generating unit and power plant specific details, including hourly generation. This hourly generation (termed grossunitload by the EPA) is the actual hourly average output of the generator on a per unit basis. The questions to be answered are do any of these units operate in tandem and do any of the units start, stop, or change operation as a result of another's change in generation? These types of questions will be answered for the years of April 2002 through April 2003 for facilities that operate pipeline natural-gas-fired generating units. Purpose of Research The research conducted has dual uses if fruitful. First, the use of a local modeling between generating units would be highly profitable among energy traders. Betting that a plant will operate a unit based on another's current characteristics would be sensationally profitable to energy traders. This profitability is variable due to fuel type. For instance, if the price of coal is extremely high due to shortages, the value of knowing a semioperating characteristic of two generating units is highly valuable. Second, this known characteristic can also be used in regulation and operational modeling. The second use is of great importance to government agencies. If regulatory committees can be aware of past (or current) similarities between power producers, they may be able to avoid a power struggle in a region caused by greedy traders or companies. Not considering profitable motives, the Department of Energy may use something similar to generate a model of power grid generation availability based on previous data for reliability purposes. Type of Problem: The problem tackled within this Master's thesis is of multiple time series pattern recognition. This field is expansive and well studied, therefore the research performed will benefit from previously known techniques. The author has chosen to experiment with conventional techniques such as correlation, principal component analysis, and kmeans clustering for feature and eventually pattern extraction. For the primary analysis performed, the author chose to use a conventional sequence discovery algorithm. The sequence discovery algorithm has no prior knowledge of space limitations, therefore it searches over the entire space resulting in an expense but complete process. Prior to sequence discovery the author applies a uniform coding schema to the raw data, which is an adaption of a coding schema presented by Keogh. This coding and discovery process is deemed USD, or Uniform Sequence Discovery. The data is highly dimensional along with being extremely dynamic and sporadic with regards to magnitude. The energy market that demands power generation is profit and somewhat reliability driven. The obvious factors are more reliability based, for instance to keep system frequency at 60Hz, units may operate in an idle state resulting in a constant or very low value for a period of time (idle time). Also to avoid large frequency swings on the power grid, companies are required