Ensuring Quality Data Analysis Using the Three C’s of Data Quality – Consistent, Complete and Correct
Consistent – Using the Same Nomenclatures, Measurement Units and Data structures to Allow Comparisons
Consistency among different data sources can be improved by establishing data standards. This can include common nomenclatures for equipment components, performance metrics, and failure modes and mechanisms. Manual procedures are often required initially to translate data from various sources into the common framework. Over time, the data providers can adopt the common framework and submit data in the standard format.
Complete – with a Sufficient Number of Descriptors to Characterize Equipment Configurations, Operating Conditions and Failure Modes
Complete data sets are initially difficult to find. It is important to define a minimum data set that contains the basic information required for the analysis. This data needs to be generally available from all data sources. A broader general data set can also be defined that contains other information that can allow more detailed analysis, if it is available. Incomplete data records may need to be excluded from the analysis until the shortcomings are fixed. This provides a strong incentive to the data providers to ensure that the data records are complete.
Correct –Accurately Representing the Equipment Configuration and Operating Conditions
Correct data is required to increase confidence in the analysis results. Correctness can be checked by developing various comparisons inside a data record. For instance, for a pump installation in an oil well, each record can be verified by checking the following:
- Does the specified pump model fit inside the specified well casing size?
- Is the reported flow rate within the normal operating range of the specified pump?
- Is the pump pressure rating consistent with the specified pump installation depth?
We use a rigorous data review process to identify data records that do not meet the Consistent, Complete and Correct criteria. Initially these may be manual processes, but eventually the checking procedures can be standardized and automated to speed up the process. Typically, any shortcomings are flagged for the data provider to give them an opportunity to correct the data so that it can be included in the analysis.
Establishing an efficient data collection process from many sources can take many years. To accelerate this process, we have developed a generic data collection, checking and analysis software tool that can be adapted to a broad range of equipment types.