2020Drilling Rigs & AutomationJanuary/February

Combining disparate data sources powers rig performance initiatives

Contractor overcomes obstacles in data aggregation, sourcing and storage to create unified data model, mine data for value

By Alex Groh and Brian Seiler, Patterson-UTI Drilling Company

Diagram of systems used to combine data from multiple data sources, types and uses.

In recent years, drilling contractors and operators alike have devoted increasing attention to the data generated during the drilling process. However, the growing number of sensors and systems that collect data, both on- and off-location, have created challenges in converting these large volumes of data into actionable value.

It’s been proven that applying data analytic techniques to data sets collected from rig sensors can yield fruitful results, with real-time data processing, equipment condition monitoring and machine learning becoming more prevalent.

However, significant challenges remain in terms of collecting, aggregating and storing data in an easily digestible form so that people can extract value. Drilling contractors, in particular, face significant and specific challenges when it comes to making use of these large and diverse data sets in a unified way.

Sources of data

Some of the largest challenges that contractors face derive from the number of types and sources of data generated by a drilling business unit. Rig sensor data from electronic drilling recorder (EDR) systems, along with the analytics that can be performed on them, have received considerable attention. However, the drilling process generates data from many other sources that hold value, yet are not easily combined in analysis of EDR sensor data. These include:

• Rig automation and control systems;

• IADC and other daily reports;

• HSE records;

• Financial data;

• Maintenance systems;

• Quality systems; and

• Personnel/HR records.

The largest source of data by volume are the rig control systems, which are typically sampled at the millisecond scale across thousands of traces, containing sensor, alarm, setpoint and other data. Control systems data alone comprise over 99.9% of the total volume generated by a drilling rig.

EDRs produce the next greatest quantity, typically sampling at 1 Hz, and come in a variety of forms. Daily reports, which often have a combination of standardized conventions and unstructured free text, may be the most relied-upon source for office personnel.

Data becomes even more specialized when considering other sources, with HSE, financial, maintenance, quality and personnel records all demonstrating substantial variations in terms of data type and frequency.

Of the sources described, control systems and EDR content present the most issues with respect to data volume and difficulty of storage. Yet, each type of data offers distinct insight into the drilling process and, when combined, can yield powerful results.

For example, it is impossible to create an equipment condition monitoring program, in which equipment life and failures are predicted based on sensor values, without utilizing both sensor data and maintenance records indicating the incidence and character of failures. Sensor data, when merged with safety incident reports, provide insight into the actions taken by the rig crew that led to the incident.

It is also difficult to optimize many key performance indicators (KPIs) that are heavily affected by crew composition, if specific crew members cannot be correlated with performance metrics. Connection times is one example.

These applications demonstrate how the combination of data from disparate sources offers insight into rig performance and opportunities for performance improvement.


The average rig score trend from January to September 2019.

Historically, company personnel have not had easy access to these data sets through a centralized interface. More often than not, different data sources are silo’ed into different systems for aggregation, storage, analysis and presentation to end users. There is typically little to no cross-accessibility between sources.

Consider personnel in operations management and the different sources of data they may need to access from day to day. All the sources described so far provide relevant data to rig management, but there may be different user interfaces, as well as data and report formats, that do not easily merge to offer concise, digestible and actionable information. This creates challenges for a manager tasked with making the best use of available data and communicating it to employees effectively. Across the fleet, different rigs may operate different EDR systems, requiring different log-in credentials for separate web-based portals, different trace names generated by the same sensor type from system to system, and varying levels of data quality among providers.

Further complicating matters, the EDR does not typically collect rig controls data, requiring retrieval from an entirely different system. Financial and asset-level data may be housed in an enterprise resource planning (ERP) system, with personnel and HSE records found in yet another location. All of this makes it inefficient to collectively compare all factors that should be considered when working to improve overall rig performance.

This problem suggests an obvious solution, although it is not simple to execute: collect, aggregate and analyze all data on a single, easy-to-use platform that is accessible to all company personnel. Implementing such a system presents at least three major challenges.

Operational KPIs have improved consistently since the release of the rig scorecard in April 2019. Exact values of operational KPIs have been removed for confidentiality.

First, existing systems must connect to a single centralized data store. Data from structured systems can stream in using different formats and protocols, such as WITS, WITSML, LAS and FTP. Rig control systems require several different communications protocols to transfer data, such as Modbus, FINS, Profibus and OPC UA, all of which require particular handling in order to transmit data properly.

Most importantly, all of these data sources must undergo transformation to a single shared metadata structure in order to successfully relate information from disparate systems. Further, all data should pass through layers of automated and manual validation to ensure quality.

Second, a large, scalable and structured storage system must be constructed that is capable of addressing large volumes of data while allowing quick and simple querying. It is also important that this system can ingest incoming streams and expose them for technical development and analysis.

Third, this system must present effective and easily accessible interfaces operable by users of varying technical skills. Implementation through a web-based platform that is optimized for speed and efficiency can provide superior user experience, particularly for field users who may not have fast or reliable network connectivity.

Building a Unified Platform

A cloud-based architecture was constructed to serve as a strong foundation for addressing these challenges. Several different protocols communicate between the EDR provider, control systems and third-party equipment on rig locations to receive data and forward it to the cloud stream. This data is then merged with ERP data, data from other software providers, and any data provided by users (well plans, formation information, etc) into a centralized data lake.

An ingestion process consumes the data, performs initial data quality checks and applies any interpolation or basic enrichment before moving into more permanent structured storage. This process separates relational, time series and completely unstructured data and places each into appropriate stores, with complex analytical and enrichment operations taking place in a specialized database.

A warehousing process periodically aggregates and stores data in a reporting warehouse optimized for access by web-based applications. Outside access to the data passes through a security layer to ensure that any consumer can only access data for which they have authorized credentials. Finally, users access the processed data through online dashboards and can use secured connections to contribute to the exposed data set with data uploads.

Case Study: Rig Scorecard

Working from this basic architecture, a scorecard was devised as a tool to standardize targets for certain KPIs and rank rigs based on performance. Five aspects of rig performance drove the selection of KPIs so that the priorities of internal and external stakeholders would be balanced and to maintain a focus on leading indicators:

• HSE: high-potential incident rates, rates of late or incomplete reporting, rig inspection scores;

• Operational performance: connection times, tripping speeds, rig walk/move times;

• Equipment: completed work orders, equipment-related NPT;

• Financial: daily spend by rig, monthly spend on capital projects;

• Personnel: monthly rig turnover; and

• Data quality: errors on daily reports.

The system assigned each metric a weight and performance goal, with rigs earning points based on how well they perform relative to each metric’s target. Points were aggregated to form a score between 0 and 100. Stakeholders developed targets and overall scoring weights through a rigorous development process involving the departments that are affected by each metric, as well as overall executive leadership goals.

The system incorporates a simple user interface to showcase this information, accessible to all company personnel. Scorecards were updated monthly and generated for each rig for an initial testing and acceptance period.

Shortly after this initial release, rig scorecards were further aggregated to generate scores for the managers responsible for each set of rigs. Values were calculated for all positions from rig superintendent to executive leadership.

Finally, a dashboard was created to view companywide trends and distributions of the raw data to help identify specific areas of improvement within regions and summarize overall company performance. Operations management was engaged to drive performance on the rig site using these new tools.

This strategy of making aggregated data available to the entire company generated excellent response in overall scores and individual KPIs. Since the release of the scorecard system to the company in April 2019, average rig scores have improved by nearly 10% across a fleet of at least 130 rigs. Given the overall weighting and aggregated nature across different metrics, this increase indicates substantial improvement across multiple areas of performance.

Additionally, several operational KPIs have improved significantly across the fleet since the program began. KPIs that are highly correlated to crew performance – such as connection times, tripping speeds, and rig walk and move times – have all improved by 5% to 25%. It is also noteworthy that the rate of high-potential safety incidents has been substantially reduced, while consistent execution of prescribed equipment maintenance tasks has significantly improved.

This case study offers an example of what performance improvements can be captured when merging even basic data sets across many different sources, while also highlighting the performance improvement generated through competitive scoring of company personnel.

Merging Control Systems Data and Future Work

To date, rig controls data have not been fully incorporated into the system, due to their large volumes and substantial hardware requirements at the rig site. However, a small set of pilot rigs have implemented technology to sync and process rig controls information in real time. The learnings from this initial effort are guiding the development of a rig-based server infrastructure to assist in merging this data on location, rather than attempting to process it in the cloud.

Rig-based processing offers several benefits – most substantially the overall processing load for the application suite being distributed across the rig fleet. Performing high-volume time series processing directly on the rig enables immediate utilization of the data for operations, and it also diverts the overall processing costs from expensive cloud-based resources to more cost-efficient local resources.

In turn, a direct connection to the rig control systems enables bidirectional communication, allowing automated alerts or notifications to be transmitted directly to the driller over rig HMIs. Installing a high-spec machine at the rig site further allows immediate local execution of custom applications, such as engineering models and sophisticated automated alerts. Finally, this system will expose the thousands of control systems tags to the central cloud-based system for processing with data from other sources.


As the industry continues to digitalize, the concern for data collection, aggregation and storage has grown. Addressing these challenges, bringing in data sources beyond the typical rig-based sensors, and making analytics easily accessible to all personnel involved in the drilling process has proven fruitful in improving performance on a fleetwide scale. DC

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button