Home > Uncategorized > Opportunities in Applying Data Analytics in Geo Data Management

Opportunities in Applying Data Analytics in Geo Data Management

Opportunities in Applying Data Analytics in Geo Data Management

Copyright Agus Daniel @2012

Most mining and exploration companies have to manage large volumes of data for their activities. Data acquired from internal and external activities are stored in both electronic and other medias. The electronic data may be categorized in different ways: as structured and non-structured data; as spatial or non-spatial data, as operational or administrative data, and may be as geological, geotechnical, or geochemistry data. The data may be classified any way users want to perceive them, however the data is usually large in volumes and provides insights into the whole exploration and mining operations effectiveness and performance. In addition, the data used for resource modeling is a strategic asset for the organization and will determine the dollar valuation of the entire operation by investors.

The operationally and strategically important data should be managed effectively following best practices in geo data management. The best practices of geology data management for each process in the exploration and mining operation are accessible through consulting and training, electronic media, or knowledge transfers from specialists to apprentices. The storage of electronic data both in structured and structured formats may be centralized or scattered in information islands depending on the data management process maturity of the organization.

A very important thing to remember is that the costs of data acquisitions are usually very large compared to the cost of managing the data itself. Therefore it is very wasteful when data acquired through different costly means like field surveys, core drilling, engineering test and measurements, as well as lab analysis should be managed unprofessionally. Another important but often overlooked data readily available but mostly not managed properly is administrative data related to operations. For example tracking of drills operational/non operational statuses reasons (details comment on delays, standbys, shut downs) and performances (drilling rate, setup and moving times, ground conditions) are important for project performance review and planning of future drilling programs in the same area or with the same contractors or with the same existing infrastructure (electricity, water access, mine traffic, etc.).

In this article I am not discussing the best practices of the data management but in proposing that those best practices are applied so the data utilization may be optimized. The motivations is that as of any other assets, the company should use the sweat the assets and get maximum return from the costs the activities to acquire and maintain the data. One way to optimize the use of the data is to apply Data Analytics (DA) to the data sets not only for resource modeling or operational reasons but also for business improvement opportunities to identify processes that may be improved.

DA involves sets of processes and activities to acquire and evaluate data to arrive at important and useful information to identify key risks, errors or misuse; improve business effectiveness and influence decision-making processes. Depending on the maturity of the organizations data analysis culture, the natural progression of the data analytics techniques are described by the Information Systems Audit and Control Association (ISACA) as from Ad Hoc to Continuous Monitoring on the most advanced data analytics users.

Ad hoc DA is a one-use process—a starting point that may be used to help identify patterns or potential risk areas within a business system. Use of ad hoc DA is typically used for an initial investigation to gain initial understanding of the business processes while becoming familiar with the data.  Ad hoc DA usually starts with exploratory data mining, benchmarking/trending or data quality testing. Many analyses start due to specific management-directed requests from any business area or function resulting in increased knowledge of systems and processes to system improvements and cost savings.

Users of geo data use this approach most of the time usually at the request of management or required by projects. For example a project may be required to identify incompatible lithology and formation combinations in the existing database that may be due to data entry error or due to geologist misinterpretation during logging. This exercise may help users understanding the geology data and issues that arise due to discrepancies in their resource models.

Repeatable DA is predefined and scripted usually run as scheduled analysis tasks to test the same analytical objectives or the same sets of data. This approach provides the benefit of providing consistency, efficiency and more effective corrective actions. Most of these types of analysis are stored in individual files or scripts within data users workstations or in shared folders accessible by other users. The scripts or programs are usually connected to database from office productivity application or geo-specific applications directly to production systems using ODBC.

MS Excel QA/QC scripts, Vulcan/MineSight scripts are examples of this type of repeatable data analysis where the scripts become members of logic libraries that can be run repeatedly. New team members may use the script for training purposes and for understanding the data and correlations of data within the processes. As more and more the scripts are used, potential improvements arise and quality of analysis is improved and remains consistent from run to run, as the data acquisition process is partially or fully automated.

The Centralized DA approach is the next maturity steps in data analytics. This involves the development, storage and operation of repeatable DA where a central repository is developed and managed. In this stage standard data files; standards for DA development are documented and DA applications are set up and scheduled to run against the centralized data on a regular basis or on demand.

There are several advantages of using the centralized approach provides:

• A more consistent, efficient and repeatable processes.

• A more reliable and consistent.

• Reduced chance of multiple variations being scattered across individual machines.

• Minimized potential negative impact on the performance of production application

• Improved Data security.

• Easier backups and better workstation performance.

• Better access to analytics and results is available to more people, increasing productivity and improving the use of supporting reference materials, analytic sample logic and source data.

An example of this is using industrial grade systems and databases to help manage the repository and data deliveries such as acQuire, web based arcGIS, MS SQL, Oracle and other tools. In this stage data stored in production systems are collected, cleansed and transformed so that pre-defined queries with users assigned parameters may be run efficiently.

Integration and centralization in practice is itself a challenging if not complex tasks especially when data collected centrally comes from operational systems with different formats and integration capability. With most of current systems going away from proprietary data storage formats, the tasks are easier but designing a good data integration infrastructure has its challenges. A geological data integration effort may take years to reach stable states. Data from field surveys and sampling, core drilling, geotechnical measurements, geochemical analysis are usually stored in disparate information islands that requires support and discipline from all stakeholders to keep the data cleaned and quality controlled appropriately so that the data becomes analysis ready and provides correct decision making.

Representation of the data may be in the forms of pre-formatted tables, graphics, custom views or spatial data. Reports may be run by users and provide consistent analysis logic for tactical decision-making. Strategic decision-making may also take advantage from complex logics than can be served in the form of periodic score card reports and management pilot screens to provide higher level view of the mining operations.

Continuous Monitoring (CM) marks the highest point on the maturity scale. At this stage, analytics are fully automated and running at regularly scheduled intervals and may be embedded directly into a production system. A continuous run of analytics enables the immediate identification of potential exception transaction. Embedded analytics are preventive solutions to identify violations of the segregation of duties (SoD) or monitor high-risk issues such as critical geotechnical measurements or rainfall in real time. They can also act as advanced management tools that allow test runs of the workflow.

The process involves three steps:

1. Identify data transactions that match predefined criteria.

2. Copy and store the information to a database or data file.

3. Alert business unit managers or other stakeholders of data transactions matching the criteria.

Independent CM modules are detective solutions. Preconfigured rules are set up to run against data extracted after a transaction has been processed. The rules consist of data extraction from the production system to another server/ computer at regular intervals and the performance of predefined data tests. This solution allows for the use of multiple rules to reduce the likelihood of false positives (where something is flagged for review, but turns out to be a valid transaction) and results can be sent to business unit managers, auditors or other stakeholders after each run.

An enterprise GIS system may take advantage of the previous efforts in data centralization and data integration and provides another channel for delivering analysis of data spatially. Ordinary GIS system may provide attributes data but in the centralized stage logical steps to arrive at visual representation of an issue may be more valuable to users. For example, alerts with color codes green, yellow, red may represent complex logics behind formulation of geological data, ore extraction activities data, rainfall data, seismic data, and ground geotechnical data. Alerts may be sent to stakeholders to initiate automatic response for equipment or manual response of a dispatcher of information or coordinator that may organize an effective response. This type of use may not materialize if integration of data is not done appropriately and standards for data acquisitions, maintenance, and operating procedures are in place.

References: ISACA White Paper on Data Analytics – A Practical Approach, August 2011

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.