insights
Clickhouse Use Case Guide: Digital Twins
Dec 18, 2024

Clickhouse Use Case Guide: Digital Twins

As the world becomes more tech-driven, we rely more and more on rare earth metals to power our phones, appliances, and even lifesaving devices. The mining operations to gather these minerals have grown increasingly complex and sophisticated.


The children yearn for the mines. And by children I mean The Bagger 293. In 2023, the global mining market size is estimated to be valued at 1.1 trillion U.S. dollars according to Global Market Estimates.

As the world becomes more tech-driven, we rely more and more on rare earth metals to power our phones, appliances, and even lifesaving devices. The mining operations to gather these minerals have grown increasingly complex and sophisticated, requiring advanced monitoring and control systems to ensure safety, efficiency, and environmental compliance.

Sensor data from all types of disparate sources needs to be collected, processed, and analyzed in real-time to create accurate digital representations of mining equipment and operations. In this blog post, we’re going to explore what digital twins are, and how ClickHouse can be a powerful backbone to digital twin data.

What the heck is a digital twin?

A digital twin is a virtual representation of a physical object, process, or system that exists in the real world. Think of it as a highly sophisticated digital mirror that reflects not just what something looks like, but how it behaves, performs, and changes over time. In the context of mining operations, digital twins can model everything from individual pieces of equipment like excavators and haul trucks to entire mining sites, complete with real-time data about operations, maintenance needs, and environmental conditions.

These digital representations serve multiple purposes: they help predict equipment failures before they occur, optimize resource allocation, and ensure compliance with safety and environmental regulations. By maintaining a real-time digital twin, mining operations can make data-driven decisions that improve efficiency while reducing risks and environmental impact.

The challenge, however, lies in managing the massive amounts of data that power these digital twins. This is where ClickHouse enters the picture.

How digital twins process data

Digital twins process data in a variety of ways, but one of the most commonly seen formats and methods is using messaging broker protocols such as MQTT (Message Queuing Telemetry Transport) and Apache Kafka. These messaging protocols enable real-time data streaming from sensors and equipment to the digital twin system, ensuring that the virtual representation stays synchronized with its physical counterpart. The data typically includes measurements like temperature, pressure, vibration, location, and operational status.

The format this data renders in a data warehouse is known as time series data. This type of data is characterized by its timestamp-based nature and sequential ordering, making it ideal for tracking changes and patterns over time. In the context of digital twins, time series data allows for both real-time monitoring and historical analysis of equipment performance and operational metrics.

image.png

The Time Series Engine (Experimental)

ClickHouse has recently introduced an experimental Time Series Engine that’s particularly well-suited for handling the type of data generated by digital twins. This specialized engine is designed to efficiently store and process time-series data, making it an excellent choice for applications that require real-time monitoring and historical analysis. The engine optimizes storage and query performance by organizing data around timestamps, which is crucial for digital twin applications where temporal relationships between data points are paramount. Let’s look at some potential implementation in both Clickhouse and chDB.

Example A. Acme Mining Company’s Clickhouse

To demonstrate how the ClickHouse Time Series Engine can be used for Acme Mining Company’s digital twin data, imagine we’re tracking the performance of a mining excavator in real time. The data includes metrics such as engine temperature, vibration levels, and fuel consumption, all of which are stored as time series data in a ClickHouse table.

Here’s an example of a query to analyze engine temperature trends over the last hour:

SELECT
    toStartOfMinute(timestamp) AS minute,
    avg(engine_temperature) AS avg_temp
FROM
    excavator_metrics
WHERE
    timestamp >= now() - INTERVAL 1 HOUR
GROUP BY
    minute
ORDER BY
    minute ASC;

This query groups the data by minute, calculates the average engine temperature for each minute, and orders the results chronologically. With this information, Acme Mining Company can quickly identify overheating trends or anomalies before they lead to equipment failure, enabling proactive maintenance and minimizing downtime.

The efficiency of the Time Series Engine ensures that even with millions of data points, these insights can be generated in real time, making ClickHouse a powerful tool for managing the vast data demands of digital twins.

Example B. Tennesse Limestone Company chDB

Similar to our Acme Mining example, let’s look at how Tennessee Limestone Company utilizes chDB (a fork of ClickHouse) for their digital twin implementation. Their setup is different in that it is run in-memory on the sensor itself, and monitors limestone extraction equipment and quarry conditions before sending the batch data to the central Clickhouse instance, with a particular focus on dust levels and equipment wear patterns.

import chdb
import time
from datetime import datetime, timedelta

# Initialize in-memory chDB instance
db = chdb.Client(memory=True)

# Create table for sensor data
db.execute("""
    CREATE TABLE IF NOT EXISTS sensor_buffer (
        timestamp DateTime,
        dust_concentration Float32,
        equipment_wear_index Float32
    ) ENGINE = Memory
""")

# Query function to check dust levels and wear
def check_sensor_metrics():
    result = db.query("""
        SELECT 
            toStartOfFiveMinute(timestamp) AS time_bucket,
            avg(dust_concentration) AS avg_dust,
            max(equipment_wear_index) AS max_wear
        FROM sensor_buffer
        WHERE timestamp >= now() - INTERVAL 24 HOUR
        GROUP BY time_bucket
        HAVING avg_dust > 50  -- EPA threshold
        ORDER BY time_bucket DESC
    """)
    return result

# Example of processing sensor data
while True:
    # Process local sensor data before sending to central DB
    alerts = check_sensor_metrics()
    if alerts:
        # Send alert data to central Clickhouse instance
        send_to_central_db(alerts)
    time.sleep(300)  # Check every 5 minutes

This approach helps Tennessee Limestone Company maintain compliance with EPA regulations while simultaneously monitoring equipment health. The ability to process and analyze this data in real-time allows them to make immediate adjustments to their extraction processes when dust levels exceed acceptable thresholds.

Conclusion

Digital twins powered by ClickHouse and chDB offer mining companies unprecedented visibility into their operations, equipment health, and environmental impact. By leveraging these powerful analytical tools, companies can make data-driven decisions that improve safety, efficiency, and regulatory compliance. As the mining industry continues to evolve, the combination of digital twin technology and robust time-series databases will become increasingly essential for maintaining competitive advantage in this trillion-dollar market.