Data Lake
The hopit Platform provides two Data Lakes: the Edge Data Lake running on each Edge device and the central Portal Data Lake, which provides a central source for all data from every Edge devices. The data can be displayed with the Dashboard services.
The Data Lakes are capable of storing time series data in an highly storage efficient manner. Therefore it is only possible to store floating-point numbers.
For Information how to send data to the Data Lakes, please read the Insights Collector documentation.
Naming and Labeling
To structure data, each signal can be labeled. The labels are added at the end of the signal name in curly brackets: signal_name{label-name-1="value-1",label-name-2="value-2"}
.
All signals sent to a Data Lake are modified to be lower case and only has underscores as special character.
If arrays are sent to a Data Lake (e.g. with the ADS target), the array indices are added as labels. The ADS
signal GVL.Line[2].Temperature[3]
will be named gvl_line_temperature{index_gvl_line="2",index_gvl_line_temperature="3"}
. This way, it can be indexed in the Dashboards and a single dashboard can be used for multiple lines and temperatures. The signal can then be selected with Grafana Variables.
When a signal gets sent to the Portal, it is labeled with the Edge device name, represented by the Device Name parameter in the Insights Collector service.
Edge Data Lake Configuration
The available parameters for the Data Lake are:
Enabled
: Enabled or disables the service-RetentionTime
: The time period, data is stored. Units can bed
dor days,m
for months ory
for years.MaxDiskUsage
: Maximum disk usage inGB
. If this is reached earlier than theRetentionTime
, the oldest data gets deleted.
Corresponding Edge
configuration and Device Twin
definition to activate the Edge Data Lake service:
- Edge-UI
- Device Twin
{
"DataLake": {
"MaxDiskUsage": 1,
"RetentionTime": "2y",
"Enabled": true
}
}
To stream data to this Data Lake, use the DataLake-edge
Target.
Portal Data Lake Configuration
To stream data to this Data Lake, use the DataLake-short-term
or DataLake-long-term
Target.
The Portal Data Lake configuration is done by HEAP Engineering GmbH.