Skip to main content

Data Lake

The hopit Platform provides two Data Lakes: the Edge Data Lake running on each Edge device and the central Portal Data Lake, which provides a central source for all data from every Edge devices. The data can be displayed with the Dashboard services.

The Data Lakes are capable of storing time series data in an highly storage efficient manner. Therefore it is only possible to store floating-point numbers.

For Information how to send data to the Data Lakes, please read the Insights Collector documentation.

Naming and Labeling

To structure data, each signal can be labeled. The labels are added at the end of the signal name in curly brackets: signal_name{label-name-1="value-1",label-name-2="value-2"}. All signals sent to a Data Lake are modified to be lower case and only has underscores as special character.

If arrays are sent to a Data Lake (e.g. with the ADS target), the array indices are added as labels. The ADS signal GVL.Line[2].Temperature[3] will be named gvl_line_temperature{index_gvl_line="2",index_gvl_line_temperature="3"}. This way, it can be indexed in the Dashboards and a single dashboard can be used for multiple lines and temperatures. The signal can then be selected with Grafana Variables.

When a signal gets sent to the Portal, it is labeled with the Edge device name, represented by the Device Name parameter in the Insights Collector service.

Edge Data Lake Configuration

The available parameters for the Data Lake are:

  • Enabled: Enabled or disables the service-
  • RetentionTime: The time period, data is stored. Units can be d dor days, m for months or y for years.
  • MaxDiskUsage: Maximum disk usage in GB. If this is reached earlier than the RetentionTime, the oldest data gets deleted.
Corresponding Edge configuration and Device Twin definition to activate the Edge Data Lake service:
https://localhost:5050/Settings
ADS Router Settings

To stream data to this Data Lake, use the DataLake-edge Target.

Portal Data Lake Configuration

To stream data to this Data Lake, use the DataLake-short-term or DataLake-long-term Target.

The Portal Data Lake configuration is done by HEAP Engineering GmbH.