Apache IoTDB is an open-source time series database developed based on our preliminary study [1]. In this demo, we present TsQuality [2], a system to measure data quality in IoTDB. The document of four data quality functions, completeness, consistency, timeliness, and validity, is available on the product website of Apache IoTDB. The corresponding code is included in the GitHub repository of the system.
Introduction
Time series data are often found with various data quality issues, such as completeness, consistency, and validity, especially in the scenarios of IoT. In the process of time series data management, from being collected to being stored in time series databases, any issue like sensor failure or network transmission errors, may lead to data quality problems. Analysis upon dirty data without prior assessment of data quality may yield misleading results.
The following figure presents a segment of time series with four types of data quality issues: completeness, consistency, timeliness, validity.
As shown, the data points are usually collected every minute, a preset frequency of sensors. A point, however, is missing at time 13:02:37, and leads to completeness issue. In contrast, the point at 13:06:37 is re-transmitted, resulting in a redundant one, known as consistency issue. Moreover, a point could also be delayed, e.g., the one that should appear at time 13:04:37 but not until 30 seconds later. Such an issue is measured by timeliness. The validity measure is evaluated w.r.t. a set of constraints on both time and value. For instance, the two horizontal red lines, $v_{min}$ and $v_{max}$, denote the valid range of values. The point at time 13:08:37 has an abnormal value smaller than the minimum. In addition, the two red arrows, $s_{min}$ and $s_{max}$, specify the speed of maximum and minimum value fluctuation over time. The data point at time 13:01:37 has a speed of 250−115 = 2.25 > 2 = $s_{max}$, and thus has abnormal value as well.
System architecture
TsQuality follows a three-tier architecture model: Storage, Computation, and Presentation, as shown in this figure:
Storage Design
The entity-relationship model of the data in SQLite is shown in this figure:
Demonstration
TsQuality using three tools to measure the data quality in IoTDB as follows.
TsQuality Dashboard
Data quality overview of time series in TsQuality:
Data quality explanation of time series in TsQuality:
Apache Superset
Apache Zeppelin
Docs
- TsQuality Installation & Configuration
- Superset Installation
- Superset Configuration
- Zeppelin Installation & Configuration
- IoTDB Installation & Configuration
References
- Chen Wang, Jialin Qiao, Xiangdong Huang, Shaoxu Song, Haonan Hou, Tian Jiang, Lei Rui, Jianmin Wang, and Jiaguang Sun. 2023. Apache IoTDB: A Time Series Database for IoT Applications. Proc. ACM Manag. Data 1, 2, Article 195 (June 2023), 26 pages. https://doi.org/10.1145/3589775.
- Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. TsQuality: Measuring Time Series Data Quality in Apache IoTDB. International Conference on Very Large Data Bases, VLDB, 2023. [paper][slides] [demo]