Glossary | VeloDB

IoT monitoring

IoT monitoring focuses on understanding system-level behavior across devices, gateways, and data pipelines, enabling real-time insight, anomaly detection, and operational decision-making at scale.

LLM Observability

LLM Observability is the comprehensive practice of monitoring, tracking, and analyzing the behavior, performance, and outputs of Large Language Models (LLMs) throughout their entire lifecycle from development to production. It provides real-time visibility into every layer of LLM-based systems, enabling organizations to understand not just what is happening with their AI models, but why specific behaviors occur, ensuring reliable, safe, and cost-effective AI operations.

Filebeat

Filebeat is a lightweight log shipper designed to efficiently forward and centralize log data as part of the Elastic Stack ecosystem. Originally developed by Elastic, Filebeat belongs to the Beats family of data shippers and serves as a crucial component in modern log management pipelines. As organizations increasingly deploy distributed systems, microservices, and cloud-native applications that generate massive volumes of log data across multiple servers and containers, Filebeat provides a reliable, resource-efficient solution for collecting, processing, and forwarding log files to centralized destinations like Elasticsearch, Logstash, or other data processing systems. Unlike heavy-weight log collection tools, Filebeat is specifically designed to consume minimal system resources while maintaining high reliability and performance in production environments.

Inverted Index

Learn how inverted indexes power full-text search in databases. Covers architecture, build process, and practical use in RAG, log analytics, and hybrid search.

Semi-Structured Data

Semi-structured data is a form of data that sits between structured and unstructured data, containing some organizational properties without conforming to a rigid schema like traditional relational databases. This data format maintains partial organization through tags, metadata, and hierarchical structures while retaining flexibility for varied content representation. As organizations increasingly handle diverse data sources including web content, IoT device outputs, social media feeds, and API responses, semi-structured data has become fundamental to modern data management strategies. Unlike structured data that fits neatly into rows and columns, or unstructured data that lacks any organizational framework, semi-structured data provides a balance of flexibility and organization that enables efficient storage, processing, and analysis across distributed systems and cloud-native architectures.

OpenTelemetry

OpenTelemetry is the de facto standard for observability defining unified specifications and providing out-of-the-box instrumentation SDKs for collecting traces, metrics, and logs.

Grafana

Grafana is an open-source analytics and monitoring platform that provides comprehensive data visualization, dashboards, and alerting capabilities for observability across modern IT infrastructure. Originally developed by Torkel Ödegaard in 2014, Grafana has evolved into the leading solution for creating interactive dashboards that unify metrics, logs, traces, and other data sources into coherent visual narratives.

Apache Doris

Apache Doris is an MPP-based real-time data warehouse known for its high query speed. For queries on large datasets, it returns results in sub-seconds. It supports both high-concurrency point queries and high-throughput complex analysis. It can be used for report analysis, ad-hoc queries, unified data warehouse, and data lake query acceleration. Based on Apache Doris, users can build applications for user behavior analysis, A/B testing platform, log analysis, user profile analysis, and e-commerce order analysis.

Analytics Database

An analytics database is a specialized database management system optimized for Online Analytical Processing (OLAP), designed to handle complex queries, aggregations, and analytical workloads across large datasets. Unlike traditional transactional databases that focus on operational efficiency and data consistency, analytics databases prioritize query performance, data compression, and support for multidimensional analysis. Modern analytics databases leverage columnar storage, massively parallel processing (MPP) architectures, and vectorized execution engines to deliver sub-second response times on petabyte-scale datasets, making them essential for business intelligence, data science, and real-time decision-making applications.