Back
Glossary

Filebeat

SelectDB· 2025/09/09

1. Introduction

Filebeat is a lightweight log shipper designed to efficiently forward and centralize log data as part of the Elastic Stack ecosystem. Originally developed by Elastic, Filebeat belongs to the Beats family of data shippers and serves as a crucial component in modern log management pipelines. As organizations increasingly deploy distributed systems, microservices, and cloud-native applications that generate massive volumes of log data across multiple servers and containers, Filebeat provides a reliable, resource-efficient solution for collecting, processing, and forwarding log files to centralized destinations like Elasticsearch, Logstash, or other data processing systems. Unlike heavy-weight log collection tools, Filebeat is specifically designed to consume minimal system resources while maintaining high reliability and performance in production environments.

2. Why Do We Need Filebeat?

Modern distributed systems and cloud infrastructure generate overwhelming amounts of log data that traditional collection methods cannot efficiently handle:

  • Distributed Log Fragmentation: Applications running across multiple servers, containers, and cloud regions create scattered log files that need centralized collection and processing
  • Resource Consumption Concerns: Traditional log collection tools often consume excessive CPU, memory, and network resources, impacting application performance
  • Reliability and Durability Requirements: Production systems need log shipping solutions that guarantee data delivery and handle network interruptions gracefully
  • Scalability Challenges: Dynamic environments with auto-scaling containers and ephemeral instances require log shippers that can adapt to changing infrastructure
  • Format Standardization Needs: Diverse log formats from different applications and systems require parsing and normalization before analysis
  • Real-time Processing Demands: Modern observability requires near real-time log ingestion for immediate alerting and incident response

Filebeat addresses these challenges by providing:

  • Lightweight Architecture with minimal resource footprint and efficient data transmission capabilities
  • Built-in Reliability Features including at-least-once delivery guarantees and persistent queue management
  • Modular Configuration System supporting diverse log sources through pre-configured modules and custom parsing rules
  • Cloud-Native Integration with automatic discovery and monitoring of containers, Kubernetes pods, and cloud services
  • Elastic Common Schema (ECS) Compliance ensuring standardized data formats for consistent analysis and visualization
  • High Availability Support through load balancing, failover mechanisms, and cluster-aware deployment patterns

3. Filebeat Architecture and Core Components

Overall Architecture

Filebeat employs a modular architecture consisting of input harvesters, processing pipelines, and output publishers that work together to efficiently collect log data from various sources, process it according to configured rules, and reliably deliver it to designated destinations while maintaining low resource overhead and high throughput.

Key Components

3.1 Input Layer

  • File Harvester: Core component that monitors and reads log files, handling file rotation, truncation, and new file detection
  • Docker Container Logs: Specialized input for collecting logs from Docker containers with automatic metadata enrichment
  • Kubernetes Integration: Native support for collecting logs from Kubernetes pods with namespace, container, and service context
  • Cloud Provider Inputs: Direct integration with cloud logging services like AWS CloudTrail, Azure Activity Logs, and Google Cloud Logging

3.2 Processing Pipeline

  • Multiline Processing: Advanced pattern matching for combining multi-line log entries into single events
  • Field Extraction: Built-in processors for parsing timestamps, extracting structured fields, and enriching events with metadata
  • Data Filtering: Conditional processing rules for including, excluding, or modifying log events based on content or source
  • Scripting Support: JavaScript processor for custom data transformation and complex parsing logic

3.3 Output Publishers

  • Elasticsearch Output: Direct indexing to Elasticsearch clusters with automatic index lifecycle management
  • Logstash Integration: Reliable delivery to Logstash instances for complex processing and transformation
  • Kafka Producer: High-throughput streaming to Apache Kafka topics for distributed log processing pipelines
  • Cloud Service Outputs: Native connectors for AWS CloudWatch, Azure Monitor, and other cloud logging platforms

3.4 Reliability and Monitoring

  • Persistent Queue: Local storage for events during network outages or downstream service unavailability
  • Registry Management: State tracking for processed log files to prevent duplicate processing after restarts
  • Health Monitoring: Built-in HTTP endpoints for health checks, metrics collection, and operational monitoring
  • Backpressure Handling: Automatic throttling and buffering to handle downstream processing bottlenecks

4. Key Features and Characteristics

Based on the official Filebeat documentation, Filebeat is designed as a lightweight shipper for forwarding and centralizing log data, with several key characteristics that make it ideal for production environments:

4.1 Lightweight Log Shipping Architecture

Filebeat operates as a lightweight agent installed on servers to monitor specified log files or locations, collect log events, and forward them to Elasticsearch or Logstash for indexing. The architecture employs inputs that look for log data in specified locations, with each log monitored by a harvester that reads content and sends new log data to libbeat, which aggregates events and forwards them to configured outputs. This design ensures minimal resource consumption while maintaining high reliability and performance across distributed systems.

4.2 Input and Harvester Management

The core functionality revolves around inputs and harvesters working in coordination. When Filebeat starts, it initiates one or more inputs that monitor specified locations for log data. For each discovered log file, Filebeat starts a dedicated harvester that reads the file for new content, tracks file position, handles log rotation, and manages file state. This harvester-per-file approach ensures efficient resource utilization while preventing data loss during file operations and system restarts.

4.3 Flexible Output and Integration Capabilities

Filebeat supports multiple output destinations including Elasticsearch for direct indexing, Logstash for additional processing and transformation, and various other systems like Kafka, Redis, and cloud services. The libbeat framework provides standardized output handling with features like load balancing across multiple destinations, automatic retry mechanisms, and persistent queuing during network outages. This flexibility allows organizations to integrate Filebeat into existing log management pipelines regardless of their architecture.

4.4 Built-in Processing and Data Enhancement

Filebeat includes processing capabilities that enhance log data before forwarding, reducing the need for external transformation tools. Features include multiline processing for handling stack traces and complex log entries, field extraction and enrichment with metadata, filtering rules for selective log forwarding, and support for various log formats. The system also provides automatic timestamp parsing and normalization, ensuring consistent data structure across different log sources.

4.5 Production-Ready Reliability and Monitoring

Enterprise deployment features include persistent state management through registry files that track processed log positions, at-least-once delivery guarantees, graceful shutdown procedures, and comprehensive monitoring capabilities. Filebeat provides built-in metrics for operational visibility, health check endpoints for load balancer integration, and structured logging for troubleshooting. Security features encompass TLS/SSL encryption, authentication mechanisms, and integration with Elasticsearch security features for access control and data protection.

5. Use Cases

5.1 Centralized Log Management

Organizations use Filebeat to collect logs from distributed servers and applications, centralizing them in Elasticsearch or other log management platforms for unified analysis and troubleshooting.

5.2 Container and Kubernetes Monitoring

Development teams deploy Filebeat as DaemonSets in Kubernetes clusters to automatically collect logs from all containers and pods, providing comprehensive observability for cloud-native applications.

5.3 Security Information and Event Management (SIEM)

Security teams leverage Filebeat to collect security-relevant logs from multiple sources, feeding them into SIEM systems for threat detection and compliance monitoring.

5.4 Application Performance Monitoring

Operations teams use Filebeat to collect application logs for performance analysis, error tracking, and capacity planning across distributed microservices architectures.

6. Practical Example

Filebeats Doris output plugin

Installation

Download from the Official Website

https://apache-doris-releases.oss-accelerate.aliyuncs.com/extension/filebeat-doris-2.1.1

Compile from Source Code

Execute the following commands in the extension/beats/ directory:

cd doris/extension/beats

go build -o filebeat-doris filebeat/filebeat.go
go build -o metricbeat-doris metricbeat/metricbeat.go
go build -o winlogbeat-doris winlogbeat/winlogbeat.go
go build -o packetbeat-doris packetbeat/packetbeat.go
go build -o auditbeat-doris auditbeat/auditbeat.go
go build -o heartbeat-doris heartbeat/heartbeat.go

Create Table

// get data: wget https://data.gharchive.org/2024-01-01-15.json.gz
CREATE DATABASE log_db;
USE log_db;


CREATE TABLE github_events
(
  `created_at` DATETIME,
  `id` BIGINT,
  `type` TEXT,
  `public` BOOLEAN,
  `actor` VARIANT,
  `repo` VARIANT,
  `payload` TEXT,
  INDEX `idx_id` (`id`) USING INVERTED,
  INDEX `idx_type` (`type`) USING INVERTED,
  INDEX `idx_actor` (`actor`) USING INVERTED,
  INDEX `idx_host` (`repo`) USING INVERTED,
  INDEX `idx_payload` (`payload`) USING INVERTED PROPERTIES("parser" = "unicode", "support_phrase" = "true")
)
ENGINE = OLAP
DUPLICATE KEY(`created_at`)
PARTITION BY RANGE(`created_at`) ()
DISTRIBUTED BY RANDOM BUCKETS 10
PROPERTIES (
"replication_num" = "1",
"compaction_policy" = "time_series",
"enable_single_replica_compaction" = "true",
"dynamic_partition.enable" = "true",
"dynamic_partition.create_history_partition" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "1",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "10",
"dynamic_partition.replication_num" = "1"
);

Filebeat Configuration

This configuration file differs from the previous TEXT log collection in the following aspects:

  1. Processors are not used because no additional processing or transformation is needed.
  2. The codec_format_string in the output is simple, directly outputting the entire message, which is the raw content.
# input
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /path/to/your/log

# queue and batch
queue.mem:
  events: 1000000
  flush.min_events: 100000
  flush.timeout: 10s

# output
output.doris:
  fenodes: [ "http://fehost1:http_port", "http://fehost2:http_port", "http://fehost3:http_port" ]
  user: "your_username"
  password: "your_password"
  database: "your_db"
  table: "your_table"
  # output string format
  ## Directly outputting the raw message of each line from the original file. Since headers specify format: "json", Stream Load will automatically parse the JSON fields and write them into the corresponding fields of the Doris table.
  codec_format_string: '%{[message]}'
  headers:
    format: "json"
    read_json_by_line: "true"
    load_to_single_tablet: "true"

Enterprise Production Deployment

#!/bin/bash
# Production deployment script for Filebeat across multiple environments

set -e

# Configuration variables
FILEBEAT_VERSION="8.11.0"
ELASTICSEARCH_CLUSTER="https://es-cluster.company.com:9200"
KIBANA_URL="https://kibana.company.com:5601"
ENVIRONMENT=${1:-production}
DATACENTER=${2:-us-east-1}

echo "Deploying Filebeat ${FILEBEAT_VERSION} to ${ENVIRONMENT} environment in ${DATACENTER}"

# Create directory structure
sudo mkdir -p /etc/filebeat/modules.d
sudo mkdir -p /var/lib/filebeat
sudo mkdir -p /var/log/filebeat
sudo mkdir -p /opt/filebeat/certs

# Download and install Filebeat
curl -L -O "https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-${FILEBEAT_VERSION}-linux-x86_64.tar.gz"
tar xzvf "filebeat-${FILEBEAT_VERSION}-linux-x86_64.tar.gz"
sudo mv "filebeat-${FILEBEAT_VERSION}-linux-x86_64" /opt/filebeat

# Create symlinks
sudo ln -sf /opt/filebeat/filebeat /usr/local/bin/filebeat

# Generate dynamic configuration based on environment
cat > /tmp/filebeat.yml << EOF
# Dynamic Filebeat configuration for ${ENVIRONMENT}
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/syslog
      - /var/log/auth.log
      - /var/log/nginx/*.log
      - /var/log/applications/*/*.log
    fields:
      environment: ${ENVIRONMENT}
      datacenter: ${DATACENTER}
      server_role: \$(hostname -s)
    fields_under_root: true
    processors:
      - add_host_metadata:
          when.not.contains.tags: forwarded
      - add_docker_metadata: ~
      - drop_event:
          when.regexp:
            message: "^DEBUG|^TRACE"

# Load balancing across Elasticsearch nodes
output.elasticsearch:
  hosts: 
    - "${ELASTICSEARCH_CLUSTER}"
  protocol: "https"
  username: "filebeat_${ENVIRONMENT}"
  password: "\${FILEBEAT_PASSWORD}"
  index: "logs-${ENVIRONMENT}-%{+yyyy.MM.dd}"
  template.name: "filebeat-${ENVIRONMENT}"
  template.pattern: "logs-${ENVIRONMENT}-*"
  
setup.kibana:
  host: "${KIBANA_URL}"

# Performance optimizations for production
queue.mem:
  events: 8192
  flush.min_events: 1024
  flush.timeout: 30s

# Monitoring configuration
monitoring.enabled: true
monitoring.elasticsearch:
  hosts: ["${ELASTICSEARCH_CLUSTER}"]
  username: "filebeat_monitoring"
  password: "\${MONITORING_PASSWORD}"

logging.level: info
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat.log
  keepfiles: 10
  permissions: 0600
  rotateeverybytes: 104857600
EOF

# Install configuration
sudo mv /tmp/filebeat.yml /etc/filebeat/filebeat.yml
sudo chown root:root /etc/filebeat/filebeat.yml
sudo chmod 600 /etc/filebeat/filebeat.yml

# Create systemd service
cat > /tmp/filebeat.service << EOF
[Unit]
Description=Filebeat Log Shipper
Documentation=https://www.elastic.co/beats/filebeat
Wants=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/filebeat/filebeat.yml

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/bin/filebeat -e -c /etc/filebeat/filebeat.yml
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=filebeat
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=30

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/var/lib/filebeat /var/log/filebeat

# Resource limits
LimitNOFILE=65536
LimitMEMLOCK=infinity

[Install]
WantedBy=multi-user.target
EOF

sudo mv /tmp/filebeat.service /etc/systemd/system/filebeat.service

# Set up log rotation
cat > /tmp/filebeat-logrotate << EOF
/var/log/filebeat/*.log {
    daily
    missingok
    rotate 30
    compress
    notifempty
    create 0600 root root
    postrotate
        /bin/systemctl reload filebeat > /dev/null 2>&1 || true
    endscript
}
EOF

sudo mv /tmp/filebeat-logrotate /etc/logrotate.d/filebeat

# Enable and start Filebeat
sudo systemctl daemon-reload
sudo systemctl enable filebeat
sudo systemctl start filebeat

# Verify installation
echo "Verifying Filebeat installation..."
sleep 10

if sudo systemctl is-active --quiet filebeat; then
    echo "Filebeat is running successfully"
    echo "Status: $(sudo systemctl status filebeat --no-pager -l)"
else
    echo "Filebeat failed to start"
    echo "Checking logs..."
    sudo journalctl -u filebeat --no-pager -l
    exit 1
fi

# Test configuration
echo "Testing Filebeat configuration..."
sudo filebeat test config -c /etc/filebeat/filebeat.yml

echo "Testing output connectivity..."
sudo filebeat test output -c /etc/filebeat/filebeat.yml

# Set up monitoring script
cat > /tmp/filebeat-monitor.sh << 'EOF'
#!/bin/bash
# Simple monitoring script for Filebeat

SERVICE_NAME="filebeat"
LOG_FILE="/var/log/filebeat/filebeat.log"
EMAIL_ALERT="ops@company.com"

check_service() {
    if ! systemctl is-active --quiet "$SERVICE_NAME"; then
        echo "ALERT: $SERVICE_NAME is not running" | mail -s "Filebeat Alert" "$EMAIL_ALERT"
        systemctl start "$SERVICE_NAME"
    fi
}

check_errors() {
    ERROR_COUNT=$(tail -n 100 "$LOG_FILE" | grep -c "ERROR\|CRITICAL" || true)
    if [ "$ERROR_COUNT" -gt 5 ]; then
        echo "ALERT: High error count ($ERROR_COUNT) in $SERVICE_NAME logs" | mail -s "Filebeat Error Alert" "$EMAIL_ALERT"
    fi
}

check_service
check_errors
EOF

sudo mv /tmp/filebeat-monitor.sh /usr/local/bin/filebeat-monitor.sh
sudo chmod +x /usr/local/bin/filebeat-monitor.sh

# Add monitoring to cron
echo "*/5 * * * * /usr/local/bin/filebeat-monitor.sh" | sudo crontab -

echo "Filebeat deployment completed successfully!"
echo "Configuration: /etc/filebeat/filebeat.yml"
echo "Logs: /var/log/filebeat/"
echo "Monitor: sudo systemctl status filebeat"
echo "Test config: sudo filebeat test config"

# Cleanup
rm -f "filebeat-${FILEBEAT_VERSION}-linux-x86_64.tar.gz"

echo "Deployment summary:"
echo "- Environment: ${ENVIRONMENT}"
echo "- Datacenter: ${DATACENTER}"
echo "- Version: ${FILEBEAT_VERSION}"
echo "- Elasticsearch: ${ELASTICSEARCH_CLUSTER}"
echo "- Kibana: ${KIBANA_URL}"
echo "- Service status: $(systemctl is-active filebeat)"

7. Key Takeaways

  • Filebeat provides lightweight, efficient log shipping with minimal resource consumption while maintaining high reliability and performance for enterprise-scale log collection
  • Native cloud and container support enables seamless integration with Kubernetes, Docker, and cloud platforms through auto-discovery and metadata enrichment
  • Modular architecture with built-in processors allows flexible log parsing, transformation, and routing without requiring external processing systems
  • Enterprise-grade reliability features including persistent queues, at-least-once delivery guarantees, and automatic failover ensure data durability in production environments
  • Elastic Stack integration provides seamless connectivity with Elasticsearch, Logstash, and Kibana for comprehensive log management and analysis workflows

8. FAQ

Q: How does Filebeat compare to other log shippers like Fluentd or Logstash?

A: Filebeat is more lightweight and resource-efficient than Logstash, focusing specifically on log shipping. It's comparable to Fluentd but optimized for the Elastic Stack ecosystem.

Q: Can Filebeat handle log rotation and large files efficiently?

A: Yes, Filebeat has built-in support for log rotation, file truncation, and large file handling with configurable harvester settings and memory management.

Q: What happens if Elasticsearch is unavailable?

A: Filebeat uses persistent queues to store events locally during outages and resumes shipping when connectivity is restored, ensuring no data loss.

Q: How do I secure Filebeat communications?

A: Use TLS/SSL encryption for all communications, implement certificate-based authentication, and configure proper access controls for Filebeat users in Elasticsearch.

9. Additional Resources and Next Steps

Learn More

Videos

Get Started

Ready to implement efficient log collection with Filebeat? Start with our installation guide and begin centralizing your log data for better observability and incident response.

Streamline Your Logs: Deploy Filebeat today and transform your distributed log collection with the industry's most efficient and reliable log shipping solution.