Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04

admineci

admineci

Auteur

1711 mots
Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04

Transform your infrastructure monitoring with the powerful open-source duo of Prometheus and Grafana on Ubuntu 24.04 LTS. This comprehensive guide takes you from zero to a fully operational monitoring stack, complete with secure HTTPS access, email alerts, and stunning visualization dashboards.

Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04

Introduction

In today's complex IT infrastructure, monitoring system performance and availability is crucial for maintaining reliability and preventing downtime. Prometheus and Grafana form a powerful open-source monitoring stack that provides real-time metrics collection, alerting, and beautiful visualization dashboards.

This comprehensive guide will walk you through setting up a complete monitoring solution on Ubuntu 24.04, including:

  • Prometheus server for metrics collection
  • Node Exporter for system metrics
  • Blackbox Exporter for endpoint monitoring
  • Alertmanager for notifications
  • Grafana for visualization
  • Security with HTTPS and authentication

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It features:

  • Multi-dimensional data model with time series data
  • Flexible query language (PromQL)
  • Pull-based metrics collection
  • Built-in alerting rules
  • Service discovery capabilities
  • Extensive ecosystem of exporters

Prerequisites

Before starting, ensure you have:

  • Ubuntu 24.04 LTS server with sudo privileges
  • Minimum 2GB RAM (4GB recommended)
  • 20GB available disk space
  • Basic understanding of Linux commands
  • Network connectivity for package installation

1. Installing Prometheus and Node Exporter

Step 1: Update System Packages

sudo apt update && sudo apt upgrade -y

Step 2: Install Prometheus Components

# Install Prometheus server and Node Exporter
sudo apt -y install prometheus prometheus-node-exporter

# Enable services to start on boot
sudo systemctl enable prometheus prometheus-node-exporter

# Start services
sudo systemctl start prometheus prometheus-node-exporter

Step 3: Verify Installation

# Check service status
sudo systemctl status prometheus
sudo systemctl status prometheus-node-exporter

# Check listening ports
sudo ss -tlnp | grep -E '9090|9100'

Prometheus runs on port 9090, Node Exporter on port 9100.

2. Understanding Prometheus Configuration

The main configuration file is located at /etc/prometheus/prometheus.yml:

# Sample config for Prometheus.
global:
  scrape_interval:     15s # How often to scrape targets
  evaluation_interval: 15s # How often to evaluate rules
  # scrape_timeout is set to the global default (10s)

  external_labels:
      monitor: 'example'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

3. Securing Prometheus with HTTPS and Authentication

Step 1: Generate SSL Certificate

For production, use a proper SSL certificate. For testing, create a self-signed certificate:

# Generate self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/prometheus/server.key \
  -out /etc/prometheus/server.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=prometheus.local"

# Set proper permissions
sudo chown prometheus:prometheus /etc/prometheus/server.{crt,key}
sudo chmod 600 /etc/prometheus/server.key

Step 2: Configure Authentication

# Install Apache utilities for password generation
sudo apt -y install apache2-utils

# Generate bcrypt password hash
htpasswd -nB admin
# Enter password when prompted
# Copy the generated hash

Step 3: Create Web Configuration

sudo nano /etc/prometheus/web.yml

Add the following content:

# TLS configuration
tls_server_config:
  cert_file: server.crt
  key_file: server.key

# Basic authentication
basic_auth_users:
  admin: $2y$05$YOURGENERATEDHASHHERE

Step 4: Update Prometheus Service

sudo nano /etc/default/prometheus

Add the web config parameter:

ARGS="--web.config.file=/etc/prometheus/web.yml"

Step 5: Update Prometheus Configuration

sudo nano /etc/prometheus/prometheus.yml

Update the Prometheus job to use HTTPS:

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    scheme: https
    tls_config:
      cert_file: /etc/prometheus/server.crt
      key_file: /etc/prometheus/server.key
      insecure_skip_verify: true  # For self-signed certificates
    basic_auth:
      username: 'admin'
      password: 'your_password_here'
    static_configs:
      - targets: ['localhost:9090']

Restart Prometheus:

sudo systemctl restart prometheus

4. Adding Monitoring Targets

Step 1: Install Node Exporter on Target Nodes

On each node you want to monitor:

# On target node (e.g., node01.example.com)
sudo apt -y install prometheus-node-exporter
sudo systemctl enable --now prometheus-node-exporter

Step 2: Configure Prometheus to Scrape New Targets

sudo nano /etc/prometheus/prometheus.yml

Add targets to the node job:

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100', 'node01.example.com:9100', 'node02.example.com:9100']

  # Or create separate job groups
  - job_name: 'webservers'
    static_configs:
      - targets: ['web01.example.com:9100', 'web02.example.com:9100']
  
  - job_name: 'databases'
    static_configs:
      - targets: ['db01.example.com:9100', 'db02.example.com:9100']

Restart Prometheus:

sudo systemctl restart prometheus

5. Setting Up Alertmanager

Step 1: Install Alertmanager

sudo apt -y install prometheus-alertmanager
sudo systemctl enable prometheus-alertmanager

Step 2: Configure Email Notifications

# Backup original configuration
sudo mv /etc/prometheus/alertmanager.yml /etc/prometheus/alertmanager.yml.bak

# Create new configuration
sudo nano /etc/prometheus/alertmanager.yml

Add email notification configuration:

global:
  # SMTP configuration
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@yourdomain.com'
  smtp_auth_username: 'your-email@gmail.com'
  smtp_auth_password: 'your-app-password'
  smtp_require_tls: true

route:
  receiver: 'email-notifications'
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'admin@yourdomain.com'
    headers:
      Subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'

Step 3: Create Alert Rules

sudo nano /etc/prometheus/alert_rules.yml

Add monitoring rules:

groups:
- name: system_alerts
  rules:
  # Instance down alert
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  # High CPU usage
  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% (current value: {{ $value }}%)"

  # High memory usage
  - alert: HighMemoryUsage
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 85% (current value: {{ $value }}%)"

  # Disk space low
  - alert: DiskSpaceLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Low disk space on {{ $labels.instance }}"
      description: "Disk space on root partition is below 15% (current value: {{ $value }}%)"

Step 4: Update Prometheus Configuration

sudo nano /etc/prometheus/prometheus.yml

Add the alert rules file:

rule_files:
  - "alert_rules.yml"

Restart services:

sudo systemctl restart prometheus prometheus-alertmanager

6. Installing and Configuring Grafana

Step 1: Install Grafana

# Install required packages
sudo apt-get install -y software-properties-common

# Add Grafana GPG key
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

# Add Grafana repository
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Update and install Grafana
sudo apt update
sudo apt install -y grafana

# Enable and start Grafana
sudo systemctl enable --now grafana-server

Step 2: Configure Grafana

Grafana runs on port 3000. Access it at http://your-server-ip:3000

Default credentials:

  • Username: admin
  • Password: admin (you'll be prompted to change it)

Step 3: Add Prometheus Data Source

  1. Log into Grafana
  2. Navigate to Configuration → Data Sources
  3. Click "Add data source"
  4. Select "Prometheus"
  5. Configure:
    • Name: Prometheus
    • URL: https://localhost:9090 (if using HTTPS)
    • Auth: Toggle on "Basic auth"
    • User: admin
    • Password: your-prometheus-password
    • TLS Settings: Toggle "Skip TLS Verify" for self-signed certificates
  6. Click "Save & Test"

Step 4: Import Dashboards

Import pre-built dashboards:

  1. Go to Dashboards → Import
  2. Popular dashboard IDs:
    • Node Exporter Full: 1860
    • Node Exporter Server Metrics: 405
    • Prometheus Stats: 2
  3. Enter the ID and click "Load"
  4. Select your Prometheus data source
  5. Click "Import"

7. Blackbox Exporter for Endpoint Monitoring

Step 1: Install Blackbox Exporter

# On monitoring node
sudo apt -y install prometheus-blackbox-exporter
sudo systemctl enable --now prometheus-blackbox-exporter

Step 2: Configure Blackbox Exporter

The default configuration at /etc/prometheus/blackbox.yml includes modules for:

  • HTTP/HTTPS monitoring
  • TCP port checks
  • ICMP ping
  • DNS queries

Step 3: Add Blackbox Jobs to Prometheus

sudo nano /etc/prometheus/prometheus.yml

Add monitoring jobs:

scrape_configs:
  # HTTP/HTTPS monitoring
  - job_name: 'blackbox_http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://example.com
        - https://app.example.com
        - http://internal-app.local
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  # TCP port monitoring
  - job_name: 'blackbox_tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - db.example.com:3306      # MySQL
        - cache.example.com:6379   # Redis
        - mq.example.com:5672      # RabbitMQ
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  # ICMP ping monitoring
  - job_name: 'blackbox_icmp'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
        - gateway.example.com
        - dns1.example.com
        - critical-server.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Restart Prometheus:

sudo systemctl restart prometheus

8. Best Practices and Optimization

Storage Optimization

# Configure retention in prometheus.yml
sudo nano /etc/prometheus/prometheus.yml

Add to global section:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  # Retention configuration
  retention_time: 30d  # Keep data for 30 days

Performance Tuning

  1. Optimize scrape intervals: Don't scrape too frequently
  2. Use recording rules: Pre-compute expensive queries
  3. Limit cardinality: Avoid labels with many unique values
  4. Configure storage: Use SSD for better performance

Security Best Practices

  1. Use proper SSL certificates: Get certificates from Let's Encrypt
  2. Implement firewall rules:
    sudo ufw allow 9090/tcp  # Prometheus
    sudo ufw allow 9093/tcp  # Alertmanager
    sudo ufw allow 3000/tcp  # Grafana
    sudo ufw allow 9100/tcp  # Node Exporter
  3. Regular updates: Keep all components updated
  4. Secure passwords: Use strong, unique passwords
  5. Network isolation: Use VLANs or VPCs for monitoring infrastructure

9. Troubleshooting Common Issues

Prometheus Not Starting

# Check logs
sudo journalctl -u prometheus -f

# Validate configuration
promtool check config /etc/prometheus/prometheus.yml

# Check permissions
ls -la /etc/prometheus/

Targets Showing as Down

  1. Check firewall rules on target nodes
  2. Verify exporter is running: systemctl status prometheus-node-exporter
  3. Test connectivity: curl http://target:9100/metrics
  4. Check Prometheus logs for scrape errors

Alertmanager Not Sending Emails

  1. Verify SMTP settings
  2. Check Alertmanager logs: journalctl -u prometheus-alertmanager -f
  3. Test email configuration with amtool
  4. Ensure firewall allows outbound SMTP

Grafana Connection Issues

  1. Verify Prometheus URL in data source
  2. Check authentication credentials
  3. Test from Grafana server: curl -u admin:password https://localhost:9090/api/v1/labels
  4. Check certificate configuration

10. Creating Custom Dashboards

Essential Grafana Panels

  1. System Overview:
    • CPU usage gauge
    • Memory usage gauge
    • Disk usage table
    • Network traffic graph
  2. Service Health:
    • Up/Down status table
    • Response time graph
    • Error rate chart
    • Request rate graph
  3. Alert Overview:
    • Active alerts table
    • Alert history graph
    • Alert frequency by severity

Example PromQL Queries

# CPU usage percentage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage percentage
100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})

# Network receive bandwidth
rate(node_network_receive_bytes_total[5m])

# HTTP request rate
rate(http_requests_total[5m])

# Service uptime
up

# 95th percentile response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Conclusion

You now have a comprehensive monitoring stack with Prometheus and Grafana on Ubuntu 24.04. This setup provides:

  • Real-time system and application metrics
  • Proactive alerting for issues
  • Beautiful visualization dashboards
  • Secure access with HTTPS and authentication
  • Scalable architecture for growth

As your infrastructure grows, you can:

  • Add more exporters (MySQL, PostgreSQL, Redis, etc.)
  • Implement service discovery for dynamic environments
  • Create custom exporters for your applications
  • Set up federation for multi-site monitoring
  • Integrate with incident management systems

Remember to regularly review and update your monitoring strategy as your infrastructure evolves. Happy monitoring!

Partager cet article

Twitter LinkedIn

Vous avez un projet similaire ?

Nos experts sont là pour vous accompagner dans vos projets cloud et infrastructure.

Articles similaires

Nathan

Assistant virtuel ECINTELLIGENCE