Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04

Introduction

In today's complex IT infrastructure, monitoring system performance and availability is crucial for maintaining reliability and preventing downtime. Prometheus and Grafana form a powerful open-source monitoring stack that provides real-time metrics collection, alerting, and beautiful visualization dashboards.

This comprehensive guide will walk you through setting up a complete monitoring solution on Ubuntu 24.04, including:

Prometheus server for metrics collection
Node Exporter for system metrics
Blackbox Exporter for endpoint monitoring
Alertmanager for notifications
Grafana for visualization
Security with HTTPS and authentication

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It features:

Multi-dimensional data model with time series data
Flexible query language (PromQL)
Pull-based metrics collection
Built-in alerting rules
Service discovery capabilities
Extensive ecosystem of exporters

Prerequisites

Before starting, ensure you have:

Ubuntu 24.04 LTS server with sudo privileges
Minimum 2GB RAM (4GB recommended)
20GB available disk space
Basic understanding of Linux commands
Network connectivity for package installation

1. Installing Prometheus and Node Exporter

Step 1: Update System Packages

sudo apt update && sudo apt upgrade -y

Step 2: Install Prometheus Components

# Install Prometheus server and Node Exporter
sudo apt -y install prometheus prometheus-node-exporter

# Enable services to start on boot
sudo systemctl enable prometheus prometheus-node-exporter

# Start services
sudo systemctl start prometheus prometheus-node-exporter

Step 3: Verify Installation

# Check service status
sudo systemctl status prometheus
sudo systemctl status prometheus-node-exporter

# Check listening ports
sudo ss -tlnp | grep -E '9090|9100'

Prometheus runs on port 9090, Node Exporter on port 9100.

2. Understanding Prometheus Configuration

The main configuration file is located at /etc/prometheus/prometheus.yml:

# Sample config for Prometheus.
global:
  scrape_interval:     15s # How often to scrape targets
  evaluation_interval: 15s # How often to evaluate rules
  # scrape_timeout is set to the global default (10s)

  external_labels:
      monitor: 'example'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# Scrape configurations
scrape_configs:
  # Prometheus itself
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

3. Securing Prometheus with HTTPS and Authentication

Step 1: Generate SSL Certificate

For production, use a proper SSL certificate. For testing, create a self-signed certificate:

# Generate self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout /etc/prometheus/server.key \
  -out /etc/prometheus/server.crt \
  -subj "/C=US/ST=State/L=City/O=Organization/CN=prometheus.local"

# Set proper permissions
sudo chown prometheus:prometheus /etc/prometheus/server.{crt,key}
sudo chmod 600 /etc/prometheus/server.key

Step 2: Configure Authentication

# Install Apache utilities for password generation
sudo apt -y install apache2-utils

# Generate bcrypt password hash
htpasswd -nB admin
# Enter password when prompted
# Copy the generated hash

Step 3: Create Web Configuration

sudo nano /etc/prometheus/web.yml

Add the following content:

# TLS configuration
tls_server_config:
  cert_file: server.crt
  key_file: server.key

# Basic authentication
basic_auth_users:
  admin: $2y$05$YOURGENERATEDHASHHERE

Step 4: Update Prometheus Service

sudo nano /etc/default/prometheus

Add the web config parameter:

ARGS="--web.config.file=/etc/prometheus/web.yml"

Step 5: Update Prometheus Configuration

sudo nano /etc/prometheus/prometheus.yml

Update the Prometheus job to use HTTPS:

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    scheme: https
    tls_config:
      cert_file: /etc/prometheus/server.crt
      key_file: /etc/prometheus/server.key
      insecure_skip_verify: true  # For self-signed certificates
    basic_auth:
      username: 'admin'
      password: 'your_password_here'
    static_configs:
      - targets: ['localhost:9090']

Restart Prometheus:

sudo systemctl restart prometheus

4. Adding Monitoring Targets

Step 1: Install Node Exporter on Target Nodes

On each node you want to monitor:

# On target node (e.g., node01.example.com)
sudo apt -y install prometheus-node-exporter
sudo systemctl enable --now prometheus-node-exporter

Step 2: Configure Prometheus to Scrape New Targets

sudo nano /etc/prometheus/prometheus.yml

Add targets to the node job:

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100', 'node01.example.com:9100', 'node02.example.com:9100']

  # Or create separate job groups
  - job_name: 'webservers'
    static_configs:
      - targets: ['web01.example.com:9100', 'web02.example.com:9100']
  
  - job_name: 'databases'
    static_configs:
      - targets: ['db01.example.com:9100', 'db02.example.com:9100']

Restart Prometheus:

sudo systemctl restart prometheus

5. Setting Up Alertmanager

Step 1: Install Alertmanager

sudo apt -y install prometheus-alertmanager
sudo systemctl enable prometheus-alertmanager

Step 2: Configure Email Notifications

# Backup original configuration
sudo mv /etc/prometheus/alertmanager.yml /etc/prometheus/alertmanager.yml.bak

# Create new configuration
sudo nano /etc/prometheus/alertmanager.yml

Add email notification configuration:

global:
  # SMTP configuration
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@yourdomain.com'
  smtp_auth_username: 'your-email@gmail.com'
  smtp_auth_password: 'your-app-password'
  smtp_require_tls: true

route:
  receiver: 'email-notifications'
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'admin@yourdomain.com'
    headers:
      Subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'

Step 3: Create Alert Rules

sudo nano /etc/prometheus/alert_rules.yml

Add monitoring rules:

groups:
- name: system_alerts
  rules:
  # Instance down alert
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  # High CPU usage
  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% (current value: {{ $value }}%)"

  # High memory usage
  - alert: HighMemoryUsage
    expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 85% (current value: {{ $value }}%)"

  # Disk space low
  - alert: DiskSpaceLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Low disk space on {{ $labels.instance }}"
      description: "Disk space on root partition is below 15% (current value: {{ $value }}%)"

Step 4: Update Prometheus Configuration

sudo nano /etc/prometheus/prometheus.yml

Add the alert rules file:

rule_files:
  - "alert_rules.yml"

Restart services:

sudo systemctl restart prometheus prometheus-alertmanager

6. Installing and Configuring Grafana

Step 1: Install Grafana

# Install required packages
sudo apt-get install -y software-properties-common

# Add Grafana GPG key
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

# Add Grafana repository
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Update and install Grafana
sudo apt update
sudo apt install -y grafana

# Enable and start Grafana
sudo systemctl enable --now grafana-server

Step 2: Configure Grafana

Grafana runs on port 3000. Access it at http://your-server-ip:3000

Default credentials:

Username: admin
Password: admin (you'll be prompted to change it)

Step 3: Add Prometheus Data Source

Log into Grafana
Navigate to Configuration → Data Sources
Click "Add data source"
Select "Prometheus"
Configure:
- Name: Prometheus
- URL: https://localhost:9090 (if using HTTPS)
- Auth: Toggle on "Basic auth"
- User: admin
- Password: your-prometheus-password
- TLS Settings: Toggle "Skip TLS Verify" for self-signed certificates
Click "Save & Test"

Step 4: Import Dashboards

Import pre-built dashboards:

Go to Dashboards → Import
Popular dashboard IDs:
- Node Exporter Full: 1860
- Node Exporter Server Metrics: 405
- Prometheus Stats: 2
Enter the ID and click "Load"
Select your Prometheus data source
Click "Import"

7. Blackbox Exporter for Endpoint Monitoring

Step 1: Install Blackbox Exporter

# On monitoring node
sudo apt -y install prometheus-blackbox-exporter
sudo systemctl enable --now prometheus-blackbox-exporter

Step 2: Configure Blackbox Exporter

The default configuration at /etc/prometheus/blackbox.yml includes modules for:

HTTP/HTTPS monitoring
TCP port checks
ICMP ping
DNS queries

Step 3: Add Blackbox Jobs to Prometheus

sudo nano /etc/prometheus/prometheus.yml

Add monitoring jobs:

scrape_configs:
  # HTTP/HTTPS monitoring
  - job_name: 'blackbox_http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://example.com
        - https://app.example.com
        - http://internal-app.local
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  # TCP port monitoring
  - job_name: 'blackbox_tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
        - db.example.com:3306      # MySQL
        - cache.example.com:6379   # Redis
        - mq.example.com:5672      # RabbitMQ
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  # ICMP ping monitoring
  - job_name: 'blackbox_icmp'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets:
        - gateway.example.com
        - dns1.example.com
        - critical-server.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Restart Prometheus:

sudo systemctl restart prometheus

8. Best Practices and Optimization

Storage Optimization

# Configure retention in prometheus.yml
sudo nano /etc/prometheus/prometheus.yml

Add to global section:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  # Retention configuration
  retention_time: 30d  # Keep data for 30 days

Performance Tuning

Optimize scrape intervals: Don't scrape too frequently
Use recording rules: Pre-compute expensive queries
Limit cardinality: Avoid labels with many unique values
Configure storage: Use SSD for better performance

Security Best Practices

Use proper SSL certificates: Get certificates from Let's Encrypt

Implement firewall rules:

sudo ufw allow 9090/tcp  # Prometheus
sudo ufw allow 9093/tcp  # Alertmanager
sudo ufw allow 3000/tcp  # Grafana
sudo ufw allow 9100/tcp  # Node Exporter

Regular updates: Keep all components updated
Secure passwords: Use strong, unique passwords
Network isolation: Use VLANs or VPCs for monitoring infrastructure

9. Troubleshooting Common Issues

Prometheus Not Starting

# Check logs
sudo journalctl -u prometheus -f

# Validate configuration
promtool check config /etc/prometheus/prometheus.yml

# Check permissions
ls -la /etc/prometheus/

Targets Showing as Down

Check firewall rules on target nodes
Verify exporter is running: systemctl status prometheus-node-exporter
Test connectivity: curl http://target:9100/metrics
Check Prometheus logs for scrape errors

Alertmanager Not Sending Emails

Verify SMTP settings
Check Alertmanager logs: journalctl -u prometheus-alertmanager -f
Test email configuration with amtool
Ensure firewall allows outbound SMTP

Grafana Connection Issues

Verify Prometheus URL in data source
Check authentication credentials
Test from Grafana server: curl -u admin:password https://localhost:9090/api/v1/labels
Check certificate configuration

10. Creating Custom Dashboards

Essential Grafana Panels

System Overview:
- CPU usage gauge
- Memory usage gauge
- Disk usage table
- Network traffic graph
Service Health:
- Up/Down status table
- Response time graph
- Error rate chart
- Request rate graph
Alert Overview:
- Active alerts table
- Alert history graph
- Alert frequency by severity

Example PromQL Queries

# CPU usage percentage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage percentage
100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})

# Network receive bandwidth
rate(node_network_receive_bytes_total[5m])

# HTTP request rate
rate(http_requests_total[5m])

# Service uptime
up

# 95th percentile response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Conclusion

You now have a comprehensive monitoring stack with Prometheus and Grafana on Ubuntu 24.04. This setup provides:

Real-time system and application metrics
Proactive alerting for issues
Beautiful visualization dashboards
Secure access with HTTPS and authentication
Scalable architecture for growth

As your infrastructure grows, you can:

Add more exporters (MySQL, PostgreSQL, Redis, etc.)
Implement service discovery for dynamic environments
Create custom exporters for your applications
Set up federation for multi-site monitoring
Integrate with incident management systems

Remember to regularly review and update your monitoring strategy as your infrastructure evolves. Happy monitoring!

Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04

Introduction

What is Prometheus?

Prerequisites

1. Installing Prometheus and Node Exporter

Step 1: Update System Packages

Step 2: Install Prometheus Components

Step 3: Verify Installation

2. Understanding Prometheus Configuration

3. Securing Prometheus with HTTPS and Authentication

Step 1: Generate SSL Certificate

Step 2: Configure Authentication

Step 3: Create Web Configuration

Step 4: Update Prometheus Service

Step 5: Update Prometheus Configuration

4. Adding Monitoring Targets

Step 1: Install Node Exporter on Target Nodes

Step 2: Configure Prometheus to Scrape New Targets

5. Setting Up Alertmanager

Step 1: Install Alertmanager

Step 2: Configure Email Notifications

Step 3: Create Alert Rules

Step 4: Update Prometheus Configuration

6. Installing and Configuring Grafana

Step 1: Install Grafana

Step 2: Configure Grafana

Step 3: Add Prometheus Data Source

Step 4: Import Dashboards

7. Blackbox Exporter for Endpoint Monitoring

Step 1: Install Blackbox Exporter

Step 2: Configure Blackbox Exporter

Step 3: Add Blackbox Jobs to Prometheus

8. Best Practices and Optimization

Storage Optimization

Performance Tuning

Security Best Practices

9. Troubleshooting Common Issues

Prometheus Not Starting

Targets Showing as Down

Alertmanager Not Sending Emails

Grafana Connection Issues

10. Creating Custom Dashboards

Essential Grafana Panels

Example PromQL Queries

Conclusion

Partager cet article

Vous avez un projet similaire ?

Articles similaires

Nathan