Complete Guide to Prometheus and Grafana Monitoring Stack on Ubuntu 24.04
Introduction
In today's complex IT infrastructure, monitoring system performance and availability is crucial for maintaining reliability and preventing downtime. Prometheus and Grafana form a powerful open-source monitoring stack that provides real-time metrics collection, alerting, and beautiful visualization dashboards.
This comprehensive guide will walk you through setting up a complete monitoring solution on Ubuntu 24.04, including:
- Prometheus server for metrics collection
- Node Exporter for system metrics
- Blackbox Exporter for endpoint monitoring
- Alertmanager for notifications
- Grafana for visualization
- Security with HTTPS and authentication
What is Prometheus?
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It features:
- Multi-dimensional data model with time series data
- Flexible query language (PromQL)
- Pull-based metrics collection
- Built-in alerting rules
- Service discovery capabilities
- Extensive ecosystem of exporters
Prerequisites
Before starting, ensure you have:
- Ubuntu 24.04 LTS server with sudo privileges
- Minimum 2GB RAM (4GB recommended)
- 20GB available disk space
- Basic understanding of Linux commands
- Network connectivity for package installation
1. Installing Prometheus and Node Exporter
Step 1: Update System Packages
sudo apt update && sudo apt upgrade -y
Step 2: Install Prometheus Components
# Install Prometheus server and Node Exporter
sudo apt -y install prometheus prometheus-node-exporter
# Enable services to start on boot
sudo systemctl enable prometheus prometheus-node-exporter
# Start services
sudo systemctl start prometheus prometheus-node-exporter
Step 3: Verify Installation
# Check service status
sudo systemctl status prometheus
sudo systemctl status prometheus-node-exporter
# Check listening ports
sudo ss -tlnp | grep -E '9090|9100'
Prometheus runs on port 9090, Node Exporter on port 9100.
2. Understanding Prometheus Configuration
The main configuration file is located at /etc/prometheus/prometheus.yml
:
# Sample config for Prometheus.
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate rules
# scrape_timeout is set to the global default (10s)
external_labels:
monitor: 'example'
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# Load rules once and periodically evaluate them
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# Scrape configurations
scrape_configs:
# Prometheus itself
- job_name: 'prometheus'
scrape_interval: 5s
scrape_timeout: 5s
static_configs:
- targets: ['localhost:9090']
# Node Exporter
- job_name: node
static_configs:
- targets: ['localhost:9100']
3. Securing Prometheus with HTTPS and Authentication
Step 1: Generate SSL Certificate
For production, use a proper SSL certificate. For testing, create a self-signed certificate:
# Generate self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/prometheus/server.key \
-out /etc/prometheus/server.crt \
-subj "/C=US/ST=State/L=City/O=Organization/CN=prometheus.local"
# Set proper permissions
sudo chown prometheus:prometheus /etc/prometheus/server.{crt,key}
sudo chmod 600 /etc/prometheus/server.key
Step 2: Configure Authentication
# Install Apache utilities for password generation
sudo apt -y install apache2-utils
# Generate bcrypt password hash
htpasswd -nB admin
# Enter password when prompted
# Copy the generated hash
Step 3: Create Web Configuration
sudo nano /etc/prometheus/web.yml
Add the following content:
# TLS configuration
tls_server_config:
cert_file: server.crt
key_file: server.key
# Basic authentication
basic_auth_users:
admin: $2y$05$YOURGENERATEDHASHHERE
Step 4: Update Prometheus Service
sudo nano /etc/default/prometheus
Add the web config parameter:
ARGS="--web.config.file=/etc/prometheus/web.yml"
Step 5: Update Prometheus Configuration
sudo nano /etc/prometheus/prometheus.yml
Update the Prometheus job to use HTTPS:
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
scrape_timeout: 5s
scheme: https
tls_config:
cert_file: /etc/prometheus/server.crt
key_file: /etc/prometheus/server.key
insecure_skip_verify: true # For self-signed certificates
basic_auth:
username: 'admin'
password: 'your_password_here'
static_configs:
- targets: ['localhost:9090']
Restart Prometheus:
sudo systemctl restart prometheus
4. Adding Monitoring Targets
Step 1: Install Node Exporter on Target Nodes
On each node you want to monitor:
# On target node (e.g., node01.example.com)
sudo apt -y install prometheus-node-exporter
sudo systemctl enable --now prometheus-node-exporter
Step 2: Configure Prometheus to Scrape New Targets
sudo nano /etc/prometheus/prometheus.yml
Add targets to the node job:
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100', 'node01.example.com:9100', 'node02.example.com:9100']
# Or create separate job groups
- job_name: 'webservers'
static_configs:
- targets: ['web01.example.com:9100', 'web02.example.com:9100']
- job_name: 'databases'
static_configs:
- targets: ['db01.example.com:9100', 'db02.example.com:9100']
Restart Prometheus:
sudo systemctl restart prometheus
5. Setting Up Alertmanager
Step 1: Install Alertmanager
sudo apt -y install prometheus-alertmanager
sudo systemctl enable prometheus-alertmanager
Step 2: Configure Email Notifications
# Backup original configuration
sudo mv /etc/prometheus/alertmanager.yml /etc/prometheus/alertmanager.yml.bak
# Create new configuration
sudo nano /etc/prometheus/alertmanager.yml
Add email notification configuration:
global:
# SMTP configuration
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@yourdomain.com'
smtp_auth_username: 'your-email@gmail.com'
smtp_auth_password: 'your-app-password'
smtp_require_tls: true
route:
receiver: 'email-notifications'
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@yourdomain.com'
headers:
Subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}'
Step 3: Create Alert Rules
sudo nano /etc/prometheus/alert_rules.yml
Add monitoring rules:
groups:
- name: system_alerts
rules:
# Instance down alert
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
# High CPU usage
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% (current value: {{ $value }}%)"
# High memory usage
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 10m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% (current value: {{ $value }}%)"
# Disk space low
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk space on root partition is below 15% (current value: {{ $value }}%)"
Step 4: Update Prometheus Configuration
sudo nano /etc/prometheus/prometheus.yml
Add the alert rules file:
rule_files:
- "alert_rules.yml"
Restart services:
sudo systemctl restart prometheus prometheus-alertmanager
6. Installing and Configuring Grafana
Step 1: Install Grafana
# Install required packages
sudo apt-get install -y software-properties-common
# Add Grafana GPG key
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# Add Grafana repository
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
# Update and install Grafana
sudo apt update
sudo apt install -y grafana
# Enable and start Grafana
sudo systemctl enable --now grafana-server
Step 2: Configure Grafana
Grafana runs on port 3000. Access it at http://your-server-ip:3000
Default credentials:
- Username: admin
- Password: admin (you'll be prompted to change it)
Step 3: Add Prometheus Data Source
- Log into Grafana
- Navigate to Configuration → Data Sources
- Click "Add data source"
- Select "Prometheus"
- Configure:
- Name: Prometheus
- URL: https://localhost:9090 (if using HTTPS)
- Auth: Toggle on "Basic auth"
- User: admin
- Password: your-prometheus-password
- TLS Settings: Toggle "Skip TLS Verify" for self-signed certificates
- Click "Save & Test"
Step 4: Import Dashboards
Import pre-built dashboards:
- Go to Dashboards → Import
- Popular dashboard IDs:
- Node Exporter Full: 1860
- Node Exporter Server Metrics: 405
- Prometheus Stats: 2
- Enter the ID and click "Load"
- Select your Prometheus data source
- Click "Import"
7. Blackbox Exporter for Endpoint Monitoring
Step 1: Install Blackbox Exporter
# On monitoring node
sudo apt -y install prometheus-blackbox-exporter
sudo systemctl enable --now prometheus-blackbox-exporter
Step 2: Configure Blackbox Exporter
The default configuration at /etc/prometheus/blackbox.yml
includes modules for:
- HTTP/HTTPS monitoring
- TCP port checks
- ICMP ping
- DNS queries
Step 3: Add Blackbox Jobs to Prometheus
sudo nano /etc/prometheus/prometheus.yml
Add monitoring jobs:
scrape_configs:
# HTTP/HTTPS monitoring
- job_name: 'blackbox_http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://app.example.com
- http://internal-app.local
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
# TCP port monitoring
- job_name: 'blackbox_tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- db.example.com:3306 # MySQL
- cache.example.com:6379 # Redis
- mq.example.com:5672 # RabbitMQ
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
# ICMP ping monitoring
- job_name: 'blackbox_icmp'
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- gateway.example.com
- dns1.example.com
- critical-server.example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
Restart Prometheus:
sudo systemctl restart prometheus
8. Best Practices and Optimization
Storage Optimization
# Configure retention in prometheus.yml
sudo nano /etc/prometheus/prometheus.yml
Add to global section:
global:
scrape_interval: 15s
evaluation_interval: 15s
# Retention configuration
retention_time: 30d # Keep data for 30 days
Performance Tuning
- Optimize scrape intervals: Don't scrape too frequently
- Use recording rules: Pre-compute expensive queries
- Limit cardinality: Avoid labels with many unique values
- Configure storage: Use SSD for better performance
Security Best Practices
- Use proper SSL certificates: Get certificates from Let's Encrypt
- Implement firewall rules:
sudo ufw allow 9090/tcp # Prometheus sudo ufw allow 9093/tcp # Alertmanager sudo ufw allow 3000/tcp # Grafana sudo ufw allow 9100/tcp # Node Exporter
- Regular updates: Keep all components updated
- Secure passwords: Use strong, unique passwords
- Network isolation: Use VLANs or VPCs for monitoring infrastructure
9. Troubleshooting Common Issues
Prometheus Not Starting
# Check logs
sudo journalctl -u prometheus -f
# Validate configuration
promtool check config /etc/prometheus/prometheus.yml
# Check permissions
ls -la /etc/prometheus/
Targets Showing as Down
- Check firewall rules on target nodes
- Verify exporter is running:
systemctl status prometheus-node-exporter
- Test connectivity:
curl http://target:9100/metrics
- Check Prometheus logs for scrape errors
Alertmanager Not Sending Emails
- Verify SMTP settings
- Check Alertmanager logs:
journalctl -u prometheus-alertmanager -f
- Test email configuration with
amtool
- Ensure firewall allows outbound SMTP
Grafana Connection Issues
- Verify Prometheus URL in data source
- Check authentication credentials
- Test from Grafana server:
curl -u admin:password https://localhost:9090/api/v1/labels
- Check certificate configuration
10. Creating Custom Dashboards
Essential Grafana Panels
- System Overview:
- CPU usage gauge
- Memory usage gauge
- Disk usage table
- Network traffic graph
- Service Health:
- Up/Down status table
- Response time graph
- Error rate chart
- Request rate graph
- Alert Overview:
- Active alerts table
- Alert history graph
- Alert frequency by severity
Example PromQL Queries
# CPU usage percentage
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk usage percentage
100 - ((node_filesystem_avail_bytes{mountpoint="/"} * 100) / node_filesystem_size_bytes{mountpoint="/"})
# Network receive bandwidth
rate(node_network_receive_bytes_total[5m])
# HTTP request rate
rate(http_requests_total[5m])
# Service uptime
up
# 95th percentile response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Conclusion
You now have a comprehensive monitoring stack with Prometheus and Grafana on Ubuntu 24.04. This setup provides:
- Real-time system and application metrics
- Proactive alerting for issues
- Beautiful visualization dashboards
- Secure access with HTTPS and authentication
- Scalable architecture for growth
As your infrastructure grows, you can:
- Add more exporters (MySQL, PostgreSQL, Redis, etc.)
- Implement service discovery for dynamic environments
- Create custom exporters for your applications
- Set up federation for multi-site monitoring
- Integrate with incident management systems
Remember to regularly review and update your monitoring strategy as your infrastructure evolves. Happy monitoring!