Homelab Automation with Ansible: From Zero to Fully Automated in 2026
Homelab Automation with Ansible: From Zero to Fully Automated in 2026
There is a moment in every homelabber’s journey where SSH-ing into six different machines to run the same three commands stops feeling productive and starts feeling like punishment. You tell yourself “I’ll document the process,” so you write a wiki page. Then the wiki page gets outdated. Then you forget which server has which version of Docker. Then you rebuild a machine and spend four hours recreating the setup from memory because your wiki page assumed you would remember what “configure the thing” meant three months later.
Ansible fixes this. Not by adding another tool to manage, but by turning your infrastructure knowledge into code that is executable, version-controlled, and repeatable. Instead of a wiki page that says “install Docker, configure the daemon, set up the compose stack,” you have a playbook that does it. Instead of remembering which servers need updates, you run one command and all of them update simultaneously.
This guide takes you from zero Ansible knowledge to a fully automated homelab. We will cover installation, inventory setup, your first playbook, Docker deployment automation, system updates, backup automation, monitoring setup, and the patterns that keep your Ansible code maintainable as your homelab grows. Every example is a complete, working playbook you can copy and adapt.
Table of Contents
- TL;DR
- Why Ansible for Homelabs
- Installing Ansible
- Core Concepts: The Five Things You Need to Know
- Setting Up Your Inventory
- Your First Playbook: System Setup
- Automating Docker Installation
- Deploying Docker Compose Stacks with Ansible
- Managing System Updates Across All Servers
- Backup Automation
- User Management and SSH Hardening
- Monitoring and Alerting Setup
- Organizing Your Ansible Project: Roles and Directory Structure
- Secrets Management with Ansible Vault
- Common Mistakes and How to Avoid Them
- Final Thoughts
TL;DR
- Ansible is agentless — it connects to your servers via SSH and runs commands. No software to install on target machines.
- Install Ansible on one control machine (your laptop or a dedicated management server). Use
pipx install ansiblefor the cleanest setup. - Define your servers in an inventory file. Group them by role (docker hosts, proxmox nodes, NAS, etc.).
- Write playbooks (YAML files) that describe what you want each server to look like. Ansible makes it so.
- Ansible is idempotent — running a playbook twice produces the same result. It only changes what needs changing.
- Start small: automate Docker installation and system updates. Add more as you get comfortable.
- Use Ansible Vault for secrets (passwords, API keys) so you can safely commit your playbooks to Git.
Why Ansible for Homelabs
Ansible vs. Shell Scripts
You might wonder why you need Ansible when Bash scripts work fine. Here is the difference:
| Aspect | Shell Scripts | Ansible |
|---|---|---|
| Idempotency | You build it yourself (if-then-else for every step) | Built in — modules know how to check state first |
| Multi-host | Manual loops with SSH, error handling is painful | Native — runs on multiple hosts in parallel |
| Error handling | Stops at first error (or worse, continues silently) | Structured error handling, retry, rescue blocks |
| OS differences | Different commands for apt vs dnf vs pacman | Modules abstract package managers |
| Readability | Readable only to the person who wrote it | YAML is readable to anyone |
| Secrets | Hardcoded or in env vars | Ansible Vault with encryption |
| Dry run | Hope and prayer | --check mode shows what would change |
| State tracking | None — runs every command every time | Only applies changes when state differs |
A Bash script that installs Docker, adds a user to the docker group, configures the daemon, and deploys a compose stack is 50-80 lines of fragile, OS-specific code with no error handling. The equivalent Ansible playbook is declarative, handles errors, works across distributions, and only makes changes when something is actually different.
Ansible vs. Other Tools
Terraform is for provisioning infrastructure (VMs, cloud resources). Ansible is for configuring what runs on that infrastructure. They complement each other — use Terraform to create VMs on Proxmox, use Ansible to configure them.
Puppet/Chef require agents running on every managed node. Ansible is agentless — it uses SSH. For a homelab with 3-20 machines, the agentless model is simpler.
NixOS is a different paradigm entirely (declarative OS configuration). If you are already running NixOS, you probably do not need Ansible. If you are running Debian/Ubuntu/Fedora, Ansible is the practical choice.
Installing Ansible
Ansible runs on your control node — the machine you manage your homelab from. This is typically your laptop or a dedicated management server. You do not install Ansible on the servers being managed.
The Recommended Way: pipx
# Install pipx if you don't have it
sudo apt install pipx # Debian/Ubuntu
# or
brew install pipx # macOS
# Install Ansible in an isolated environment
pipx install ansible
# Verify installation
ansible --version
Using pipx keeps Ansible and its dependencies isolated from your system Python. This avoids the dependency conflicts that plague pip install ansible.
Alternative: System Package Manager
# Debian/Ubuntu
sudo apt update
sudo apt install ansible
# Fedora
sudo dnf install ansible
# Arch
sudo pacman -S ansible
System packages are often a version or two behind. For a homelab, this rarely matters.
Verify SSH Access
Ansible connects to your servers via SSH. Before writing any playbooks, make sure you can SSH into every server without a password:
# Generate an SSH key if you don't have one
ssh-keygen -t ed25519 -C "ansible@homelab"
# Copy your key to each server
ssh-copy-id user@server1.local
ssh-copy-id user@server2.local
ssh-copy-id user@server3.local
# Test connectivity
ssh user@server1.local "hostname"
If that works, Ansible will work.
Core Concepts: The Five Things You Need to Know
1. Inventory
The inventory is a file that lists your servers and groups them. It tells Ansible what machines exist and how to connect to them.
2. Playbooks
Playbooks are YAML files that describe the desired state of your servers. They contain one or more “plays,” each targeting a group of hosts and specifying tasks to run.
3. Tasks
Tasks are individual actions: install a package, copy a file, start a service, run a command. Each task uses a module that knows how to perform the action idempotently.
4. Modules
Modules are the building blocks. apt installs packages on Debian. copy copies files. docker_compose_v2 manages Docker Compose stacks. There are thousands of modules for every common operation.
5. Roles
Roles are reusable bundles of tasks, files, templates, and variables. They let you organize your playbooks into logical units (“docker” role, “monitoring” role, “backup” role) that you can apply to different servers.
Setting Up Your Inventory
Create a directory for your Ansible project:
mkdir -p ~/homelab-ansible
cd ~/homelab-ansible
Basic Inventory File
Create inventory/hosts.yml:
all:
children:
docker_hosts:
hosts:
docker01:
ansible_host: 192.168.1.10
docker02:
ansible_host: 192.168.1.11
docker03:
ansible_host: 192.168.1.12
proxmox_nodes:
hosts:
pve01:
ansible_host: 192.168.1.2
pve02:
ansible_host: 192.168.1.3
nas:
hosts:
truenas:
ansible_host: 192.168.1.5
vars:
ansible_user: jerry
ansible_become: true
ansible_become_method: sudo
ansible_python_interpreter: /usr/bin/python3
This inventory defines three groups: docker_hosts, proxmox_nodes, and nas. The vars section under all applies default connection settings to every host.
Ansible Configuration File
Create ansible.cfg in your project root:
[defaults]
inventory = inventory/hosts.yml
remote_user = jerry
host_key_checking = false
retry_files_enabled = false
stdout_callback = yaml
callbacks_enabled = timer, profile_tasks
[privilege_escalation]
become = true
become_method = sudo
become_ask_pass = false
[ssh_connection]
pipelining = true
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
Key settings:
pipelining = truesignificantly speeds up Ansible by reducing the number of SSH connections.ControlPersist=60skeeps SSH connections open for reuse.stdout_callback = yamlmakes output much more readable.profile_tasksshows how long each task takes, useful for optimization.
Test Your Inventory
# Ping all hosts
ansible all -m ping
# Ping just docker hosts
ansible docker_hosts -m ping
# Get facts from a specific host
ansible docker01 -m setup
If ping returns “pong” for all hosts, your inventory and SSH access are working.
Your First Playbook: System Setup
Create playbooks/base-setup.yml. This playbook configures every server with your baseline: timezone, locale, essential packages, and basic security settings.
---
- name: Base system setup for all homelab servers
hosts: all
become: true
vars:
timezone: "America/New_York"
base_packages:
- curl
- wget
- git
- htop
- tmux
- vim
- unzip
- jq
- net-tools
- dnsutils
- ncdu
- tree
- rsync
- fail2ban
- ufw
tasks:
- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Upgrade all packages
apt:
upgrade: safe
when: ansible_os_family == "Debian"
- name: Install base packages
apt:
name: "{{ base_packages }}"
state: present
when: ansible_os_family == "Debian"
- name: Set timezone
timezone:
name: "{{ timezone }}"
- name: Set hostname
hostname:
name: "{{ inventory_hostname }}"
- name: Ensure /etc/hosts has the hostname
lineinfile:
path: /etc/hosts
regexp: '^127\.0\.1\.1'
line: "127.0.1.1 {{ inventory_hostname }}"
- name: Configure fail2ban
copy:
dest: /etc/fail2ban/jail.local
content: |
[DEFAULT]
bantime = 1h
findtime = 10m
maxretry = 5
backend = systemd
[sshd]
enabled = true
port = ssh
filter = sshd
maxretry = 3
owner: root
group: root
mode: "0644"
notify: Restart fail2ban
- name: Enable and start fail2ban
systemd:
name: fail2ban
state: started
enabled: true
- name: Configure UFW defaults
ufw:
direction: "{{ item.direction }}"
policy: "{{ item.policy }}"
loop:
- { direction: incoming, policy: deny }
- { direction: outgoing, policy: allow }
- name: Allow SSH through UFW
ufw:
rule: allow
port: "22"
proto: tcp
- name: Enable UFW
ufw:
state: enabled
- name: Configure sysctl for security
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
sysctl_file: /etc/sysctl.d/99-homelab.conf
reload: true
loop:
- { key: "net.ipv4.conf.all.rp_filter", value: "1" }
- { key: "net.ipv4.conf.default.rp_filter", value: "1" }
- { key: "net.ipv4.icmp_echo_ignore_broadcasts", value: "1" }
- { key: "net.ipv4.conf.all.accept_redirects", value: "0" }
- { key: "net.ipv4.conf.default.accept_redirects", value: "0" }
- { key: "net.ipv4.conf.all.send_redirects", value: "0" }
- { key: "net.ipv4.conf.default.send_redirects", value: "0" }
- name: Set up automatic security updates
apt:
name:
- unattended-upgrades
- apt-listchanges
state: present
when: ansible_os_family == "Debian"
- name: Enable automatic security updates
copy:
dest: /etc/apt/apt.conf.d/20auto-upgrades
content: |
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";
owner: root
group: root
mode: "0644"
when: ansible_os_family == "Debian"
handlers:
- name: Restart fail2ban
systemd:
name: fail2ban
state: restarted
Run it:
# Dry run first (shows what would change without making changes)
ansible-playbook playbooks/base-setup.yml --check --diff
# Apply for real
ansible-playbook playbooks/base-setup.yml
# Apply to only one host
ansible-playbook playbooks/base-setup.yml --limit docker01
The --check --diff flag is your best friend. Always dry-run playbooks before applying them to production servers.
Automating Docker Installation
Create playbooks/docker-install.yml:
---
- name: Install and configure Docker on all Docker hosts
hosts: docker_hosts
become: true
vars:
docker_users:
- jerry
docker_compose_version: "2.29.1"
docker_daemon_config:
log-driver: json-file
log-opts:
max-size: "10m"
max-file: "3"
storage-driver: overlay2
default-address-pools:
- base: "172.17.0.0/12"
size: 24
live-restore: true
userland-proxy: false
tasks:
- name: Remove old Docker packages
apt:
name:
- docker
- docker-engine
- docker.io
- containerd
- runc
state: absent
- name: Install prerequisites
apt:
name:
- ca-certificates
- curl
- gnupg
- lsb-release
state: present
update_cache: true
- name: Create keyrings directory
file:
path: /etc/apt/keyrings
state: directory
mode: "0755"
- name: Add Docker GPG key
apt_key:
url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
keyring: /etc/apt/keyrings/docker.gpg
state: present
- name: Add Docker repository
apt_repository:
repo: >-
deb [arch={{ ansible_architecture | replace('x86_64', 'amd64') }}
signed-by=/etc/apt/keyrings/docker.gpg]
https://download.docker.com/linux/{{ ansible_distribution | lower }}
{{ ansible_distribution_release }} stable
filename: docker
state: present
- name: Install Docker Engine
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-buildx-plugin
- docker-compose-plugin
state: present
update_cache: true
- name: Configure Docker daemon
copy:
content: "{{ docker_daemon_config | to_nice_json }}"
dest: /etc/docker/daemon.json
owner: root
group: root
mode: "0644"
notify: Restart Docker
- name: Add users to docker group
user:
name: "{{ item }}"
groups: docker
append: true
loop: "{{ docker_users }}"
- name: Create Docker network for reverse proxy
docker_network:
name: proxy
driver: bridge
- name: Create Docker network for monitoring
docker_network:
name: monitoring
driver: bridge
- name: Enable and start Docker
systemd:
name: docker
state: started
enabled: true
- name: Enable containerd
systemd:
name: containerd
state: started
enabled: true
- name: Verify Docker is running
command: docker info
changed_when: false
register: docker_info
- name: Display Docker version
debug:
msg: "Docker {{ docker_info.stdout_lines[1] | trim }} is running on {{ inventory_hostname }}"
- name: Set up Docker system prune cron job
cron:
name: "Docker system prune"
minute: "0"
hour: "3"
weekday: "0"
job: "docker system prune -af --volumes --filter 'until=168h' > /dev/null 2>&1"
user: root
handlers:
- name: Restart Docker
systemd:
name: docker
state: restarted
This playbook:
- Removes any old Docker installation
- Installs Docker CE from the official repository
- Configures the Docker daemon with sane defaults (log rotation, overlay2 storage, live restore)
- Adds your user to the docker group
- Creates shared Docker networks for reverse proxy and monitoring
- Sets up a weekly cron job to clean unused images and volumes
Deploying Docker Compose Stacks with Ansible
This is where Ansible really shines for homelabs. Instead of SSH-ing into each server to manage Docker Compose stacks, you define them in Ansible and deploy everywhere with one command.
Project Structure for Docker Stacks
~/homelab-ansible/
├── ansible.cfg
├── inventory/
│ └── hosts.yml
├── playbooks/
│ ├── base-setup.yml
│ ├── docker-install.yml
│ └── deploy-stacks.yml
├── stacks/
│ ├── traefik/
│ │ ├── docker-compose.yml
│ │ └── traefik.yml
│ ├── monitoring/
│ │ ├── docker-compose.yml
│ │ └── prometheus.yml
│ ├── gitea/
│ │ └── docker-compose.yml
│ └── media/
│ └── docker-compose.yml
└── group_vars/
└── docker_hosts.yml
The Stack Deployment Playbook
Create playbooks/deploy-stacks.yml:
---
- name: Deploy Docker Compose stacks
hosts: docker_hosts
become: true
vars:
stack_base_dir: /opt/stacks
stacks_to_deploy: "{{ docker_stacks | default([]) }}"
tasks:
- name: Create base directory for stacks
file:
path: "{{ stack_base_dir }}"
state: directory
owner: jerry
group: docker
mode: "0755"
- name: Create directory for each stack
file:
path: "{{ stack_base_dir }}/{{ item }}"
state: directory
owner: jerry
group: docker
mode: "0755"
loop: "{{ stacks_to_deploy }}"
- name: Copy Docker Compose files
copy:
src: "../stacks/{{ item }}/"
dest: "{{ stack_base_dir }}/{{ item }}/"
owner: jerry
group: docker
mode: "0644"
loop: "{{ stacks_to_deploy }}"
register: compose_files
- name: Deploy Docker Compose stacks
community.docker.docker_compose_v2:
project_src: "{{ stack_base_dir }}/{{ item.item }}"
state: present
pull: always
remove_orphans: true
loop: "{{ compose_files.results }}"
when: item.changed
register: deploy_result
- name: Show deployment results
debug:
msg: "Stack {{ item.item.item }} deployed successfully"
loop: "{{ deploy_result.results }}"
when: item is not skipped
Host-Specific Stack Assignments
Define which stacks run on which hosts in group_vars/docker_hosts.yml or in host-specific variable files.
Create inventory/host_vars/docker01.yml:
docker_stacks:
- traefik
- monitoring
- gitea
Create inventory/host_vars/docker02.yml:
docker_stacks:
- media
- nextcloud
Now run:
# Deploy all stacks to all hosts
ansible-playbook playbooks/deploy-stacks.yml
# Deploy to just one host
ansible-playbook playbooks/deploy-stacks.yml --limit docker01
# Deploy a specific stack (using tags, if you add them)
ansible-playbook playbooks/deploy-stacks.yml --limit docker01 -e "stacks_to_deploy=['traefik']"
Example: Traefik Reverse Proxy Stack
Create stacks/traefik/docker-compose.yml:
services:
traefik:
image: traefik:v3.2
container_name: traefik
restart: unless-stopped
security_opt:
- no-new-privileges:true
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik.yml:/etc/traefik/traefik.yml:ro
- traefik-certs:/letsencrypt
networks:
- proxy
labels:
- "traefik.enable=true"
- "traefik.http.routers.dashboard.rule=Host(`traefik.example.com`)"
- "traefik.http.routers.dashboard.service=api@internal"
- "traefik.http.routers.dashboard.middlewares=auth"
- "traefik.http.middlewares.auth.basicauth.users=admin:$$apr1$$xyz..."
networks:
proxy:
external: true
volumes:
traefik-certs:
Create stacks/traefik/traefik.yml:
api:
dashboard: true
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
certificatesResolvers:
letsencrypt:
acme:
email: you@example.com
storage: /letsencrypt/acme.json
httpChallenge:
entryPoint: web
providers:
docker:
endpoint: "unix:///var/run/docker.sock"
exposedByDefault: false
network: proxy
Example: Monitoring Stack
Create stacks/monitoring/docker-compose.yml:
services:
prometheus:
image: prom/prometheus:v2.53.0
container_name: prometheus
restart: unless-stopped
user: "1000:1000"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.retention.time=30d"
- "--storage.tsdb.path=/prometheus"
networks:
- monitoring
- proxy
labels:
- "traefik.enable=true"
- "traefik.http.routers.prometheus.rule=Host(`prometheus.example.com`)"
- "traefik.http.routers.prometheus.entrypoints=websecure"
- "traefik.http.routers.prometheus.tls.certresolver=letsencrypt"
grafana:
image: grafana/grafana:11.1.0
container_name: grafana
restart: unless-stopped
user: "1000:1000"
environment:
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
- GF_INSTALL_PLUGINS=grafana-clock-panel
volumes:
- grafana-data:/var/lib/grafana
secrets:
- grafana_admin_password
networks:
- monitoring
- proxy
labels:
- "traefik.enable=true"
- "traefik.http.routers.grafana.rule=Host(`grafana.example.com`)"
- "traefik.http.routers.grafana.entrypoints=websecure"
- "traefik.http.routers.grafana.tls.certresolver=letsencrypt"
node-exporter:
image: prom/node-exporter:v1.8.1
container_name: node-exporter
restart: unless-stopped
pid: host
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/rootfs"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
container_name: cadvisor
restart: unless-stopped
privileged: true
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- monitoring
secrets:
grafana_admin_password:
file: ./secrets/grafana_admin_password.txt
networks:
monitoring:
external: true
proxy:
external: true
volumes:
prometheus-data:
grafana-data:
Managing System Updates Across All Servers
Create playbooks/update-systems.yml:
---
- name: Update all homelab servers
hosts: all
become: true
serial: 1 # Update one server at a time to maintain availability
vars:
reboot_if_required: true
tasks:
- name: Update apt cache
apt:
update_cache: true
when: ansible_os_family == "Debian"
- name: Get list of upgradable packages
command: apt list --upgradable
register: upgradable
changed_when: false
when: ansible_os_family == "Debian"
- name: Display upgradable packages
debug:
msg: "{{ upgradable.stdout_lines }}"
when:
- ansible_os_family == "Debian"
- upgradable.stdout_lines | length > 1
- name: Upgrade all packages
apt:
upgrade: safe
autoremove: true
autoclean: true
register: upgrade_result
when: ansible_os_family == "Debian"
- name: Check if reboot is required
stat:
path: /var/run/reboot-required
register: reboot_required
- name: Display reboot status
debug:
msg: "Reboot is {{ 'required' if reboot_required.stat.exists else 'not required' }} on {{ inventory_hostname }}"
- name: Reboot if required
reboot:
msg: "Ansible: Rebooting due to system updates"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 5
post_reboot_delay: 30
when:
- reboot_if_required
- reboot_required.stat.exists
- name: Wait for server to come back
wait_for_connection:
delay: 10
timeout: 300
when:
- reboot_if_required
- reboot_required.stat.exists
- name: Verify Docker is running after reboot
systemd:
name: docker
state: started
when: "'docker_hosts' in group_names"
- name: Verify all containers are running after reboot
command: docker ps --format "{{ '{{' }}.Names{{ '}}' }}: {{ '{{' }}.Status{{ '}}' }}"
register: docker_status
changed_when: false
when: "'docker_hosts' in group_names"
- name: Display container status
debug:
msg: "{{ docker_status.stdout_lines }}"
when:
- "'docker_hosts' in group_names"
- docker_status.stdout_lines is defined
Key details:
serial: 1processes one server at a time. If you have services with redundancy, you will not lose all instances at once during updates.- The playbook checks whether a reboot is needed and only reboots if necessary.
- After reboot, it verifies Docker is running and shows container status.
Run this weekly or monthly:
# Dry run first
ansible-playbook playbooks/update-systems.yml --check
# Apply updates
ansible-playbook playbooks/update-systems.yml
# Skip reboots (just install updates)
ansible-playbook playbooks/update-systems.yml -e "reboot_if_required=false"
Automating Docker Image Updates
Create playbooks/update-containers.yml:
---
- name: Update Docker containers to latest images
hosts: docker_hosts
become: true
vars:
stack_base_dir: /opt/stacks
tasks:
- name: Find all docker-compose files
find:
paths: "{{ stack_base_dir }}"
patterns: "docker-compose.yml,docker-compose.yaml,compose.yml,compose.yaml"
recurse: true
register: compose_files
- name: Pull latest images for each stack
community.docker.docker_compose_v2:
project_src: "{{ item.path | dirname }}"
state: present
pull: always
remove_orphans: true
loop: "{{ compose_files.files }}"
register: update_results
- name: Show update results
debug:
msg: "Stack at {{ item.item.path | dirname | basename }}: {{ 'updated' if item.changed else 'no changes' }}"
loop: "{{ update_results.results }}"
- name: Prune old images
community.docker.docker_prune:
images: true
images_filters:
dangling: false
Backup Automation
Backups are non-negotiable. This playbook implements a comprehensive backup strategy for Docker volumes, configuration files, and databases.
Create playbooks/backup.yml:
---
- name: Backup homelab data
hosts: docker_hosts
become: true
vars:
backup_base_dir: /opt/backups
backup_remote_dir: /mnt/nas/backups
backup_retention_days: 30
stack_base_dir: /opt/stacks
timestamp: "{{ ansible_date_time.date }}-{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}"
tasks:
- name: Create backup directories
file:
path: "{{ item }}"
state: directory
owner: root
group: root
mode: "0700"
loop:
- "{{ backup_base_dir }}"
- "{{ backup_base_dir }}/{{ timestamp }}"
- "{{ backup_base_dir }}/{{ timestamp }}/volumes"
- "{{ backup_base_dir }}/{{ timestamp }}/configs"
- "{{ backup_base_dir }}/{{ timestamp }}/databases"
# Backup Docker Compose configurations
- name: Backup stack configurations
archive:
path: "{{ stack_base_dir }}"
dest: "{{ backup_base_dir }}/{{ timestamp }}/configs/stacks.tar.gz"
format: gz
# Backup Docker volumes
- name: Get list of Docker volumes
command: docker volume ls --format "{{ '{{' }}.Name{{ '}}' }}"
register: docker_volumes
changed_when: false
- name: Backup each Docker volume
shell: |
docker run --rm \
-v {{ item }}:/source:ro \
-v {{ backup_base_dir }}/{{ timestamp }}/volumes:/backup \
alpine tar czf /backup/{{ item }}.tar.gz -C /source .
loop: "{{ docker_volumes.stdout_lines }}"
when: docker_volumes.stdout_lines | length > 0
# Backup PostgreSQL databases
- name: Find running PostgreSQL containers
command: docker ps --filter "ancestor=postgres" --format "{{ '{{' }}.Names{{ '}}' }}"
register: postgres_containers
changed_when: false
- name: Backup PostgreSQL databases
shell: |
docker exec {{ item }} pg_dumpall -U postgres | \
gzip > {{ backup_base_dir }}/{{ timestamp }}/databases/{{ item }}.sql.gz
loop: "{{ postgres_containers.stdout_lines }}"
when: postgres_containers.stdout_lines | length > 0
# Create final archive
- name: Create consolidated backup archive
archive:
path: "{{ backup_base_dir }}/{{ timestamp }}"
dest: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
format: gz
- name: Get backup file size
stat:
path: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
register: backup_stat
- name: Display backup info
debug:
msg: >
Backup completed: {{ inventory_hostname }}-{{ timestamp }}.tar.gz
({{ (backup_stat.stat.size / 1048576) | round(2) }} MB)
# Copy to remote storage
- name: Copy backup to NAS
copy:
src: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
dest: "{{ backup_remote_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
remote_src: true
when: backup_remote_dir is defined
ignore_errors: true
# Cleanup old backups
- name: Remove local backups older than retention period
find:
paths: "{{ backup_base_dir }}"
patterns: "*.tar.gz"
age: "{{ backup_retention_days }}d"
register: old_backups
- name: Delete old backups
file:
path: "{{ item.path }}"
state: absent
loop: "{{ old_backups.files }}"
- name: Clean up temporary backup directory
file:
path: "{{ backup_base_dir }}/{{ timestamp }}"
state: absent
Automate with a cron job by adding to your playbook or running manually:
# Run backup now
ansible-playbook playbooks/backup.yml
# Add to crontab on the control node
crontab -e
# Add: 0 2 * * * cd ~/homelab-ansible && ansible-playbook playbooks/backup.yml >> /var/log/ansible-backup.log 2>&1
User Management and SSH Hardening
Create playbooks/users-ssh.yml:
---
- name: Manage users and harden SSH
hosts: all
become: true
vars:
admin_users:
- name: jerry
ssh_keys:
- "ssh-ed25519 AAAAC3... jerry@workstation"
groups:
- sudo
- docker
ssh_port: 22
ssh_permit_root: false
ssh_password_auth: false
ssh_max_auth_tries: 3
tasks:
- name: Create admin users
user:
name: "{{ item.name }}"
groups: "{{ item.groups }}"
append: true
shell: /bin/bash
create_home: true
loop: "{{ admin_users }}"
- name: Set up authorized keys
authorized_key:
user: "{{ item.0.name }}"
key: "{{ item.1 }}"
state: present
exclusive: false
loop: "{{ admin_users | subelements('ssh_keys') }}"
- name: Configure SSH daemon
lineinfile:
path: /etc/ssh/sshd_config
regexp: "{{ item.regexp }}"
line: "{{ item.line }}"
validate: "sshd -t -f %s"
loop:
- { regexp: '^#?Port ', line: 'Port {{ ssh_port }}' }
- { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin {{ "no" if not ssh_permit_root else "yes" }}' }
- { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication {{ "no" if not ssh_password_auth else "yes" }}' }
- { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries {{ ssh_max_auth_tries }}' }
- { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
- { regexp: '^#?AllowAgentForwarding', line: 'AllowAgentForwarding no' }
- { regexp: '^#?ClientAliveInterval', line: 'ClientAliveInterval 300' }
- { regexp: '^#?ClientAliveCountMax', line: 'ClientAliveCountMax 2' }
notify: Restart SSH
- name: Ensure SSH is enabled
systemd:
name: sshd
state: started
enabled: true
handlers:
- name: Restart SSH
systemd:
name: sshd
state: restarted
Important: Be careful with SSH configuration changes. Always test with --check first and make sure you have console access (IPMI, Proxmox console) in case you lock yourself out.
Monitoring and Alerting Setup
Create playbooks/monitoring.yml to deploy node exporters on all servers and configure Prometheus to scrape them:
---
- name: Deploy monitoring agents on all servers
hosts: all
become: true
vars:
node_exporter_version: "1.8.1"
tasks:
- name: Check if node_exporter is already installed
stat:
path: /usr/local/bin/node_exporter
register: ne_binary
- name: Get installed node_exporter version
command: /usr/local/bin/node_exporter --version
register: ne_version
changed_when: false
failed_when: false
when: ne_binary.stat.exists
- name: Download node_exporter
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
dest: /tmp/node_exporter.tar.gz
mode: "0644"
when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))
- name: Extract node_exporter
unarchive:
src: /tmp/node_exporter.tar.gz
dest: /tmp/
remote_src: true
when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))
- name: Install node_exporter binary
copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
dest: /usr/local/bin/node_exporter
owner: root
group: root
mode: "0755"
remote_src: true
notify: Restart node_exporter
when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))
- name: Create node_exporter user
user:
name: node_exporter
system: true
shell: /bin/false
create_home: false
- name: Create node_exporter systemd service
copy:
dest: /etc/systemd/system/node_exporter.service
content: |
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
Wants=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.systemd \
--collector.processes \
--web.listen-address=:9100
[Install]
WantedBy=multi-user.target
owner: root
group: root
mode: "0644"
notify:
- Reload systemd
- Restart node_exporter
- name: Allow node_exporter port in UFW
ufw:
rule: allow
port: "9100"
proto: tcp
from_ip: 192.168.1.0/24
comment: "Prometheus node_exporter"
- name: Enable and start node_exporter
systemd:
name: node_exporter
state: started
enabled: true
handlers:
- name: Reload systemd
systemd:
daemon_reload: true
- name: Restart node_exporter
systemd:
name: node_exporter
state: restarted
- name: Generate Prometheus configuration
hosts: docker_hosts[0]
become: true
vars:
stack_base_dir: /opt/stacks
tasks:
- name: Generate Prometheus scrape config from inventory
copy:
dest: "{{ stack_base_dir }}/monitoring/prometheus.yml"
content: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node-exporter"
static_configs:
- targets:
{% for host in groups['all'] %}
- "{{ hostvars[host]['ansible_host'] }}:9100" # {{ host }}
{% endfor %}
- job_name: "cadvisor"
static_configs:
- targets:
{% for host in groups['docker_hosts'] %}
- "{{ hostvars[host]['ansible_host'] }}:8080" # {{ host }}
{% endfor %}
owner: jerry
group: docker
mode: "0644"
notify: Restart Prometheus
handlers:
- name: Restart Prometheus
community.docker.docker_compose_v2:
project_src: "{{ stack_base_dir }}/monitoring"
services:
- prometheus
state: restarted
This playbook installs node_exporter on every server in your inventory and then generates the Prometheus configuration automatically from your inventory. When you add a new server to your inventory, re-running this playbook adds it to monitoring automatically.
Organizing Your Ansible Project: Roles and Directory Structure
As your playbooks grow, you will want to organize them into roles. Here is the recommended structure:
~/homelab-ansible/
├── ansible.cfg
├── inventory/
│ ├── hosts.yml
│ ├── group_vars/
│ │ ├── all.yml # Variables for all hosts
│ │ ├── docker_hosts.yml # Variables for Docker hosts
│ │ └── proxmox_nodes.yml
│ └── host_vars/
│ ├── docker01.yml
│ ├── docker02.yml
│ └── docker03.yml
├── playbooks/
│ ├── site.yml # Main playbook that imports everything
│ ├── base-setup.yml
│ ├── deploy-stacks.yml
│ ├── update-systems.yml
│ └── backup.yml
├── roles/
│ ├── common/
│ │ ├── tasks/
│ │ │ └── main.yml
│ │ ├── handlers/
│ │ │ └── main.yml
│ │ ├── templates/
│ │ │ └── sshd_config.j2
│ │ └── defaults/
│ │ └── main.yml
│ ├── docker/
│ │ ├── tasks/
│ │ │ └── main.yml
│ │ ├── handlers/
│ │ │ └── main.yml
│ │ └── defaults/
│ │ └── main.yml
│ ├── monitoring/
│ │ ├── tasks/
│ │ │ └── main.yml
│ │ ├── files/
│ │ │ └── node_exporter.service
│ │ └── defaults/
│ │ └── main.yml
│ └── backup/
│ ├── tasks/
│ │ └── main.yml
│ ├── templates/
│ │ └── backup.sh.j2
│ └── defaults/
│ └── main.yml
└── stacks/
├── traefik/
├── monitoring/
└── gitea/
The Master Playbook
Create playbooks/site.yml that ties everything together:
---
- name: Apply base configuration to all servers
hosts: all
become: true
roles:
- common
- name: Configure Docker hosts
hosts: docker_hosts
become: true
roles:
- docker
- monitoring
- name: Deploy application stacks
hosts: docker_hosts
become: true
roles:
- role: backup
tags: [backup]
tasks:
- name: Include stack deployment
include_tasks: deploy-stacks.yml
tags: [stacks]
Run the entire setup with:
# Configure everything from scratch
ansible-playbook playbooks/site.yml
# Only run specific tags
ansible-playbook playbooks/site.yml --tags "docker,stacks"
# Skip specific tags
ansible-playbook playbooks/site.yml --skip-tags "backup"
Secrets Management with Ansible Vault
Never put passwords, API keys, or tokens in plain text in your playbooks or inventory. Ansible Vault encrypts sensitive data so you can safely commit it to Git.
Creating Encrypted Variables
# Create a new encrypted variable file
ansible-vault create inventory/group_vars/docker_hosts/vault.yml
# Edit an existing encrypted file
ansible-vault edit inventory/group_vars/docker_hosts/vault.yml
# Encrypt an existing file
ansible-vault encrypt secrets.yml
# View encrypted file contents
ansible-vault view inventory/group_vars/docker_hosts/vault.yml
Example: Encrypted Variables File
# inventory/group_vars/docker_hosts/vault.yml (encrypted)
vault_grafana_admin_password: "SuperSecretPassword123"
vault_gitea_db_password: "AnotherSecretPassword456"
vault_traefik_dashboard_password: "YetAnotherPassword789"
vault_backup_encryption_key: "EncryptionKeyForBackups"
Reference these in your regular variables:
# inventory/group_vars/docker_hosts/vars.yml (not encrypted)
grafana_admin_password: "{{ vault_grafana_admin_password }}"
gitea_db_password: "{{ vault_gitea_db_password }}"
Running Playbooks with Vault
# Prompt for vault password
ansible-playbook playbooks/site.yml --ask-vault-pass
# Use a password file (better for automation)
echo "your-vault-password" > ~/.vault_pass
chmod 600 ~/.vault_pass
ansible-playbook playbooks/site.yml --vault-password-file ~/.vault_pass
# Or set it in ansible.cfg
# [defaults]
# vault_password_file = ~/.vault_pass
Important: Add .vault_pass and any password files to your .gitignore.
Writing Secrets to Disk Securely
When deploying Docker stacks that need secrets as files:
- name: Write Grafana admin password
copy:
content: "{{ vault_grafana_admin_password }}"
dest: "{{ stack_base_dir }}/monitoring/secrets/grafana_admin_password.txt"
owner: jerry
group: docker
mode: "0600"
no_log: true # Prevents the password from appearing in Ansible output
The no_log: true directive is critical — without it, Ansible logs the full content of the task, including the decrypted password.
Common Mistakes and How to Avoid Them
Mistake 1: Not Using —check First
Always dry-run before applying:
ansible-playbook playbooks/site.yml --check --diff
The --diff flag shows exactly what would change in files. This catches misconfigurations before they hit production.
Mistake 2: Using shell/command When a Module Exists
Bad:
- name: Install nginx
command: apt-get install -y nginx
Good:
- name: Install nginx
apt:
name: nginx
state: present
Modules are idempotent by design. The apt module checks if nginx is already installed and skips the task if so. The command module runs every time regardless.
Mistake 3: Not Using Handlers
Bad:
- name: Update nginx config
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
- name: Restart nginx
systemd:
name: nginx
state: restarted
This restarts nginx every time the playbook runs, even if the config did not change.
Good:
- name: Update nginx config
copy:
src: nginx.conf
dest: /etc/nginx/nginx.conf
notify: Restart nginx
handlers:
- name: Restart nginx
systemd:
name: nginx
state: restarted
Handlers only run when notified by a changed task. No config change means no restart.
Mistake 4: Hardcoding Values Instead of Using Variables
Bad:
- name: Create user
user:
name: jerry
groups: sudo,docker
Good:
# In group_vars/all.yml
admin_user: jerry
admin_groups:
- sudo
- docker
# In tasks
- name: Create user
user:
name: "{{ admin_user }}"
groups: "{{ admin_groups }}"
append: true
Variables make your playbooks reusable and configurable without editing the tasks themselves.
Mistake 5: Ignoring Ansible-lint
Install and run ansible-lint to catch common issues:
pipx install ansible-lint
# Lint your playbooks
ansible-lint playbooks/
It catches things like deprecated modules, missing name fields, command tasks that should use modules, and YAML formatting issues. Set it up in your editor for real-time feedback.
Final Thoughts
Ansible transforms your homelab from a collection of manually-configured machines into reproducible, documented infrastructure. The investment pays off the first time you need to rebuild a server — instead of spending a weekend recreating your setup from memory, you run one command and go make coffee.
Start small. Pick one pain point — maybe it is installing Docker on new machines, or running updates across all servers — and automate it. Once you see a playbook do in 30 seconds what used to take you 20 minutes of SSH sessions, you will want to automate everything else.
The playbooks in this guide are complete and working. Copy them, adapt the variables to your environment, and build from there. As your homelab grows, your Ansible repository grows with it, and every new server you add is configured correctly from the first boot.
The best infrastructure is the kind you do not have to think about. Ansible gets you there.