Homelab Automation with Ansible: From Zero to Fully Automated in 2026

ansible homelabhomelab automationansible dockeransible tutorial 2026self-hosting

Homelab Automation with Ansible: From Zero to Fully Automated in 2026

There is a moment in every homelabber’s journey where SSH-ing into six different machines to run the same three commands stops feeling productive and starts feeling like punishment. You tell yourself “I’ll document the process,” so you write a wiki page. Then the wiki page gets outdated. Then you forget which server has which version of Docker. Then you rebuild a machine and spend four hours recreating the setup from memory because your wiki page assumed you would remember what “configure the thing” meant three months later.

Ansible fixes this. Not by adding another tool to manage, but by turning your infrastructure knowledge into code that is executable, version-controlled, and repeatable. Instead of a wiki page that says “install Docker, configure the daemon, set up the compose stack,” you have a playbook that does it. Instead of remembering which servers need updates, you run one command and all of them update simultaneously.

This guide takes you from zero Ansible knowledge to a fully automated homelab. We will cover installation, inventory setup, your first playbook, Docker deployment automation, system updates, backup automation, monitoring setup, and the patterns that keep your Ansible code maintainable as your homelab grows. Every example is a complete, working playbook you can copy and adapt.

Table of Contents

TL;DR

  • Ansible is agentless — it connects to your servers via SSH and runs commands. No software to install on target machines.
  • Install Ansible on one control machine (your laptop or a dedicated management server). Use pipx install ansible for the cleanest setup.
  • Define your servers in an inventory file. Group them by role (docker hosts, proxmox nodes, NAS, etc.).
  • Write playbooks (YAML files) that describe what you want each server to look like. Ansible makes it so.
  • Ansible is idempotent — running a playbook twice produces the same result. It only changes what needs changing.
  • Start small: automate Docker installation and system updates. Add more as you get comfortable.
  • Use Ansible Vault for secrets (passwords, API keys) so you can safely commit your playbooks to Git.

Why Ansible for Homelabs

Ansible vs. Shell Scripts

You might wonder why you need Ansible when Bash scripts work fine. Here is the difference:

AspectShell ScriptsAnsible
IdempotencyYou build it yourself (if-then-else for every step)Built in — modules know how to check state first
Multi-hostManual loops with SSH, error handling is painfulNative — runs on multiple hosts in parallel
Error handlingStops at first error (or worse, continues silently)Structured error handling, retry, rescue blocks
OS differencesDifferent commands for apt vs dnf vs pacmanModules abstract package managers
ReadabilityReadable only to the person who wrote itYAML is readable to anyone
SecretsHardcoded or in env varsAnsible Vault with encryption
Dry runHope and prayer--check mode shows what would change
State trackingNone — runs every command every timeOnly applies changes when state differs

A Bash script that installs Docker, adds a user to the docker group, configures the daemon, and deploys a compose stack is 50-80 lines of fragile, OS-specific code with no error handling. The equivalent Ansible playbook is declarative, handles errors, works across distributions, and only makes changes when something is actually different.

Ansible vs. Other Tools

Terraform is for provisioning infrastructure (VMs, cloud resources). Ansible is for configuring what runs on that infrastructure. They complement each other — use Terraform to create VMs on Proxmox, use Ansible to configure them.

Puppet/Chef require agents running on every managed node. Ansible is agentless — it uses SSH. For a homelab with 3-20 machines, the agentless model is simpler.

NixOS is a different paradigm entirely (declarative OS configuration). If you are already running NixOS, you probably do not need Ansible. If you are running Debian/Ubuntu/Fedora, Ansible is the practical choice.

Installing Ansible

Ansible runs on your control node — the machine you manage your homelab from. This is typically your laptop or a dedicated management server. You do not install Ansible on the servers being managed.

# Install pipx if you don't have it
sudo apt install pipx  # Debian/Ubuntu
# or
brew install pipx       # macOS

# Install Ansible in an isolated environment
pipx install ansible

# Verify installation
ansible --version

Using pipx keeps Ansible and its dependencies isolated from your system Python. This avoids the dependency conflicts that plague pip install ansible.

Alternative: System Package Manager

# Debian/Ubuntu
sudo apt update
sudo apt install ansible

# Fedora
sudo dnf install ansible

# Arch
sudo pacman -S ansible

System packages are often a version or two behind. For a homelab, this rarely matters.

Verify SSH Access

Ansible connects to your servers via SSH. Before writing any playbooks, make sure you can SSH into every server without a password:

# Generate an SSH key if you don't have one
ssh-keygen -t ed25519 -C "ansible@homelab"

# Copy your key to each server
ssh-copy-id user@server1.local
ssh-copy-id user@server2.local
ssh-copy-id user@server3.local

# Test connectivity
ssh user@server1.local "hostname"

If that works, Ansible will work.

Core Concepts: The Five Things You Need to Know

1. Inventory

The inventory is a file that lists your servers and groups them. It tells Ansible what machines exist and how to connect to them.

2. Playbooks

Playbooks are YAML files that describe the desired state of your servers. They contain one or more “plays,” each targeting a group of hosts and specifying tasks to run.

3. Tasks

Tasks are individual actions: install a package, copy a file, start a service, run a command. Each task uses a module that knows how to perform the action idempotently.

4. Modules

Modules are the building blocks. apt installs packages on Debian. copy copies files. docker_compose_v2 manages Docker Compose stacks. There are thousands of modules for every common operation.

5. Roles

Roles are reusable bundles of tasks, files, templates, and variables. They let you organize your playbooks into logical units (“docker” role, “monitoring” role, “backup” role) that you can apply to different servers.

Setting Up Your Inventory

Create a directory for your Ansible project:

mkdir -p ~/homelab-ansible
cd ~/homelab-ansible

Basic Inventory File

Create inventory/hosts.yml:

all:
  children:
    docker_hosts:
      hosts:
        docker01:
          ansible_host: 192.168.1.10
        docker02:
          ansible_host: 192.168.1.11
        docker03:
          ansible_host: 192.168.1.12

    proxmox_nodes:
      hosts:
        pve01:
          ansible_host: 192.168.1.2
        pve02:
          ansible_host: 192.168.1.3

    nas:
      hosts:
        truenas:
          ansible_host: 192.168.1.5

  vars:
    ansible_user: jerry
    ansible_become: true
    ansible_become_method: sudo
    ansible_python_interpreter: /usr/bin/python3

This inventory defines three groups: docker_hosts, proxmox_nodes, and nas. The vars section under all applies default connection settings to every host.

Ansible Configuration File

Create ansible.cfg in your project root:

[defaults]
inventory = inventory/hosts.yml
remote_user = jerry
host_key_checking = false
retry_files_enabled = false
stdout_callback = yaml
callbacks_enabled = timer, profile_tasks

[privilege_escalation]
become = true
become_method = sudo
become_ask_pass = false

[ssh_connection]
pipelining = true
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

Key settings:

  • pipelining = true significantly speeds up Ansible by reducing the number of SSH connections.
  • ControlPersist=60s keeps SSH connections open for reuse.
  • stdout_callback = yaml makes output much more readable.
  • profile_tasks shows how long each task takes, useful for optimization.

Test Your Inventory

# Ping all hosts
ansible all -m ping

# Ping just docker hosts
ansible docker_hosts -m ping

# Get facts from a specific host
ansible docker01 -m setup

If ping returns “pong” for all hosts, your inventory and SSH access are working.

Your First Playbook: System Setup

Create playbooks/base-setup.yml. This playbook configures every server with your baseline: timezone, locale, essential packages, and basic security settings.

---
- name: Base system setup for all homelab servers
  hosts: all
  become: true

  vars:
    timezone: "America/New_York"
    base_packages:
      - curl
      - wget
      - git
      - htop
      - tmux
      - vim
      - unzip
      - jq
      - net-tools
      - dnsutils
      - ncdu
      - tree
      - rsync
      - fail2ban
      - ufw

  tasks:
    - name: Update apt cache
      apt:
        update_cache: true
        cache_valid_time: 3600
      when: ansible_os_family == "Debian"

    - name: Upgrade all packages
      apt:
        upgrade: safe
      when: ansible_os_family == "Debian"

    - name: Install base packages
      apt:
        name: "{{ base_packages }}"
        state: present
      when: ansible_os_family == "Debian"

    - name: Set timezone
      timezone:
        name: "{{ timezone }}"

    - name: Set hostname
      hostname:
        name: "{{ inventory_hostname }}"

    - name: Ensure /etc/hosts has the hostname
      lineinfile:
        path: /etc/hosts
        regexp: '^127\.0\.1\.1'
        line: "127.0.1.1 {{ inventory_hostname }}"

    - name: Configure fail2ban
      copy:
        dest: /etc/fail2ban/jail.local
        content: |
          [DEFAULT]
          bantime = 1h
          findtime = 10m
          maxretry = 5
          backend = systemd

          [sshd]
          enabled = true
          port = ssh
          filter = sshd
          maxretry = 3
        owner: root
        group: root
        mode: "0644"
      notify: Restart fail2ban

    - name: Enable and start fail2ban
      systemd:
        name: fail2ban
        state: started
        enabled: true

    - name: Configure UFW defaults
      ufw:
        direction: "{{ item.direction }}"
        policy: "{{ item.policy }}"
      loop:
        - { direction: incoming, policy: deny }
        - { direction: outgoing, policy: allow }

    - name: Allow SSH through UFW
      ufw:
        rule: allow
        port: "22"
        proto: tcp

    - name: Enable UFW
      ufw:
        state: enabled

    - name: Configure sysctl for security
      sysctl:
        name: "{{ item.key }}"
        value: "{{ item.value }}"
        sysctl_file: /etc/sysctl.d/99-homelab.conf
        reload: true
      loop:
        - { key: "net.ipv4.conf.all.rp_filter", value: "1" }
        - { key: "net.ipv4.conf.default.rp_filter", value: "1" }
        - { key: "net.ipv4.icmp_echo_ignore_broadcasts", value: "1" }
        - { key: "net.ipv4.conf.all.accept_redirects", value: "0" }
        - { key: "net.ipv4.conf.default.accept_redirects", value: "0" }
        - { key: "net.ipv4.conf.all.send_redirects", value: "0" }
        - { key: "net.ipv4.conf.default.send_redirects", value: "0" }

    - name: Set up automatic security updates
      apt:
        name:
          - unattended-upgrades
          - apt-listchanges
        state: present
      when: ansible_os_family == "Debian"

    - name: Enable automatic security updates
      copy:
        dest: /etc/apt/apt.conf.d/20auto-upgrades
        content: |
          APT::Periodic::Update-Package-Lists "1";
          APT::Periodic::Unattended-Upgrade "1";
          APT::Periodic::AutocleanInterval "7";
        owner: root
        group: root
        mode: "0644"
      when: ansible_os_family == "Debian"

  handlers:
    - name: Restart fail2ban
      systemd:
        name: fail2ban
        state: restarted

Run it:

# Dry run first (shows what would change without making changes)
ansible-playbook playbooks/base-setup.yml --check --diff

# Apply for real
ansible-playbook playbooks/base-setup.yml

# Apply to only one host
ansible-playbook playbooks/base-setup.yml --limit docker01

The --check --diff flag is your best friend. Always dry-run playbooks before applying them to production servers.

Automating Docker Installation

Create playbooks/docker-install.yml:

---
- name: Install and configure Docker on all Docker hosts
  hosts: docker_hosts
  become: true

  vars:
    docker_users:
      - jerry
    docker_compose_version: "2.29.1"
    docker_daemon_config:
      log-driver: json-file
      log-opts:
        max-size: "10m"
        max-file: "3"
      storage-driver: overlay2
      default-address-pools:
        - base: "172.17.0.0/12"
          size: 24
      live-restore: true
      userland-proxy: false

  tasks:
    - name: Remove old Docker packages
      apt:
        name:
          - docker
          - docker-engine
          - docker.io
          - containerd
          - runc
        state: absent

    - name: Install prerequisites
      apt:
        name:
          - ca-certificates
          - curl
          - gnupg
          - lsb-release
        state: present
        update_cache: true

    - name: Create keyrings directory
      file:
        path: /etc/apt/keyrings
        state: directory
        mode: "0755"

    - name: Add Docker GPG key
      apt_key:
        url: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
        keyring: /etc/apt/keyrings/docker.gpg
        state: present

    - name: Add Docker repository
      apt_repository:
        repo: >-
          deb [arch={{ ansible_architecture | replace('x86_64', 'amd64') }}
          signed-by=/etc/apt/keyrings/docker.gpg]
          https://download.docker.com/linux/{{ ansible_distribution | lower }}
          {{ ansible_distribution_release }} stable
        filename: docker
        state: present

    - name: Install Docker Engine
      apt:
        name:
          - docker-ce
          - docker-ce-cli
          - containerd.io
          - docker-buildx-plugin
          - docker-compose-plugin
        state: present
        update_cache: true

    - name: Configure Docker daemon
      copy:
        content: "{{ docker_daemon_config | to_nice_json }}"
        dest: /etc/docker/daemon.json
        owner: root
        group: root
        mode: "0644"
      notify: Restart Docker

    - name: Add users to docker group
      user:
        name: "{{ item }}"
        groups: docker
        append: true
      loop: "{{ docker_users }}"

    - name: Create Docker network for reverse proxy
      docker_network:
        name: proxy
        driver: bridge

    - name: Create Docker network for monitoring
      docker_network:
        name: monitoring
        driver: bridge

    - name: Enable and start Docker
      systemd:
        name: docker
        state: started
        enabled: true

    - name: Enable containerd
      systemd:
        name: containerd
        state: started
        enabled: true

    - name: Verify Docker is running
      command: docker info
      changed_when: false
      register: docker_info

    - name: Display Docker version
      debug:
        msg: "Docker {{ docker_info.stdout_lines[1] | trim }} is running on {{ inventory_hostname }}"

    - name: Set up Docker system prune cron job
      cron:
        name: "Docker system prune"
        minute: "0"
        hour: "3"
        weekday: "0"
        job: "docker system prune -af --volumes --filter 'until=168h' > /dev/null 2>&1"
        user: root

  handlers:
    - name: Restart Docker
      systemd:
        name: docker
        state: restarted

This playbook:

  • Removes any old Docker installation
  • Installs Docker CE from the official repository
  • Configures the Docker daemon with sane defaults (log rotation, overlay2 storage, live restore)
  • Adds your user to the docker group
  • Creates shared Docker networks for reverse proxy and monitoring
  • Sets up a weekly cron job to clean unused images and volumes

Deploying Docker Compose Stacks with Ansible

This is where Ansible really shines for homelabs. Instead of SSH-ing into each server to manage Docker Compose stacks, you define them in Ansible and deploy everywhere with one command.

Project Structure for Docker Stacks

~/homelab-ansible/
├── ansible.cfg
├── inventory/
│   └── hosts.yml
├── playbooks/
│   ├── base-setup.yml
│   ├── docker-install.yml
│   └── deploy-stacks.yml
├── stacks/
│   ├── traefik/
│   │   ├── docker-compose.yml
│   │   └── traefik.yml
│   ├── monitoring/
│   │   ├── docker-compose.yml
│   │   └── prometheus.yml
│   ├── gitea/
│   │   └── docker-compose.yml
│   └── media/
│       └── docker-compose.yml
└── group_vars/
    └── docker_hosts.yml

The Stack Deployment Playbook

Create playbooks/deploy-stacks.yml:

---
- name: Deploy Docker Compose stacks
  hosts: docker_hosts
  become: true

  vars:
    stack_base_dir: /opt/stacks
    stacks_to_deploy: "{{ docker_stacks | default([]) }}"

  tasks:
    - name: Create base directory for stacks
      file:
        path: "{{ stack_base_dir }}"
        state: directory
        owner: jerry
        group: docker
        mode: "0755"

    - name: Create directory for each stack
      file:
        path: "{{ stack_base_dir }}/{{ item }}"
        state: directory
        owner: jerry
        group: docker
        mode: "0755"
      loop: "{{ stacks_to_deploy }}"

    - name: Copy Docker Compose files
      copy:
        src: "../stacks/{{ item }}/"
        dest: "{{ stack_base_dir }}/{{ item }}/"
        owner: jerry
        group: docker
        mode: "0644"
      loop: "{{ stacks_to_deploy }}"
      register: compose_files

    - name: Deploy Docker Compose stacks
      community.docker.docker_compose_v2:
        project_src: "{{ stack_base_dir }}/{{ item.item }}"
        state: present
        pull: always
        remove_orphans: true
      loop: "{{ compose_files.results }}"
      when: item.changed
      register: deploy_result

    - name: Show deployment results
      debug:
        msg: "Stack {{ item.item.item }} deployed successfully"
      loop: "{{ deploy_result.results }}"
      when: item is not skipped

Host-Specific Stack Assignments

Define which stacks run on which hosts in group_vars/docker_hosts.yml or in host-specific variable files.

Create inventory/host_vars/docker01.yml:

docker_stacks:
  - traefik
  - monitoring
  - gitea

Create inventory/host_vars/docker02.yml:

docker_stacks:
  - media
  - nextcloud

Now run:

# Deploy all stacks to all hosts
ansible-playbook playbooks/deploy-stacks.yml

# Deploy to just one host
ansible-playbook playbooks/deploy-stacks.yml --limit docker01

# Deploy a specific stack (using tags, if you add them)
ansible-playbook playbooks/deploy-stacks.yml --limit docker01 -e "stacks_to_deploy=['traefik']"

Example: Traefik Reverse Proxy Stack

Create stacks/traefik/docker-compose.yml:

services:
  traefik:
    image: traefik:v3.2
    container_name: traefik
    restart: unless-stopped
    security_opt:
      - no-new-privileges:true
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik.yml:/etc/traefik/traefik.yml:ro
      - traefik-certs:/letsencrypt
    networks:
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.dashboard.rule=Host(`traefik.example.com`)"
      - "traefik.http.routers.dashboard.service=api@internal"
      - "traefik.http.routers.dashboard.middlewares=auth"
      - "traefik.http.middlewares.auth.basicauth.users=admin:$$apr1$$xyz..."

networks:
  proxy:
    external: true

volumes:
  traefik-certs:

Create stacks/traefik/traefik.yml:

api:
  dashboard: true

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"

certificatesResolvers:
  letsencrypt:
    acme:
      email: you@example.com
      storage: /letsencrypt/acme.json
      httpChallenge:
        entryPoint: web

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: proxy

Example: Monitoring Stack

Create stacks/monitoring/docker-compose.yml:

services:
  prometheus:
    image: prom/prometheus:v2.53.0
    container_name: prometheus
    restart: unless-stopped
    user: "1000:1000"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=30d"
      - "--storage.tsdb.path=/prometheus"
    networks:
      - monitoring
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.prometheus.rule=Host(`prometheus.example.com`)"
      - "traefik.http.routers.prometheus.entrypoints=websecure"
      - "traefik.http.routers.prometheus.tls.certresolver=letsencrypt"

  grafana:
    image: grafana/grafana:11.1.0
    container_name: grafana
    restart: unless-stopped
    user: "1000:1000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
      - GF_INSTALL_PLUGINS=grafana-clock-panel
    volumes:
      - grafana-data:/var/lib/grafana
    secrets:
      - grafana_admin_password
    networks:
      - monitoring
      - proxy
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.grafana.rule=Host(`grafana.example.com`)"
      - "traefik.http.routers.grafana.entrypoints=websecure"
      - "traefik.http.routers.grafana.tls.certresolver=letsencrypt"

  node-exporter:
    image: prom/node-exporter:v1.8.1
    container_name: node-exporter
    restart: unless-stopped
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.sysfs=/host/sys"
      - "--path.rootfs=/rootfs"
      - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    networks:
      - monitoring

secrets:
  grafana_admin_password:
    file: ./secrets/grafana_admin_password.txt

networks:
  monitoring:
    external: true
  proxy:
    external: true

volumes:
  prometheus-data:
  grafana-data:

Managing System Updates Across All Servers

Create playbooks/update-systems.yml:

---
- name: Update all homelab servers
  hosts: all
  become: true
  serial: 1  # Update one server at a time to maintain availability

  vars:
    reboot_if_required: true

  tasks:
    - name: Update apt cache
      apt:
        update_cache: true
      when: ansible_os_family == "Debian"

    - name: Get list of upgradable packages
      command: apt list --upgradable
      register: upgradable
      changed_when: false
      when: ansible_os_family == "Debian"

    - name: Display upgradable packages
      debug:
        msg: "{{ upgradable.stdout_lines }}"
      when:
        - ansible_os_family == "Debian"
        - upgradable.stdout_lines | length > 1

    - name: Upgrade all packages
      apt:
        upgrade: safe
        autoremove: true
        autoclean: true
      register: upgrade_result
      when: ansible_os_family == "Debian"

    - name: Check if reboot is required
      stat:
        path: /var/run/reboot-required
      register: reboot_required

    - name: Display reboot status
      debug:
        msg: "Reboot is {{ 'required' if reboot_required.stat.exists else 'not required' }} on {{ inventory_hostname }}"

    - name: Reboot if required
      reboot:
        msg: "Ansible: Rebooting due to system updates"
        connect_timeout: 5
        reboot_timeout: 300
        pre_reboot_delay: 5
        post_reboot_delay: 30
      when:
        - reboot_if_required
        - reboot_required.stat.exists

    - name: Wait for server to come back
      wait_for_connection:
        delay: 10
        timeout: 300
      when:
        - reboot_if_required
        - reboot_required.stat.exists

    - name: Verify Docker is running after reboot
      systemd:
        name: docker
        state: started
      when: "'docker_hosts' in group_names"

    - name: Verify all containers are running after reboot
      command: docker ps --format "{{ '{{' }}.Names{{ '}}' }}: {{ '{{' }}.Status{{ '}}' }}"
      register: docker_status
      changed_when: false
      when: "'docker_hosts' in group_names"

    - name: Display container status
      debug:
        msg: "{{ docker_status.stdout_lines }}"
      when:
        - "'docker_hosts' in group_names"
        - docker_status.stdout_lines is defined

Key details:

  • serial: 1 processes one server at a time. If you have services with redundancy, you will not lose all instances at once during updates.
  • The playbook checks whether a reboot is needed and only reboots if necessary.
  • After reboot, it verifies Docker is running and shows container status.

Run this weekly or monthly:

# Dry run first
ansible-playbook playbooks/update-systems.yml --check

# Apply updates
ansible-playbook playbooks/update-systems.yml

# Skip reboots (just install updates)
ansible-playbook playbooks/update-systems.yml -e "reboot_if_required=false"

Automating Docker Image Updates

Create playbooks/update-containers.yml:

---
- name: Update Docker containers to latest images
  hosts: docker_hosts
  become: true

  vars:
    stack_base_dir: /opt/stacks

  tasks:
    - name: Find all docker-compose files
      find:
        paths: "{{ stack_base_dir }}"
        patterns: "docker-compose.yml,docker-compose.yaml,compose.yml,compose.yaml"
        recurse: true
      register: compose_files

    - name: Pull latest images for each stack
      community.docker.docker_compose_v2:
        project_src: "{{ item.path | dirname }}"
        state: present
        pull: always
        remove_orphans: true
      loop: "{{ compose_files.files }}"
      register: update_results

    - name: Show update results
      debug:
        msg: "Stack at {{ item.item.path | dirname | basename }}: {{ 'updated' if item.changed else 'no changes' }}"
      loop: "{{ update_results.results }}"

    - name: Prune old images
      community.docker.docker_prune:
        images: true
        images_filters:
          dangling: false

Backup Automation

Backups are non-negotiable. This playbook implements a comprehensive backup strategy for Docker volumes, configuration files, and databases.

Create playbooks/backup.yml:

---
- name: Backup homelab data
  hosts: docker_hosts
  become: true

  vars:
    backup_base_dir: /opt/backups
    backup_remote_dir: /mnt/nas/backups
    backup_retention_days: 30
    stack_base_dir: /opt/stacks
    timestamp: "{{ ansible_date_time.date }}-{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}"

  tasks:
    - name: Create backup directories
      file:
        path: "{{ item }}"
        state: directory
        owner: root
        group: root
        mode: "0700"
      loop:
        - "{{ backup_base_dir }}"
        - "{{ backup_base_dir }}/{{ timestamp }}"
        - "{{ backup_base_dir }}/{{ timestamp }}/volumes"
        - "{{ backup_base_dir }}/{{ timestamp }}/configs"
        - "{{ backup_base_dir }}/{{ timestamp }}/databases"

    # Backup Docker Compose configurations
    - name: Backup stack configurations
      archive:
        path: "{{ stack_base_dir }}"
        dest: "{{ backup_base_dir }}/{{ timestamp }}/configs/stacks.tar.gz"
        format: gz

    # Backup Docker volumes
    - name: Get list of Docker volumes
      command: docker volume ls --format "{{ '{{' }}.Name{{ '}}' }}"
      register: docker_volumes
      changed_when: false

    - name: Backup each Docker volume
      shell: |
        docker run --rm \
          -v {{ item }}:/source:ro \
          -v {{ backup_base_dir }}/{{ timestamp }}/volumes:/backup \
          alpine tar czf /backup/{{ item }}.tar.gz -C /source .
      loop: "{{ docker_volumes.stdout_lines }}"
      when: docker_volumes.stdout_lines | length > 0

    # Backup PostgreSQL databases
    - name: Find running PostgreSQL containers
      command: docker ps --filter "ancestor=postgres" --format "{{ '{{' }}.Names{{ '}}' }}"
      register: postgres_containers
      changed_when: false

    - name: Backup PostgreSQL databases
      shell: |
        docker exec {{ item }} pg_dumpall -U postgres | \
          gzip > {{ backup_base_dir }}/{{ timestamp }}/databases/{{ item }}.sql.gz
      loop: "{{ postgres_containers.stdout_lines }}"
      when: postgres_containers.stdout_lines | length > 0

    # Create final archive
    - name: Create consolidated backup archive
      archive:
        path: "{{ backup_base_dir }}/{{ timestamp }}"
        dest: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
        format: gz

    - name: Get backup file size
      stat:
        path: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
      register: backup_stat

    - name: Display backup info
      debug:
        msg: >
          Backup completed: {{ inventory_hostname }}-{{ timestamp }}.tar.gz
          ({{ (backup_stat.stat.size / 1048576) | round(2) }} MB)

    # Copy to remote storage
    - name: Copy backup to NAS
      copy:
        src: "{{ backup_base_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
        dest: "{{ backup_remote_dir }}/{{ inventory_hostname }}-{{ timestamp }}.tar.gz"
        remote_src: true
      when: backup_remote_dir is defined
      ignore_errors: true

    # Cleanup old backups
    - name: Remove local backups older than retention period
      find:
        paths: "{{ backup_base_dir }}"
        patterns: "*.tar.gz"
        age: "{{ backup_retention_days }}d"
      register: old_backups

    - name: Delete old backups
      file:
        path: "{{ item.path }}"
        state: absent
      loop: "{{ old_backups.files }}"

    - name: Clean up temporary backup directory
      file:
        path: "{{ backup_base_dir }}/{{ timestamp }}"
        state: absent

Automate with a cron job by adding to your playbook or running manually:

# Run backup now
ansible-playbook playbooks/backup.yml

# Add to crontab on the control node
crontab -e
# Add: 0 2 * * * cd ~/homelab-ansible && ansible-playbook playbooks/backup.yml >> /var/log/ansible-backup.log 2>&1

User Management and SSH Hardening

Create playbooks/users-ssh.yml:

---
- name: Manage users and harden SSH
  hosts: all
  become: true

  vars:
    admin_users:
      - name: jerry
        ssh_keys:
          - "ssh-ed25519 AAAAC3... jerry@workstation"
        groups:
          - sudo
          - docker
    ssh_port: 22
    ssh_permit_root: false
    ssh_password_auth: false
    ssh_max_auth_tries: 3

  tasks:
    - name: Create admin users
      user:
        name: "{{ item.name }}"
        groups: "{{ item.groups }}"
        append: true
        shell: /bin/bash
        create_home: true
      loop: "{{ admin_users }}"

    - name: Set up authorized keys
      authorized_key:
        user: "{{ item.0.name }}"
        key: "{{ item.1 }}"
        state: present
        exclusive: false
      loop: "{{ admin_users | subelements('ssh_keys') }}"

    - name: Configure SSH daemon
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
        validate: "sshd -t -f %s"
      loop:
        - { regexp: '^#?Port ', line: 'Port {{ ssh_port }}' }
        - { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin {{ "no" if not ssh_permit_root else "yes" }}' }
        - { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication {{ "no" if not ssh_password_auth else "yes" }}' }
        - { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries {{ ssh_max_auth_tries }}' }
        - { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
        - { regexp: '^#?AllowAgentForwarding', line: 'AllowAgentForwarding no' }
        - { regexp: '^#?ClientAliveInterval', line: 'ClientAliveInterval 300' }
        - { regexp: '^#?ClientAliveCountMax', line: 'ClientAliveCountMax 2' }
      notify: Restart SSH

    - name: Ensure SSH is enabled
      systemd:
        name: sshd
        state: started
        enabled: true

  handlers:
    - name: Restart SSH
      systemd:
        name: sshd
        state: restarted

Important: Be careful with SSH configuration changes. Always test with --check first and make sure you have console access (IPMI, Proxmox console) in case you lock yourself out.

Monitoring and Alerting Setup

Create playbooks/monitoring.yml to deploy node exporters on all servers and configure Prometheus to scrape them:

---
- name: Deploy monitoring agents on all servers
  hosts: all
  become: true

  vars:
    node_exporter_version: "1.8.1"

  tasks:
    - name: Check if node_exporter is already installed
      stat:
        path: /usr/local/bin/node_exporter
      register: ne_binary

    - name: Get installed node_exporter version
      command: /usr/local/bin/node_exporter --version
      register: ne_version
      changed_when: false
      failed_when: false
      when: ne_binary.stat.exists

    - name: Download node_exporter
      get_url:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
        dest: /tmp/node_exporter.tar.gz
        mode: "0644"
      when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))

    - name: Extract node_exporter
      unarchive:
        src: /tmp/node_exporter.tar.gz
        dest: /tmp/
        remote_src: true
      when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))

    - name: Install node_exporter binary
      copy:
        src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
        dest: /usr/local/bin/node_exporter
        owner: root
        group: root
        mode: "0755"
        remote_src: true
      notify: Restart node_exporter
      when: not ne_binary.stat.exists or node_exporter_version not in (ne_version.stderr | default(''))

    - name: Create node_exporter user
      user:
        name: node_exporter
        system: true
        shell: /bin/false
        create_home: false

    - name: Create node_exporter systemd service
      copy:
        dest: /etc/systemd/system/node_exporter.service
        content: |
          [Unit]
          Description=Prometheus Node Exporter
          After=network-online.target
          Wants=network-online.target

          [Service]
          User=node_exporter
          Group=node_exporter
          Type=simple
          ExecStart=/usr/local/bin/node_exporter \
            --collector.systemd \
            --collector.processes \
            --web.listen-address=:9100

          [Install]
          WantedBy=multi-user.target
        owner: root
        group: root
        mode: "0644"
      notify:
        - Reload systemd
        - Restart node_exporter

    - name: Allow node_exporter port in UFW
      ufw:
        rule: allow
        port: "9100"
        proto: tcp
        from_ip: 192.168.1.0/24
        comment: "Prometheus node_exporter"

    - name: Enable and start node_exporter
      systemd:
        name: node_exporter
        state: started
        enabled: true

  handlers:
    - name: Reload systemd
      systemd:
        daemon_reload: true

    - name: Restart node_exporter
      systemd:
        name: node_exporter
        state: restarted


- name: Generate Prometheus configuration
  hosts: docker_hosts[0]
  become: true

  vars:
    stack_base_dir: /opt/stacks

  tasks:
    - name: Generate Prometheus scrape config from inventory
      copy:
        dest: "{{ stack_base_dir }}/monitoring/prometheus.yml"
        content: |
          global:
            scrape_interval: 15s
            evaluation_interval: 15s

          scrape_configs:
            - job_name: "prometheus"
              static_configs:
                - targets: ["localhost:9090"]

            - job_name: "node-exporter"
              static_configs:
                - targets:
          {% for host in groups['all'] %}
                    - "{{ hostvars[host]['ansible_host'] }}:9100"  # {{ host }}
          {% endfor %}

            - job_name: "cadvisor"
              static_configs:
                - targets:
          {% for host in groups['docker_hosts'] %}
                    - "{{ hostvars[host]['ansible_host'] }}:8080"  # {{ host }}
          {% endfor %}
        owner: jerry
        group: docker
        mode: "0644"
      notify: Restart Prometheus

  handlers:
    - name: Restart Prometheus
      community.docker.docker_compose_v2:
        project_src: "{{ stack_base_dir }}/monitoring"
        services:
          - prometheus
        state: restarted

This playbook installs node_exporter on every server in your inventory and then generates the Prometheus configuration automatically from your inventory. When you add a new server to your inventory, re-running this playbook adds it to monitoring automatically.

Organizing Your Ansible Project: Roles and Directory Structure

As your playbooks grow, you will want to organize them into roles. Here is the recommended structure:

~/homelab-ansible/
├── ansible.cfg
├── inventory/
│   ├── hosts.yml
│   ├── group_vars/
│   │   ├── all.yml          # Variables for all hosts
│   │   ├── docker_hosts.yml  # Variables for Docker hosts
│   │   └── proxmox_nodes.yml
│   └── host_vars/
│       ├── docker01.yml
│       ├── docker02.yml
│       └── docker03.yml
├── playbooks/
│   ├── site.yml              # Main playbook that imports everything
│   ├── base-setup.yml
│   ├── deploy-stacks.yml
│   ├── update-systems.yml
│   └── backup.yml
├── roles/
│   ├── common/
│   │   ├── tasks/
│   │   │   └── main.yml
│   │   ├── handlers/
│   │   │   └── main.yml
│   │   ├── templates/
│   │   │   └── sshd_config.j2
│   │   └── defaults/
│   │       └── main.yml
│   ├── docker/
│   │   ├── tasks/
│   │   │   └── main.yml
│   │   ├── handlers/
│   │   │   └── main.yml
│   │   └── defaults/
│   │       └── main.yml
│   ├── monitoring/
│   │   ├── tasks/
│   │   │   └── main.yml
│   │   ├── files/
│   │   │   └── node_exporter.service
│   │   └── defaults/
│   │       └── main.yml
│   └── backup/
│       ├── tasks/
│       │   └── main.yml
│       ├── templates/
│       │   └── backup.sh.j2
│       └── defaults/
│           └── main.yml
└── stacks/
    ├── traefik/
    ├── monitoring/
    └── gitea/

The Master Playbook

Create playbooks/site.yml that ties everything together:

---
- name: Apply base configuration to all servers
  hosts: all
  become: true
  roles:
    - common

- name: Configure Docker hosts
  hosts: docker_hosts
  become: true
  roles:
    - docker
    - monitoring

- name: Deploy application stacks
  hosts: docker_hosts
  become: true
  roles:
    - role: backup
      tags: [backup]

  tasks:
    - name: Include stack deployment
      include_tasks: deploy-stacks.yml
      tags: [stacks]

Run the entire setup with:

# Configure everything from scratch
ansible-playbook playbooks/site.yml

# Only run specific tags
ansible-playbook playbooks/site.yml --tags "docker,stacks"

# Skip specific tags
ansible-playbook playbooks/site.yml --skip-tags "backup"

Secrets Management with Ansible Vault

Never put passwords, API keys, or tokens in plain text in your playbooks or inventory. Ansible Vault encrypts sensitive data so you can safely commit it to Git.

Creating Encrypted Variables

# Create a new encrypted variable file
ansible-vault create inventory/group_vars/docker_hosts/vault.yml

# Edit an existing encrypted file
ansible-vault edit inventory/group_vars/docker_hosts/vault.yml

# Encrypt an existing file
ansible-vault encrypt secrets.yml

# View encrypted file contents
ansible-vault view inventory/group_vars/docker_hosts/vault.yml

Example: Encrypted Variables File

# inventory/group_vars/docker_hosts/vault.yml (encrypted)
vault_grafana_admin_password: "SuperSecretPassword123"
vault_gitea_db_password: "AnotherSecretPassword456"
vault_traefik_dashboard_password: "YetAnotherPassword789"
vault_backup_encryption_key: "EncryptionKeyForBackups"

Reference these in your regular variables:

# inventory/group_vars/docker_hosts/vars.yml (not encrypted)
grafana_admin_password: "{{ vault_grafana_admin_password }}"
gitea_db_password: "{{ vault_gitea_db_password }}"

Running Playbooks with Vault

# Prompt for vault password
ansible-playbook playbooks/site.yml --ask-vault-pass

# Use a password file (better for automation)
echo "your-vault-password" > ~/.vault_pass
chmod 600 ~/.vault_pass
ansible-playbook playbooks/site.yml --vault-password-file ~/.vault_pass

# Or set it in ansible.cfg
# [defaults]
# vault_password_file = ~/.vault_pass

Important: Add .vault_pass and any password files to your .gitignore.

Writing Secrets to Disk Securely

When deploying Docker stacks that need secrets as files:

- name: Write Grafana admin password
  copy:
    content: "{{ vault_grafana_admin_password }}"
    dest: "{{ stack_base_dir }}/monitoring/secrets/grafana_admin_password.txt"
    owner: jerry
    group: docker
    mode: "0600"
  no_log: true  # Prevents the password from appearing in Ansible output

The no_log: true directive is critical — without it, Ansible logs the full content of the task, including the decrypted password.

Common Mistakes and How to Avoid Them

Mistake 1: Not Using —check First

Always dry-run before applying:

ansible-playbook playbooks/site.yml --check --diff

The --diff flag shows exactly what would change in files. This catches misconfigurations before they hit production.

Mistake 2: Using shell/command When a Module Exists

Bad:

- name: Install nginx
  command: apt-get install -y nginx

Good:

- name: Install nginx
  apt:
    name: nginx
    state: present

Modules are idempotent by design. The apt module checks if nginx is already installed and skips the task if so. The command module runs every time regardless.

Mistake 3: Not Using Handlers

Bad:

- name: Update nginx config
  copy:
    src: nginx.conf
    dest: /etc/nginx/nginx.conf

- name: Restart nginx
  systemd:
    name: nginx
    state: restarted

This restarts nginx every time the playbook runs, even if the config did not change.

Good:

- name: Update nginx config
  copy:
    src: nginx.conf
    dest: /etc/nginx/nginx.conf
  notify: Restart nginx

handlers:
  - name: Restart nginx
    systemd:
      name: nginx
      state: restarted

Handlers only run when notified by a changed task. No config change means no restart.

Mistake 4: Hardcoding Values Instead of Using Variables

Bad:

- name: Create user
  user:
    name: jerry
    groups: sudo,docker

Good:

# In group_vars/all.yml
admin_user: jerry
admin_groups:
  - sudo
  - docker

# In tasks
- name: Create user
  user:
    name: "{{ admin_user }}"
    groups: "{{ admin_groups }}"
    append: true

Variables make your playbooks reusable and configurable without editing the tasks themselves.

Mistake 5: Ignoring Ansible-lint

Install and run ansible-lint to catch common issues:

pipx install ansible-lint

# Lint your playbooks
ansible-lint playbooks/

It catches things like deprecated modules, missing name fields, command tasks that should use modules, and YAML formatting issues. Set it up in your editor for real-time feedback.

Final Thoughts

Ansible transforms your homelab from a collection of manually-configured machines into reproducible, documented infrastructure. The investment pays off the first time you need to rebuild a server — instead of spending a weekend recreating your setup from memory, you run one command and go make coffee.

Start small. Pick one pain point — maybe it is installing Docker on new machines, or running updates across all servers — and automate it. Once you see a playbook do in 30 seconds what used to take you 20 minutes of SSH sessions, you will want to automate everything else.

The playbooks in this guide are complete and working. Copy them, adapt the variables to your environment, and build from there. As your homelab grows, your Ansible repository grows with it, and every new server you add is configured correctly from the first boot.

The best infrastructure is the kind you do not have to think about. Ansible gets you there.