# 🌐 Website Downloader CLI [![CI – Website Downloader](https://github.com/PKHarsimran/website-downloader/actions/workflows/python-app.yml/badge.svg)](https://github.com/PKHarsimran/website-downloader/actions/workflows/python-app.yml) [![Lint & Style](https://github.com/PKHarsimran/website-downloader/actions/workflows/lint.yml/badge.svg)](https://github.com/PKHarsimran/website-downloader/actions/workflows/lint.yml) [![Automatic Dependency Submission](https://github.com/PKHarsimran/website-downloader/actions/workflows/dependency-graph/auto-submission/badge.svg)](https://github.com/PKHarsimran/website-downloader/actions/workflows/dependency-graph/auto-submission) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/) [![Code style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) Website Downloader CLI is a **tiny, pure-Python** site-mirroring tool that lets you grab a complete, browsable offline copy of any publicly reachable website: * Recursively crawls every same-origin link (including “pretty” `/about/` URLs) * Downloads **all** assets (images, CSS, JS, …) * Rewrites internal links so pages open flawlessly from your local disk * Streams files concurrently with automatic retry / back-off * Generates a clean, flat directory tree (`example_com/index.html`, `example_com/about/index.html`, …) * Handles extremely long filenames safely via hashing and graceful fallbacks > Perfect for web archiving, pentesting labs, long flights, or just poking around a site without an internet connection. --- ## 🚀 Quick Start se non è installato sudo apt install python3-full python3-venv -y installazione ```sh sudo -s cd /usr/local mkdir python git clone https://forgit.patachina.it/Fabio/website-downloader.git cd website-downloader python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python website-downloader.py \ --url https://harsim.ca \ --destination harsim_ca_backup \ --max-pages 100 \ --threads 8 deactivate cp downloadsite.sh /usr/local/bin exit ``` ```bash # 1. Grab the code git clone https://github.com/PKHarsimran/website-downloader.git cd website-downloader # 2. Install dependencies (only two runtime libs!) pip install -r requirements.txt # 3. Mirror a site – no prompts needed python website-downloader.py \ --url https://harsim.ca \ --destination harsim_ca_backup \ --max-pages 100 \ --threads 8 ``` --- ## 🛠️ Libraries Used | Library | Emoji | Purpose in this project | |---------|-------|-------------------------| | **requests** + **urllib3.Retry** | 🌐 | High-level HTTP client with automatic retry / back-off for flaky hosts | | **BeautifulSoup (bs4)** | 🍜 | Parses downloaded HTML and extracts every ``, ``, `