bplr 9 hours ago

A powerful, stealthy website cloner/scraper built with TypeScript that downloads entire websites for offline use. Supports HTTP proxy authentication, comprehensive asset downloading (CSS, JS, images, SVG sprites, fonts, etc.), and intelligent URL rewriting.

Features Complete Website Cloning - Downloads HTML, CSS, JavaScript, images, fonts, and all other assets HTTP Proxy Support - Connect through HTTP proxies with username/password authentication SVG Sprite Support - Properly handles SVG sprites with xlink:href references Smart URL Rewriting - Converts all URLs to relative local paths for offline browsing Stealthy Crawling - Configurable delays, random user agents, and realistic headers Asset Discovery - Extracts assets from: HTML tags (img, script, link, etc.) CSS files (background images, fonts, etc.) Inline styles SVG sprites and references srcset attributes Data attributes (data-src, data-lazy-src) CSS Processing - Parses CSS files to download referenced assets External Link Handling - Optional following of external links Progress Tracking - Real-time statistics and detailed logging Highly Configurable - Control depth, patterns, delays, and more