Discord Archiving Guide
To create a self-contained archive folder with the viewer HTML file and an attachments folder:
- Generate a Save Viewer HTML file in DHT (≤ v41.1) as usual, with no attachment downloading needed in DHT. ⚠ From here, you must run steps 2 and 3 within a few hours; I waited 18h once and 31% of my URLs had expired.
- Run this Python script to list all
cdn.discordapp.com/attachments
URLs from that HTML in a separate filei.txt
.import re with open('archive.html', 'r', encoding="utf8") as file: text = file.read().replace("\\\\u0026","&") res = re.findall(r"(https:\/\/cdn.discordapp.com\/attachments\/\d+\/\d+\/[^\?]+(?:[\w\?=]+&){3})", text) with open('i.txt', "w+", encoding="utf-8") as f: f.write("\n".join(res))
- Run Linux/WSL
wget -nv -x -i i.txt
to create the archive (-nv
= non-verbose output, useful for then finding failed downloads in a text-editor easily;-x
creates a folder structure mimicking the source;-i
is input list). I then unnested the attachments folder from the cdn.discordapp.com folder, to be next to the HTML file. - Replace all
https://cdn.discordapp.com/attachments/
URLs in the HTML file with justattachments/
. - Run this bash script in the
attachments
folder to get rid of the query strings in the attachment filenames.# /bin/bash for i in `find $1 -type f` do mv $i `echo $i | cut -d? -f1` done
- On newer DHT versions, I had to replace the body of
isImageAttachment(
in the viewer HTML file with this old version to get rid of the URL validation so local addresses can be used. Add in any extra extensions you need.isImageAttachment(attachment) { const extension = attachment.url.split("?")[0].split(".").at(-1); return ["png", "gif", "jpg", "jpeg", "webp"].includes(extension); },
That’s it – the HTML file now works with a permanent local archive, and it can be zipped up with the attachments folder or the attachments uploaded to a different CDN with URLs replaced with external ones again.