Discord Archiving Guide

To create a self-contained archive folder with the viewer HTML file and an attachments folder:

  1. Generate a Save Viewer HTML file in DHT (≤ v41.1) as usual, with no attachment downloading needed in DHT. ⚠ From here, you must run steps 2 and 3 within a few hours; I waited 18h once and 31% of my URLs had expired.
  2. Run this Python script to list all cdn.discordapp.com/attachments URLs from that HTML in a separate file i.txt.
    import re
    with open('archive.html', 'r', encoding="utf8") as file:
     text = file.read().replace("\\\\u0026","&")
    res = re.findall(r"(https:\/\/cdn.discordapp.com\/attachments\/\d+\/\d+\/[^\?]+(?:[\w\?=]+&){3})", text)
    with open('i.txt', "w+", encoding="utf-8") as f:
     f.write("\n".join(res))
    
  3. Run Linux/WSL wget -nv -x -i i.txt to create the archive (-nv = non-verbose output, useful for then finding failed downloads in a text-editor easily; -x creates a folder structure mimicking the source; -i is input list). I then unnested the attachments folder from the cdn.discordapp.com folder, to be next to the HTML file.
  4. Replace all https://cdn.discordapp.com/attachments/ URLs in the HTML file with just attachments/.
  5. Run this bash script in the attachments folder to get rid of the query strings in the attachment filenames.
    # /bin/bash
    for i in `find $1 -type f`
    do
     mv $i `echo $i | cut -d? -f1`
    done
    
  6. On newer DHT versions, I had to replace the body of isImageAttachment( in the viewer HTML file with this old version to get rid of the URL validation so local addresses can be used. Add in any extra extensions you need.
    isImageAttachment(attachment) {
     const extension = attachment.url.split("?")[0].split(".").at(-1);
     return ["png", "gif", "jpg", "jpeg", "webp"].includes(extension);
    },
    

That’s it – the HTML file now works with a permanent local archive, and it can be zipped up with the attachments folder or the attachments uploaded to a different CDN with URLs replaced with external ones again.