Methodology
How the archive was built — from discovery to publishing.
Overview
The Orgasm Sound Archive was built by systematically recovering audio files from captures of orgasmsoundlibrary.com stored in the Internet Archive's Wayback Machine. The pipeline ran in four stages: discovery, harvest, processing, and publishing.
Pipeline
-
Discovery
The Internet Archive's Wayback Machine CDX API was queried for captures of orgasmsoundlibrary.com audio files. Over 3,000 unique audio URLs were identified.
-
Harvest
Each audio file was downloaded from Wayback Machine captures. SHA-256 hashes were computed for integrity verification. Download state was tracked in SQLite.
-
Processing
Metadata was extracted from CSV manifests and merged with the download state. Duration, format, tags, and provenance data were normalized into a unified catalog.
-
Publishing
Audio files are served from Cloudflare R2. The catalog is a static Astro site with 3,570 individual recording pages, deployed via a Cloudflare Worker.
Integrity
Every recording has a SHA-256 checksum computed at download time. The checksum is stored in the catalog and displayed on each recording's detail page, allowing independent verification against the Wayback Machine source.
Provenance
Each file's Wayback Machine capture URL and capture timestamp are preserved in the catalog. This links every recording to the specific snapshot from which it was retrieved, providing a full chain of custody from the original site to this archive.
Completeness
3,352 unique audio files were identified and successfully downloaded from the Wayback Machine. A small number of additional URLs were found in the CDX index but could not be retrieved (timeout, corrupt capture, or duplicate hash). Those are excluded from the published catalog.