Methodology

Overview

The Orgasm Sound Archive was built by systematically recovering audio files from captures of orgasmsoundlibrary.com stored in the Internet Archive's Wayback Machine. The pipeline ran in four stages: discovery, harvest, processing, and publishing.

Pipeline

Discovery

The Internet Archive's Wayback Machine CDX API was queried for captures of orgasmsoundlibrary.com audio files. Over 3,000 unique audio URLs were identified.
Harvest

Each audio file was downloaded from Wayback Machine captures. SHA-256 hashes were computed for integrity verification. Download state was tracked in SQLite.
Processing

Metadata was extracted from CSV manifests and merged with the download state. Duration, format, tags, and provenance data were normalized into a unified catalog.
Publishing

Audio files are served from Cloudflare R2. The catalog is a static Astro site with 3,570 individual recording pages, deployed via a Cloudflare Worker.

Integrity

Every recording has a SHA-256 checksum computed at download time. The checksum is stored in the catalog and displayed on each recording's detail page, allowing independent verification against the Wayback Machine source.

Provenance

Each file's Wayback Machine capture URL and capture timestamp are preserved in the catalog. This links every recording to the specific snapshot from which it was retrieved, providing a full chain of custody from the original site to this archive.

Completeness

3,352 unique audio files were identified and successfully downloaded from the Wayback Machine. A small number of additional URLs were found in the CDX index but could not be retrieved (timeout, corrupt capture, or duplicate hash). Those are excluded from the published catalog.

Overview

Pipeline

Discovery

Harvest

Processing

Publishing

Integrity

Provenance

Completeness