Corpus Overview
The Sylvester Corpus was formally initiated at 11:47:51 PM EST on January 24, 2026, with the statement: “Thus, we TDD. Especially Ourselves.” The corpus name references the researcher’s surname and establishes a citational anchor for all subsequent research outputs.
Dataset 1: YouTube Behavioral Metadata (2015–2026)
YouTube consumption history provides timestamped, machine-readable evidence immune to retrospective bias. When analyzed through the triple integrated perspective filters (Ascetic-INFJ, LGBTQ+ Gender Expression, and Polymath Capacity), this dataset reveals learning patterns, skill development trajectories, cognitive states, and identity exploration patterns that the researcher could not fabricate retroactively.
Triple Integrated Perspective Filters
Filter 1: Ascetic-INFJ
Maps consumption through INFJ cognitive functions (Ni-Fe-Ti-Se). Detects ascetic practice periods. Classifies 17 spiritual/cognitive states with privacy-aware interpretation.
Filter 2: LGBTQ+ Expression
Bisexual orientation (public), creative expression and costuming (contextual). Consumption states across privacy gradients. Cultural touchstone mapping including Rocky Horror Picture Show participation.
Filter 3: Polymath Capacity
Cross-domain learning detection, skill cluster identification, Rocky Horror security context, and integration analysis across all identity dimensions.
Dataset 2: TDDFlow LLM Corpus
Hundreds of thousands of lines of LLM dialogue processed through TDDFlow v2, a hydrological cognitive scaffolding system:
| TDDFlow Element | Analog | Function |
|---|---|---|
| Lake | Major life domain | Ministry, QA, therapy, ranching |
| River | Work stream | Planning, research, implementation |
| Tributary | Detailed thread | Specific investigation within a stream |
| Puddle Packet | Actionable task | Smallest unit of tracked work |
| Cloud | Insight crystallized | Ready to execute |
| Rain | Implementation | Active execution of insight |
| Hurricane | Deep work sprint | Scope-locked intensive execution |
| Delta | Closure checkpoint | Archive and verify completion |
| Sediment | Version layer | Thought evolution tracked over time |
Dataset 3: Professional Documentation
526+ competencies across 12 domains, extracted and verified through multiple AI-assisted analyses of professional resume and cover letter documents. The skill taxonomy spans: technical skills (161), leadership/management (75), security/compliance (32), domain crossover (41), methodological innovation (28), and additional categories.
Dataset 4: Distributed Filesystem (Drives D–K)
Eight storage drives across Windows and Ubuntu dual-boot systems, representing the physical architecture of polymath-scale knowledge organization. The filesystem itself is a research instrument—its structure reveals organizational patterns, domain prioritization, and knowledge architecture that supports claims of cross-domain expertise.
Infrastructure
Privacy Through Ownership
The Sylvester Corpus employs self-hosted Ollama GPU processing (Docker container: mnccouk/ollama-gpu-rx580) on an AMD RX 580. This ensures complete data sovereignty: no cloud uploads, no surveillance capitalism, no third-party data processing. The researcher owns every byte of the analysis pipeline.
This infrastructure choice is not merely technical preference but a methodological requirement: the corpus contains sacred spiritual content, therapeutic processing, and ministry confidentiality that must remain under researcher control.