Data & Language Sciences — Sylvester Longitudinal Study

Why these are separate from Pillar I: Computer Science produces the tools. These disciplines define how meaning is extracted from data at scale. A Ph.D. in Computer Science does not require corpus annotation methodology, data provenance frameworks, or search engine ranking science. These are structurally separate doctoral programs with distinct faculty, journals, and qualifying examinations.

⚠ Conditional — Linguistics Half Not Demonstrated

14. Ph.D. in Computational Linguistics & Corpus Science

The YouTube Corpus Analyzer treats eleven years of consumption history (2015–2026) as a primary-source linguistic dataset. The 526-skill taxonomy was extracted via systematic corpus analysis (Claude Sonnet 4.5 processing of resume documents). The TDDFlow system implements machine-readable tags for cognitive state classification across hundreds of thousands of lines of LLM dialogue. The behavioral metadata framework transforms lived experience into analyzable corpus through skill correlation, learning phase identification, and cross-domain pattern analysis. NLP ethics frameworks (data statements, datasheets for datasets, model cards) are applied to document corpus provenance.

April 2026 audit finding: The corpus science half of this claim is legitimate and well-evidenced. The linguistics half is not demonstrated. A Computational Linguistics doctorate requires formal linguistic analysis: syntax, morphology, phonology, inter-annotator reliability, and engagement with formal grammar theory. LLM-assisted skill extraction and behavioral metadata classification are data science methods, not formal linguistic analysis. The title of the claim is broader than the evidence supports. The claim is retained but will be retitled when the site moves to a database-driven architecture — candidate title: “Ph.D. in Corpus Science & Behavioral Metadata Engineering.”

Portfolio Evidence

YouTube Corpus Analyzer: consumption history as primary-source linguistic data — Corpus Architecture (Dataset 1)
526-skill taxonomy extraction via systematic corpus processing — Research Instruments
TDDFlow: machine-readable tag system for cognitive state classification — TDDFlow Help Guide
Behavioral metadata framework with skill correlation and learning phase identification — Study Methodology
Mood-skill taxonomy with privacy sanitization protocols — Ethics Framework
NLP data documentation: data statements, datasheets, model cards applied to personal corpora
Consolidated content corpus: hundreds of thousands of lines as structured dataset — Corpus Architecture (Dataset 2)

Published Corpus Evidence: The Research Corpus itself is a corpus linguistics artifact — 18 published documents with extracted metadata, auto-indexed by section. The Attributions and Acknowledgments document maps the context engineering methodology and AI development credits, demonstrating corpus provenance at the documentation level. The TDDFlow Role Profiles implement machine-readable cognitive state classification.

What would strengthen this claim: Publication of formal linguistic analysis demonstrating syntax parsing, morphological analysis, or inter-annotator reliability work. The corpus processing is genuine and substantial; the linguistic analysis layer is the gap. The corpus science evidence is strong enough to stand alone under a revised title.

15. Ph.D. in Data Science & Machine Learning Engineering

The MTG local model pipeline is a complete ML engineering project: Docker containerized, Jenkins automated, with corpus generation, model fine-tuning via Ollama, retrieval-augmented generation, and evaluation metrics. The IP Nexus implements a database-driven analytics system with 2,126 NAICS codes, coverage views, and file association scanning across three-million-plus files. The density framework implements calibration distributions, experiment tracking (A/B/C methodology), and scorer pipelines with modular architecture.

Portfolio Evidence

MTG local model pipeline: Docker + Jenkins + Ollama with corpus/retrieval/eval modules — LAN AISec Adventure (infrastructure)
IP Nexus: SQLite analytics with NAICS coverage views and file scanning — IP Nexus
MTG density hybrid project: modular scorer/calibration/experiment architecture — Data Density Framework
Punchcard Compiler v7: custom data processing toolkit
Trinity Clock integration for temporal analysis
Cross-domain pattern analysis across 12 skill taxonomy domains — Research Instruments

16. Ph.D. in Marketing Science & Consumer Behavior

New in the revised analysis. Two deep-research SEO analyses totaling 170+ pages demonstrate doctoral-level understanding of Google’s ranking algorithms (relevance, distance, prominence), local search systems (Google Business Profile optimization, local pack mechanics), structured data ontologies (LocalBusiness JSON-LD, FAQPage schema), crawl budget science (sitemap submission, canonical consolidation, robots.txt vs. noindex), and keyword research methodology (Semrush volume estimation vs. Search Console ground truth). The GoldHat site v4 implements comprehensive Schema.org structured data, Open Graph metadata, and user-agent detection for twenty-plus search engine crawlers.

Portfolio Evidence

Two deep-research SEO analyses (170+ pages combined)
Google ranking algorithm science: relevance, distance, prominence levers
Local search systems: Google Business Profile optimization, local pack mechanics
Structured data implementation: LocalBusiness JSON-LD, FAQPage schema
Crawl budget engineering: sitemap strategy, canonical consolidation, Search Console API
Keyword research methodology: Semrush estimation vs. GSC ground truth
User-agent detection for 20+ search engine crawlers
Core Web Vitals optimization on IONOS shared hosting