⚠ Conditional — Linguistics Half Not Demonstrated
14. Ph.D. in Computational Linguistics & Corpus Science
The YouTube Corpus Analyzer treats eleven years of consumption history (2015–2026) as a primary-source linguistic dataset. The 526-skill taxonomy was extracted via systematic corpus analysis (Claude Sonnet 4.5 processing of resume documents). The TDDFlow system implements machine-readable tags for cognitive state classification across hundreds of thousands of lines of LLM dialogue. The behavioral metadata framework transforms lived experience into analyzable corpus through skill correlation, learning phase identification, and cross-domain pattern analysis. NLP ethics frameworks (data statements, datasheets for datasets, model cards) are applied to document corpus provenance.
Portfolio Evidence
- YouTube Corpus Analyzer: consumption history as primary-source linguistic data — Corpus Architecture (Dataset 1)
- 526-skill taxonomy extraction via systematic corpus processing — Research Instruments
- TDDFlow: machine-readable tag system for cognitive state classification — TDDFlow Help Guide
- Behavioral metadata framework with skill correlation and learning phase identification — Study Methodology
- Mood-skill taxonomy with privacy sanitization protocols — Ethics Framework
- NLP data documentation: data statements, datasheets, model cards applied to personal corpora
- Consolidated content corpus: hundreds of thousands of lines as structured dataset — Corpus Architecture (Dataset 2)
15. Ph.D. in Data Science & Machine Learning Engineering
The MTG local model pipeline is a complete ML engineering project: Docker containerized, Jenkins automated, with corpus generation, model fine-tuning via Ollama, retrieval-augmented generation, and evaluation metrics. The IP Nexus implements a database-driven analytics system with 2,126 NAICS codes, coverage views, and file association scanning across three-million-plus files. The density framework implements calibration distributions, experiment tracking (A/B/C methodology), and scorer pipelines with modular architecture.
Portfolio Evidence
- MTG local model pipeline: Docker + Jenkins + Ollama with corpus/retrieval/eval modules — LAN AISec Adventure (infrastructure)
- IP Nexus: SQLite analytics with NAICS coverage views and file scanning — IP Nexus
- MTG density hybrid project: modular scorer/calibration/experiment architecture — Data Density Framework
- Punchcard Compiler v7: custom data processing toolkit
- Trinity Clock integration for temporal analysis
- Cross-domain pattern analysis across 12 skill taxonomy domains — Research Instruments
16. Ph.D. in Marketing Science & Consumer Behavior
New in the revised analysis. Two deep-research SEO analyses totaling 170+ pages demonstrate doctoral-level understanding of Google’s ranking algorithms (relevance, distance, prominence), local search systems (Google Business Profile optimization, local pack mechanics), structured data ontologies (LocalBusiness JSON-LD, FAQPage schema), crawl budget science (sitemap submission, canonical consolidation, robots.txt vs. noindex), and keyword research methodology (Semrush volume estimation vs. Search Console ground truth). The GoldHat site v4 implements comprehensive Schema.org structured data, Open Graph metadata, and user-agent detection for twenty-plus search engine crawlers.
Portfolio Evidence
- Two deep-research SEO analyses (170+ pages combined)
- Google ranking algorithm science: relevance, distance, prominence levers
- Local search systems: Google Business Profile optimization, local pack mechanics
- Structured data implementation: LocalBusiness JSON-LD, FAQPage schema
- Crawl budget engineering: sitemap strategy, canonical consolidation, Search Console API
- Keyword research methodology: Semrush estimation vs. GSC ground truth
- User-agent detection for 20+ search engine crawlers
- Core Web Vitals optimization on IONOS shared hosting