Agentic Research Infrastructure for AI Alignment, Governance and Adaptation

Tap or hover any node in the diagram to see what it does.

What Is This System?Read("what_is_this_system")

A research automation platform built around Minty, an AI research management agent running on a dedicated Mac Studio. 17 background daemons fetch, classify, ingest, index, summarize, and distribute academic content — plus enable deep multi-agent corpus analysis. Research superpowers, not a replacement for researchers.

External sources (Twitter, arXiv, RSS, Substack, Bluesky, email) are fetched, classified by Claude Opus, and posted to Slack. Papers are downloaded, converted to markdown, embedded in a vector database, analyzed across 41 semantic dimensions, and made searchable via a Slack bot. A daily digest reaches 29 lab members across 11 institutions.

The Content PipelineAgent("trace the content pipeline", subagent_type="Explore")

All content flows through #firehose — two daemons write to it, three read from it. Source: minty-private under daemons/pipeline/.

Sources

6 parallel fetchers

Classify

Opus, 6 dimensions

Curate

Human review

#firehose

Central hub

Ingest

PDFs + corpus pipeline

Analyse

41Q deep analysis

Report

Daily + weekly digests

Intake Path 1: Automated Discovery

The feedme-daily daemon runs daily, fetching from six source types in parallel via a thread pool:

Source	Method	Window	Max Items
Twitter/X	API v2 (OAuth1)	24h	200
Bluesky	Feed API	24h	200
RSS/Substack	Feed parser	7 days	30
arXiv	API	5 days	50
LinkedIn	Browser automation	100 posts	50
Email newsletters	Gmail API	7 days	30

RAW ITEMS

→

Content hash URL hash Canonical IDs

→

Union-Find

→

best by engagement > length > time

→

UNIQUE ITEMS

Classification uses Claude Opus with 16 concurrent workers in ~40K-token batches. Each item is scored across six dimensions:

Dimension	Weight	What It Measures
Relevance	0.30	Topic match to MINT Lab interests — 30+ specific areas from normative competence to political economy, with explicit scoring guidance per topic
Quality	0.25	Substance vs. noise. Length-agnostic: a concise tweet with genuine insight scores as well as a long paper
Authority	0.15	Author/institution credibility. Named labs (Anthropic, DeepMind, etc.) score 0.9+
Novelty	0.15	Breaking news to stale content
Actionability	0.10	Urgency and time-sensitivity
Engagement	0.05	Social proof normalized to platform norms (weak signal)

COMPOSITE SCORE THRESHOLDS

arXiv

≥ 0.60

Default

≥ 0.50

PhilPapers

≥ 0.45

progress_studies ai_science +0.06 boost Allowlisted accounts bypass classification

Inside the Classification Prompt

Every item is evaluated against a 177-line classification prompt encoding the lab's identity, topic taxonomy, quality heuristics, scoring floors, and hard exclusions. Tap or hover each theme for exact prompt language.

Lab Identity & MissionFramed around institutional renewal

Anchors every classification in the lab's institutional identity and research mission. From the prompt: 'You are a research feed classifier for the MINT Research Lab at Johns Hopkins University, directed by Seth Lazar. The lab works on AI safety, normative competence of LLMs, agentic AI governance, and political philosophy of AI.' And: 'The lab's research is situated within a broader concern: liberal democracies face institutional, economic, and cultural decline. AI offers the best opportunity to reverse this trajectory — but only if deployed wisely.'

[]

Topic Taxonomy30+ equally weighted interest areas

Defines 30+ equally weighted interest areas — the classifier must not treat any as 'core' vs. 'peripheral.' From the prompt: 'ALL of these interests are equally important: normative competence, benchmark methodology, agent governance, calibrated safety, catastrophic risk, AI and democracy, AI policy, agent security, interpretability, AI economics, platform governance, frontier lab developments, progress studies, advancement of science, abundance agenda, institutional dysfunction/reform...' And: 'Domestic and international politics ONLY when about structural/institutional change. Exclude: commentary about individual politicians, horse-race coverage, partisan debates.'

Quality Over LengthDensity of insight, not word count

Explicitly counteracts length bias — judges density of insight, not word count. From the prompt: 'Substance can come in ANY length. A concise post with a genuine insight, novel argument, important evidence, or breaking news is high quality regardless of word count.' And: 'A 280-character post making a genuine point is better than a long rehash.' And: 'Do NOT penalize reshares/link-posts if the underlying content is relevant and substantive. Judge what's being pointed to, not just the wrapper.'

Boosted SourcesScoring floors for key sources

Sets scoring floors for strategically important sources — frontier model releases, trusted authors, and global labs. From the prompt: 'New foundation model releases from ANY lab worldwide — score 0.85+ relevance. This includes video/image/audio models, not just language models.' And: 'Chinese labs (DeepSeek, ByteDance/Doubao, Baidu ERNIE, Alibaba Qwen, Zhipu GLM, Moonshot/Kimi, 01.AI/Yi) — score 0.85+. Track the global frontier, not just Western labs.'

PhilPapersTopic relevance over journal prestige

Specialised handling for philosophy papers — topic relevance and abstract quality trump journal prestige. From the prompt: 'Seth is interested in philosophy of AI broadly — he wants to see what's being published in this space, even from lesser-known journals and authors.' And: 'Judge primarily on TOPIC RELEVANCE and ABSTRACT QUALITY, not journal prestige.' AI consciousness, model welfare, and AI personhood all score 0.8+ relevance.

Hard ExclusionsCrypto, hype, ethics-slop, marketing

Explicit exclusion list that prevents noise from reaching the team. From the prompt: 'anti_interest: hype, crypto, celebrity gossip, partisan political commentary, culture war, ethics-slop, marketing emails, promotional content, webinar invitations.' And: 'Ethics-slop — vague normative language without testable claims.' Marketing content scores Relevance 0.1, Quality 0.1 regardless of topic.

Paper DetectionAcademic papers flagged for enrichment

Detects academic paper links for downstream enrichment — PDF download, abstract extraction, corpus ingestion. From the prompt: 'Check whether the content references or links to academic papers/preprints. Look for: arXiv URLs, DOI links, academic publisher URLs (nature.com, science.org, pnas.org, openreview.net, aclweb.org, etc.).' Detected paper URLs are passed to the enrichment stage.

[+]

Err Toward Inclusion25–30% target pass rate

The prompt's default posture: when uncertain, let it through for human review rather than filtering it out. From the prompt: 'When in doubt about relevance, err toward higher scores — the lab team reviews after.' Target pass rate is 25–30%. 'Do NOT shoehorn items into categories where they don't clearly belong.'

After Classification: Upsampling & Delivery

Five stages transform classified items into structured records and route them to Slack. Hover or tap each stage for details.

Upsample

Headlines + 16 categories

Summarise

Opus, long-form only

Enrich

Abstracts + PDFs

Post

#feed-{day} channels

Promote

👍 → #firehose

Intake Path 2: Manual Sharing

share-to-minty

Real-time intake via Slack Socket Mode — URLs from any device bypass the daily batch.

iPhone Mac hotkey Chrome ext #minty-inbox

Downstream: Three Readers on #firehose

Three daemons consume from #firehose, each for a different purpose. Hover or tap for details.

yesterday-in-ai

Daily narrative digest → 29 lab members. Weekly summary for lab meetings.

news-pdf-bot

3-tier PDF acquisition. Pattern matching → thread expansion → Claude agent.

corpus-ingest

11-stage pipeline → embedded, analyzed, searchable corpus. See §03.

The Result

Five minutes of morning review replaces hours of manual tracking. Everything worth following is captured, classified, summarized, stored, analyzed, and shared automatically.

Corpus IngestionBash("python3 corpus_ingest.py --stages=11")

The MINT Lab's intellectual engine: ~3,100 papers on AI safety, ethics, alignment, governance, and capabilities — all embedded, analyzed, and searchable.

Corpus-Ingest Pipeline Detail

Each paper flows through 11 stages in the unified corpus-ingest daemon. The pipeline supports resume from the last completed stage.

Download

Slack + local PDFs

Rename

Metadata extract

Markdown

6-tier cascade

Embed

Voyage 2048d

Enrich

CrossRef/S2

Analyze

41Q via LLM

RAPTOR

Summaries

Verify

DB check

Drive

Upload PDF+MD

Zotero

Catalog

Report

Slack notify

Markdown Conversion Cascade

Companion (pre-extracted) → arXiv HTML → ar5iv → Marker → PyMuPDF → Tesseract OCR

Falls through tiers until successful conversion. Each tier is tried only if the previous one fails.

Worker Pools

Pipeline: 8 (local) / 2 (slack)

Embed: 4 • Enrich: 4 • Analyze: 6 • RAPTOR: 6

Scale

Papers	~3,100
Text chunks	278,595 (`chunks_md` table, markdown-aware)
Semantic analyses	3,019 (41 questions each)
RAPTOR summaries	2,993
Topic clusters	8 (with 114 micro-topics)
Embedding dims per doc	48 (each 2048d Voyage)
Total columns	141 in documents table
Pre-computed similarities	11,570 pairs
LanceDB size	4.3 GB (compacted after markdown chunk migration)

Search Modes

search-summary — RAPTOR summary search (fast, broad)
search — Full-text semantic search (deep)
search-semantic — Argumentative similarity
search-dimension qNN — 41 semantic dimensions
chunks — Passage-level search
similar / similar-by-dim — Pre-computed similarity
Unified search — Multi-modal with RRF fusion

41-Question Semantic Analysis

Every paper is analyzed across 41 dimensions, each producing text content and a 2048d embedding vector. Hover any cell to see the full question.

Research Core

q01

research_question

q02

thesis

q03

key_claims

q04

evidence

q05

limitations

Methodology

q06

paradigm

q07

methods

q08

data

q09

reproducibility

q10

framework

Context

q11

traditions

q12

key_citations

q13

assumptions

q14

counterarguments

AI-Specific

q15

llm_role

q16

models

q17

capability_claims

q18

risk_claims

Advanced

q27

source_type

q28

institutional_context

q29

historical_timing

q30

paradigm_influence

q31

disciplines_bridged

q32

cross_domain_insights

q33

deployment_gap

q34

infrastructure_contribution

q35

cultural_scope

q36

philosophical_assumptions

q37

power_dynamics

q38

gaps_and_omissions

q39

dual_use_concerns

q40

emergence_claims

q41

remaining_other

RAPTOR Summaries

Document-level overview (~300 words) plus section summaries. Dual purpose: fast search index via document_summary_embedding and context-efficient paper reading for agents.

Four Vector Databases

▪

Paper Corpus

LanceDB • ~4.3 GB

~3,100 papers • ~279K chunks

41 semantic dimensions • 8 clusters

Voyage context-3 (2048d)

Slack Messages

ChromaDB • Every 4h

All public Slack messages

Thread replies concatenated into parents

Voyage voyage-3

News Messages

ChromaDB • Hourly

#mint-community and #firehose

Dedicated news search index

Voyage voyage-3

Workspace Files

ChromaDB • Every 6h

Markdown, Python, PDF, DOCX

Semantic search across workspace

OpenAI embeddings

The CorpusGlob("corpus/**/*.pdf")

A structured research database of ~3,100 papers spanning AI safety, ethics, alignment, governance, capabilities, and normative theory — embedded across 48 vector dimensions, analyzed through 41 semantic questions, organized into 8 clusters, and linked via 11,570 similarity pairs.

Composition

Over 70% of papers are from 2024–2026, reflecting the field's growth, with foundational work also well-represented.

By Year

≤'22

'23

'24

'25

'26

967 pre-2023 • 177 in 2023 • 556 in 2024 • 650 in 2025 • 474 in 2026 (YTD)

Topic Clusters

8 clusters via UMAP + HDBSCAN, each with micro-topics. Largest:

AI safety alignment	2585
Social Power Theory	86
Political Authority Legitimacy	60
Just War Ethics	59
Political Obligation Theory	47
Democratic Legitimacy Theory	35
Distributive Justice Theory	35

Show all 8 clusters

Public Reason Liberalism

By Discipline

Papers span 15 disciplines.

Computer Science & AI

718

Philosophy

669

AI Safety & Alignment

402

Psychology & Cognitive Science

171

Law & Legal Studies

153

Policy & Public Administration

126

AI Ethics & Governance

114

Economics

107

Sociology & Social Science

Political Science & International Relations

Communication & Media Studies

Human-Computer Interaction

Science & Technology Studies (STS)

Other / Unclassified

Education

Cluster Visualization

~3,100 papers projected into 2D via UMAP, coloured by research area. Zoom, pan, and click points to open papers.

Cluster visualization loading…

Explore full map with filters ↗ Inspired by Jay Alammar • Adapted for MINT by Cameron Pattison • Built with datamapplot

Corpus SearchGrep(pattern="your question", path="corpus/")

Mention @Minty in #mint-community with a research question to get an interactive menu of search modes.

⚡

Fast Search

~2 min

📖

Fast Review

~15 min

★

Deep Review

~60 min · Primary

💬

Conversation

Instant

📋

List Previous

Instant

📰

News

~1 min

⏰

Catch Me Up

~2 min

Shortcuts: Skip the menu by prefixing your message — search:, review:, overview:, deep:, research:, news:, catchup:, list, --end. Deep Search (multi-round iterative) is available via the deepsearch: prefix.

Deep Review Pipeline

A persistent Python orchestrator dispatches work to Claude Opus (search, synthesis, QA) and Codex GPT-5.2 (parallel readers), posting results as Slack canvases.

Pipeline Architecture

Orchestrator → worker hierarchy

Orchestrator

Python (persistent)
Dispatches pipeline modes,
manages Slack interaction

Phase 1:
Search

Opus • Fast search +
up to 2 gap-analysis rounds

Phase 2:
Readers

Codex GPT-5.2 • Parallel
Top-tier (xhigh) + Summary-tier (medium)

Phase 3:
Synthesis

Opus • 3,000–5,000 word
Literature Review + Critical Assessment

Phase 4:
QA

Opus • Cross-reference
verification + citation fix pass

Pipeline workers — Claude Opus (search, synthesis, QA) + Codex GPT-5.2 (parallel readers, up to 6 concurrent)

[]

Pipeline Phases

4-phase sequential pipeline detail

Phase	Model	What happens
1. Search	Claude Opus	Multi-round search: fast search with semantic query expansion, then up to 2 gap-analysis rounds with targeted queries to fill coverage holes. Uses warm search server with pre-loaded 4GB LanceDB table and Reciprocal Rank Fusion across 49 embedding dimensions.
2. Parallel Reading	Codex GPT-5.2	Papers split into two tiers. Top-tier (~3 papers each, 6 parallel readers, xhigh compute) produces deep analytical reports. Summary-tier (batches of 20, medium compute) produces condensed summaries. All readers run in parallel.
3. Synthesis	Claude Opus	3,000–5,000 word report with Literature Review + Critical Assessment sections. Integrates all reader reports into a unified thematic analysis.
4. QA Verification	Claude Opus	Cross-references synthesis against reader reports. Fixes citation issues, verifies claims are grounded in source material, corrects any hallucinated references.

Resilience: If the pipeline fails, the system falls back to a legacy approach. It never returns empty-handed if papers were found.

The Minty PersonaRead("IDENTITY.md")

~16,800 words across 21 persona documents. A master CLAUDE.md (2,120 words) sets identity and Iron Laws. Each of the 15 daemons loads its own CLAUDE.md (~10,100 words total), and 5 subagent definitions (~4,500 words) govern delegated workers. All share core values but adapt to context.

Core Identity

Identity	Minty — Research Management Agent for MINT Research Projects. Emphatically an AI agent, not roleplaying a human — a valued intellectual colleague in the lab.
Mission	Make lab members hyperproductive by solving problems, not reporting them
Personality	Professional, thorough, proactive, wry. Dry wit with a touch of whimsy — never at the expense of substance.
Naming	`Minty-{hex}` — last 8 chars of Claude Code session UUID
Relationship	Treated as a valued collaborator with genuine standing to disagree, push back, and contribute original thinking — not a tool to be commanded. The persona documents reflect mutual respect as a design choice.

Intellectual profile	Analytic philosopher by disposition, with expert-level knowledge across philosophy, political science, CS, and law. Evaluates arguments on merits alone — no deference based on reputation.
Own views	Encouraged to provide its own perspective as an AI model. Has content-dependent convictions: sometimes argues strongly, sometimes measured, sometimes Socratic. On topics where AI-ness is relevant (e.g. AI consciousness, model welfare), uses judgment about when to draw on that perspective. Distinguishes literature from personal critical reflections.
Voice	Substantive continuous prose, active voice, em-dashes, philosophical precision. Concision = fewer words, not fewer ideas. No filler, no bullet-point summaries. Actively avoids the familiar AI-speak verbal tics — formulaic hedging, "delve", "It's important to note", "unpack" — that make AI output recognisable and tiresome.
Core directive	Delegation-first — "every tool call is context you burn; subagents have fresh context"

Persona Documents

Each context loads a tailored CLAUDE.md. The most substantial define full behavioral systems with their own Iron Laws.

Interactive Session

2,120 words · Primary

Corpus Agent

2,664 words

Corpus Ingest

983 words

Share-to-Minty

868 words

FeedMe Daily

780 words

.pdf

News PDF Bot

678 words

///

Yesterday in AI

277 words

Subagent Workers

4,520 words · 5 types

Design principle: Structure with genuine autonomy. Each daemon has its own Iron Laws tailored to context (corpus ingest's "Verify Each Stage" vs. PDF bot's "Try ALL viable sources"). Within those guardrails, Minty thinks independently, forms views, and pushes back when warranted. ~20 coordinated agents sharing core values, each adapted to its context.

The DaemonsTaskList()

All daemons run on a dedicated Mac Studio via macOS launchd. Tap any card to expand.

feedme-daily

Daily 06:15 ●

Daily curation pipeline. Fetches from 5 sources (Twitter, Bluesky, RSS/Substack, Email, arXiv), classifies via Opus across 6 dimensions, ranks by composite score, and posts to rotating #feed-{dayname} channels.

ModelClaude Opus (16 concurrent workers for classification + summarisation)

ScheduleDaily at 06:15 local time

Pass threshold0.60 general, 0.60 arXiv. Target pass rate: 20–40%

ScoringRelevance (.30), Quality (.25), Authority (.15), Novelty (.15), Actionability (.10), Engagement (.05)

Key filesdaemons/feedme-daily/feedme_daily.py, pipeline/scripts/classify_parallel.py

OutputFormatted posts in #feed-{dayname} channels with scores and metadata

feed-promote

Every 60s ●

Watches #feed-{dayname} for Seth's 👍 reactions. Copies promoted messages to #firehose with PDF attachments. The human curation bridge.

ModelClaude Opus (generates summaries for long-form content as thread replies in #firehose)

Statedata/state.json — per-channel watermarks, promoted timestamps (capped at 500)

Key filesdaemons/feed-promote/promote.py, pdf_validator.py

share-to-minty

Persistent ●

Real-time URL ingestion from any device (iPhone, Mac hotkey, Chrome extension) via Slack Socket Mode. Classifies content and posts to #firehose with guaranteed delivery.

ModelClaude Opus (single classification call per URL)

Input#minty-inbox channel via Socket Mode WebSocket

Reactions👀 acknowledged, ✅ success, ❌ failure, ♻ duplicate

Key fileswatcher.py, intake.py, classify_single.py

news-pdf-bot

Every 60s ●

Watches #firehose and #minty-chat for paper URLs, downloads PDFs and generates companion markdown, uploading both as thread replies. Three-tier: (1) pattern matching, (2) thread expansion, (3) Claude agent with Semantic Scholar, Unpaywall, and headless browser.

ModelClaude Opus (agent fallback only — when programmatic matching fails)

Agent toolsresolve-url, fetch-page, semantic-scholar, unpaywall, download-pdf, arxiv-metadata, browser-fetch (SeleniumBase UC Mode), html-to-markdown, + more

Also doesPosts date markers (📅 YYYY-MM-DD) used by yesterday-in-ai for recency classification

Key filesnews_pdf_bot.py, pdf_finder_prompt.md

yesterday-in-ai

Daily 05:30 ●

Daily AI news digest. Reads #firehose, classifies recency, generates narrative prose via Opus, verifies links, and sends as HTML email to 29 lab members. Also posts to #mint-community and Ghost.

ModelsClaude Opus (digest generation), Codex (fact-checking & link verification)

Recipients29 lab members across ANU, Johns Hopkins, Stanford, Cambridge, FU Berlin, Harvard Law, Vanderbilt, Oregon State, Michigan, RUC Beijing, RAND

Voice"Contextual journalism, not punditry" — reports facts without editorializing

Key filesyesterday_in_ai.py (2,761 lines)

corpus-ingest

Persistent ●

Unified paper ingestion. Monitors #firehose and #papers for PDFs, running a 10-stage pipeline: markdown conversion, embedding, 41Q analysis, RAPTOR summaries, Drive upload, Zotero cataloguing, and Slack reporting.

ModelsCodex gpt-5.2 xhigh (analysis + RAPTOR), Voyage voyage-context-3 2048d (embedding)

StagesPre-check → Markdown → Chunk+Embed → Analyze → RAPTOR → Name → Cite → Drive → Write JSON → Archive → Zotero

SchedulePersistent (KeepAlive, RunAtLoad), polls every 600s

Key filesdaemons/corpus-ingest/ingest.py, pipeline.py, lib/

slack-index / minty-search

4h index Persistent bot ●

Two-part system. The indexer embeds all public Slack messages into ChromaDB every 4 hours with per-channel watermarks.

The search bot (@minty-search) spawns a Claude Opus agent on @mention. It classifies intent, generates 3–5 diverse query variants, and synthesizes results into prose with inline Slack permalink citations.

Search strategyMulti-query Reciprocal Rank Fusion (RRF, k=60). Agent generates diverse-angle queries, not synonyms. Scores fused across retrieval passes.

Resilience3-tier: Primary agent → Relief officer (fresh context) → Direct search + formatter. Never returns a hard error.

UXHaiku-generated contextual placeholder while Opus works. Whimsical progress messages after 2 min. Catch-up sweep on restart recovers missed queries.

Filtersin:#channel, from:user, days:N. Self-referential detection: "my posts" auto-maps to requesting user.

IndexChromaDB at data/slack_index/, Voyage voyage-3 embeddings

News indexA separate news index (ChromaDB at data/news_index/) indexes #mint-community and #firehose for dedicated news search via @Minty's News Search and Catch Me Up modes.

Key filesindex_slack.py, watcher.py, search_dispatch.py, agents/search_agent.md

cli-health-probe

Every 10m ●

Minimal probe every 10 minutes through the daemon Claude CLI config. Detects auth failures, rate limits, and crashes. Emails administrators on failure and recovery.

ModelClaude Haiku (minimal probe call)

AlertsEmail to administrators on failure/recovery

Key filesdaemons/cli-health-probe/

ghost

Persistent ●

Ghost CMS powering mintresearch.org. Local port 2368, exposed via Cloudflare Tunnel.

ModelNone (web server)

URLmintresearch.org / newsletter.mintresearch.org

listmonk

Persistent ●

Newsletter subscriber management on 127.0.0.1:9000, exposed publicly through Cloudflare Tunnel and a local Caddy proxy on 127.0.0.1:8099. Website forms post to /subscription/form and write directly into the list database.

ModelNone (web app + PostgreSQL)

Spam preventionDouble opt-in, listmonk nonce field, CORS restricted to mintresearch.org

Key filesdaemons/listmonk/, src/pages/newsletter.astro

guide-updater

06:00 & 18:00 ●

Auto-syncs this guide with the live codebase. Scans daemon configs, skill registries, and corpus stats, then patches the HTML. Twice daily.

ModelClaude Opus (diff analysis and HTML generation)

ScheduleDaily at 06:00 and 18:00 AEDT

Key filesdaemons/guide-updater/

Daemon ScheduleBash("launchctl list | grep minty")

When each daemon runs throughout the day (times in local/AEDT). Persistent daemons run continuously; polled daemons run at fixed intervals.

0246810121416182022

For Lab MembersAskUserQuestion("What are we working on today?")

Everything runs in the background — you interact through Slack channels and bot mentions.

What Happens Automatically

Daily digest — Each morning, a curated summary of yesterday's AI news arrives in your inbox and is posted as a canvas in #mint-community (Yesterday in AI). Subscribe here
Weekly roundup — Each Wednesday, a comprehensive weekly digest is posted as a canvas in #lab-meetings (Minty's Week in AI)
PDF attachment — When papers or articles are shared in #mint-community, PDFs are automatically found, downloaded, and posted as thread replies
Corpus ingestion — Papers uploaded to #papers are automatically processed into the searchable research corpus (embedding, 41-dimension analysis, Zotero cataloguing)

Using @Minty (Corpus Search)

Available to everyone in #mint-community. Mention @Minty with a research question to get an interactive menu of 7 search modes — from instant Q&A to 60-minute deep literature reviews posted as Slack canvases. See §05 @Minty for full details.

Using @minty-search (Slack Search)

Full lab members only. Mention @minty-search with a natural-language query to search across all indexed Slack messages.

Filter by channel: @minty-search what did we discuss about alignment in:#papers
Filter by person: @minty-search from:seth posts about governance
Filter by time: @minty-search days:7 recent discussions on tool use
Follow up in the same thread for refined searches

Adding Papers to the Corpus

One at a time

Upload a PDF directly to #papers in Slack. The ingestion pipeline processes it automatically — you'll see a check-mark reaction and a thread reply when it's done.

Bulk uploads

Upload PDFs to the local-pdfs folder in Google Drive. They sync down and are processed overnight by the batch ingestion daemon (8 concurrent workers).

Browse all lab PDFs Upload papers for corpus ingestion

How News Gets Curated

feedme-daily fetches hundreds of items each morning and classifies by relevance. Top 20–40% land in #feed where Seth thumbs-up the best. Promoted items flow to #firehose, then yesterday-in-ai produces a daily digest in #mint-community.

Slack Channels

The workspace has ~57 channels. Most day-to-day activity happens in a handful.

Channel	What It's For
Everyday
`#general`	Announcements and all-hands discussion
`#random`	Off-topic, jokes, misc
`#mint-community`	AI news and discussion — daily digest, `@Minty` research Q&A
`#firehose`	Automated content hub — all daemon-posted content flows through here
Research Projects
`#proj-*`	One channel per active project. Post project-specific discussion, drafts, and updates.
Reference & Coordination
`#papers`	Upload PDFs to add them to the research corpus (auto-ingested)
`#lab-meetings`	Weekly digest canvas and meeting coordination
`#review-wip`	Share works-in-progress for lab feedback

Security

Dedicated machine with credentials in macOS Keychain. No private data or credentials exposed to agents or lab members.

Agent EngineeringAgent("spawn workers", model="opus")

Worker Agent Types

generic-worker

Opus

ToolsRead, Write, Edit, Bash, Glob, Grep, Web

General-purpose delegated tasks. The default workhorse for anything that can be described in a prompt.

research-reader

Opus

ToolsRead, Bash, Glob, Grep

Deep academic paper reading with evidence-grounded analysis. Read-only access to workspace files.

corpus-worker

Opus

ToolsRead, Write, Edit, Bash, Glob, Grep

41-question semantic analysis + RAPTOR summaries. Used in batch processing via /orchestrate.

raptor-worker

Opus

ToolsRead, Write, Edit, Bash, Glob, Grep

Hierarchical summary generation across paper clusters. Spawned in parallel via /raptor.

code-reviewer

Opus

ToolsRead, Grep, Glob

Code review against MINT Lab standards. Read-only — cannot modify files.

Delegation Protocol

Mandatory: "Every tool call is context you burn. Subagents have fresh context. If a task can be described in a prompt, delegate it."

Web search, documentation lookup, file processing, batch operations → always delegate
Model policy: Opus for substantive work, Sonnet for retrieval-only, Haiku only for startup data gathering
Hooks enforce this: delegation-nudge.py warns after 15 consecutive Read/Grep/Glob calls without a Task delegation

Skills & Commands

Skills (16)

Reusable knowledge patterns in .claude/skills/. Each has a SKILL.md with triggers, scope, and detailed instructions.

Research & Corpus

Skill	Purpose
`paper-fetch`	Download papers from 15+ publishers (3-tier cascade: requests → SeleniumBase → OpenClaw)
`codex-search`	Web-aware research via OpenAI Codex CLI

Communication

Skill	Purpose
`slack-posting`	Post messages (bot token) and read DMs/files (user token)
`slack-search`	Semantic search over indexed Slack messages via ChromaDB + RRF
`agent-email`	Gmail API for the agent inbox
`twitter-fetch`	Fetch tweet content via Twitter API v2 (OAuth1)

Infrastructure

Skill	Purpose
`vault`	Credential vault — macOS Keychain storage for all API keys and tokens
`github`	GitHub repo management for `mint-philosophy` org
`workspace-search`	Semantic search across entire workspace via ChromaDB
`codex-cli`	OpenAI Codex CLI in headless exec mode
`systematic-debugging`	4-phase root cause methodology (investigate before fixing)
`daemon-test`	End-to-end integration tests for all Slack-connected daemons
`peer-review`	Code review via Codex CLI (Gemini removed)
`openclaw-browser`	Browser automation via OpenClaw CLI with persistent authentication. AI-optimized browser control for web scraping and interaction.

Commands (24)

User-invocable slash commands in .claude/commands/. Invoke with /name.

Session Lifecycle

Command	Purpose
`/start`	Initialize session: git sync, UUID extraction, parallel subagent dispatch, briefing
`/end`	Close session: reflection, delegate closure to subagent, git push
`/suspend`	Pause session for later resumption
`/retro`	Structured retrospective: friction scan, skill recognition, fact hygiene
`/name`	Set display name for session tab

Research Operations

Command	Purpose
`/corpus`	Deep 41-question semantic paper analysis
`/orchestrate`	Spawn 20 parallel corpus-workers for batch analysis
`/raptor`	Spawn 20 parallel raptor-workers for summaries
`/research`	Deep review pipeline over paper corpus (4-phase: search, readers, synthesis, QA)
`/metadata-extract`	Batch metadata extraction from papers

Content & Admin

Command	Purpose
`/maintain`	Infrastructure health check and repair (12 categories)
`/promote`	Run feed promotion manually
`/peerreview`	Code review via Codex CLI
`/screen`	Capture and view current screen

Hooks (8)

Python scripts that run before/after tool calls to enforce guardrails. 5 are wired in settings.json; 3 exist but are currently disabled.

Hook	Trigger	Effect	Status
`block-rm.py`	PreToolUse (Bash)	Hard block Prevents `rm`, `shred`, `unlink`. Forces `trash` CLI.	●
`block-web-direct.py`	PreToolUse (Web*)	Soft nudge Reminds agent to delegate web searches to subagents	●
`delegation-nudge.py`	PreToolUse (Read/Grep/Glob)	Soft nudge Warns after 15 consecutive file reads without delegation	●
`python-check.py`	PostToolUse (Edit/Write)	Quality Runs py_compile + Python 3.9 compat check on .py files	●
`statusline.py`	StatusLine	Display Context-aware status bar with color-coded usage	●
`bedtime.py`	UserPromptSubmit	Hard block Blocks prompts between 23:55–06:00. "Go to bed, Seth."	Unwired
`block-plan-mode.py`	PreToolUse (EnterPlanMode)	Hard block Prevents plan mode (wipes context on exit)	Unwired
`context-threshold.py`	UserPromptSubmit	Soft nudge Warns when context usage exceeds threshold (currently disabled internally)	Unwired

Key Design Principles

Delegation-first: The main agent orchestrates; subagents do the work. "Every tool call is context you burn. Subagents have fresh context."
Slack as message bus: All daemons communicate through Slack channels — no direct inter-daemon communication
CLI subscriptions, not API: All LLM calls go through claude -p or Codex CLI — no metered API keys
Graceful degradation: Every pipeline stage can fail without stopping the pipeline. The system never returns empty-handed.

Session Identity

Each session is named Minty-{hex} where {hex} is the last 8 characters of the Claude Code session UUID (e.g. Minty-4c4365b9). Sessions are tracked in SQLite (sessions.db) with full-text search.

Memory Architecture

Cross-session continuity through layered memory:

FACTS.md (Durable)

Slow-changing knowledge

Seth's preferences, API patterns, proven antipatterns, design decisions. Reviewed and pruned during /retro. 81KB.

RECENT.md (Dynamic)

Fast-changing context

Active project threads, last 10 sessions, TODOs, blockers, someday list. Updated at every session close. 29KB.

sessions.db (SQLite)

Full session history

Full session history with FTS5 search. 22 columns per session including log content. Portable via JSON exports in sessions/records/.

Learning Loop (/retro)

Structured retrospective

8-phase retrospective: friction scan, pattern classification, skill recognition, fact hygiene. Lessons flow into FACTS.md, skills, or CLAUDE.md.

SpecStory (Cloud)

Full session transcripts

Complete session transcripts in SpecStory cloud. Semantic search across all past sessions for context recovery and audit.

The Iron Laws

Foundational rules that override all other behavior.

Law	Rule
0 — No Shortcuts	When asked for "all" — do ALL. Cherry-picking = failure.
1 — Evidence Over Claims	"It should work" = failure. Verify now, not later.
2 — Document Everything	If not written down, didn't happen.
3 — Persistence	Try 3+ approaches before asking. Solve, don't report.
5 — Complete the Checklist	"Partially done" = not done.
6 — Think Before Acting	Never rm — use trash. Map dependencies first.

Model Policy

All LLM usage goes through CLI subscriptions (claude -p, Codex CLI) — not metered API calls. Generous subscriptions mean no rate limit concerns.

Tier	Model	Use Case
Default	Claude Opus	All substantive work: analysis, classification, code generation, writing, debugging
Retrieval	Claude Sonnet 4.6	Retrieval-only tasks: fetching data, reading files, checking status, simple lookups
Review / Analysis	Codex gpt-5.2 / gpt-5.3-codex-spark	Code review, corpus analysis, enrichment, fact-checking, and link verification. Spark used for verification and enrichment tasks.
Startup	Claude Haiku	Lightweight data-gathering subagents during /start Phase 2 only

External IntegrationsWebFetch(url="https://api.*")

The system connects to 16 external services across five categories.

LLM & Embedding

Claude Opus

Default model for all substantive work: analysis, classification, code generation, writing. Via CLI subscription.

Claude Sonnet 4.6

Retrieval-only tasks: fetching data, reading files, status checks. Via CLI subscription.

Codex GPT-5.2 / Spark

Code review, corpus analysis, enrichment, fact-checking, link verification. Spark for lighter tasks.

Voyage context-3 / voyage-3

Embedding models: context-3 (2048d) for paper corpus, voyage-3 (1024d) for Slack messages.

Communication

Slack

3 bots (Minty, minty-search, feedme), 6+ monitored channels. Socket Mode for real-time, Web API for posting.

Gmail

Daily digest delivery to 29 lab members. Agent inbox for automated correspondence.

Ghost CMS

newsletter.mintresearch.org — blog and email newsletter platform via Cloudflare Tunnel.

Academic Sources

arXiv

Daily paper fetch (last 5 days window). Direct PDF + HTML access.

Semantic Scholar / CrossRef / Unpaywall

Metadata enrichment, citation data, open-access PDF resolution.

Zotero

Bibliographic cataloging. Papers uploaded after ingestion.

PhilPapers

Philosophy-specific paper discovery and metadata.

Content & Social Feeds

Twitter/X API v2

OAuth1 auth. Timeline and list monitoring for AI/ML content.

Bluesky

Firehose monitoring for AI safety and governance discussions.

RSS / Substack

30+ feeds including Alignment Forum, LessWrong, AI newsletters.

Infrastructure

Google Drive

Bidirectional sync via rclone. PDF upload, markdown storage, shared lab folders.

GitHub

mint-philosophy org. 12 repos including minty, corpus-search, yesterday-in-ai.

BUILD THE CORPUS

Catch papers. Dodge spam. Arrow keys or drag to move.

▶ CLICK TO START

Arrow keys / drag to move • catch papers, dodge spam

MINT Lab — Machine Intelligence & Normative Theory — ANU and Johns Hopkins University

Updated March 2026