# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **Tokyo Livehouse Events** is a full-stack web service that automatically scrapes and aggregates event information from major Tokyo live houses. Built with React Router v7 (SSR), SQLite (better-sqlite3), and Tailwind CSS, it allows users to search, filter, and view live music events across multiple venues. ### Key Stack - **Frontend**: React 19 + React Router v7 (SSR enabled) - **Backend**: React Router Node server + SQLite database - **Scraping**: Cheerio (HTML parsing) + Playwright (JS-heavy sites) - **Styling**: Tailwind CSS v4 - **Build**: Vite + React Router build system --- ## Common Development Commands ```bash # Install dependencies npm install # Development server (http://localhost:5173) npm run dev # Type checking npm run typecheck # Build for production npm run build # Start production server (after build) npm start # Run scraper (all venues) npm run scrape # Run scraper for specific venue npm run scrape liquid-room # List registered scrapers npm run scrape -- --list ``` ### Scraper CLI Details The scraper CLI (`scripts/scrape.ts`) runs with `tsx` and supports: - **No args**: Scrapes all 12 registered venues - **Venue ID arg**: Scrapes single venue (e.g., `npm run scrape meets-otsuka`) - **`--list` flag**: Displays all registered scrapers with IDs and names Output includes success count per venue, elapsed time, and exit code (0 = all success, 1 = any failure). --- ## Architecture Overview ### Directory Structure ``` app/ ├── lib/ # Server-only utilities │ ├── db.server.ts # SQLite database layer (better-sqlite3) │ ├── scraper-runner.server.ts # Orchestrates scraping + Markdown generation │ ├── playwright.server.ts # Shared browser instance for JS-heavy sites │ ├── markdown-writer.server.ts # Generates events/.md files │ └── venue-meta.server.ts # Server-only scraper metadata │ ├── scrapers/ # Scraper modules (one per venue) │ ├── base.ts # Scraper interface & VenueMeta type │ ├── index.ts # Registry: ALL_SCRAPERS (15 venues) │ ├── liquid-room.ts # Example: fetch + Cheerio │ ├── flat-nishiogikubo.ts # Example: Playwright (Wix site) │ ├── warp-kichijoji.ts # fetch + Cheerio │ ├── pitbar-nishiogikubo.ts # Playwright (freecalend.com) │ └── [11 other venues] │ ├── routes/ # React Router routes (config-mapped in routes.ts) │ ├── index.tsx # Redirects to /events │ ├── events._index.tsx # Event list with filtering │ ├── events.$id.tsx # Event detail page │ ├── events.by-date.tsx # Calendar/date-based view │ ├── venues.tsx # Venue list + scrape status │ ├── api.scrape.ts # POST/GET endpoint to trigger scraping │ └── api.scrape-status.ts # GET endpoint for scrape job status │ ├── components/ │ ├── EventCard.tsx # Card view for events │ ├── EventListRow.tsx # Row view for events │ └── FilterBar.tsx # Search/filter form │ ├── root.tsx # Root layout + error boundary ├── routes.ts # React Router route config (file-based) └── app.css # Tailwind/global styles events/ # Auto-generated Markdown files (one per venue) events.db # SQLite database (created at runtime) scripts/ └── scrape.ts # CLI scraper entry point ``` ### Data Flow #### Scraping Pipeline 1. User calls `npm run scrape` (CLI) or `POST /api/scrape` (HTTP) 2. `scraper-runner.server.ts` → `runAllScrapers()` or `runScraper(venueId)` 3. For each scraper in `ALL_SCRAPERS`: - Execute `scraper.scrape()` → returns `EventInput[]` - Filter events within 35-day scraping window - Upsert each event to SQLite via `db.server.ts` - Log results to `scrape_logs` table 4. After success, generate Markdown files in `events/` directory 5. Close shared Playwright browser instance 6. Return results (CLI: pretty-printed output, HTTP: JSON with run_id) #### Web UI Data Access 1. Route loaders call `queryEvents()` or `getVenues()` from `db.server.ts` 2. Results rendered as React components 3. Filter form passes query params: `date_from`, `date_to`, `venue_id`, `keyword` 4. Pagination via `page` param (30 events per page) 5. View toggle: card vs. list (persisted in URL as `view` param) --- ## Scraper Implementation Pattern Each scraper module exports: - `venue`: VenueMeta object (id, name, url, area) - `scraper`: Scraper object with `scrape()` async method ### Typical Flow (Cheerio-based) ```typescript export const scraper: Scraper = { venue, async scrape(): Promise { const res = await fetch(venue_schedule_url); const html = await res.text(); const $ = cheerio.load(html); const events: EventInput[] = []; $("selector-for-event-items").each((_, el) => { const title = $(el).find(".title").text().trim(); const date = parseJapaneseDate($(el).find(".date").text()); // ... extract other fields events.push({ venue_id: venue.id, title, date, ... }); }); return events; } }; ``` ### Playwright-based (JS-required sites like Wix) ```typescript export const scraper: Scraper = { venue, async scrape(): Promise { const browser = await getBrowser(); // Singleton browser const page = await browser.newPage(); await page.goto(url, { waitUntil: "domcontentloaded" }); // ... navigate, extract data via locators await page.close(); return events; } }; ``` ### Key Utilities - **Date parsing**: `parseJapaneseDate(str)` handles formats like "2025年06月15日", "06/15", etc. Defaults to current year if year is omitted. - **URL handling**: `absoluteUrl(url, base)` converts relative to absolute URLs. - **Event deduplication**: Filter by `date + title` if needed. - **Scraping window**: Always ~35 days from today (`SCRAPE_WINDOW_DAYS` constant). --- ## Database Schema ### Tables - **venues**: id (PK), name, url, area - **events**: id (PK), venue_id (FK), title, artist, date, start_time, open_time, ticket_url, price, image_url, description, source_url, fetched_at - Unique constraint: (venue_id, title, date) — prevents duplicates on re-scrape - Indexes on date and venue_id - **scrape_logs**: id (PK), run_id (UUID), venue_id, venue_name, status ("running"|"ok"|"error"), events_saved, error, started_at, finished_at ### Key Functions (lib/db.server.ts) - `getDb()`: Singleton database connection - `upsertVenue()`, `upsertEvent()`: Insert or replace records - `queryEvents(params)`: Search with filters (date range, venue, keyword) - `getEvent(id)`: Single event detail - `getVenues()`: All venues with event counts - Scrape log functions: `insertScrapeLog()`, `updateScrapeLog()`, `getLatestScrapeRun()`, `getScrapeRunById()` --- ## Adding a New Venue ### Step 1: Create Scraper File Create `app/scrapers/.ts` implementing the `Scraper` interface (see pattern above). ### Step 2: Register in Index Add import and entry to `app/scrapers/index.ts` → `ALL_SCRAPERS` array. ### Step 3: Update Documentation Add row to `SCRAPE_TARGETS.md` table. ### Automated Approach (Claude Code Skill) Run `/add-livehouse ` to get interactive guidance for venue addition. --- ## API Endpoints ### Scraping - `POST /api/scrape` (form action) → Starts all scrapers in background, redirects to referrer, returns 202 - `GET /api/scrape?venue_id=` → Starts single venue scraper, returns `{ run_id, status: "started" }` (202) - `GET /api/scrape` (no params) → Starts all scrapers, same as POST ### Status - `GET /api/scrape-status?run_id=` → Returns scrape logs for specific run - `GET /api/scrape-status` → Returns latest run's logs Both return `{ running: boolean, results: ScrapeLog[] }`. --- ## React Router Specifics ### Config-based Routing Routes are defined explicitly in `app/routes.ts` using `index()`, `route()`, and `prefix()` helpers from `@react-router/dev/routes`. File names under `routes/` do not auto-determine paths — the mapping is set in `routes.ts`. Use `params.` in loaders/components to access dynamic segments (e.g., `:id` → `params.id`). ### SSR Configuration Enabled in `react-router.config.ts` (`ssr: true`). All routes server-render by default. ### Loaders & Actions - Loaders: `export async function loader({ request, params }: Route.LoaderArgs)` - Actions: `export async function action({ request }: Route.ActionArgs)` - Data accessed via `useLoaderData()` hook ### Link Prefetch Use `` from react-router for client-side navigation (no full page reload). --- ## Important Implementation Details ### Scrape Window - Fixed at 35 days from today - Filters applied in `scraper-runner.server.ts` → `withinWindow(event, from, to)` - Events outside this range are discarded before DB insert ### Shared Playwright Browser - Singleton instance via `getBrowser()` in `playwright.server.ts` - Only created if a scraper calls it - Closed after each scraping run via `closeBrowser()` - Used by Wix sites (FLAT 西荻窪) and other JS-heavy venues ### Event Deduplication - Unique constraint: `(venue_id, title, date)` prevents duplicates - On conflict: all fields (except date/venue/title) are updated → re-scrape refreshes data - No manual cleanup needed ### Markdown Generation - Auto-generated in `events/.md` after successful scrape - Table format: date | artist | title | time | price | URL - Marked as auto-generated (warns against manual editing) - Regenerated on each scrape (overwrites previous) ### Date Parsing - Handles Japanese format: "2025年06月15日", "2025/06/15", "06/15" (infers current year) - Converted to ISO format (YYYY-MM-DD) for database and API - Display converted back to Japanese in UI (e.g., "2025/06/15(日)") --- ## Environment & Prerequisites - **Node.js**: 20.12+ (required for `styleText` API used in scraper CLI) - **npm**: latest - **Optional**: Playwright binary (auto-installed on first `npm install`) For Docker: See `Dockerfile` (multi-stage build, production uses node:20-alpine). --- ## Type Safety The project uses strict TypeScript with path aliases: - `~/*` maps to `app/*` (configured in `tsconfig.json`) - Run `npm run typecheck` to validate before commits - React Router auto-generates types via `@react-router/dev` (`.react-router/types/`) --- ## Key Files Not to Miss - **`app/lib/db.server.ts`**: Database schema and query interface — essential for understanding data layer - **`app/lib/scraper-runner.server.ts`**: Orchestration logic — where scraping window and dedup happen - **`app/scrapers/base.ts`**: Scraper interface — defines contract all venues must implement - **`app/scrapers/liquid-room.ts`** & **`flat-nishiogikubo.ts`**: Templates for simple (Cheerio) and complex (Playwright) scrapers - **`app/routes/events._index.tsx`**: Main event list with filters — shows how DB queries integrate with UI - **`SCRAPE_TARGETS.md`**: Live reference of all 12+ venues, their status, and scraper locations --- ## Debugging ### Scraper Issues - Check `events.db` with SQLite browser to inspect DB state - Re-run single venue: `npm run scrape ` - Inspect scrape logs: `GET /api/scrape-status` endpoint in browser - Add console.log in scraper for debugging (visible in CLI output) ### Database Queries - Direct SQL via `getDb().prepare(...).all()` in `lib/db.server.ts` - WAL mode enabled → check `events.db-shm` and `events.db-wal` if DB locked - Foreign keys enforced → cannot delete venues with events ### Playwright Issues - Headless browser logs: Add `{ headless: false }` to `chromium.launch()` to see browser - Timeout errors: Check site structure changed, adjust selectors in scraper - Memory leaks: Ensure `page.close()` called (in finally block) --- ## Permissions & Settings Project uses `.claude/settings.json` with pre-configured allowlists for: - `npm run` and `npx` commands - Local dev server access (http://localhost:5173) - Scraper endpoint testing - Git commits - External domain scraping (WebFetch for venue sites) Future instances inherit these permissions. Add new patterns as needed.