summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authoryyamashita <yyamashita@mosquit.one>2026-05-07 21:48:58 +0900
committeryyamashita <yyamashita@mosquit.one>2026-05-07 21:48:58 +0900
commitb8537eabe94b24e8530b4c1511456dc94cf8ec4c (patch)
treec58223954c9ba0ff6120c170189d112d5ac9c3d8
parentd5e975b601e70adf901c8e1eb7e61f0388941195 (diff)
Add CLAUDE.md with architecture and development guidance
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-rw-r--r--CLAUDE.md332
1 files changed, 332 insertions, 0 deletions
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..e9b176a
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,332 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+**Tokyo Livehouse Events** is a full-stack web service that automatically scrapes and aggregates event information from major Tokyo live houses. Built with React Router v7 (SSR), SQLite (better-sqlite3), and Tailwind CSS, it allows users to search, filter, and view live music events across multiple venues.
+
+### Key Stack
+- **Frontend**: React 19 + React Router v7 (SSR enabled)
+- **Backend**: React Router Node server + SQLite database
+- **Scraping**: Cheerio (HTML parsing) + Playwright (JS-heavy sites)
+- **Styling**: Tailwind CSS v4
+- **Build**: Vite + React Router build system
+
+---
+
+## Common Development Commands
+
+```bash
+# Install dependencies
+npm install
+
+# Development server (http://localhost:5173)
+npm run dev
+
+# Type checking
+npm run typecheck
+
+# Build for production
+npm run build
+
+# Start production server (after build)
+npm start
+
+# Run scraper (all venues)
+npm run scrape
+
+# Run scraper for specific venue
+npm run scrape liquid-room
+
+# List registered scrapers
+npm run scrape -- --list
+```
+
+### Scraper CLI Details
+The scraper CLI (`scripts/scrape.ts`) runs with `tsx` and supports:
+- **No args**: Scrapes all 12 registered venues
+- **Venue ID arg**: Scrapes single venue (e.g., `npm run scrape meets-otsuka`)
+- **`--list` flag**: Displays all registered scrapers with IDs and names
+
+Output includes success count per venue, elapsed time, and exit code (0 = all success, 1 = any failure).
+
+---
+
+## Architecture Overview
+
+### Directory Structure
+```
+app/
+├── lib/ # Server-only utilities
+│ ├── db.server.ts # SQLite database layer (better-sqlite3)
+│ ├── scraper-runner.server.ts # Orchestrates scraping + Markdown generation
+│ ├── playwright.server.ts # Shared browser instance for JS-heavy sites
+│ ├── markdown-writer.server.ts # Generates events/<venue-id>.md files
+│ └── venue-meta.server.ts # Server-only scraper metadata
+│
+├── scrapers/ # Scraper modules (one per venue)
+│ ├── base.ts # Scraper interface & VenueMeta type
+│ ├── index.ts # Registry: ALL_SCRAPERS (12 venues)
+│ ├── liquid-room.ts # Example: fetch + Cheerio
+│ ├── flat-nishiogikubo.ts # Example: Playwright (Wix site)
+│ ├── warp-kichijoji.ts # Implemented but NOT in ALL_SCRAPERS
+│ ├── pitbar-nishiogikubo.ts # Implemented but NOT in ALL_SCRAPERS
+│ └── [10 other venues]
+│
+├── routes/ # React Router routes (config-mapped in routes.ts)
+│ ├── index.tsx # Redirects to /events
+│ ├── events._index.tsx # Event list with filtering
+│ ├── events.$id.tsx # Event detail page
+│ ├── events.by-date.tsx # Calendar/date-based view
+│ ├── venues.tsx # Venue list + scrape status
+│ ├── api.scrape.ts # POST/GET endpoint to trigger scraping
+│ └── api.scrape-status.ts # GET endpoint for scrape job status
+│
+├── components/
+│ ├── EventCard.tsx # Card view for events
+│ ├── EventListRow.tsx # Row view for events
+│ └── FilterBar.tsx # Search/filter form
+│
+├── root.tsx # Root layout + error boundary
+├── routes.ts # React Router route config (file-based)
+└── app.css # Tailwind/global styles
+
+events/ # Auto-generated Markdown files (one per venue)
+events.db # SQLite database (created at runtime)
+scripts/
+└── scrape.ts # CLI scraper entry point
+```
+
+### Data Flow
+
+#### Scraping Pipeline
+1. User calls `npm run scrape` (CLI) or `POST /api/scrape` (HTTP)
+2. `scraper-runner.server.ts` → `runAllScrapers()` or `runScraper(venueId)`
+3. For each scraper in `ALL_SCRAPERS`:
+ - Execute `scraper.scrape()` → returns `EventInput[]`
+ - Filter events within 35-day scraping window
+ - Upsert each event to SQLite via `db.server.ts`
+ - Log results to `scrape_logs` table
+4. After success, generate Markdown files in `events/` directory
+5. Close shared Playwright browser instance
+6. Return results (CLI: pretty-printed output, HTTP: JSON with run_id)
+
+#### Web UI Data Access
+1. Route loaders call `queryEvents()` or `getVenues()` from `db.server.ts`
+2. Results rendered as React components
+3. Filter form passes query params: `date_from`, `date_to`, `venue_id`, `keyword`
+4. Pagination via `page` param (30 events per page)
+5. View toggle: card vs. list (persisted in URL as `view` param)
+
+---
+
+## Scraper Implementation Pattern
+
+Each scraper module exports:
+- `venue`: VenueMeta object (id, name, url, area)
+- `scraper`: Scraper object with `scrape()` async method
+
+### Typical Flow (Cheerio-based)
+```typescript
+export const scraper: Scraper = {
+ venue,
+ async scrape(): Promise<EventInput[]> {
+ const res = await fetch(venue_schedule_url);
+ const html = await res.text();
+ const $ = cheerio.load(html);
+ const events: EventInput[] = [];
+
+ $("selector-for-event-items").each((_, el) => {
+ const title = $(el).find(".title").text().trim();
+ const date = parseJapaneseDate($(el).find(".date").text());
+ // ... extract other fields
+ events.push({ venue_id: venue.id, title, date, ... });
+ });
+ return events;
+ }
+};
+```
+
+### Playwright-based (JS-required sites like Wix)
+```typescript
+export const scraper: Scraper = {
+ venue,
+ async scrape(): Promise<EventInput[]> {
+ const browser = await getBrowser(); // Singleton browser
+ const page = await browser.newPage();
+ await page.goto(url, { waitUntil: "domcontentloaded" });
+ // ... navigate, extract data via locators
+ await page.close();
+ return events;
+ }
+};
+```
+
+### Key Utilities
+- **Date parsing**: `parseJapaneseDate(str)` handles formats like "2025年06月15日", "06/15", etc. Defaults to current year if year is omitted.
+- **URL handling**: `absoluteUrl(url, base)` converts relative to absolute URLs.
+- **Event deduplication**: Filter by `date + title` if needed.
+- **Scraping window**: Always ~35 days from today (`SCRAPE_WINDOW_DAYS` constant).
+
+---
+
+## Database Schema
+
+### Tables
+- **venues**: id (PK), name, url, area
+- **events**: id (PK), venue_id (FK), title, artist, date, start_time, open_time, ticket_url, price, image_url, description, source_url, fetched_at
+ - Unique constraint: (venue_id, title, date) — prevents duplicates on re-scrape
+ - Indexes on date and venue_id
+- **scrape_logs**: id (PK), run_id (UUID), venue_id, venue_name, status ("running"|"ok"|"error"), events_saved, error, started_at, finished_at
+
+### Key Functions (lib/db.server.ts)
+- `getDb()`: Singleton database connection
+- `upsertVenue()`, `upsertEvent()`: Insert or replace records
+- `queryEvents(params)`: Search with filters (date range, venue, keyword)
+- `getEvent(id)`: Single event detail
+- `getVenues()`: All venues with event counts
+- Scrape log functions: `insertScrapeLog()`, `updateScrapeLog()`, `getLatestScrapeRun()`, `getScrapeRunById()`
+
+---
+
+## Adding a New Venue
+
+### Step 1: Create Scraper File
+Create `app/scrapers/<venue-id>.ts` implementing the `Scraper` interface (see pattern above).
+
+### Step 2: Register in Index
+Add import and entry to `app/scrapers/index.ts` → `ALL_SCRAPERS` array.
+
+### Step 3: Update Documentation
+Add row to `SCRAPE_TARGETS.md` table.
+
+### Automated Approach (Claude Code Skill)
+Run `/add-livehouse <Venue Name> <URL>` to get interactive guidance for venue addition.
+
+---
+
+## API Endpoints
+
+### Scraping
+- `POST /api/scrape` (form action) → Starts all scrapers in background, redirects to referrer, returns 202
+- `GET /api/scrape?venue_id=<id>` → Starts single venue scraper, returns `{ run_id, status: "started" }` (202)
+- `GET /api/scrape` (no params) → Starts all scrapers, same as POST
+
+### Status
+- `GET /api/scrape-status?run_id=<uuid>` → Returns scrape logs for specific run
+- `GET /api/scrape-status` → Returns latest run's logs
+
+Both return `{ running: boolean, results: ScrapeLog[] }`.
+
+---
+
+## React Router Specifics
+
+### Config-based Routing
+Routes are defined explicitly in `app/routes.ts` using `index()`, `route()`, and `prefix()` helpers from `@react-router/dev/routes`. File names under `routes/` do not auto-determine paths — the mapping is set in `routes.ts`. Use `params.<name>` in loaders/components to access dynamic segments (e.g., `:id` → `params.id`).
+
+### SSR Configuration
+Enabled in `react-router.config.ts` (`ssr: true`). All routes server-render by default.
+
+### Loaders & Actions
+- Loaders: `export async function loader({ request, params }: Route.LoaderArgs)`
+- Actions: `export async function action({ request }: Route.ActionArgs)`
+- Data accessed via `useLoaderData<typeof loader>()` hook
+
+### Link Prefetch
+Use `<Link>` from react-router for client-side navigation (no full page reload).
+
+---
+
+## Important Implementation Details
+
+### Scrape Window
+- Fixed at 35 days from today
+- Filters applied in `scraper-runner.server.ts` → `withinWindow(event, from, to)`
+- Events outside this range are discarded before DB insert
+
+### Shared Playwright Browser
+- Singleton instance via `getBrowser()` in `playwright.server.ts`
+- Only created if a scraper calls it
+- Closed after each scraping run via `closeBrowser()`
+- Used by Wix sites (FLAT 西荻窪) and other JS-heavy venues
+
+### Event Deduplication
+- Unique constraint: `(venue_id, title, date)` prevents duplicates
+- On conflict: all fields (except date/venue/title) are updated → re-scrape refreshes data
+- No manual cleanup needed
+
+### Markdown Generation
+- Auto-generated in `events/<venue-id>.md` after successful scrape
+- Table format: date | artist | title | time | price | URL
+- Marked as auto-generated (warns against manual editing)
+- Regenerated on each scrape (overwrites previous)
+
+### Date Parsing
+- Handles Japanese format: "2025年06月15日", "2025/06/15", "06/15" (infers current year)
+- Converted to ISO format (YYYY-MM-DD) for database and API
+- Display converted back to Japanese in UI (e.g., "2025/06/15(日)")
+
+---
+
+## Environment & Prerequisites
+
+- **Node.js**: 20.12+ (required for `styleText` API used in scraper CLI)
+- **npm**: latest
+- **Optional**: Playwright binary (auto-installed on first `npm install`)
+
+For Docker: See `Dockerfile` (multi-stage build, production uses node:20-alpine).
+
+---
+
+## Type Safety
+
+The project uses strict TypeScript with path aliases:
+- `~/*` maps to `app/*` (configured in `tsconfig.json`)
+- Run `npm run typecheck` to validate before commits
+- React Router auto-generates types via `@react-router/dev` (`.react-router/types/`)
+
+---
+
+## Key Files Not to Miss
+
+- **`app/lib/db.server.ts`**: Database schema and query interface — essential for understanding data layer
+- **`app/lib/scraper-runner.server.ts`**: Orchestration logic — where scraping window and dedup happen
+- **`app/scrapers/base.ts`**: Scraper interface — defines contract all venues must implement
+- **`app/scrapers/liquid-room.ts`** & **`flat-nishiogikubo.ts`**: Templates for simple (Cheerio) and complex (Playwright) scrapers
+- **`app/routes/events._index.tsx`**: Main event list with filters — shows how DB queries integrate with UI
+- **`SCRAPE_TARGETS.md`**: Live reference of all 12+ venues, their status, and scraper locations
+
+---
+
+## Debugging
+
+### Scraper Issues
+- Check `events.db` with SQLite browser to inspect DB state
+- Re-run single venue: `npm run scrape <venue-id>`
+- Inspect scrape logs: `GET /api/scrape-status` endpoint in browser
+- Add console.log in scraper for debugging (visible in CLI output)
+
+### Database Queries
+- Direct SQL via `getDb().prepare(...).all()` in `lib/db.server.ts`
+- WAL mode enabled → check `events.db-shm` and `events.db-wal` if DB locked
+- Foreign keys enforced → cannot delete venues with events
+
+### Playwright Issues
+- Headless browser logs: Add `{ headless: false }` to `chromium.launch()` to see browser
+- Timeout errors: Check site structure changed, adjust selectors in scraper
+- Memory leaks: Ensure `page.close()` called (in finally block)
+
+---
+
+## Permissions & Settings
+
+Project uses `.claude/settings.json` with pre-configured allowlists for:
+- `npm run` and `npx` commands
+- Local dev server access (http://localhost:5173)
+- Scraper endpoint testing
+- Git commits
+- External domain scraping (WebFetch for venue sites)
+
+Future instances inherit these permissions. Add new patterns as needed.