diff options
| author | yyamashita <yyamashita@mosquit.one> | 2026-05-07 21:48:58 +0900 |
|---|---|---|
| committer | yyamashita <yyamashita@mosquit.one> | 2026-05-07 21:48:58 +0900 |
| commit | b8537eabe94b24e8530b4c1511456dc94cf8ec4c (patch) | |
| tree | c58223954c9ba0ff6120c170189d112d5ac9c3d8 | |
| parent | d5e975b601e70adf901c8e1eb7e61f0388941195 (diff) | |
Add CLAUDE.md with architecture and development guidance
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| -rw-r--r-- | CLAUDE.md | 332 |
1 files changed, 332 insertions, 0 deletions
diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..e9b176a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,332 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +**Tokyo Livehouse Events** is a full-stack web service that automatically scrapes and aggregates event information from major Tokyo live houses. Built with React Router v7 (SSR), SQLite (better-sqlite3), and Tailwind CSS, it allows users to search, filter, and view live music events across multiple venues. + +### Key Stack +- **Frontend**: React 19 + React Router v7 (SSR enabled) +- **Backend**: React Router Node server + SQLite database +- **Scraping**: Cheerio (HTML parsing) + Playwright (JS-heavy sites) +- **Styling**: Tailwind CSS v4 +- **Build**: Vite + React Router build system + +--- + +## Common Development Commands + +```bash +# Install dependencies +npm install + +# Development server (http://localhost:5173) +npm run dev + +# Type checking +npm run typecheck + +# Build for production +npm run build + +# Start production server (after build) +npm start + +# Run scraper (all venues) +npm run scrape + +# Run scraper for specific venue +npm run scrape liquid-room + +# List registered scrapers +npm run scrape -- --list +``` + +### Scraper CLI Details +The scraper CLI (`scripts/scrape.ts`) runs with `tsx` and supports: +- **No args**: Scrapes all 12 registered venues +- **Venue ID arg**: Scrapes single venue (e.g., `npm run scrape meets-otsuka`) +- **`--list` flag**: Displays all registered scrapers with IDs and names + +Output includes success count per venue, elapsed time, and exit code (0 = all success, 1 = any failure). + +--- + +## Architecture Overview + +### Directory Structure +``` +app/ +├── lib/ # Server-only utilities +│ ├── db.server.ts # SQLite database layer (better-sqlite3) +│ ├── scraper-runner.server.ts # Orchestrates scraping + Markdown generation +│ ├── playwright.server.ts # Shared browser instance for JS-heavy sites +│ ├── markdown-writer.server.ts # Generates events/<venue-id>.md files +│ └── venue-meta.server.ts # Server-only scraper metadata +│ +├── scrapers/ # Scraper modules (one per venue) +│ ├── base.ts # Scraper interface & VenueMeta type +│ ├── index.ts # Registry: ALL_SCRAPERS (12 venues) +│ ├── liquid-room.ts # Example: fetch + Cheerio +│ ├── flat-nishiogikubo.ts # Example: Playwright (Wix site) +│ ├── warp-kichijoji.ts # Implemented but NOT in ALL_SCRAPERS +│ ├── pitbar-nishiogikubo.ts # Implemented but NOT in ALL_SCRAPERS +│ └── [10 other venues] +│ +├── routes/ # React Router routes (config-mapped in routes.ts) +│ ├── index.tsx # Redirects to /events +│ ├── events._index.tsx # Event list with filtering +│ ├── events.$id.tsx # Event detail page +│ ├── events.by-date.tsx # Calendar/date-based view +│ ├── venues.tsx # Venue list + scrape status +│ ├── api.scrape.ts # POST/GET endpoint to trigger scraping +│ └── api.scrape-status.ts # GET endpoint for scrape job status +│ +├── components/ +│ ├── EventCard.tsx # Card view for events +│ ├── EventListRow.tsx # Row view for events +│ └── FilterBar.tsx # Search/filter form +│ +├── root.tsx # Root layout + error boundary +├── routes.ts # React Router route config (file-based) +└── app.css # Tailwind/global styles + +events/ # Auto-generated Markdown files (one per venue) +events.db # SQLite database (created at runtime) +scripts/ +└── scrape.ts # CLI scraper entry point +``` + +### Data Flow + +#### Scraping Pipeline +1. User calls `npm run scrape` (CLI) or `POST /api/scrape` (HTTP) +2. `scraper-runner.server.ts` → `runAllScrapers()` or `runScraper(venueId)` +3. For each scraper in `ALL_SCRAPERS`: + - Execute `scraper.scrape()` → returns `EventInput[]` + - Filter events within 35-day scraping window + - Upsert each event to SQLite via `db.server.ts` + - Log results to `scrape_logs` table +4. After success, generate Markdown files in `events/` directory +5. Close shared Playwright browser instance +6. Return results (CLI: pretty-printed output, HTTP: JSON with run_id) + +#### Web UI Data Access +1. Route loaders call `queryEvents()` or `getVenues()` from `db.server.ts` +2. Results rendered as React components +3. Filter form passes query params: `date_from`, `date_to`, `venue_id`, `keyword` +4. Pagination via `page` param (30 events per page) +5. View toggle: card vs. list (persisted in URL as `view` param) + +--- + +## Scraper Implementation Pattern + +Each scraper module exports: +- `venue`: VenueMeta object (id, name, url, area) +- `scraper`: Scraper object with `scrape()` async method + +### Typical Flow (Cheerio-based) +```typescript +export const scraper: Scraper = { + venue, + async scrape(): Promise<EventInput[]> { + const res = await fetch(venue_schedule_url); + const html = await res.text(); + const $ = cheerio.load(html); + const events: EventInput[] = []; + + $("selector-for-event-items").each((_, el) => { + const title = $(el).find(".title").text().trim(); + const date = parseJapaneseDate($(el).find(".date").text()); + // ... extract other fields + events.push({ venue_id: venue.id, title, date, ... }); + }); + return events; + } +}; +``` + +### Playwright-based (JS-required sites like Wix) +```typescript +export const scraper: Scraper = { + venue, + async scrape(): Promise<EventInput[]> { + const browser = await getBrowser(); // Singleton browser + const page = await browser.newPage(); + await page.goto(url, { waitUntil: "domcontentloaded" }); + // ... navigate, extract data via locators + await page.close(); + return events; + } +}; +``` + +### Key Utilities +- **Date parsing**: `parseJapaneseDate(str)` handles formats like "2025年06月15日", "06/15", etc. Defaults to current year if year is omitted. +- **URL handling**: `absoluteUrl(url, base)` converts relative to absolute URLs. +- **Event deduplication**: Filter by `date + title` if needed. +- **Scraping window**: Always ~35 days from today (`SCRAPE_WINDOW_DAYS` constant). + +--- + +## Database Schema + +### Tables +- **venues**: id (PK), name, url, area +- **events**: id (PK), venue_id (FK), title, artist, date, start_time, open_time, ticket_url, price, image_url, description, source_url, fetched_at + - Unique constraint: (venue_id, title, date) — prevents duplicates on re-scrape + - Indexes on date and venue_id +- **scrape_logs**: id (PK), run_id (UUID), venue_id, venue_name, status ("running"|"ok"|"error"), events_saved, error, started_at, finished_at + +### Key Functions (lib/db.server.ts) +- `getDb()`: Singleton database connection +- `upsertVenue()`, `upsertEvent()`: Insert or replace records +- `queryEvents(params)`: Search with filters (date range, venue, keyword) +- `getEvent(id)`: Single event detail +- `getVenues()`: All venues with event counts +- Scrape log functions: `insertScrapeLog()`, `updateScrapeLog()`, `getLatestScrapeRun()`, `getScrapeRunById()` + +--- + +## Adding a New Venue + +### Step 1: Create Scraper File +Create `app/scrapers/<venue-id>.ts` implementing the `Scraper` interface (see pattern above). + +### Step 2: Register in Index +Add import and entry to `app/scrapers/index.ts` → `ALL_SCRAPERS` array. + +### Step 3: Update Documentation +Add row to `SCRAPE_TARGETS.md` table. + +### Automated Approach (Claude Code Skill) +Run `/add-livehouse <Venue Name> <URL>` to get interactive guidance for venue addition. + +--- + +## API Endpoints + +### Scraping +- `POST /api/scrape` (form action) → Starts all scrapers in background, redirects to referrer, returns 202 +- `GET /api/scrape?venue_id=<id>` → Starts single venue scraper, returns `{ run_id, status: "started" }` (202) +- `GET /api/scrape` (no params) → Starts all scrapers, same as POST + +### Status +- `GET /api/scrape-status?run_id=<uuid>` → Returns scrape logs for specific run +- `GET /api/scrape-status` → Returns latest run's logs + +Both return `{ running: boolean, results: ScrapeLog[] }`. + +--- + +## React Router Specifics + +### Config-based Routing +Routes are defined explicitly in `app/routes.ts` using `index()`, `route()`, and `prefix()` helpers from `@react-router/dev/routes`. File names under `routes/` do not auto-determine paths — the mapping is set in `routes.ts`. Use `params.<name>` in loaders/components to access dynamic segments (e.g., `:id` → `params.id`). + +### SSR Configuration +Enabled in `react-router.config.ts` (`ssr: true`). All routes server-render by default. + +### Loaders & Actions +- Loaders: `export async function loader({ request, params }: Route.LoaderArgs)` +- Actions: `export async function action({ request }: Route.ActionArgs)` +- Data accessed via `useLoaderData<typeof loader>()` hook + +### Link Prefetch +Use `<Link>` from react-router for client-side navigation (no full page reload). + +--- + +## Important Implementation Details + +### Scrape Window +- Fixed at 35 days from today +- Filters applied in `scraper-runner.server.ts` → `withinWindow(event, from, to)` +- Events outside this range are discarded before DB insert + +### Shared Playwright Browser +- Singleton instance via `getBrowser()` in `playwright.server.ts` +- Only created if a scraper calls it +- Closed after each scraping run via `closeBrowser()` +- Used by Wix sites (FLAT 西荻窪) and other JS-heavy venues + +### Event Deduplication +- Unique constraint: `(venue_id, title, date)` prevents duplicates +- On conflict: all fields (except date/venue/title) are updated → re-scrape refreshes data +- No manual cleanup needed + +### Markdown Generation +- Auto-generated in `events/<venue-id>.md` after successful scrape +- Table format: date | artist | title | time | price | URL +- Marked as auto-generated (warns against manual editing) +- Regenerated on each scrape (overwrites previous) + +### Date Parsing +- Handles Japanese format: "2025年06月15日", "2025/06/15", "06/15" (infers current year) +- Converted to ISO format (YYYY-MM-DD) for database and API +- Display converted back to Japanese in UI (e.g., "2025/06/15(日)") + +--- + +## Environment & Prerequisites + +- **Node.js**: 20.12+ (required for `styleText` API used in scraper CLI) +- **npm**: latest +- **Optional**: Playwright binary (auto-installed on first `npm install`) + +For Docker: See `Dockerfile` (multi-stage build, production uses node:20-alpine). + +--- + +## Type Safety + +The project uses strict TypeScript with path aliases: +- `~/*` maps to `app/*` (configured in `tsconfig.json`) +- Run `npm run typecheck` to validate before commits +- React Router auto-generates types via `@react-router/dev` (`.react-router/types/`) + +--- + +## Key Files Not to Miss + +- **`app/lib/db.server.ts`**: Database schema and query interface — essential for understanding data layer +- **`app/lib/scraper-runner.server.ts`**: Orchestration logic — where scraping window and dedup happen +- **`app/scrapers/base.ts`**: Scraper interface — defines contract all venues must implement +- **`app/scrapers/liquid-room.ts`** & **`flat-nishiogikubo.ts`**: Templates for simple (Cheerio) and complex (Playwright) scrapers +- **`app/routes/events._index.tsx`**: Main event list with filters — shows how DB queries integrate with UI +- **`SCRAPE_TARGETS.md`**: Live reference of all 12+ venues, their status, and scraper locations + +--- + +## Debugging + +### Scraper Issues +- Check `events.db` with SQLite browser to inspect DB state +- Re-run single venue: `npm run scrape <venue-id>` +- Inspect scrape logs: `GET /api/scrape-status` endpoint in browser +- Add console.log in scraper for debugging (visible in CLI output) + +### Database Queries +- Direct SQL via `getDb().prepare(...).all()` in `lib/db.server.ts` +- WAL mode enabled → check `events.db-shm` and `events.db-wal` if DB locked +- Foreign keys enforced → cannot delete venues with events + +### Playwright Issues +- Headless browser logs: Add `{ headless: false }` to `chromium.launch()` to see browser +- Timeout errors: Check site structure changed, adjust selectors in scraper +- Memory leaks: Ensure `page.close()` called (in finally block) + +--- + +## Permissions & Settings + +Project uses `.claude/settings.json` with pre-configured allowlists for: +- `npm run` and `npx` commands +- Local dev server access (http://localhost:5173) +- Scraper endpoint testing +- Git commits +- External domain scraping (WebFetch for venue sites) + +Future instances inherit these permissions. Add new patterns as needed. |
