# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**Tokyo Livehouse Events** is a full-stack web service that automatically scrapes and aggregates event information from major Tokyo live houses. Built with React Router v7 (SSR), SQLite (better-sqlite3), and Tailwind CSS, it allows users to search, filter, and view live music events across multiple venues.

### Key Stack
- **Frontend**: React 19 + React Router v7 (SSR enabled)
- **Backend**: React Router Node server + SQLite database
- **Scraping**: Cheerio (HTML parsing) + Playwright (JS-heavy sites)
- **Styling**: Tailwind CSS v4
- **Build**: Vite + React Router build system

---

## Common Development Commands

```bash
# Install dependencies
npm install

# Development server (http://localhost:5173)
npm run dev

# Type checking
npm run typecheck

# Build for production
npm run build

# Start production server (after build)
npm start

# Run scraper (all venues)
npm run scrape

# Run scraper for specific venue
npm run scrape liquid-room

# List registered scrapers
npm run scrape -- --list
```

### Scraper CLI Details
The scraper CLI (`scripts/scrape.ts`) runs with `tsx` and supports:
- **No args**: Scrapes all 12 registered venues
- **Venue ID arg**: Scrapes single venue (e.g., `npm run scrape meets-otsuka`)
- **`--list` flag**: Displays all registered scrapers with IDs and names

Output includes success count per venue, elapsed time, and exit code (0 = all success, 1 = any failure).

---

## Architecture Overview

### Directory Structure
```
app/
├── lib/                              # Server-only utilities
│   ├── db.server.ts                  # SQLite database layer (better-sqlite3)
│   ├── scraper-runner.server.ts      # Orchestrates scraping + Markdown generation
│   ├── playwright.server.ts          # Shared browser instance for JS-heavy sites
│   ├── markdown-writer.server.ts     # Generates events/<venue-id>.md files
│   └── venue-meta.server.ts          # Server-only scraper metadata
│
├── scrapers/                         # Scraper modules (one per venue)
│   ├── base.ts                       # Scraper interface & VenueMeta type
│   ├── index.ts                      # Registry: ALL_SCRAPERS (15 venues)
│   ├── liquid-room.ts                # Example: fetch + Cheerio
│   ├── flat-nishiogikubo.ts          # Example: Playwright (Wix site)
│   ├── warp-kichijoji.ts             # fetch + Cheerio
│   ├── pitbar-nishiogikubo.ts        # Playwright (freecalend.com)
│   └── [11 other venues]
│
├── routes/                           # React Router routes (config-mapped in routes.ts)
│   ├── index.tsx                     # Redirects to /events
│   ├── events._index.tsx             # Event list with filtering
│   ├── events.$id.tsx                # Event detail page
│   ├── events.by-date.tsx            # Calendar/date-based view
│   ├── venues.tsx                    # Venue list + scrape status
│   ├── api.scrape.ts                 # POST/GET endpoint to trigger scraping
│   └── api.scrape-status.ts          # GET endpoint for scrape job status
│
├── components/
│   ├── EventCard.tsx                 # Card view for events
│   ├── EventListRow.tsx              # Row view for events
│   └── FilterBar.tsx                 # Search/filter form
│
├── root.tsx                          # Root layout + error boundary
├── routes.ts                         # React Router route config (file-based)
└── app.css                           # Tailwind/global styles

events/                               # Auto-generated Markdown files (one per venue)
events.db                             # SQLite database (created at runtime)
scripts/
└── scrape.ts                         # CLI scraper entry point
```

### Data Flow

#### Scraping Pipeline
1. User calls `npm run scrape` (CLI) or `POST /api/scrape` (HTTP)
2. `scraper-runner.server.ts` → `runAllScrapers()` or `runScraper(venueId)`
3. For each scraper in `ALL_SCRAPERS`:
   - Execute `scraper.scrape()` → returns `EventInput[]`
   - Filter events within 35-day scraping window
   - Upsert each event to SQLite via `db.server.ts`
   - Log results to `scrape_logs` table
4. After success, generate Markdown files in `events/` directory
5. Close shared Playwright browser instance
6. Return results (CLI: pretty-printed output, HTTP: JSON with run_id)

#### Web UI Data Access
1. Route loaders call `queryEvents()` or `getVenues()` from `db.server.ts`
2. Results rendered as React components
3. Filter form passes query params: `date_from`, `date_to`, `venue_id`, `keyword`
4. Pagination via `page` param (30 events per page)
5. View toggle: card vs. list (persisted in URL as `view` param)

---

## Scraper Implementation Pattern

Each scraper module exports:
- `venue`: VenueMeta object (id, name, url, area)
- `scraper`: Scraper object with `scrape()` async method

### Typical Flow (Cheerio-based)
```typescript
export const scraper: Scraper = {
  venue,
  async scrape(): Promise<EventInput[]> {
    const res = await fetch(venue_schedule_url);
    const html = await res.text();
    const $ = cheerio.load(html);
    const events: EventInput[] = [];
    
    $("selector-for-event-items").each((_, el) => {
      const title = $(el).find(".title").text().trim();
      const date = parseJapaneseDate($(el).find(".date").text());
      // ... extract other fields
      events.push({ venue_id: venue.id, title, date, ... });
    });
    return events;
  }
};
```

### Playwright-based (JS-required sites like Wix)
```typescript
export const scraper: Scraper = {
  venue,
  async scrape(): Promise<EventInput[]> {
    const browser = await getBrowser(); // Singleton browser
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: "domcontentloaded" });
    // ... navigate, extract data via locators
    await page.close();
    return events;
  }
};
```

### Key Utilities
- **Date parsing**: `parseJapaneseDate(str)` handles formats like "2025年06月15日", "06/15", etc. Defaults to current year if year is omitted.
- **URL handling**: `absoluteUrl(url, base)` converts relative to absolute URLs.
- **Event deduplication**: Filter by `date + title` if needed.
- **Scraping window**: Always ~35 days from today (`SCRAPE_WINDOW_DAYS` constant).

---

## Database Schema

### Tables
- **venues**: id (PK), name, url, area
- **events**: id (PK), venue_id (FK), title, artist, date, start_time, open_time, ticket_url, price, image_url, description, source_url, fetched_at
  - Unique constraint: (venue_id, title, date) — prevents duplicates on re-scrape
  - Indexes on date and venue_id
- **scrape_logs**: id (PK), run_id (UUID), venue_id, venue_name, status ("running"|"ok"|"error"), events_saved, error, started_at, finished_at

### Key Functions (lib/db.server.ts)
- `getDb()`: Singleton database connection
- `upsertVenue()`, `upsertEvent()`: Insert or replace records
- `queryEvents(params)`: Search with filters (date range, venue, keyword)
- `getEvent(id)`: Single event detail
- `getVenues()`: All venues with event counts
- Scrape log functions: `insertScrapeLog()`, `updateScrapeLog()`, `getLatestScrapeRun()`, `getScrapeRunById()`

---

## Adding a New Venue

### Step 1: Create Scraper File
Create `app/scrapers/<venue-id>.ts` implementing the `Scraper` interface (see pattern above).

### Step 2: Register in Index
Add import and entry to `app/scrapers/index.ts` → `ALL_SCRAPERS` array.

### Step 3: Update Documentation
Add row to `SCRAPE_TARGETS.md` table.

### Automated Approach (Claude Code Skill)
Run `/add-livehouse <Venue Name> <URL>` to get interactive guidance for venue addition.

---

## API Endpoints

### Scraping
- `POST /api/scrape` (form action) → Starts all scrapers in background, redirects to referrer, returns 202
- `GET /api/scrape?venue_id=<id>` → Starts single venue scraper, returns `{ run_id, status: "started" }` (202)
- `GET /api/scrape` (no params) → Starts all scrapers, same as POST

### Status
- `GET /api/scrape-status?run_id=<uuid>` → Returns scrape logs for specific run
- `GET /api/scrape-status` → Returns latest run's logs

Both return `{ running: boolean, results: ScrapeLog[] }`.

---

## React Router Specifics

### Config-based Routing
Routes are defined explicitly in `app/routes.ts` using `index()`, `route()`, and `prefix()` helpers from `@react-router/dev/routes`. File names under `routes/` do not auto-determine paths — the mapping is set in `routes.ts`. Use `params.<name>` in loaders/components to access dynamic segments (e.g., `:id` → `params.id`).

### SSR Configuration
Enabled in `react-router.config.ts` (`ssr: true`). All routes server-render by default.

### Loaders & Actions
- Loaders: `export async function loader({ request, params }: Route.LoaderArgs)`
- Actions: `export async function action({ request }: Route.ActionArgs)`
- Data accessed via `useLoaderData<typeof loader>()` hook

### Link Prefetch
Use `<Link>` from react-router for client-side navigation (no full page reload).

---

## Important Implementation Details

### Scrape Window
- Fixed at 35 days from today
- Filters applied in `scraper-runner.server.ts` → `withinWindow(event, from, to)`
- Events outside this range are discarded before DB insert

### Shared Playwright Browser
- Singleton instance via `getBrowser()` in `playwright.server.ts`
- Only created if a scraper calls it
- Closed after each scraping run via `closeBrowser()`
- Used by Wix sites (FLAT 西荻窪) and other JS-heavy venues

### Event Deduplication
- Unique constraint: `(venue_id, title, date)` prevents duplicates
- On conflict: all fields (except date/venue/title) are updated → re-scrape refreshes data
- No manual cleanup needed

### Markdown Generation
- Auto-generated in `events/<venue-id>.md` after successful scrape
- Table format: date | artist | title | time | price | URL
- Marked as auto-generated (warns against manual editing)
- Regenerated on each scrape (overwrites previous)

### Date Parsing
- Handles Japanese format: "2025年06月15日", "2025/06/15", "06/15" (infers current year)
- Converted to ISO format (YYYY-MM-DD) for database and API
- Display converted back to Japanese in UI (e.g., "2025/06/15（日）")

---

## Environment & Prerequisites

- **Node.js**: 20.12+ (required for `styleText` API used in scraper CLI)
- **npm**: latest
- **Optional**: Playwright binary (auto-installed on first `npm install`)

For Docker: See `Dockerfile` (multi-stage build, production uses node:20-alpine).

---

## Type Safety

The project uses strict TypeScript with path aliases:
- `~/*` maps to `app/*` (configured in `tsconfig.json`)
- Run `npm run typecheck` to validate before commits
- React Router auto-generates types via `@react-router/dev` (`.react-router/types/`)

---

## Key Files Not to Miss

- **`app/lib/db.server.ts`**: Database schema and query interface — essential for understanding data layer
- **`app/lib/scraper-runner.server.ts`**: Orchestration logic — where scraping window and dedup happen
- **`app/scrapers/base.ts`**: Scraper interface — defines contract all venues must implement
- **`app/scrapers/liquid-room.ts`** & **`flat-nishiogikubo.ts`**: Templates for simple (Cheerio) and complex (Playwright) scrapers
- **`app/routes/events._index.tsx`**: Main event list with filters — shows how DB queries integrate with UI
- **`SCRAPE_TARGETS.md`**: Live reference of all 12+ venues, their status, and scraper locations

---

## Debugging

### Scraper Issues
- Check `events.db` with SQLite browser to inspect DB state
- Re-run single venue: `npm run scrape <venue-id>`
- Inspect scrape logs: `GET /api/scrape-status` endpoint in browser
- Add console.log in scraper for debugging (visible in CLI output)

### Database Queries
- Direct SQL via `getDb().prepare(...).all()` in `lib/db.server.ts`
- WAL mode enabled → check `events.db-shm` and `events.db-wal` if DB locked
- Foreign keys enforced → cannot delete venues with events

### Playwright Issues
- Headless browser logs: Add `{ headless: false }` to `chromium.launch()` to see browser
- Timeout errors: Check site structure changed, adjust selectors in scraper
- Memory leaks: Ensure `page.close()` called (in finally block)

---

## Permissions & Settings

Project uses `.claude/settings.json` with pre-configured allowlists for:
- `npm run` and `npx` commands
- Local dev server access (http://localhost:5173)
- Scraper endpoint testing
- Git commits
- External domain scraping (WebFetch for venue sites)

Future instances inherit these permissions. Add new patterns as needed.