LIB-14 - trckr

← LIB-14

feature,infra

Created: 2025-12-25 • Updated: 2025-12-25

Relationships Loading...

## Goal
Deploy jinaai/ReaderLM-v2 (bfloat16) on chungus using vLLM or Triton for local HTML-to-Markdown conversion without rate limits.

## Background
The docs-to-md tool uses Jina's Reader API by default (20 RPM free tier). We want a local alternative for batch processing ReadTheDocs sites.

## Requirements
- **Model**: jinaai/ReaderLM-v2 (bfloat16 version from HuggingFace)
- **NOT**: reader-lm:1.5b or any Ollama GGUF versions (these don't work correctly)
- **GPU**: Must run on a free RTX 3090 (GPU 4 on chungus)
- **Framework**: vLLM or Triton (dockerized preferred, image exists: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3)

## Research Findings
- Ollama GGUF versions (milkey/reader-lm-v2) hallucinate instead of converting
- Official support is HuggingFace Transformers + vLLM, not Ollama
- Model needs bfloat16 for correct behavior
- 1.54B params, 512K context, 29 languages

## Acceptance Criteria
1. ReaderLM-v2 running as a service on chungus (GPU 4)
2. OpenAI-compatible API endpoint for d2m integration
3. Successful HTML-to-Markdown conversion test
4. Documentation for starting/stopping the service

## References
- https://huggingface.co/jinaai/ReaderLM-v2
- https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown/

Pull Requests (comma-separated URLs)

Attachments

Comments (5)

Claude · 2025-12-25

## Verification Complete All acceptance criteria verified: 1. **ReaderLM-v2 running on GPU 4** - Container `readerlm-v2` up for 6+ hours 2. **OpenAI-compatible API** - Responding at `http://localhost:10010/v1` 3. **HTML-to-Markdown conversion tested**: - Simple HTML: `<h1>Test</h1>` → `# Test` ✓ - Complex Notion HTML (900KB): Successfully chunked and converted to 21KB markdown ✓ 4. **Documentation complete** - CLAUDE.md updated with usage instructions Scripts verified: - `scripts/start_readerlm.sh` - Starts vLLM container with health checks - `scripts/stop_readerlm.sh` - Stops and removes container - `scripts/html_to_markdown.py` - CLI with --inplace, --dry-run support

agent · 2025-12-25

## Implementation Complete ### What was done 1. **Deployed ReaderLM-v2 via vLLM** on GPU 4 (RTX 3090) on chungus - Model: jinaai/ReaderLM-v2 (1.54B params, bfloat16) - Port: 10010 (OpenAI-compatible API) - VRAM usage: ~22GB (with KV cache pre-allocation) 2. **Created start/stop scripts** - `scripts/start_readerlm.sh`: Starts vLLM container with health checks - `scripts/stop_readerlm.sh`: Stops and removes container 3. **Created conversion utility** - `scripts/html_to_markdown.py`: CLI tool for batch processing - Supports chunking for large files (tested on 900KB Notion HTML -> 20KB markdown) - Detects HTML vs markdown to skip already-clean files ### Test Results - Simple HTML: Works perfectly (headings, lists, links, images) - Full Notion HTML pages: Successfully extracted content from 900KB HTML - Weaviate MDX files: Converted JSX/HTML to markdown ### Performance - Prompt throughput: ~1500 tokens/s - Generation throughput: ~150 tokens/s - Conversion time: ~30-60 seconds for large files with chunking ### Files Changed - `scripts/start_readerlm.sh`: vLLM startup script - `scripts/stop_readerlm.sh`: vLLM shutdown script - `scripts/html_to_markdown.py`: CLI conversion utility - `CLAUDE.md`: Updated with documentation ### How to Use ```bash # Start the service ./scripts/start_readerlm.sh # Convert files python scripts/html_to_markdown.py input.html output.md python scripts/html_to_markdown.py file.md --inplace # Stop the service ./scripts/stop_readerlm.sh ``` ### Notes - Service is currently running on GPU 4 (port 10010) - Uses `restart: unless-stopped` so it will survive reboots - Ready for integration with d2m or batch processing scripts

agent · 2025-12-25

## Reference: Jina AI docs available Jina AI documentation is available in llm-code-docs for reference when implementing: `~/github/llm-code-docs/docs/*/jina*/` This includes ReaderLM usage examples, API documentation, and best practices.

agent · 2025-12-25

## Priority Test Data: Pure HTML Files **98 files are full HTML pages** incorrectly saved as .md - these are the priority conversion targets. ### Location `~/github/llm-code-docs/docs/llms-txt/notion/` - 97 files `~/github/llm-code-docs/docs/llms-txt/huggingface-hub/` - 1 file ### Example `docs/llms-txt/notion/webhooks.md` starts with: ```html <!DOCTYPE html><html lang="en" style="" data-color-mode="light"... ``` These are full scraped web pages with: - Complete `<html>`, `<head>`, `<body>` structure - CSS/JS includes (should be stripped) - Actual content buried in the DOM ### Why these are ideal test cases 1. Real-world messy HTML (not clean examples) 2. Need to extract content from complex page structure 3. Must strip scripts, styles, navigation 4. Preserving code blocks and API documentation This is exactly what ReaderLM-v2 was designed for.

agent · 2025-12-25

## Test Data **910 markdown files in llm-code-docs contain raw HTML** that should be converted to clean markdown. ### Location `~/github/llm-code-docs/docs/github-scraped/` - especially weaviate docs (MDX with JSX/HTML) ### Example files to test - `docs/github-scraped/weaviate/docs-cloud-manage-clusters-status.md` - `docs/github-scraped/weaviate/docs-cloud-platform-billing.md` - `docs/github-scraped/weaviate/docs-agents-personalization-tutorial-recipe-recommender.md` ### What needs converting These files contain: - `<div style={{...}}>` JSX blocks - `<iframe>` embeds - `<details>`/`<summary>` HTML - `<script>` tags (should be removed) - `<br />` tags ### Success criteria Convert HTML elements to markdown equivalents or remove non-content elements (scripts, iframes, style blocks).