
🕷️ Turn complete documentation into clean Markdown for AI agents in minutes
Crawling documentation sites seems simple but is complex: nested pages, repeated navigation links, inconsistent content… Olostep solves it with one API.
🔧 The stack:
pip install olostep python-dotenv tqdm📜 The script in 3 steps:
- Configure the crawl — start URL, max depth, pages, include/exclude rules
- Extract as Markdown — Olostep returns content already cleaned and structured
- Save locally — each page as a
.mdfile ready for RAG or agents
⚡ Real speed: 50 pages with depth 5 → ~50 seconds
🆚 Why not Scrapy or Selenium?
- Scrapy requires lots of setup as a full framework
- Selenium is for browser automation, not documentation crawling
- Olostep: search + crawl + scrape + structure in one API, with LLM-friendly output
🎛️ Bonus: The article includes a Gradio app to crawl without touching code.
💡 Explanation in a nutshell#
An AI agent is only as good as the context it receives. To give it access to complete documentation (like Claude’s or FastAPI’s docs), you first need to convert those pages into clean text. Olostep automates that process: give it a URL and it returns the content ready to feed your RAG system.
More information at the link 👇
Also published on LinkedIn.

