Files

203 lines
6.2 KiB
Markdown

# Async HTTP Queue Fetch URLs
Lightweight, parallel HTTP fetching library for Emacs using `url-retrieve` with configurable concurrency limits.
## Why Use This Library?
While Emacs has several HTTP libraries, `async-http-queue-fetch-urls` fills a specific need: **high-level batch HTTP fetching with controlled concurrency**.
### Comparison with Existing Solutions
**Built-in `url-queue-retrieve`**
- ❌ Low-level API: Requires manual callback management per URL
- ❌ No batch processing: Must write your own loop and aggregation
- ❌ Global configuration: Uses global variables instead of per-call parameters
- ❌ Order not preserved: Results arrive in completion order, not request order
**Third-party libraries ([plz.el](https://github.com/alphapapa/plz.el), [request.el](https://github.com/tkf/emacs-request))**
- ❌ Single-request focused: Designed for one URL at a time
- ❌ No built-in queuing: Manual implementation needed for batch operations
- 🟡 External dependencies: Some require curl (though more performant)
**This library (`async-http-queue-fetch-urls`)**
- ✅ High-level batch API: One function call for multiple URLs
- ✅ Order preservation: Results vector matches input URL order
- ✅ Per-call configuration: Keyword arguments instead of global state
- ✅ Configurable parser: JSON by default, customizable or raw text
- ✅ Progress tracking: Automatic messages for large batches
- ✅ No external dependencies: Only built-in `url-retrieve`
- ✅ Clean callback pattern: Single callback with all results
### When to Use This Library
Use `async-http-queue-fetch-urls` when you need to:
- Fetch multiple URLs in parallel (API endpoints, RSS feeds, web scraping)
- Control concurrency to avoid overwhelming servers
- Maintain result order corresponding to input URLs
- Get all results in a single callback with simple error handling
- Parse responses consistently (JSON, XML, or custom formats)
For single requests or curl-based performance, consider [plz.el](https://github.com/alphapapa/plz.el) or [request.el](https://github.com/tkf/emacs-request) instead.
## Features
- Parallel downloads with configurable concurrency (default: 5)
- Automatic timeout handling (default: 10 seconds)
- Custom parser support (default: `json-parse-buffer`)
- Progress tracking for large batches
- Error handling per request
- Maintains original URL order in results
## Requirements
Emacs 28.1 or later.
## Installation
### use-package with :vc (Emacs 29+)
```el
(use-package async-http-queue-fetch-urls
:vc (:url "https://git.andros.dev/andros/async-http-queue-fetch-urls-el"
:rev :newest))
```
### use-package with :load-path
```el
(use-package async-http-queue-fetch-urls
:load-path "/path/to/async-http-queue-fetch-urls-el")
```
### Manual
Clone the repository and add to your `load-path`:
```bash
git clone https://git.andros.dev/andros/async-http-queue-fetch-urls-el.git
```
Then in your config:
```el
(add-to-list 'load-path "/path/to/async-http-queue-fetch-urls-el")
(require 'async-http-queue-fetch-urls)
```
## Usage
### Basic JSON API Example
```el
(async-http-queue-fetch-urls
'("https://api.example.com/posts/1"
"https://api.example.com/posts/2"
"https://api.example.com/posts/3")
:callback (lambda (results)
(message "Got %d results" (length results))
(dolist (result results)
(when result
(message "Title: %s" (alist-get 'title result))))))
```
### Custom Concurrency and Timeout
```el
(async-http-queue-fetch-urls
my-url-list
:max-concurrent 10
:timeout 20
:callback (lambda (results)
(message "Fetched %d URLs" (length results))))
```
### Raw Text Instead of JSON
```el
(async-http-queue-fetch-urls
'("https://example.com/page1.html"
"https://example.com/page2.html")
:parser nil ; Return raw text
:callback (lambda (results)
(dolist (html results)
(when html
(message "Page length: %d chars" (length html))))))
```
### Custom Parser
```el
(async-http-queue-fetch-urls
'("https://example.com/data.xml")
:parser (lambda ()
(libxml-parse-xml-region (point) (point-max)))
:callback (lambda (results)
(message "Parsed XML: %S" results)))
```
### Error Handling
```el
(async-http-queue-fetch-urls
my-urls
:callback (lambda (results)
(let ((successful (seq-filter #'identity results)))
(message "Successfully fetched %d/%d URLs"
(length successful)
(length results))))
:error-callback (lambda (url)
(message "Failed to fetch: %s" url)))
```
## API
### async-http-queue-fetch-urls
```
(async-http-queue-fetch-urls URLS &key CALLBACK ERROR-CALLBACK MAX-CONCURRENT TIMEOUT PARSER)
```
Fetch URLS asynchronously in parallel and call CALLBACK with results.
**Parameters:**
- `URLS` - List of URL strings to fetch
- `:callback` - Function called with vector of results when complete. Failed requests are represented as `nil`
- `:error-callback` - Optional function called for each failed URL with the URL as argument
- `:max-concurrent` - Maximum number of parallel downloads (default: 5)
- `:timeout` - Maximum time in seconds per request (default: 10)
- `:parser` - Function to parse response bodies (default: `json-parse-buffer`). Set to `nil` for raw text
**Returns:** Immediately (non-blocking). Results are delivered via callback.
## Performance
The library uses `url-retrieve` with controlled concurrency to avoid overwhelming servers or network connections. Default settings (5 concurrent requests) work well for most APIs.
For fast, reliable APIs, you can increase concurrency:
```el
:max-concurrent 10 ; or higher
```
For rate-limited APIs, decrease concurrency:
```el
:max-concurrent 2
```
## Contributing
Contributions are welcome! Please see the [contribution guidelines](https://git.andros.dev/andros/contribute) for instructions on how to submit issues or pull requests.
## License
Copyright (C) 2025 Andros Fenollosa
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
See LICENSE file for details.