rifle through the pockets of an upstream art station to retrieve the optimal image quality available.
Find a file
arcayr e6433906a8
update readme with scraper details
bluesky doesn't use regex really, so the implementation detail is moved to each section.
2025-11-12 18:45:22 +11:00
src add bluesky upstream 2025-11-12 18:32:30 +11:00
.gitignore initial commit 2025-11-10 21:18:10 +11:00
Cargo.toml rename scrapeengine trait functions 2025-11-12 18:28:18 +11:00
env.example add env.example 2025-11-12 18:35:55 +11:00
README.md update readme with scraper details 2025-11-12 18:45:22 +11:00

pickpocket

rifle through the pockets of an upstream art station to retrieve the optimal image quality available.

installation

git clone, edit .env (or set the same environment variables), and cargo run --release.

usage

deviantart

standard deviantart post urls (e.g., https://www.deviantart.com/username/art/art-id-123123123123123) will retrieve the primary image from the post's srcset, which includes size variations.

this scraper uses regex, which is brittle but ultimately just as bad as trying to parse the xml doctree when it comes to deviantart.

bluesky

standard bluesky post urls (e.g., https://bsky.app/profile/username/post/post-id) will retrieve all available images in the post.

this scraper uses a combination of json parsing and regex. it uses the official bsky apis, so is unlikely to change.

troubleshooting

pickpocket uses tracing to provide deeper debugging information if required. simply run with cargo run --debug.

license

agpl v3.0

credits