Datasets

Open datasets

Machine-readable snapshots of the content we think is most useful for AI assistants, Common Crawl, and downstream reuse. All datasets are published under the Creative Commons Attribution 4.0 International license; please credit "Azumuta — azumuta.com" when reusing.