While most pages within AntWiki are individually assembled by dedicated contributors, some pages, "AntWiki Reports", contain information compiled and synthesised from data harvested from other AntWiki pages. These Report pages are periodically updated, normally after a heavy edit session by an editor.
The data used in these reports are harvested and assembled into CSV files. These files are available directly from AntWiki, and include the following.
- List of valid genera (names in use) (tab-delimited text) (Date: 2023-03-05)
- List of invalid genera (names not in use) (tab-delimited text) (Date: 2023-03-05)
- List of valid species (names in use) (tab-delimited text) (Date: 2023-03-05)
- List of invalid species (names not in use) (tab-delimited text) (Date: 2023-03-05)
- Type specimen details (tab-delimited text) (Date: 2023-03-05)
- List of valid fossil genera (names in use) (tab-delimited text) (Date: 2022-12-16)
- List of valid fossil species (names in use) (tab-delimited text) (Date: 2023-03-05)
- World Distribution
- Distribution based on World Distribution Data (AntWiki Data (tab-delimited text)) (Date: 2014-11-17)
- Distribution based on Regional Taxon Lists (combined Regional Taxon Lists data (tab-delimited text)) (Date: 2023-03-05)
Data is harvested primarily from templates found on individual pages. Templates provide highly structured (and therefore predictable) data that can be parsed and interpreted relatively easily and reliably. In a few cases data is parsed directly from pages. This is only done when pages are relatively simple and have constant formatting across all relevant pages.
The resulting CSV files are imported into the relational database SQLite. This SQLite database is used by a set of relatively small Visual Studio C# modules to create text for each Report. Once complete, this text is uploaded to AntWiki, refreshing the existing pages with the latest information. The harvesting and compiling processes are automated and require minimal human interaction. At the same time, the harvesting process checks each page for errors and inconsistencies. When these are encountered the harvester stops with an error message, and does not continue until a human has corrected the error and restarted the harvester.
It should be noted that since Report pages are automatically updated, any changes made directly to these pages will be lost during the update process. Required changes should be made to the pages holding these data, primarily species pages and regional taxon list pages, rather than to the Report page itself.