Zum Hauptinhalt springen
Zurück zur Übersicht

Detailinformationen

Softwarename

swissgeol-assets-dataextraction

Kurzbeschreibung

Classification pipeline for categorising PDF pages from geological reports into document classes

Dokumentation

A pipeline that classifies pages from geological PDF files into various categories such as "map" or "borehole profile", and extracts relevant metadata from the files. The output can be used to automatically generate a table of contents for each document. The pipeline is used in this way for the geological documents that are published on the web application assets.swissgeol.ch. The repository supports two classifier approaches: a feature-based XGBoost model (default) and Pixtral Large via Amazon Bedrock.

Softwareversion

Lizenz

AGPL-3.0-only

Publiccode.yml Version

0.5.0