publiée le 14/06/2026 à 10:47 sur LIGHTON

LightOn Expands OCR Model to Arabic with Targeted Training

LightOn has successfully extended its document understanding model, LightOnOCR-2, to support the Arabic language. This adaptation was achieved through targeted fine-tuning, utilizing a synthetic data generation pipeline. The data set included 12,000 synthetic pages with reference transcriptions, highlighting the model’s ability to handle complex Arabic script challenges.

Arabic OCR is challenging due to its right-to-left script, cursive characters, and underrepresentation in datasets compared to Latin-based languages. This development aims to ease document processing for organizations in the Middle East, offering an enterprise-grade, open-source solution under the Apache 2.0 license.

Guides for the fine-tuning process are available on LightOn's Hugging Face space, enhancing accessibility for users and extending potential applications for the model. LightOnOCR-2 continues to be central to LightOn's self-service offering, LightOn Console, ensuring a consistent technological foundation.

R. H.

Copyright © 2026 FinanzWire, tous droits de reproduction et de représentation réservés.
Clause de non responsabilité : bien que puisées aux meilleures sources, les informations et analyses diffusées par FinanzWire sont fournies à titre indicatif et ne constituent en aucune manière une incitation à prendre position sur les marchés financiers.

Document Automation Open Source LightOnOCR-2 Arabic OCR Targeted Training

Cliquez ici pour consulter le communiqué de presse ayant servi de base à la rédaction de cette brève

Voir toutes les actualités de LIGHTON