📜 Medieval Multilingual Corpus App

Overview

This repository contains the code for a web-based application form created for the Multilingual Corpus of the AQUILIGN — Multilingual Aligner and Collator project.

The application was built using Streamlit to facilitate the structured collection, organization, and storage of textual data for historical and linguistic research.

🎯 Purpose

The primary goal of this application is to streamline the collection process of historical multilingual texts by allowing users to insert, organize, and store textual data in a structured format.

This tool ensures that all textual sources are well-documented, traceable, and accessible for further analysis. It provides an interface for submitting texts, viewing statistical insights, and managing stored records.

📌 Links:

Application Form (Currently private)
Compiled Data CSV

Why This Corpus?

This dataset is intended for training machine learning models for text segmentation—a key task in breaking down texts into meaningful linguistic units. This work is crucial for improving text accessibility and analysis, particularly for historical documents.

The corpus primarily consists of prose texts from the 13th to 15th centuries, with the possibility of extending into the mid-16th century. Texts have been selected based on thematic diversity to create a rich dataset for model training and research purposes.

📌 Features: Medieval Multilingual Corpus App

The application consists of four main pages:

App – The main form for text submission.
Display TXT – View and convert XML files to plain text.
Stats – Visualize statistical insights about the uploaded texts.
Texts – Manage and search all submitted texts.

🔹 1. App (Submission Form)

This page allows users to input detailed metadata about each text, including:

Source information
File uploads (TXT and/or XML)
Mandatory fields (marked with *)

🚨 Note: Certain fields are required before submission to ensure completeness.

🔹 2. Display TXT

View and convert XML files into plain text format for easier readability and processing.

🔹 3. Stats (Text Statistics)

Provides automatically generated statistics based on the submitted corpus.
Helps analyze trends, distribution, and text characteristics.

🔹 4. Texts (Corpus Management)

View all submitted texts in a searchable format.
Download texts as CSV for external use.
Modify or delete entries as needed.

📜 Licensing

The texts within this corpus are released under the CC BY-NC-SA license. This allows: ✅ Adaptation, remixing, and further development
✅ Non-commercial use
✅ Proper attribution to original authors
✅ Sharing under the same licensing terms

For full details and source citations, refer to the "sources" and "corpus" columns in the compiled data CSV.

🤝 Contributing

Contributions are welcome! Please fork this repository and submit a pull request for any enhancements or bug fixes.

📞 Contact

For questions or suggestions, please open an issue in this repository.

🚀 Getting Started

Clone the Repository

git clone https://github.com/carolisteia/mulada.git
cd mulada

Install dependencies
```
pip install -r requirements.txt
```
Run the Streamlit App
```
 streamlit run app.py
```
Access the Application Open your browser and navigate to http://localhost:8501

🗂️ Project Structure

   mulada/
   ├── backup/             # Backup files
   ├── data/               # Data files
   ├── pages/              # Streamlit pages
   ├── txts/               # Text files
   ├── utils/              # Utility scripts
   ├── xmls/               # XML files
   ├── .gitignore          # Git ignore file
   ├── app.py              # Main application script
   ├── data.csv            # Compiled data CSV
   ├── options.py          # Options configuration
   ├── requirements.txt    # Python dependencies
   └── xmls.py             # XML processing script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📜 Medieval Multilingual Corpus App

Overview

🎯 Purpose

Why This Corpus?

📌 Features: Medieval Multilingual Corpus App

🔹 1. App (Submission Form)

🔹 2. Display TXT

🔹 3. Stats (Text Statistics)

🔹 4. Texts (Corpus Management)

📜 Licensing

🤝 Contributing

📞 Contact

🚀 Getting Started

🗂️ Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
backup		backup
data		data
pages		pages
txts		txts
utils		utils
xmls		xmls
.gitignore		.gitignore
README.md		README.md
app.py		app.py
data.csv		data.csv
options.py		options.py
requirments.txt		requirments.txt
xmls.py		xmls.py

carolisteia/mulada

Folders and files

Latest commit

History

Repository files navigation

📜 Medieval Multilingual Corpus App

Overview

🎯 Purpose

Why This Corpus?

📌 Features: Medieval Multilingual Corpus App

🔹 1. App (Submission Form)

🔹 2. Display TXT

🔹 3. Stats (Text Statistics)

🔹 4. Texts (Corpus Management)

📜 Licensing

🤝 Contributing

📞 Contact

🚀 Getting Started

🗂️ Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages