Similar to the caregiver allowance claim in social welfare law, employment law claims are about rights of individuals who often do not have time to deal with the legal details of their circumstances. Moreover, in employment law, there is a great disparity between the power positions of the employer and the employee. For this reason, it is even more important that employees and legal practitioners in the field of labor law are well aware of the legal provisions.
Unfortunately, laws are often written in a very incomprehensible way, so that it is often difficult for norm addressees to fully understand the meaning of a norm even when reading it carefully. It must be said here that even well-trained lawyers can find it difficult to understand norms correctly.
One major problem, for example, is that legal texts are not written in a standardized way. A deliberately complicated and convoluted form of expression is used, which ultimately obscures the meaning.
This form of lawmaking practice is problematic. For example, the collective agreement Collective Agreement for Guarding Bodies in the Guarding Industry § 8 (2) Z 2 states "Working hours that exceed the limits of the maximum permissible daily working time or weekly working time (60 hours) shall be remunerated as overtime with an overtime surcharge of 50% [translated]". This clause must be a mistake in the legislation, as it would mean that employees can only work overtime if they work an illegal number of hours.
Thus, it becomes apparent that a purely textual representation of legal information is not optimal, not only for the application of law, but also for lawmaking. For this reason, this project is working on the transformation of collective agreements into a knowledge graph. Collective agreements were chosen because they contain similar information (for example, whether special payments are due or when the agreement comes into force).
There are already some systems specifically for working with collective agreements. However, these only support simple text searches. Although these are helpful, they do not save the user from having to look through the entire collective agreement.
Therefore, the goal of this project is to build a simple web application that allows users to browse the knowledge graph in a standardized way and view certain information about collective agreements without requiring the user to go through the entire collective agreement.
All commands listed below are expected to be run from the root directory of this project. Conda is assumed to be used as a package manager.
conda create -n LOntoCA python=3.10
conda activate LOntoCA
pip install -e .
The following command can take a while since it downloads all collective agreements
from this page. In case you have run this command already
and all the collective agreements of interest are in the respective ./data/html/
directory,
you can comment out line 8 in ./data/create_data.py
in order to speed up execution time significantly.
python data/create_data.py
After running this command, the parsed data can be found in ./data/final_csv/
as .csv
files.
This command uses the data from above and transforms it into a knowledge graph based on the Legal Ontology for Austrian Collective Agreements (LOntoCA).
python scripts/populate_kg.py
As a result, a new file kg.ttl
will be created in the root directory. This file is
the exported knowledge graph in turtle format. This file can also be found online.
The application assumes that the sparql endpoint http://localhost:3030/CollectiveAgreement/sparql
exists. This endpoint has to accept a POST-request with the query and then return the result rows
as json.
This can be easily achieved using Apache Jena Fuseki.
Simply deploy this SPARQL server, create a dataset with the name CollectiveAgreement
and add the data from kg.ttl
.
As soon as the SPARQL server is running, the demo web application can be started with
python application/app.py
The application can then be accessed at http://localhost:5000
.
Screenshots of every page are included in ./demo-screenshots/
.
This page lists all collective agreements that are in the knowledge graph (currently 241).
A user can search for collective agreements by entering a search string in the input field on the top.
This page displays automatically extracted information for one contract. On top the title, date of coming into effect and the contract parties are listed. Then, if a contract demands anniversary bonus pay ("Jubiläumsgeld"), the respective clauses are displayed. The same applies to bonus pay ("13. und 14. Monatsgehalt") clauses and clauses that determine the legal regular working hours ("Normalarbeitszeit") and the period over which the working hours are averaged ("Durchrechnungszeitraum").
For clauses concerning working hours, the application extract values which are displayed in a table next to the respective clause. Since the application cannot yet understand the complex rules of multiple clauses, a user has to read these clauses to fully understand which value has to be applied when. Furthermore, on the bottom, there is a full list of clauses of this contract. For this application, a clause is not the same as a paragraph. The application splits the contract in the smallest semantically complete bits possible (this is done when parsing the documents).
The screenshot below shows how it looks like when a contract does not demand anniversary bonus pay. A user can instantly see that and does not have to go through the whole contract.
This page lists all entities that have signed at least one collective agreement.
A user can search for contract parties by entering a search string in the input field on the top.
This page displays which collective agreements were signed by one contract party.
Since this is just a proof-of-concept, the final application is very basic. The knowledge graph could be extended by adding more attributes to the extraction process.
The final goal would be to extend the ontology to also represent legal rules themselves and extract these data from the clauses which would then allow for automatic decision-making.
-
Since the original data is not standardized, there are some problems with the parsing mechanism. The HTML-tags
text_gr_dist
andtext_grtit
are not handled yet, which leads to an incorrect and ugly representation of clauses that include these elements. Also, some contract titles are not extracted correctly. -
Another imperfection is that the manually added sample instances are never deleted from the knowledge graph.
-
Furthermore, some collective agreements seem to not be picked up by the download script (for example this one).
-
Current application is prone to SPARQL-injections.
-
Some contracts have the same title which makes it impossible for a user to distinguish between them.
-
The current design is extremely basic and not practical.
The knowledge graph and extracted information generated by this project are provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of correctness, accuracy, reliability, or fitness for a particular purpose. The author of this project does not guarantee the correctness of the resulting knowledge graph or the extracted information, nor does the author assume any liability for the use or interpretation of the information provided. The user assumes all responsibility and risk associated with the use of this project.