8000 GitHub - clarissescofield/lectio: An application for data analysis and natural language processing of literature in Portuguese
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

An application for data analysis and natural language processing of literature in Portuguese

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

clarissescofield/lectio

Repository files navigation

lectio

Table of contents

Introduction

The Lectio app, available online for access, was developed with the aim of facilitating the sharing of research in this project and making available the download of the literature database for a academic community.

The development of the platform took place in sprints, following a Scrum methodology aimed at effective development and visible results. Divided into two main sections - Dataset and PLN - the site was made with Javascript language using the React framework. The base of the design was made based on a free template, in order to focus on the short development time in delivering the Research and analysis of data results.

Available online

Technologies

  • Python (scikit-learn, machine learning, data analysis)
  • React / Javascript
  • Heroku App
  • Json, CSV (database)

Features

Dataset

The dataset page is subdivided into Docs, Download, Table, and Graph.

Docs is a detailed description, presenting the dataset, the construction process, details of the data provided, and valuable information for users interested in the user base provided.

Download is the option to download the database created in CSV or Dump SQL format.

Table is the visualization of two main tables of the database: goodreads_works.csv and goodreads_reads_infos.csv. The preview guides those who want to understand the data format before downloading.

Graph indicates the studies of new data elaborated along with insights studies of data-based studies and guides research.

NPL

The NLP page is geared towards natural language processing and is subdivided into Docs, Sentiment Analysis, Book Hits, and Coming Soon.

In Docs, there is a detailed description, with the presentation of the research carried out in PLN and building the sentiment analysis on the reviews.

Sentiment Analysis is the results found in the reviews processing with the VADER tool and the TextBlob tool for comparative purposes of the results and guidelines for future studies in the area.

Success in Books is a detailed description of the different visions of success in the literary field, collected from various written articles focused on natural language processing for success prediction. Concepts can guide future studies in the area of ​​projection.

Coming Soon, you will find the following research work points and the new functionalities for the platform in continuing this work with POCII.

Build Process

The development took place in two main areas: the first focused on producing the database with the collection of information from the literature in Portuguese. The second was with natural language processing to generate insights and the application of data science for comparative data analysis. Finally, the Lectio platform was developed and put into production with the help of the free tool Heroku.

Acknowledgments

Thanks to the Federal University of Minas Gerais and the supervisor Mirella Moro who made possible the development of this work.

About

An application for data analysis and natural language processing of literature in Portuguese

Topics

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0