Of course! Here's the updated README.md
:
Python tool to extract abbreviations, acronyms, and function calls from text files.
AcronymHunter
scours text files to identify abbreviations, acronyms, and function calls. Using regular expressions, this script facilitates a comprehensive and precise search to extract relevant patterns from the provided text.
- Recognizes and extracts abbreviations and acronyms, both standalone and those provided with their full forms.
- Pinpoints function calls within the text.
- Outputs the results to a designated text file.
-
For acronyms and abbreviations:
- Regex:
((?:\b[A-Z][a-z]*\b\s*)+\((\b[A-Z]{2,}\b)\))|\b[A-Z]{2,}\b
- Sample matches:
- Full form with acronym:
United Nations (UN)
- Standalone acronym:
NATO
- Full form with acronym:
- Regex:
-
For function calls:
- Regex:
\b(\w+)\(\)
- Sample match:
calculate()
- Regex:
- Clone the repository:
git clone https://github.com/Spraten/AcronymHunter.git
- Navigate to the repository directory:
cd AcronymHunter
Remove pdf password through qpdf:
qpdf --password=enterpasswordhere -decrypt "InputFilename.pdf" "OutputFilename.pdf"
Convert new pdf to txt: pdftotext unencryptedfile.pdf coursetxt.txt
- Place the text file you want to analyze into the repository directory. The script is set to read from a file named
input.txt
by default. - Run the script:
python3 AcronymHunter.py
- The findings—abbreviations, acronyms, and function calls—will be stored in
out.txt
.
To target a different file or alter the output file name:
- Open
AcronymHunter.py
in a text editor or IDE of your choice. - Change the filename in the
find_abbreviations("input.txt")
line to your intended input filename. - (Optionally) To adjust the output filename, modify the
io.open("out.txt", 'w', encoding='utf-8')
line.
Your contributions are invaluable! If you notice room for improvements, discover bugs, or come up with potential fixes, please open an issue or submit a pull request.
This project is open-sourced and distributed under the MIT License.
This enhanced README provides a clearer idea of the regex patterns used in the script, and it also includes sample matches to help users understand the patterns better.