Computer systems experience a ridiculous amount of attacks daily and yet we don't have a universal database of the attacks which is free and accessible for anyone to use. This project is a dataset composed of actual network attacks. It is a dataset that grows and is updated automatically by the hour and can be freely used or contributed to. This project's aim is to assist system administrators and developers who are interested in finding out more details about system attacks.
Project MOTO This project was created by cool people for cool people because it's not cool when people try to break into your system for no good reason. Not one iota! The heart behind this project I hope that this project inspires others to want to participate ,based on the idea that this collected information will help spawn a whole host of applications in the realm of cyber security making the Internet a better safer place with consequences enforced by anyone willing to contribute to weeding these attackers out. -Disclaimer More then knowing python, so you can read the code and trust it, you need to have super user access to a system preferably your system. This project aims at the exact opposite of being malicious so please do not attempt to contribute to this project using someone else's system without their consent and full understanding of it. Keep in mind that none of the contributed material will, and should not, contain any personally specific information pertaining to your server. ie. Your usernames your ip address |
There is a project on github which gives you your own ssh server without giving you your own ssh server?? Huh. It's a honeypot! You can use this honey pot to contribute to this project by simply contributing the honeypot log files and then ill do all the rest of the processing to get more data on the attacks and then merge it into the master repo. You could also do the processing with the provided script in this repo. Up tp you. Anyhow here is the link to the project so you can read all about it. Really Cool!
Debian Based
Arch Based
On RHEL/CentOS/Rocky Linux and friends, there's a small difference. You'll need to install the Extra Packages for Enterprise Linux (EPEL) repository first, then use DNF to install geoiplookup:
Now that you have cloned the repo. There are a few ways that you can help the project. I am not a very rigid person so ill just explain the general idea on how to help so you can use your own creative ways to assist. I will say that the format of your data is going to matter and will assist in keeping this project automated so no one has to do work more than once or a couple times really. If your using the lastb command like this:
The terminal output should be in this format:
Send the output of the command to a file
Please save this file inside of the /repo_dir/contributors/yourfilename This format is perfect for the setup I've already constructed. If you output this to a file, as is, we can add it to the master repo. If you can manage to contribute this and dont want to do any of the geographic processing no problem I can take care of that part. The included scripts in the repo take data in this format and then put it into a pandas dataframe and add two more columns. The data frame after processing ends up looking like the following: |
User_Name | IP_Addresses | Day | Month | D_of_M | Time_UTC | Year | LATLNG | Country |
---|---|---|---|---|---|---|---|---|
wangyon | 165.22.62.225 | Fri | Mar | 10 | 15:19:40 | 2023 | 1.292900, 103.854698 | SG, Singapore |
Note: Please create or modify the filter in the script as to exclude any of your servers personal information. In fact make sure that no details of your personal server are accidentally included. See the do_many.py script filter section for more info After creating your initial lastb log file keep in mind:
The idea is now you want to continue to add new entry's to your nottys file We want to use as little of our machines processing power so we don't want to keep running the lastb command and getting its entire output we only want to do this once. With this in mind lets walk through how I personally accomplish this:
Quick Reference | The crontab format: Min | Hour | Day | Month | Year | COMMAND 0 0 30 3 2023 command_to_run_once_march_30_2023_Midnight This is the line in my crontab:
The above cronjob means that in the first minute of every hour the output of the newest 300 entries in the lastb server log are going to be added to the tail of the nottys file. At this point you have done enough to contribute and you can stop here. Your next steps are located in the development portion of this readme. You can also choose to run the scripts I created which will greatly help in cleaning and organizing the data. I wont hold you to it and understand either way. After modifying the python scripts to work specifically for your server. This means that you updated the folder locations and changed the filters to exclude any unwanted personal server data. You can either run the do_once.py file in a terminal
Or
Make a non sudo cronjob and then delete
You can check the contents of the progress file which gets created automatically to ensure all finished up fine with no issues.
Running the above command will serve to organize the bulk of the nottys file. You only need to do this once because the next cronjob you set up is going to do this once an hour but for much less entries and it is simply going to add them to the larger original file. Keep in mind you already have one sudo cronjob running so you want to give that job a few mins to complete away from your next non sudo cronjob
|
- Cyber Security
- Personal Projects
- Blockchain Technology
- Data Science
Want to contribute? Great!
To fix a bug or enhance an existing module, follow these steps:
- Fork the repo
- Create a new branch (
git checkout -b added-features
) - Make the appropriate changes in the files
- Add changes to reflect the changes made
- Commit your changes (
git commit -am 'Added features'
) - Push to the branch (
git push origin added-features
) - Create a Pull Request
If you find a bug in this code please let me know.
- Python - For its flexibility and abundant amount of resources
- Numpy - For computation and updating specific feature data
- Pandas - For organization, processing, data visualization and csv and json file output
- GeoIP - For obtaining the geographic data on each individual server attacker
- Create a CLI UI so collaborators can more easily run an analysis on the data
- Create an insights section of this readme which discusses insights regarding an analysis of the dataset.
- Create a section discussing sshkeys and how to use them for automating the collaborative process.
- Create a section explaining how to automate git commits and pushes.