Machine Learning and Exploratory Data Analysis

Machine learning is informally defined as the ability of tasking computers to perform certain jobs without being explicitly programmed to do so.

For example, a self-driving car, while being a case study in the application of machine learning and AI is at no point during its production explicitly programmed to drive. To the unsuspecting autonomous vehicle, the road is just a snapshot of greyscale images and sensor data. The vehicle predicts the best response to the input data based on tons of similar data that was used to train the vehicle. Since this consisted mostly of pictures of the road and steering angles (massive oversimplification), the vehicle is doing little more than predicting the same. The predictions are sent to the controls of the car which then runs as instructed.

From the above example, we may establish that Data plays a pivotal role in the implementation of Machine Learning. It is, in fact, the bread and butter of ML & AI applications. More data and better quality data help us make more accurate models and the information der 68BF ived from their performance scientist to develop new algorithms that help us model our systems still closer to their real world counterparts.

However, this ideal set of data, as all ideal systems, does not exist in the real world. Actual data is prone to errors, missing data points, mislabelling and incorrect formatting usually served with a plethora of human and systematic errors made during collection. Sometimes there's just not enough data.

Even there is a sufficient volume of data, it may be prone to presence of useless and leaky features that may cause our models to show erratic performance. Thus it is very important we determine what features to use and how to format the said features so that the computational cost of pre-processing and training models is reduced without compromising with model performance. This set of steps are referred to as Exploratory Data Analysis and Pre-processing

Data, what ails thee?

There could be a lot that could go wrong in a dataset. From useless features to target leakage. Sometimes its not just the errors in data but the format of data itself, take non-numeric values for example. These values often referred to as categorical values are human-readable and can't be processed by a computer without pre-processing.

In this repository, you will find the basics and examples of Data exploration and pre-processing of data. These examples cover the most of the basic cases and techniques.

Exploring the data

In this repository, we use a python library seaborn to visualize data. Seaborn is a handy python library, uses to plot beautiful and informative relations between data.

Check out the Data visualization notebooks here for examples and quickies.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
input		input
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
URLs.md		URLs.md
URLs_models.md		URLs_models.md
_config.yml		_config.yml
categorical_data.ipynb		categorical_data.ipynb
cross_validation.ipynb		cross_validation.ipynb
data_leakge.ipynb		data_leakge.ipynb
data_vis_1.ipynb		data_vis_1.ipynb
data_vis_2.ipynb		data_vis_2.ipynb
feature_engineering_1_BaselineModel.ipynb		feature_engineering_1_BaselineModel.ipynb
feature_engineering_2_categoricalEncodings.ipynb		feature_engineering_2_categoricalEncodings.ipynb
iowa_housing.ipynb		iowa_housing.ipynb
melbourne_submissions.csv		melbourne_submissions.csv
missing_data_melb_housing_data.ipynb		missing_data_melb_housing_data.ipynb
pipelines.ipynb		pipelines.ipynb
resnet50_dog_breed_identification_demo.ipynb		resnet50_dog_breed_identification_demo.ipynb
xgboost.ipynb		xgboost.ipynb
xgboost_home_data.ipynb		xgboost_home_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning and Exploratory Data Analysis

Data, what ails thee?

Exploring the data

Pre-processing the data

About

Uh oh!

Releases

Packages

Languages

License

rakshitraj/kaggle

Folders and files

Latest commit

History

Repository files navigation

Machine Learning and Exploratory Data Analysis

Data, what ails thee?

Exploring the data

Pre-processing the data

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages