Data Exploration
5
Lessons
2
Videos
All
Skill Level
4 Hours
Duration
English
Language
Set Up & Introduction
If you don’t have your environment set up yet. Please reference this guide. Throughout this lesson we’ll make use of multiple Python libraries. If you see a library you haven’t previously used, simply install the library using pip install
. For reference, here’s the pip install
documentation. Download the CSV file we’ll be working with here.
Before we get started, let’s recap what we’ve gone through so far. We’ve learned how to query a database, import data, manipulate data tables, and other techniques involved in cleaning and aggregating data. Now that we have our DataFrames setup, we can begin exploring our variables. This initial “exploration” will help us choose and create a model later on as well as give us insight into our data.
Completing this course will help you:
- Gain insights from data collected
- Improve research skills
- Modeling
Who is the course for?
Learning Path
The first step to looking into our now “cleaned” data is to perform a univariate analysis. A univariate analysis is when we look at the characteristics and plots of each variable individually in order to gain insight into our variables as a whole.
“Querying” a database is another way of saying you want to “retrieve information from” a database. SQL queries make use of a powerful command called SELECT
, it allows us to retrieve specific information from our database.
Video 48 Min + 2 Min read to complete
The SELECT
command is where we put the attributes we’re hoping to retrieve, the FROM
command is used to signify which table we plan on selecting our attributes from, the * means “all”, and a WHERE
command is used to provide a condition to determine which specific entries from those columns we hope to retrieve. Here’s an example of one of these queries.
Video 48 Min + 2 Min read to complete
A PCA can help us visualize our dataset when we have many features and can also help us with dimensionality reduction.