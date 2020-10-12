Profiling the ‘pandas’ dataframe

Profilingis a process that helps us understand our data, and Pandas Profiling is a python package that does exactly that. It’s a simple and fast way to perform exploratory data analysis of a Pandas Dataframe. The pandas df.describe() and df.info()functions are normally used as a first step in the EDA process. However, it only gives a very basic overview of the data and doesn’t help much in the case of large data sets. The Pandas Profiling function, on the other hand, extends the pandas DataFrame with df.profile_report() for quick data analysis. It displays a lot of information with a single line of code and that too in an interactive HTML report.

For a given dataset the pandas profiling package computes the following statistics:

Statistics computer by Pandas Profiling package.

Installation

Usage

Let’s use the age-old titanic dataset to demonstrate the capabilities of the versatile python profiler.

This single line of code is all you need to display the data profiling report in a Jupyter notebook. The report is pretty detailed including charts wherever necessary.

The report can also be exported into an interactive HTML file with the following code.