10 simple Python tips to speed up your data analysis

10 simple Python tips to speed up your data analysis

Profiling the ‘pandas’ dataframe

Profilingis a process that helps us understand our data, and Pandas Profiling is a python package that does exactly that. It’s a simple and fast way to perform exploratory data analysis of a Pandas Dataframe. The pandas df.describe()and df.info()functions are normally used as a first step in the EDA process. However, it only gives a very basic overview of the data and doesn’t help much in the case of large data sets. The Pandas Profiling function, on the other hand, extends the pandas DataFrame with df.profile_report() for quick data analysis. It displays a lot of information with a single line of code and that too in an interactive HTML report.

Image for post
Statistics computer by Pandas Profiling package.

Installation

Usage

Let’s use the age-old titanic dataset to demonstrate the capabilities of the versatile python profiler.

Image for post

Image for post

Bringing interactivity to pandas plots

Pandas has a built-in .plot() function as part of the DataFrame class. However, the visualizations rendered with this function aren’t interactive and that makes it less appealing. On the contrary, the ease to plot charts with pandas.DataFrame.plot() function also cannot be ruled out. What if we could plot interactive plotly like charts with pandas without having to make major modifications to the code? Well, you can actually do that with the help of Cufflinks library.

Installation

Usage

Image for post
Image for post
df.iplot() vs df.plot()

A dash of magic

Magic commands are a set of convenient functions in Jupyter Notebooks that are designed to solve some of the common problems in standard data analysis. You can see all available magics with the help of %lsmagic.

Image for post
List of all available magic functions

Image for post
  • %matplotlib notebook
Image for post
%matplotlib inline vs %matplotlib notebook

Image for post
Image for post

The interactive debugger is also a magic function but I have given it a category of its own. If you get an exception while running the code cell, type %debug in a new line and run it. This opens an interactive debugging environment that brings you to the position where the exception has occurred. You can also check for the values of variables assigned in the program and also perform operations here. To exit the debugger hit q.

Image for post

Printing can be pretty too

If you want to produce aesthetically pleasing representations of your data structures, pprint is the go-to module. It is especially useful when printing dictionaries or JSON data. Let’s have a look at an example which uses both print and pprint to display the output.

Image for post
Image for post

Making the notes stand out

We can use alert/Note boxes in your Jupyter Notebooks to highlight something important or anything that needs to stand out. The color of the note depends upon the type of alert that is specified. Just add any or all of the following codes in a cell that needs to be highlighted.

Image for post
Image for post
Image for post
Image for post

Consider a cell of Jupyter Notebook containing the following lines of code:

A typical way of running a python script from the command line is: python hello.py. However, if you add an additional -i while running the same script e.g python -i hello.py it offers more advantages. Let’s see how.

Image for post

Ctrl/Cmd + / comments out selected lines in the cell by automatically. Hitting the combination again will uncomment the same line of code.

Image for post

Have you ever accidentally deleted a cell in a Jupyter Notebook? If yes then here is a shortcut that can undo that delete action.

  • If you need to recover an entire deleted cell hit ESC+Z or EDIT > Undo Delete Cells
Image for post

In this article, I’ve listed the main tips I have gathered while working with Python and Jupyter Notebooks. I’m sure these simple hacks will be of use to you at some point in your career. Till then, happy coding!


This article was written by Parul Pandey on Towards Data Science. You can read the original piece here

Read next: AI helps produce world's largest 3D map of the universe