Unleashing Data Visualization in Notebooks

University
University of California San Diego
Course
DSC 207R | Python for Data Science
Pages

2

Academic year

2023
Author

anon
Views

18

Live Code, Data Visualization Unlocking the Power of Data Visualization and Operations inNotebooks Analyzing and using data requires the use of data visualization. We can now see patterns, trends, and linkages in data that were previously invisible. Data visualization is even moreimportant in the setting of notebooks since it allows us to visualize our data and analysis in amore engaging and dynamic way. In this article, we will explore visualization and frequent data operations in notebooks. We will provide you with some examples to get started and add to these as we come to newerproblems and use cases in the upcoming weeks as well. Matplotlib: The Plotting Library for Python Before we dive into the examples, we need to explain a quick something on Matplotlib, the plotting library for Python. Pandas leverages Matplotlib underneath for its plots, so it'sessential to have a good understanding of it. To use Matplotlib for plotting graphs inside the notebooks, we need to tell Jupyter to plot inline. This is done by adding a percentage sign before Matplotlib, which is a symbol for aspecial class of functions in Jupyter called magic functions. Here's an example: %matplotlib inline Jupyter is instructed to plot Matplotlib graphs inline within the notebook by this line of code. Let's now go over how to insert Matplotlib into a function. Histograms and Boxplots with Pandas To get started with visualization, we'll use a simple example with Pandas. The first thing we'll do is to get our data frame object in Pandas, which in this case is called ratings. We'lluse the histogram function to plot the rating column of the data frame. You can adjust thefigure size using the figsize option to control how big your figure looks. There are many otheroptions for controlling the bin size and other things, which you can find in the documentationfor data visualization in Python using Pandas. ratings['rating'].hist(figsize=(15, 10))

This line of code gives the histogram function the name of the ratings column from the ratings data frame as an input. To change the size of the graph, we again utilized the figsizeoption. A histogram is produced when this code is run, as you can see. Next, we'll use a similar code with the boxplot function to generate boxplots. Pandas makes generating boxplots really easy. We can get the ratings column from the data frame and usethe dot boxplot function. ratings.boxplot(column='rating', figsize=(15, 10)) As with the histogram example, we gave the column 'rating' as our data, the column that we are plotting. The figsize option is used to adjust the size of the boxplot. When you run thiscode, you'll see a boxplot of the data. Data Slicing and Column Slicing Techniques Let's move on to data slicing and column slicing techniques in Pandas now that we've covered some fundamental visualization approaches. Data slicing is the process of extracting a portion of a data set based on some criteria. It's a powerful technique that can help us understand our data better and make better decisions. Filtering is one of the most widely used data slicing methods. Filtering entails choosing a portion of the data that adheres to particular standards. Let's imagine, for illustrationpurposes, that we wish to limit the movies displayed in the ratings data frame to those with a5.0 rating. The code below will enable us to do this: ratings[ratings['rating'] == 5.0] This code filters the ratings data frame to only show the rows where the rating is equal to 5.0. Column slicing is the process of selecting specific columns from a data set. Conclusion Data analysis must include frequent data operations and data visualization, and notebooks make it simple to carry out these activities. We can rapidly create high-qualitygraphics and carry out common data operations by utilizing the Pandas and Matplotliblibraries. We trust that this article has given you the groundwork necessary to begininvestigating data analysis in Jupyter notebooks. In the following weeks, stay tuned for moreexamples and application cases..