Lecture Note
University
University of California San DiegoCourse
DSC 207R | Python for Data SciencePages
2
Academic year
2023
anon
Views
28
World Development Indicators Using World Development Indicators Dataset for Data Science: A Comprehensive Guide The first stage in data science is to do a preliminary investigation of the dataset. The Global Development Indicators Dataset, an open dataset on Kaggle that is a slightly modifiedversion of the dataset accessible from the World Bank, will be examined in this guide. Let'skeep in mind that without a context, visualizations have little to no value before we getstarted. Importing the Dataset It will be necessary to import pandas, numpy, random, and matplotlib.pyplot before we can begin working with the dataset. The shape of the data will then be displayed after readingthe CSV file into a pandas data frame. It can take some time to read the dataset because ithas 5.6 million rows and six columns. The head technique can be used to determine whatthe columns contain once the data has been placed in a data frame. Exploring the Dataset We've got the name of the country, the code for the country, the name of an indicator, an indicator code, a year, and a value. This is actually a four-dimensional dataset where thedimensions are country, indicator, year, and value. Looking at these indicators, we canalready see some really interesting things. For instance, as someone who's environmentallyconscientious, the CO2 emissions per capita metric is pretty interesting. We'll use that metrica bit later. How Many Countries and Indicators Are There? The number of unique entries in a column of the data frame can be determined by applying the unique method to the column. In this instance, the dataset contains data from roughly247 nations. We may use the unique method once again to get the number of uniquecountry codes in order to perform a fast sanity check on the data. We should have 247country codes if there are 247 countries, and we do. The dataset also has a list of 1,344 indicators, which is quite extensive. If you want to explore a full list of the indicators and more details about them, there's a link at the top of thenotebook. How Many Years of Data Do We Have? We need to know how many years of data we have, and in this case, it's 56 years. We can see that the time frame is from 1960 through 2015. Now that we have a good feel for thedataset, we can start exploring it using visualizations in matplotlib.
Visualizing the Dataset In data science, visualizing the data is a crucial step. It aids in the discovery of patterns and trends that may not be readily apparent when examining the raw data. In this situation, wemay use matplotlib to build a number of visualizations that aid in data exploration. Examining the earlier-mentioned CO2 emissions per capita measure may be one of the first things we wish to accomplish. The correlation between CO2 emissions per capita andGDP per capita can be depicted using a scatter plot. This can assist us in determiningwhether the two indicators are correlated. We can also create a bar chart that shows the top 10 countries with the highest CO2 emissions per capita. This can help us to see which countries are contributing the most toglobal warming. Another interesting visualization is a line plot that shows the change in CO2 emissions per capita over time. This can help us to see if there's been any improvement or worsening ofthe situation over the years. Conclusion This guide has shown you how to use the Global Development Indicators Dataset for data science. The dataset was first imported, after which we looked through it to determine howmany nations, indicators, and years of data we had.
World Development Indicators for Data Science
Please or to post comments