Lecture Note
University
University of California San DiegoCourse
DSC 207R | Python for Data SciencePages
2
Academic year
2023
anon
Views
26
Pandas, Data Ingestion In this session, we'll talk about using pandas to import data into Python. You will have a thorough understanding of the quick and simple ways pandas offers for importing data intomemory by the end of this tutorial. We will examine different data sources that pandas maydirectly import from and highlight important functions like read csv for reading a CSV file intoa DataFrame. The flexibility of pandas to consume data from various sources in a number of data kinds and formats is one of its main benefits. For all of us, this makes the data ingestion processsimpler. Let's have a look at a handful of these data types and functions that make itpossible. CSV One of the most popular data formats is the Comma Separated Values (CSV) format. CSV is a simple file format used to store tabular data, such as a spreadsheet or a database.Files in the CSV format can be ingested into Python as DataFrames using the pandasread_csv function. This function allows you to specify a range of parameters, such asdelimiter, header row, and encoding, to properly import your data. JSON The format for structuring data is called JSON, or JavaScript Object Notation, and it's frequently used for communication within online applications. We may ingest the structureand content of a JSON file as a pandas DataFrame or a Series data structure by using theread json method in Python pandas. The read json method allows you to set parameters likeorient and dtype to ensure appropriate import of your data, much like the read csv functiondoes. HTML HTML, or HyperText Markup Language, is a file format used as the basis of every webpage. The data in an HTML document gets stored as a list of pandas DataFrames usingthe read_html function. This function allows you to specify parameters such as match, flavor,and header to properly import your data. SQL
Structured Query Language (SQL) is used to communicate with a database using queries to insert, delete, and select data of interest. The read_sql_query function in pandas providesus a way to subset and load data from a relational database into Python. Similarly, we canload a whole relational table using the pandas read_sql_table function, which will simplyshow in tabular format as a pandas DataFrame data structure. In conclusion, importing data into Python was not always simple. To allow for a wide range of different data formats, pandas has simplified the process and provided data scientists withtools to edit the ingested data and important data structures. There are many more examplesif you click the link in the summary slide, but we only covered a handful of the source kindsthat we can ingest into Python. We have now covered the most popular ways to use pandas to import data into Python. Understanding these operations will make it simple for you to import data from numeroussources and data types and to use pandas to alter, examine, and display your data. Thesefeatures will enable you to deal with your data more effectively whether you are a novice oran expert data scientist.
Pandas, Data Cleaning
Please or to post comments