Lecture Note
University
University of California San DiegoCourse
DSC 207R | Python for Data SciencePages
2
Academic year
2023
anon
Views
28
Relational Data Model Interacting with Relational Databases: A Comprehensive Overview You are probably already aware with Jupyter Notebook and its features as a data sciencepractitioner. In this session, we'll look at how to connect with a relational database usingJupyter Notebook, a key tool for the data science community. In order to interact withrelational data, we will first go through what we mean by a relational data model. A structure known as a relational data model is used to store data in tables, also known asrelations. A table is the main type of data structure in a relational paradigm. Thecharacteristics and tuples are represented in the table's columns and rows, accordingly.Each row in a relational model is a tuple, which stands for a single record, and each columnis an attribute, which stands for a property of that record. Let's look at a relational data model in action. Think of a play application that keepspersonnel data in a table called employees. In the table of employees, each row representsa single employee, and each column each of that employee's attributes. The ID column inthis table is assigned as the main key. This means that it is distinct for each employee, andby knowing the main key for each person, we can separately identify the employee's first andlast names, department, title, and compensation. As previously mentioned, the presence of a primary key logically precludes the existence ofduplicate records in a table because doing so would violate the primary key's uniquenessconstraint. In reality, many systems permit duplicate tuples in their relation, but if the user sochooses, techniques are provided to prevent duplicate entries. Let's now establish a brand-new table named EmpSalaries that keeps track of employees'past salaries. The column employee ID, or EmpID, in this table serves as a means ofidentifying the personnel. These are the same IDs that were listed in the ID column of theemployees' database that was previously displayed, not fresh values. An example of aforeign key is this. Referencing means that the values in this column can only exist if thesame values also present in the table called employees. The EmpID column of theEmpSalaries table is referred to as a foreign key, which relates to the primary key of theworkers database in relational model jargon. Because the EmpSalaries database contains several tuples with the same EmpIDs,representing the employee's wage at various points in time, it is important to note that theEmpID column is not a primary key in the table. The join operation is one of the most widely utilized operations in relational databases. In ajoin operation, rows from two or more tables are combined based on a shared column. Thisis an illustration of a relational join on the first three columns of the workers table and theEmpSalaries table. The EmpID and EmpSalaries columns are compared for equality. Let's examine how to import relational tables into Pandas DataFrames now that we arefamiliar with relational data models and how to work with them. As was already established,
relational tables, also known as relations, can be loaded using DataFrames. Python'ssophisticated Pandas package is used to manipulate and analyze data. It is simple to load a table from a database using Pandas. Installing a Python databaseconnector that supports the database you're using is the first step. A connection object isthen created to represent a connection to the database. Once the connection object hasbeen established, you may load the database table into a DataFrame using the Pandas readsql() function. The ability to work with relational databases is crucial for data scientists, to sum up. Itenables us to effortlessly execute complicated data transformations as well as efficientlyorganize and store data. We went over the definition of relational data model in this lecture.
Relational Data Model
Please or to post comments