Lecture Note
University
University of California San DiegoCourse
DSC 207R | Python for Data SciencePages
2
Academic year
2023
anon
Views
10
Live Code, Linear Regression Regression enables us to predict continuous variables, as we have now learned. To predict a player's overall performance based on their attributes, wewill now use regression technique. These days, data is everything. It propels businesses, aids in decision-making, and, most signi cantly, forecasts future events. Regression isa statistical method that can be used to forecast the future using data from thepast. In the previous weeks, we learned about regression and how it enables usto forecast continuous variables. This article will demonstrate how to useregression techniques to forecast a player's overall soccer performance basedon their qualities. We'll use the soccer dataset from week one, when we conducted an overall analysis, to accomplish this. We will expand on our prior understanding and gomore into the dataset. We will run regression analysis on the dataset usingscikit-learn, a well-known Python tool for machine learning. Find the data set we'll be working with in our notebook, "European Soccer Regression Analysis with Scikit-Learn," which is located in our week's folder.We must import the libraries required for regression analysis before importingthe dataset. There are some regression techniques that we have incorporated,including the pandas sqlite tool for interacting with relational databases. Wealso have several other modules related to math and error detection. We can import the data set into a data frame after importing the required libraries. The player attributes will be chosen and loaded into a data framecalled df using the connection. The functions used to load the data into apandas data frame are typically identical once we get acquainted to dataingestion. It is generally advisable to consult sample notebooks to getacquainted with the information. Let's examine the rst ve rows of the data frame to better comprehend the data before we start cleaning it. The attributes of a player, such as their overallrating, potential, favorite foot, offensive effort rate, and defensive work rate, areevident. To determine how many characteristics a data frame has, we may alsoexamine its shape. We will now declare a list of the characteristics we plan to use to forecast the player's total rating. It's important to note that because the total rating is ourprediction target, we won't see it in the feature. We will forecast a player'snumeric overall rating value using the input data from these features. Let's now begin cleaning up the data. We will discard the null values because this is a problem with the dataset, as we have known since week one. Then, we'll
make two data frames, X and Y, with X serving as the input and Y serving as thegoal. The features we desire for our input data frame will be chosen, and thedata frame will be loaded into X without the null values. We can conduct regression analysis on the dataset after cleaning the data. To build a model that can forecast a player's total rating based on their attributes,we will employ scikit-linear learn's regression technique. Using the train testsplit method from scikit-learn, we will divide our dataset into training andtesting sets. This is a crucial stage in creating a strong model since it enables usto assess how well our model performs on unobserved data. After dividing the dataset, we can use the t method to t the model to our training set of data. The predict method can then be used to create predictionsbased on the testing data. The mean squared error metric, which calculates theaverage squared difference between the predicted and actual values, can beused to assess the effectiveness of our model. Regression analysis is a crucial method for making projections about the future based on historical data.
Linear Regression
Please or to post comments