Lecture Note
University
University of California San DiegoCourse
DSC 10 | Principles of Data SciencePages
2
Academic year
2023
anon
Views
29
Knowing the Fundamental Steps of the DataScience Process With a rising need for experts who can analyze and comprehend complicated data, datascience has grown in importance in today's society. The data science process is anorganized method that aids data scientists in gathering, processing, and analyzing dataefficiently. The five fundamental elements of the data science process and what each stepentails will be examined in greater detail in this article. Getting Data Getting data is the initial stage in the data science process. Finding, gaining access to, andobtaining data from numerous sources are required. Authenticated access must beestablished to all pertinent data, and data from various sources must be transported. Theobjective of a geographic query, also known as a subset and match, is to subset and matchthe data to regions or times of interest. Gathering data Preparing data is the next step in the data science process. Understanding the data andpre-processing the data for analysis are the two parts of this stage. To grasp the data's nature, meaning, quality, and format, data scientists must first take aquick glance at it. This step, which is frequently referred to as "preparation," entailsperforming a preliminary analysis on data or data samples. Data scientists might proceed to the pre-processing stage if they have a deeper knowledgeof the data. In order to do this, the data must be cleaned, subset, or filtered, and a datamodel that programs can read and comprehend must be created. This process also includesintegrating data from many data sources or streams if there are multiple datasets involved. Analyzing Data The analysis of the data represents the third step in the data science process. This entailschoosing analytical methods, creating a model from the data, and evaluating the outcomes.This phase may require a data scientist to return to steps one and two in order to gatheradditional data or to package the data in a new manner. It may also take this step a fewiterations to complete on its own. Reporting Results The dissemination of the findings is the fourth step in the data science process. This entailsassessing the analytical data, visualizing them, and producing reports that evaluate the
outcomes in relation to success criteria. It's common to use terms like "interpret,""summarize," "visualize," and "post-process" to describe the tasks in this step. Implementing Insights Bringing the insights back to the analysis's original goal is the last phase in the data scienceprocess. The "act" step is defined as reporting the insights and choosing actions based onthe insights and the analysis's original goal. It's critical to remember that the data science process is iterative, and conclusions from onephase may need repeating earlier processes with updated data. Conclusion The data science process is an organized method that aids data scientists in gathering,processing, and analyzing data efficiently. A key component of becoming a competent datascientist is comprehending the five fundamental processes in the process. These stagesshould be kept in mind as you work on your data science projects, whether you are abeginner or an experienced professional.
DSC10: A Study Guide to the Five Phases of Data Science
Please or to post comments