Lecture Note
University
University of California San DiegoCourse
DSC 10 | Principles of Data SciencePages
1
Academic year
2023
anon
Views
11
p {margin: 0; padding: 0;} .ft00{font-size:20px;font-family:Arial;color:#000000;} .ft01{font-size:16px;font-family:ArialMT;color:#000000;} .ft02{font-size:16px;line-height:21px;font-family:ArialMT;color:#000000;} Acquiring Data In order to access and get the data you require, you will learn to list numerous approachesand technologies. A situation that accesses data from many sources while utilizing varioustechnologies will be described as an illustration. Identifying the data that is alreadyaccessible and the appropriate sources that are pertinent to your issue is the first stage indata acquisition. It's crucial to take into account all pertinent data in order to ensureaccuracy. A variety of sources, including local and distant databases, files, and websites,can provide data. They can arrive at various speeds and be structured or unstructured. Structured Query Language Structured Query Language (SQL), which is supported by all relational databasemanagement systems, is the preferred tool for gaining access to structured data fromtraditional relational databases. The database system also offers a graphical applicationenvironment via which you can access the data. Scripting languages like Python, JavaScript,Perl, R, Octave, and MATLAB are frequently used to extract data from files. You will gainknowledge of Python's text processing packages and functions in this course. Utilizing webservices and file formats like XML or JSON, data from websites may beretrieved. REST, which offers programmatic access to data with a focus on performance,scalability, and maintainability, is the most widely used type of webservice. WebSocketservices, which offer real-time updates from websites, are also growing in popularity.Numerous data types are being managed by NoSQL storage systems like Cassandra,MongoDB, and HBASE. To access the data, some platforms offer web interfaces or APIs. For instance, we employed wildfire data analysis in a study at the San Diego SupercomputerCenter to forecast the direction and pace of spread of the fire. We employed a variety ofways to collect data, including a WebSocket service to access real-time data and SQL to getpast sensor data from a relational database. Additionally, we retrieved tweets on fires andassessed their sentiment using Twitter's REST service. Our ability to assess the urgency ofthe fire situation was made possible by the combination of sensor data and tweetsentiments. Conclusion In summary, data might originate from a variety of sources, thus it's important to locate andassess any relevant data before receiving it. There are various ways to access datadepending on its source and structure, and in the next weeks with Python, you'll learn aboutthese access techniques in depth.
DSC 10: Acquiring Data
Please or to post comments