We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23
CS3352
FOUNDATIONS OF DATA SCIENCE
Unit 1 ⚫ Uses of Data Science Applications of Data Science ⚫ In the healthcare industry, physicians use Data Science to analyze data from wearable trackers to ensure their patients’ well-being and make vital decisions. Data Science also enables hospital managers to reduce waiting time and enhance care. ⚫ Retailers use Data Science to enhance customer experience and retention. ⚫ Data Science is widely used in the banking and finance sectors for fraud detection and personalized financial advice. ⚫ Transportation providers use Data Science to enhance the transportation journeys of their customers. For instance, Transport for London maps customer journeys offering personalized transportation details, and manages unexpected circumstances using statistical data. ⚫ Construction companies use Data Science for better decision making by tracking activities, including average time for completing tasks, materials-based expenses, and more. ⚫ Data Science enables trapping and analyzing massive data from manufacturing processes, which has gone untapped so far. Continued ⚫ With Data Science, one can analyze massive graphical data, temporal data, and geospatial data to draw insights. It also helps in seismic interpretation and reservoir characterization. ⚫ Data Science facilitates firms to leverage social media content to obtain real-time media content usage patterns. This enables the firms to create target audience-specific content, measure content performance, and recommend on-demand content. ⚫ Data Science helps study utility consumption in the energy and utility domain. This study allows for better control of utility use and enhanced consumer feedback. ⚫ Financial institutions use data science to predict stock markets, determine the risk of lending money, and learn how to attract new clients for their services. ⚫ Many governmental organizations not only rely on internal data scientists to discover valuable information, but also share their data with the public. ⚫ Nongovernmental organizations (NGOs) are also no strangers to using data. They use it to raise money and defend their causes. The World Wildlife Fund (WWF), for instance, employs data scientists to increase the effectiveness of their fundraising efforts. ⚫ Universities use data science in their research but also to enhance the study experience of their students. The rise of massive open online courses (MOOC) produces a lot of data, which allows universities to study how this type of learning can complement traditional classes. Facets of Data – Way of Representation ⚫ ■ Structured ⚫ ■ Unstructured ⚫ ■ Natural language ⚫ ■ Machine-generated ⚫ ■ Graph-based ⚫ ■ Audio, video, and images ⚫ ■ Streaming Structured Unstructured NLP Machine Generated Graph Based Data Audio Image Video – Gaana/ Youtube/Instagram
⚫ Streaming Data – Live matches
⚫ Structured Data ⚫ Unstructured Data ⚫ Structured data is data that ⚫ Unstructured data is depends on a data model data that isn’t easy to fit and resides in a fixed field into a data model within a record. As such, because the content is it’s often easy to store context-specific or structured data in tables varying. within databases or Excel ⚫ The files doesn’t have files. specific columns/format ⚫ SQL, or Structured Query to identify specific Language, is the preferred things. way to manage and query ⚫ The thousands of data that resides in different languages make databases. this more difficult. ⚫ Eg. Email, Text files. ⚫ NLP ⚫ Machine Generated ⚫ Natural language is a special type of ⚫ automatically created by a unstructured data; it’s challenging to computer. process because it requires ⚫ The analysis of machine data knowledge of specific data science relies on highly scalable tools, techniques and linguistics. due to its high volume and ⚫ The natural language processing speed. Examples of machine community has had success in entity data are web server logs, call recognition, topic recognition, detail records, network event summarization, text completion, and logs. sentiment analysis. ⚫ It’s ambiguous by nature. The concept of meaning itself is questionable here. Have two people listen to the same conversation. Will they get the same meaning? The meaning of the same words can vary when coming from someone upset or joyous. ⚫ Audio Image and Video ⚫ Graph ⚫ Tasks that are trivial for ⚫ Graph or network data is, in humans, such as recognizing short, data that focuses on the objects in pictures, turn out to relationship or adjacency of be challenging for computers. objects. ⚫ High-speed cameras at ⚫ The graph structures use nodes, stadiums will capture ball and edges, and properties to athlete movements to represent and store graphical calculate in real time, for data. example, the path taken by a ⚫ Graph-based data is a natural way defender relative to two to represent social networks, and baselines. ( FIFA VAR). its structure allows you to ⚫ Eg. Autonomous Cars calculate specific metrics such as the influence of a person and the shortest path between two people. ⚫ Streaming Data ⚫ Eg. a graph with the same people ⚫ The data flows into the system which connects business when an event happens colleagues via LinkedIn instead of being loaded into a ⚫ Graph databases are used to store data store in a batch. graph-based data and are queried with specialized query languages such as SPARQL Data Science Process