Chapter - 2
Chapter - 2
DATA SCIENCE
OVERVIEW OF DATA SCIENCE
• Activity 2.1 - Define:
• Data science?
• Data and Information
• Big data?
• What is role of data in emerging technologies?
• Data Science is a multi-disciplinary field that uses scientific methods, processes,
algorithms, and systems to extract knowledge and insights from structured, semi-
structured and unstructured data.
• Much more than just analyzing data.
• Offers a range of roles and requires a range of skills (mathematical, programing, analytical, …)
OVERVIEW OF DATA SCIENCE …
• Example:
• Consider data involved in buying a box of cereal from the store or supermarket:
• Your data here is the planned purchase written somewhere
• When you get to the store, you use that piece of data to remind yourself about what
you need to buy and pick it up and put it in your cart.
• At checkout, the cashier scans the barcode on your box and the cash register logs the
price.
• Back in the warehouse, a computer informs the stock manager that it is time to order
this item from distributor because your purchase takes the last box in the store.
• You may have a coupon for your purchase and the cashier scans that too, giving you a
predetermined discount.
OVERVIEW OF DATA SCIENCE …
• Example:
• At the end of the week, a report of all the scanned manufacturer coupons gets uploaded
to the cereal company so they can issue a reimbursement to the grocery store for all of
the coupon discounts they have handed out to customers.
• Finally, at the end of the month, a store manager looks at a colorful collection of pie
charts showing all the different kinds of cereal that were sold and, on the basis of strong
sales of cereals, decides to offer more varieties of these on the store’s limited shelf
space next month.
• So, the small piece of information on your notebook ended up in many different places
• Notably on the desk of a manager as an aid to decision making.
• The data went through many transformations.
OVERVIEW OF DATA SCIENCE …
• Example …
• In addition to the computers where the data might have stopped by or stayed on for
the long term, lots of other pieces of hardware—such as the barcode scanner—were
involved in collecting, manipulating, transmitting, and storing the data.
• In addition, many different pieces of software were used to organize, aggregate,
visualize, and present the data.
• Finally, many different human systems were involved in working with the data.
• People decided which systems to buy and install, who should get access to what kinds
of data, and what would happen to the data after its immediate purpose was fulfilled.
• Data science evolves as one of the most promising and in-demand career paths.
• Professionals use advanced techniques for analyzing large volumes of data.
• They are also skilled in communicating results to their non-technical counterparts.
OVERVIEW OF DATA SCIENCE …
• Skills important for data science:
• Statistics
• Linear algebra
• Programming knowledge with focus on data warehousing, data mining, and data modeling
OVERVIEW OF DATA SCIENCE …
• Activity 2.2
• Describe in some detail the main disciplines that contribute to data science.
• Write a small report on the role of data scientists .
DATA VS INFORMATION
• Data: a representation of facts, concepts, or instructions in a formalized manner, which
should be suitable for communication, interpretation, or processing, by human or
electronic machines.
• It can be described as unprocessed facts and figures.
• It is represented groups of non-random symbols in the form of text, images, voice, videos
representing quantities, action and objects.
• Information is the processed/interpreted data on which decisions and actions are based.
• It is data that has been processed into a form that is meaningful to the recipient and is of
real or perceived value in the current or the prospective action or decision of recipient.
• It is interpreted data; created from organized, structured, and processed data in a
particular context.
DATA PROCESSING CYCLE
• Data processing: is the re-structuring or re-ordering of data by people or machine to
increase their usefulness and add values for a particular purpose.
• Consists of the following basic steps: input, processing, and output, in that order.
• Activity 2.3
• Discuss the main differences between data and information with examples.
• Can we process data manually using a pencil and paper? Discuss the differences with
data processing using the computer.
•
DATA TYPES AND THEIR REPRESENTATION
• Data types can be described from diverse perspectives.
1. Computer science and programming perspective:
• A data type is an attribute of data that tells the compiler or interpreter how the
programmer intends to use the data.
• Almost all programming languages explicitly include the notion of data type, though
different languages may use different terminology.
• Common data types include:
• Integers: store integers.
• Booleans: store one of the two values: true or false
• Characters: store a single character (numeric, alphabetic, symbol, …)
• Floating-point numbers: stores real numbers
• Alphanumeric strings: stores a combination of characters and numbers.
DATA TYPES AND THEIR REPRESENTATION …
• A data type:
• Constrains the values that an expression (such as a variable or a function) might take.
• Defines the operations that can be performed on the data, the meaning of the data, and the way values
of that data type can be stored/represented.