Assignament
Assignament
Noshaba Muneer
Registration No
Math 211101076(6B)
Course:
Data Science
Data Science is a blend of various tools, algorithms, and machine learning principles with
the goal to discover hidden patterns from the raw data. But how is this different from
what statisticians have been doing for years?
Data Science is primarily used to make decisions and predictions making use of
predictive causal analytics, prescriptive analytics (predictive plus decision science) and
machine learning.
Big data refers to extremely large and diverse collections of structured, unstructured, and
semistructured data that continue to grow exponentially over time. Let’s break down what this
means:
1. Volume: Big data encompasses a vast volume of information. It’s not just a few
gigabytes; we’re talking about terabytes, petabytes, or even exabytes of data.
2. Velocity: Data is generated at an incredibly high speed. Think of social media posts,
sensor data, financial transactions, and more pouring in rapidly.
3. Variety: Big data comes in various formats: text, images, videos, logs, sensor readings,
and more. It’s not neatly organized; it’s a mix of structured and unstructured data.
Social Media: Comments, posts, tweets, and interactions on platforms like Facebook,
Twitter, and Instagram.
Internet of Things (IoT): Data from connected devices, sensors, wearables, and smart
appliances.
Business Transactions: Sales records, customer orders, invoices, and financial data
Web Logs: Information from web servers, user behavior, and website visits.
Scientific Research: Genomic data, climate models, and simulations.
Healthcare: Electronic health records, medical imaging, and patient data.
Handling Complexity: Big data is messy, and separating valuable insights from noise
can be challenging
Storage and Processing: Storing and analyzing massive datasets require specialized
infrastructure and tools.
Data Privacy and Security: Protecting sensitive information is crucial.
Business Insights: Big data analytics can reveal patterns, trends, and correlations that
drive business decisions.
Use Cases
Data science is the computational science of extracting meaningful insights from raw data
and then effectively communicating those insights to generate value.
Data engineering, on the other hand is an engineering domain that’s dedicated to building
and maintaining systems that overcome data processing and data handling problems for
applications that consume, process, and store large volumes, varieties, and velocities of
data.
Data science focuses on extracting insights from data using statistical and machine
learning techniques
While data engineering involves designing and maintaining the infrastructure to collect,
store, and process data efficiently
Both roles are crucial for successful data-driven decision-making within an organization
You
Your organization
Your employer
Anyone who has a bit of understanding and training can begin using data insights to
improve their lives, their careers and the well-being of their businesses
Structured data entails data that is categorized and stored in a file according to a particular
format description, where unstructured data is free-form text that takes on a number of types.
Website links
Emails
Twitter responses
Product reviews
Pictures/images
Written text on various platforms