CS213 - 04 - Data Science
CS213 - 04 - Data Science
Tsegamlak Molla
• Data & Information
• Data Quality
Previously • Data Processing
• Integers (int)
• Is used to store whole numbers, mathematically known as integers
• Booleans (bool)
• Is used to represent restricted to one of two values: true or false
• Characters (char)
• Is used to store a single character
• Semi-structured,
• Unstructured.
• The metadata then provides fields for dates and locations which,
by themselves, can be considered structured data.
• The data value chain and the data processing lifecycle are two
distinct concepts related to the management and utilization of
data.
5. Data Usage
• A key trend for the duration of big data utilizes community and
crowdsourcing approaches
• Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.
• Volume
• Large amounts of data zeta bytes/massive datasets
• Velocity
• Data is live streaming or in motion
• Variety
• Data comes in many different forms from diverse sources
More V’s:
• Veracity
• Can we trust the data?
• It can store data in its native format and process any variety of
it, ignoring size limits.
• List and describe each technology or tool used in the big data
life cycle.