DV - QB - Solution
DV - QB - Solution
QB Solution
Q1. 7 Stages of Data Visualization
1. Information Overload
"Information overload" means being overwhelmed by too much information. Today,
computers are incredibly powerful and cheap, allowing us to analyse large data sets
without needing a research lab. Over the past decade, computer graphics have also
improved, thanks to gaming technology, making it easier and cheaper to create detailed
and interactive visualizations.
2. Data Collection
We’re getting better at gathering data, but we struggle with making the most of it. Much of
the data available online isn’t used effectively because it isn’t visualized well. Despite
collecting a lot of data, we often can’t answer important questions quickly. We need to
improve how we understand and communicate this information.
Import pandas as pd
Import numpy as np
#Creating a DataFrame
Data={
‘Name’:[‘Akash’,’Priya’,’Rithesh’,’Neha’, ‘Rahul’],
‘Age’:[25, 23, 19, 20, 26],
‘City’:[‘New Delhi’,’Mumbai’,’Indore’,’Nashik’,’Jaipur’]
}
Df = pd.DataFrame(data)
Df
To classify digital data based on structure, it can be divided into three main categories:
Structured, Semi-Structured, and Unstructured data. Here is an overview of each type:
1. Structured Data
• Definition: Structured data is highly organized and formatted into rows and
columns. It follows a predefined schema, making it easily searchable and
analyzable.
• Characteristics:
o Data is stored in databases like relational databases (e.g., MySQL, Oracle).
o Fields are clearly defined with specific data types (e.g., integer, string, date).
o Querying and processing can be done using SQL.
• Examples: Spreadsheets, databases with customer information, inventory
records.
• Advantages: Easy to manage, query, and analyze using standard tools.
2. Semi-Structured Data
• Definition: Semi-structured data has some organization but lacks a fixed schema.
It may include tags or metadata to identify elements within the data.
• Characteristics:
o Data is stored in flexible formats like JSON, XML, or NoSQL databases.
o It is more adaptable than structured data and can evolve over time.
• Examples: JSON files, XML files, NoSQL databases, email messages (where the
subject and sender are structured, but the message body is not).
• Advantages: Allows flexibility and can store diverse types of information without
strict schema requirements.
3. Unstructured Data
• Definition: Unstructured data lacks any predefined format or organization. It can
come in a variety of forms and is more challenging to process.
• Characteristics:
o Includes diverse types of data such as text, images, audio, and video.
o Requires advanced techniques like machine learning or natural language
processing (NLP) to analyze.
• Examples: Images, videos, social media posts, PDFs, audio recordings.
• Advantages: Contains a wealth of valuable insights, particularly for qualitative
analysis, but is difficult to analyze without specialized tools.
• Conclusion:
Structured data is highly organized and easy to manage but rigid, while semi-structured
data provides more flexibility and adaptability. Unstructured data, though harder to
process, contains the most diverse and valuable information. Each type of data serves
different purposes depending on the needs of an organization.
Q5. Reading the data from different files
Import pandas as pd
#Path to your Excel file
File_path = ‘/content/drive/MyDrive/sales_data.xlsx’
Import pandas as pd
#Path to your CSV file
File_path = ‘/content/drive/MyDrive/data.csv’
Import pandas as pd
#Path to your Json file
File_path = ‘/content/drive/MyDrive/sample.json’
# Importing Pandas
Import pandas as pd
# Disply DataFrame
Print(df.head())
---Harshada Patil