Chapter 2
Chapter 2
DATA SCIENCE
ENAMUL HAQUE
All rights reserved. This book or any portion thereof may not be reproduced or used in any
manner whatsoever without the publisher's express written permission except for the use of
brief quotations in a book review or scholarly journal.
It's essential to precisely understand how to manage your data life cycle and maintain the corporate
data model. You need to understand what data the company has to develop the right strategy for
working with them. Then it will be valuable for business. In parallel with the first pilots, a roadmap
for creating a platform for data science, including developing the storage platform and approaches
to working with machine learning models, is being developed and refined.
Accumulation of data
It goes either way, even if the data is not used in existing models. It is important to organise
storage space and engage in minimal structuring; otherwise, the "lake" will turn into a useless
"swamp". In addition, data lake needs to be linked to the company's analytical ecosystem and
information security: it should not leak or cause problems with regulators.
Figure 1 - A data lake is a centralised repository that allows you to store all your structured and unstructured data
at any scale.
Comprehensive data work, machine learning initiatives, and process digitisation enable any
manufacturing company to become more efficient. This provides an opportunity to maximise
profits by reducing the cost of production, facilitating and accelerating specialists' work, improving
production safety, improving the situation with overspending of raw materials, the percentage of
optimisation, and equipment maintenance. And in the long term - to ensure the transition to fully
autonomous production.
Approaches and Methods of Social Media Data
Analysis
Social media is a good source of data, and it is vital to be able to work effectively with that data. Let's
take a look at a few features and approaches to how social media data works. It is worth noting that
there is a separate direction - Social Mining. This is applying data mining methods and algorithms to
find and detect dependencies and knowledge on social networks (or areas of knowledge where data
can be presented as networks/graphs). The applications are pretty comprehensive.
In general, almost all the practical tasks of analysing social media data are reduced to the
following basic:
• Analysis of social network information flows, structure and metrics
• Analysis of the tone of messages (emotional colouring)
• Analysis and extraction of topics (as written in social networks)
• Image analysis
There are also combinations of these tasks.
Theme analysis
This class of methods allows you to identify the most popular topics in the community and
most often discussed in it (at a particular time). Solved tasks: highlighting topics (topic modelling),
assessing emotional colouring by themes, highlighting entities related to the topic.
Image analysis
It allows you to identify what types of photo-content place different segments of users. Solved
tasks: the kind of object in the photo, the type of location in the image, people's emotions,
verification and identification (to compare the person found in a physical location with his profile
on the social network).
If the task is aimed at the level of analysis of a particular person, that is, such directions:
• Personalisation of proposals
• Analysis of the structure of the social network
• Analysis of human content on the social network
Offering personification allows you to provide the user with the content that is most relevant
to them. Tasks: collecting and enriching user information; Clustering and segmenting users User
classification based on the built model personalised provision of information.
What does Google know about you? You can find information about yourself here:
google.com/settings/ads
Current and promising research in the field of social media analysis
• Semi-supervised learning on social media
• Social media sustainability and design
• Predicting the spread of information on social networks
• A synergy of spatial data and social media data
Data Science Project Management
As the volume of data increases day by day in all areas and industries, it is essential for any company,
industry, or domain to know about it and use it appropriately to grow enormously. No business
wants to slow down growth, and then they do not know what the root of the problem is and how to
solve it and develop it. Often when we talk about data science projects, it seems that no one can
provide a clear explanation of how the whole process is going. From data collection to analysis and
presentation of results. In the previous section, we saw the data science lifecycle, and now we will
apply them in the data science project
Problem statement
There are two ways in the problem statement-based data science approach: dive into the
problem and solve. First, you need to know if your goal in this data is a numerical or categorical
decision. For example, your problem statement is whether a drug has shown the desired results or
not, whether customers are satisfied with a new product released, or whether sales will rise or fall in
the future. This is a definite answer, i.e., simply yes or no, possibly or not. If your job is to predict
the future sales price or home prices, or what dosage is required. They all give numerical values based
on the data provided. So, first, you need to identify the problem and find the best solution for it.
Collecting data
Now the data processing starts, and the data is collected from various sources and placed in a
specific location (database). All data required to solve this problem is collected.
Data cleansing
The collected data is correctly installed and checked for any missing data, anomalies and data
distribution. The data is cleared and processed with all the payload.
Data visualisation
Since most of the collected data is now cleaned up, explored and well understood and presented
visually with some graphs, graphs using the Scikit-learn library in Python or visualisation can be
created in Tableau and in some visualisation software or in something else. In this way, ideas are well
extracted with perfect images that anyone can see, which can be well explained.