Advanced Data Analytics Using Python - Unit II
Advanced Data Analytics Using Python - Unit II
● Different IDEs
● Advanced data acquisition methods
● APIs (Application Programming Interfaces)
● web scraping
● Databases
● Data Cleaning and Preprocessing Techniques
Python Development IDEs
Tools that provide a comprehensive environment for software development, combining code editing,
debugging, and project management.
Example:
● Google Colab - Collaborative Python Notebooks. Cloud-Based Python Development and
Collaboration.
● Visual Studio Code (VS Code): A lightweight, open-source code editor with powerful features and
a large extension ecosystem.
● Jupyter Notebook: An interactive, web-based tool for data analysis, visualization, and
documentation.
● PyCharm: A comprehensive IDE with intelligent code assistance, integrated testing, and support
for various Python frameworks.
Google Colab
● Access Google Colab:
○ Visit Google Colab in your web browser.
● Sign In with Google Account:
○ Sign in with your Google account to access Google Drive integration.
● Create a New Notebook:
○ Click on File > New Notebook to create a new Python notebook.
Google Colab
● Write Python Code:
○ Write Python code in cells within the notebook.
○ Example code:print("Hello, Google Colab!")
Data Acquisition (DAQ) is used to gather, measure, and record data from different sources
or sensors in real-world scenarios. This involves the conversion of analog signals into digital
data that computers can process and analyze.
Data Sources
● Databases
● Files
● APIs
● Web Scraping
● Sensors and IoT Devices
What is Data acquisition?
Data collection Sources
Main Purpose of a Data Acquisition System (DAQ)
Analog-to-Digital
Conversion
Convert analog signals from sensors into digital data.
● Data recording
● Data storing
● Real-time data visualization
● Post-recording data review
● Data analysis using various mathematical and statistical calculations
● Report generation
What Types of Data Do Companies Collect?
Importance of Data Acquisition Systems
The importance of Data Acquisition Systems lies in their critical role in modern data-
driven applications. Here are the key reasons why they are vital:
An API integration can be used for data collection in a number of ways. One common use case
is to use an API to collect data from multiple sources and then analyze that data in a more
efficient and accurate way.
List of 5 data science projects using API
● Social Media Sentiment Analysis : By using data from Twitter and Facebook API.
● Stock Prediction : By using data from Yahoo Stock API and Quandl API.
● Request: is the actual data or service being asked for (e.g., if I give you current location from my game
(Pokemon Go), tell me the map around that place). A Request has two main parts:
● Methods: i.e. the questions you can ask, assuming you have access (it also defines the type of
responses available).
● APIs for data science are useful as they’re essential building blocks.
● They are pieces of code that can be put together to enhance applications and websites.
● For example, speech recognition APIs like chatbots improve the connection between models and
consumers.
Difference between an API and a Library
APIs and the Future of Data Science
Data science is constantly evolving, and it has the potential to grow beyond its current capabilities with the
additional help of APIs.
APIs enable different industries to innovate, improve, and become more data-driven.
Additionally, they pave the path for new business partners and help with app development.
APIs have the potential to make advanced analytics more understandable so that better business forecasting can
take place.
Web Scraping
Data science involves extracting value and insights from large volumes of data to drive
business decisions. It also involves building predictive models using historical data. Databases
facilitate effective storage, management, retrieval, and analysis of such large volumes of data.
Essential Database Skills for Data Science
Types of Databases
What is data preprocessing?
● Data preprocessing is the initial processing of raw data to ready it for subsequent data
analysis or machine learning tasks.
● It transforms raw data into a more suitable format for efficient processing in tasks like
data mining and machine learning.
● Data preprocessing ensures the accuracy of results by refining and organizing the input
data.
Steps for data preprocessing
Preprocessing techniques for complex datasets
Handling complex datasets can be challenging, but Python offers various preprocessing techniques to make your
life easier. Here are some common techniques:
● Data Cleaning
● Data Transformation
● Feature Engineering
● Handling Text Data
● Dealing with Date/Time Data
● Handling Imbalanced Datasets
● Dimensionality Reduction
● Data Splitting
● Handling Multicollinearity
● Handling Time Series Data
What is Data Cleansing ?
Data Cleaning Cycle
Usage of Data Cleaning
Data Transformation
Data transformation is one of the techniques that we use in between data processing. This
technique lets us convert the raw data into a required format so that the next procedures of
data processing and data modelling can be performed efficiently.
Ways of data transformation
Data Transformation Techniques
Data Transformation Process
Advantages of Data Transformation
What is Feature Engineering?
The process of identifying & extracting relevant features from raw data for a machine learning
algorithm is called feature engineering. It starts from selecting the most important
characteristics (features), their transformation using mathematical operations, construction of
new variables as per the requirement, and feature extraction.
Handling Text Data
Text Analysis Techniques
Dealing with Date/Time Data
Handling Imbalanced Datasets