0% found this document useful (0 votes)

249 views16 pages

EDA - Exploratory Data Analysis

EDA for Data Science

Uploaded by

spraga1995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views16 pages

EDA - Exploratory Data Analysis

EDA for Data Science

Uploaded by

spraga1995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

“Torture the data, and it will confess to

anything”

But, Raw, Unprocessed Data isn’t of much use………

EXPLORATORY DATA ANALYSIS - EDA

Exploratory Data Analysis (EDA) is the process of visualizing and

analysing data to extract insights from it.

In other words, EDA is the process of summarizing important

characteristics of data in order to gain better understanding of
the dataset.

After the data has been collected, it undergoes some processing

before being cleaned and EDA is then performed. Notice that
after EDA, we may go back to processing and cleaning of data,
i.e., this can be an iterative process.

It mainly has the following goals:

 To gain an understanding of data and find clues from the data

o Analyse the data
o Extract insights from it
 Preparing the proper input dataset, compatible with the
machine learning algorithm requirements.
 Improving the performance of machine learning models.
Data scientists spend 80% of their time on data preparation:

Best way to achieve expertise in feature engineering is practicing

different techniques on various datasets and observing their
effect on model performances.

Basic python scripts need Pandas and Numpy library to run

basic operations like data tables, arithmetic, logical…etc.

import pandas as pd
import numpy as np

Convert to data frame:

df=pd.DataFrame(“data”)

Missing values:
Missing values are one of the most common problems you can
encounter when you try to prepare your data for machine
learning. The reason for the missing values might be human
errors, interruptions in the data flow, privacy concerns, and so
on. These affect the performance of the machine learning models.
Most of the algorithms do not accept datasets with missing values
and gives an error.

df.isnull()
df.isnull().sum()

We can handle missing values in many ways:

Delete: You can delete the rows with the missing values or delete
the whole column which has missing values.

df=df.dropna()  Axis and Inplace

Impute: Deleting data might cause huge amount of information

loss. So, replacing data might be a better option than deleting.
However, there is an important selection of what you impute to
the missing values.
Numerical Imputation:
Considering a possible default value of missing values in the
column.

df = df.fillna(0)  Filling all missing values with ‘0’

One standard replacement technique is to replace missing values

with the average value of the entire column.
df=df.fillna(df.mean())
But, best imputation way is to use the medians of the columns
even when distribution is skewed. As the averages of the columns
are sensitive to the outlier values

df=df.fillna(df.median())

Categorical Imputation:
Replacing the missing values with the maximum occurred
value (Mode) in a column is a good option for handling
categorical columns.

df['column_name'].fillna(df['column_name'].value_counts()
.idxmax(), inplace=True)

Predictive filling:
Alternatively, you can choose to fill missing values through
predictive filling.

df=df.interpolate(method=’linear’)  Filling all missing values with

linearly formed data
Predictive model:
Creating a predictive model for filling the data
 Dataset split into 2 dataset
 One without missing values and one with missing values
 Create a model with dataset that is without missing values
 Run the model on dataset with missing values
Thus, the missing values are filled. But estimated values are well
behaved.

Handling Outliers:

Best way to detect the outliers is to demonstrate the data visually.

Box and whisker plot is most easy method to detect outliers

import seaborn as sns

sns.boxplot(x=’Variable',y='Target variable',data=df)

An Outlier Dilemma: Drop or Cap:

Dropping the data which is an outlier is one of the option to

handle the outliers
Dropping the outlier rows with Percentiles
upper_lim = df['column'].quantile(.95)
lower_lim = df['column'].quantile(.05)

df = df[(data['column'] < upper_lim) & (df['column'] > lower_lim)]

Another option for handling outliers is to cap them instead of

dropping.
Capping the outlier rows with Percentiles
upper_lim = df['column'].quantile(.95)
lower_lim = df['column'].quantile(.05)

df.loc[(df[column] > upper_lim),column] = upper_lim

df.loc[(df[column] < lower_lim),column] = lower_lim

Binning:

Binning can be applied on both categorical and numerical data

Numerical Binning
ExampleValue Bin
0-30 -> Low
31-70 -> Mid
71-100 -> High

Categorical Binning
ExampleValue Bin
Spain -> Europe
Italy -> Europe
Chile -> South America
Brazil -> South America
Numerical Binning Example
df['bin'] = pd.cut(df['value'], bins=[0,30,70,100],
labels=["Low", "Mid", "High"])

value bin
0 2 Low
1 45 Mid
2 7 Low
3 85 High
4 28 Low

Categorical Binning Example

conditions = [
df['Country'].str.contains('Spain'),
df['Country'].str.contains('Italy'),
df['Country'].str.contains('Chile'),
df['Country'].str.contains('Brazil')]

choices = ['Europe', 'Europe', 'South America', 'South America']

df['Continent'] = np.select(conditions, choices,

default='Other')

Country Continent
0 Spain Europe
1 Chile South America
2 Australia Other
3 Italy Europe
4 Brazil South America

Log Transform

Logarithm transformation (or log transform) is one of the most

commonly used mathematical transformations in feature
engineering. What are the benefits of log transform?

 It helps to handle skewed data and after transformation, the

distribution becomes more approximate to normal.

 In most of the cases the magnitude order of the data changes

within the range of the data.
 It also decreases the effect of the outliers, due to the
normalization of magnitude differences and the model
become more robust.
Log Transform Example

df['log+1'] =(df['value']+1).transform(np.log)

Negative Values Handling

Note that the values are different

df['log'] = (df['value']-df['value'].min()+1) .transform(np.log)

value log(x+1) log(x-min(x)+1)

0 2 1.09861 3.25810
1 45 3.82864 4.23411
2 -23 nan 0.00000
3 85 4.45435 4.69135
4 28 3.36730 3.95124
5 2 1.09861 3.25810
6 35 3.58352 4.07754
7 -12 nan 2.48491

One-hot encoding (or) Dummy coding:

One-hot encoding is one of the most common encoding methods

in machine learning. This method spreads the values in a column
to multiple flag columns and assigns 0 or 1 to them. These binary
values express the relationship between grouped and encoded
column. This method changes your categorical data, which is
challenging to understand for algorithms, to a numerical format
and enables you to group your categorical data without losing any
information. If you have N distinct values in the column, it is
enough to map them to N-1 binary columns, because the missing
value can be deducted from other columns.

encoded_columns = pd.get_dummies(df['column'])

df = df.join(encoded_columns).drop('column', axis=1)
New variable creation:
Sometimes, there will be a possibility of creating a new variable
by gaining information from 2 or more columns
This can be used to find the hidden relationship between the
variable and target.
E.g.: In ticket reservation:
No of passenger and their relationship column can be used to get
the information whether the passenger is travelling with family
or with friends or alone

Feature Split:

Splitting features is a good way to make them useful in terms of

machine learning. By extracting the utilizable parts of a column
into new features:

 We enable machine learning algorithms to comprehend them.

 Make possible to bin and group them.

 Improve model performance by uncovering potential

information.
String extraction example

data.title.head()

0 Toy Story (1995)

1 Jumanji (1995)
2 Grumpier Old Men (1995)
3 Waiting to Exhale (1995)
4 Father of the Bride Part II (1995)
data.title.str.split("(", n=1, expand=True)[1].str.split(")", n=1,
expand=True)[0]

0 1995
1 1995
2 1995
3 1995
4 1995

Scaling:

In most cases, the numerical features of the dataset do not have

a certain range and they differ from each other. In real life, it is
nonsense to expect age and income columns to have the same
range. But from the machine learning point of view, how these
two columns can be compared?

Scaling solves this problem. The continuous features become

identical in terms of the range, after a scaling process. This
process is not mandatory for many algorithms, but it might be
still nice to apply. However, the algorithms based
on distance calculations such as k-NN or k-Means need to have
scaled continuous features as model input.

Basically, there are two common ways of scaling:

Normalization

Normalization (or min-max normalization) scale all values in a

fixed range between 0 and 1. This transformation does not change
the distribution of the feature and due to the decreased standard
deviations, the effects of the outliers’ increases. Therefore, before
normalization, it is recommended to handle the outliers.
data = pd.DataFrame({'value':[2,45, -23, 85, 28, 2, 35, -12]})

data['normalized'] = (data['value'] - data['value']. min ()) /

(data['value']. max() - data['value'].min())

value normalized
0 2 0.23
1 45 0.63
2 -23 0.00
3 85 1.00
4 28 0.47
5 2 0.23
6 35 0.54
7 -12 0.10

Standardization

Standardization (or z-score normalization) scales the values

while taking into account standard deviation. If the standard
deviation of features is different, their range also would differ
from each other. This reduces the effect of the outliers in the
features.

In the following formula of standardization, the mean is shown

as μ and the standard deviation is shown as σ.

data = pd.DataFrame({'value':[2,45, -23, 85, 28, 2, 35, -12]})

data['standardized'] = (data['value'] - data['value'].mean()) /

data['value'].std()
value standardized
0 2 -0.52
1 45 0.70
2 -23 -1.23
3 85 1.84
4 28 0.22
5 2 -0.52
6 35 0.42
7 -12 -0.92

“garbage in, garbage out!”

Univariate Analysis:

Check it Out:
The first step in examining your data:

df.head()
df.info()
df.describe()

Explore target Variable:

Find the distribution of target variable, if the target variable is
categorical then find the success rate, using matplotlib and
seaborn plots.
For continuous variable:
import matplotlib.pyplot as plt
x = target['Column'].hist(density=True, stacked=True)
target['Column'].plot(kind='density')
plt.show()
For categorical variable:
import seaborn as sns
sns.countplot(x='Survived', data=df)
Visualization:
Histogram:
In the univariate analysis, we use histograms for analysing and
visualizing frequency distribution.

sns.distplot(df.column, kde=False)

Combination of histogram and distribution function

Box-plot
Second visualization tool used in the univariate analysis is box-
plot, this type of graph used for detecting outliers in data.
The distribution of continuous data that facilitates comparison
between variables or across the levels of categorical variables

sns.boxplot(x=df[‘categorical column’], y=df[‘continuous

column’], data=df)

Count Plot:
A histogram of categorical variable.

sns.countplot(x=df[‘categorical column’], data=df)

Bar Plot:
Represents the central tendency of a numerical variable with a
high solid rectangle and an error bar on top of it to represent the
uncertainty
Sns.barplot(x=’categorical variable’, data=df)
Bivariate Analysis:
Pair plot:
Plot pairwise relationships in a dataset. This function will create
a grid of graphs with combinations of all variables. Mostly
creates scatter plot.
sns.pairplot(df, hue=’target variable’)

Reg plot:
Plot data and linear regression models best fit line in the same
graph

sns.regplot(‘Var1’, ‘Var2’, data=df)

Joint plot:
Plot 2 variables with bi variate and univariate analysis in same
graph

sns.jointplot(‘Var1’, ‘Var2’, data=df, Kind=”Reg”)

Point plot:
Estimates intervals between categorical variables

sns.pointplot(x=’cat var’, y=’cont var’, hue=’cat var2’,

data=df)

Factor plot:
Used for multiple group comparison

sns.factorplot(data=df, x=var1, y=var2, hue =cat var)

Strip plot:
Scatter plot of one continuous and one categorical variable.

sns.stripplot(data=df, x=’cat variable’, y=’continuous

variable’)

Swarm plot:
One continuous and one categorical variable.
sns.swarnplot(data=df, x=’cat variable’, y=’continuous
variable’)

Co- variance:
∑(𝐱 − 𝐱̅)(𝐲 − 𝐲̅)
𝐧−𝟏

Co- relation:
𝐶𝑜 − 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
√𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑥 ) ∗ √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑦)

R2:
𝑅 2 = (𝑐𝑜 − 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)2

[𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑚𝑒𝑎𝑛) − 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑙𝑖𝑛𝑒)]

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑚𝑒𝑎𝑛)

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Tugas Ulang Metodologi Penelitian
No ratings yet
Tugas Ulang Metodologi Penelitian
55 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Phython Example
No ratings yet
Phython Example
12 pages
DSP Unit - III
No ratings yet
DSP Unit - III
49 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
1.sintayehu Fekadu Tola Last Corrected Thesis
No ratings yet
1.sintayehu Fekadu Tola Last Corrected Thesis
79 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Cours Data
No ratings yet
Cours Data
51 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Data Science Unit 2 Second Half Notes
No ratings yet
Data Science Unit 2 Second Half Notes
18 pages
Do Analyst Matter For Green Investment - Evidence From The EU Taxonomy
No ratings yet
Do Analyst Matter For Green Investment - Evidence From The EU Taxonomy
17 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Keywords:-Leadership Behavior, Education Supervision
No ratings yet
Keywords:-Leadership Behavior, Education Supervision
6 pages
Feature Engineering
No ratings yet
Feature Engineering
20 pages
Desmos User Guide
No ratings yet
Desmos User Guide
13 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
KTLTCNC Eng
No ratings yet
KTLTCNC Eng
134 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
BDS306B Module5
No ratings yet
BDS306B Module5
5 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
Attitude Toward Nursing Communicate Sexuality
No ratings yet
Attitude Toward Nursing Communicate Sexuality
8 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Mexican Assimilation March 05
No ratings yet
Mexican Assimilation March 05
31 pages
Application of Learning Curves in The Aerospace Industry Handout
No ratings yet
Application of Learning Curves in The Aerospace Industry Handout
34 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Lab File
No ratings yet
Lab File
96 pages
Standard Deviation B Pharma
No ratings yet
Standard Deviation B Pharma
8 pages
Econ 3230 HW4
No ratings yet
Econ 3230 HW4
3 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
052-Bancut 3f0b4b
No ratings yet
052-Bancut 3f0b4b
11 pages
Notes Part 2
No ratings yet
Notes Part 2
101 pages
Prediction and Reliability Analysis of Shear Stren
No ratings yet
Prediction and Reliability Analysis of Shear Stren
16 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Numerai Competition EDA
No ratings yet
Numerai Competition EDA
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
Feature Engineering
No ratings yet
Feature Engineering
50 pages
A Novel Method For Building Regression Tree Models For QSAR Based On Artificial Ant Colony Systems
No ratings yet
A Novel Method For Building Regression Tree Models For QSAR Based On Artificial Ant Colony Systems
5 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
P3 Acowtancy Notes - 186p
No ratings yet
P3 Acowtancy Notes - 186p
186 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Even Students
No ratings yet
Even Students
36 pages
Week 10
No ratings yet
Week 10
50 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
ML Notes
No ratings yet
ML Notes
44 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Data Analysis: Data Preparation
No ratings yet
Data Analysis: Data Preparation
9 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
Widescreen Presentation: CFA Research Challenge
No ratings yet
Widescreen Presentation: CFA Research Challenge
33 pages
Modified Wisconsin Card Sorting Test (M-WCST)
No ratings yet
Modified Wisconsin Card Sorting Test (M-WCST)
10 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Dmdw-Lab Manual
No ratings yet
Dmdw-Lab Manual
61 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Revised MBE Syllabus
No ratings yet
Revised MBE Syllabus
24 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Civil 4th Sem Question Collection
No ratings yet
Civil 4th Sem Question Collection
25 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Week 1 2 3 4 5 6 Value 18 13 16 11 17 14: Chapter 6 Time Series Analysis and Forecasting
No ratings yet
Week 1 2 3 4 5 6 Value 18 13 16 11 17 14: Chapter 6 Time Series Analysis and Forecasting
7 pages
Part A Assignment - No - 1
No ratings yet
Part A Assignment - No - 1
7 pages
Datascience
No ratings yet
Datascience
26 pages
Module 2 - Introduction To Cost Concepts
No ratings yet
Module 2 - Introduction To Cost Concepts
51 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Data Preprocessing Techniques in ML
No ratings yet
Data Preprocessing Techniques in ML
12 pages
Silabus Aplikasi Komputer Statistik
No ratings yet
Silabus Aplikasi Komputer Statistik
2 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
AIML
No ratings yet
AIML
13 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages

EDA - Exploratory Data Analysis

Uploaded by

EDA - Exploratory Data Analysis

Uploaded by

“Torture the data, and it will confess to

But, Raw, Unprocessed Data isn’t of much use………

Exploratory Data Analysis (EDA) is the process of visualizing and

In other words, EDA is the process of summarizing important

After the data has been collected, it undergoes some processing

It mainly has the following goals:

 To gain an understanding of data and find clues from the data

Best way to achieve expertise in feature engineering is practicing

Basic python scripts need Pandas and Numpy library to run

Convert to data frame:

We can handle missing values in many ways:

df=df.dropna()  Axis and Inplace

Impute: Deleting data might cause huge amount of information

df = df.fillna(0)  Filling all missing values with ‘0’

One standard replacement technique is to replace missing values

df=df.interpolate(method=’linear’)  Filling all missing values with

Best way to detect the outliers is to demonstrate the data visually.

Box and whisker plot is most easy method to detect outliers

import seaborn as sns

An Outlier Dilemma: Drop or Cap:

Dropping the data which is an outlier is one of the option to

df = df[(data['column'] < upper_lim) & (df['column'] > lower_lim)]

Another option for handling outliers is to cap them instead of

df.loc[(df[column] > upper_lim),column] = upper_lim

Binning can be applied on both categorical and numerical data

Categorical Binning Example

choices = ['Europe', 'Europe', 'South America', 'South America']

df['Continent'] = np.select(conditions, choices,

Logarithm transformation (or log transform) is one of the most

 It helps to handle skewed data and after transformation, the

 In most of the cases the magnitude order of the data changes

Negative Values Handling

df['log'] = (df['value']-df['value'].min()+1) .transform(np.log)

value log(x+1) log(x-min(x)+1)

One-hot encoding (or) Dummy coding:

One-hot encoding is one of the most common encoding methods

Splitting features is a good way to make them useful in terms of

 We enable machine learning algorithms to comprehend them.

 Make possible to bin and group them.

 Improve model performance by uncovering potential

0 Toy Story (1995)

In most cases, the numerical features of the dataset do not have

Scaling solves this problem. The continuous features become

Basically, there are two common ways of scaling:

Normalization (or min-max normalization) scale all values in a

data['normalized'] = (data['value'] - data['value']. min ()) /

Standardization (or z-score normalization) scales the values

In the following formula of standardization, the mean is shown

data = pd.DataFrame({'value':[2,45, -23, 85, 28, 2, 35, -12]})

data['standardized'] = (data['value'] - data['value'].mean()) /

“garbage in, garbage out!”

Explore target Variable:

Combination of histogram and distribution function

sns.boxplot(x=df[‘categorical column’], y=df[‘continuous

sns.countplot(x=df[‘categorical column’], data=df)

sns.regplot(‘Var1’, ‘Var2’, data=df)

sns.jointplot(‘Var1’, ‘Var2’, data=df, Kind=”Reg”)

sns.pointplot(x=’cat var’, y=’cont var’, hue=’cat var2’,

sns.factorplot(data=df, x=var1, y=var2, hue =cat var)

sns.stripplot(data=df, x=’cat variable’, y=’continuous

[𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑚𝑒𝑎𝑛) − 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑙𝑖𝑛𝑒)]

You might also like