Self Intoduction 1 Project
Self Intoduction 1 Project
with python backend developer for last 3.5 years and I was currently working
from Accenture. and my current project was
I use tools like SQL, NumPy, and Pandas to handle and organize the data,
making sure it is accurate and reliable. This means fixing any missing
information, outliers, and inconsistencies.
Once the data is clean, I perform exploratory data analysis (EDA). EDA is about
looking at the data from different angles to find patterns, trends, and insights.
To do this, I use visualization tools like Matplotlib and Seaborn. These tools
help create charts and graphs that make it easier to understand and share our
findings.
Overall, my goal is to ensure the data we work with is of high quality and to
generate valuable insights that can be communicated effectively to
stakeholders.
.
Project 2 ) Bank of America
Previously, I worked on upgrading the Core Banking System (CBS) for Citibank
to enhance performance, security, and compliance with banking regulations.
This project involved adding new features, optimizing system performance,
and ensuring seamless data migration while maintaining high security and
scalability.
I use tools like SQL, NumPy, and Pandas to handle and organize the data,
making sure it is accurate and reliable. This means fixing any missing
information, outliers, and inconsistencies.
Once the data is clean, I perform exploratory data analysis (EDA). EDA is about
looking at the data from different angles to find patterns, trends, and insights.
To do this, I use visualization tools like Matplotlib and Seaborn. These tools
help create charts and graphs that make it easier to understand and share our
findings.
Overall, my goal is to ensure the data we work with is of high quality and to
generate valuable insights that can be communicated effectively to
stakeholders.
2. Pandas: Utilized pd.read_sql() function to extract data from SQL databases into
Pandas DataFrame for further processing
3. NumPy and Pandas: Employed methods like np.isnan() and df.isnull() to identify
missing values
4. Pandas: Used df.dropna() or df.fillna() to handle missing values by either dropping
or imputing them.
5. NumPy: Detected outliers using statistical methods like calculating Z-scores
(np.abs((x - x.mean()) / x.std())) or IQR (Interquartile Range) method (np.percentile())
6. Pandas: Removed outliers using boolean indexing with conditions like
df[(df['column'] > lower_bound) & (df['column'] < upper_bound)]
7. Pandas: Handled inconsistencies in data by using methods like str.lower(),
str.strip(), or str.replace() for string operations
1.
Pandas: Calculated descriptive statistics such as mean, median,
standard deviation, etc., using methods like df.describe().
Pandas and NumPy: Generated summary statistics like correlation
coefficients (df.corr()) and covariance (np.cov()).
Matplotlib and Seaborn: Created various types of plots including
histograms, box plots, scatter plots, and heatmaps to visualize
distributions, relationships, and trends in the data.
Matplotlib and Seaborn: Customized plot aesthetics and styles using
parameters and functions provided by these libraries.
NumPy and Pandas: Applied mathematical functions and operations
to transform data, calculate new features, or derive insights.
Data Preprocessing:
Pandas:
Pandas as a python open-source library used for data manipulation and
analysis. It provides easy-to-use data structures, as Data Frames and Series,
along with a wide range of functions for tasks like filtering, sorting, grouping,
and aggregating data. Pandas is particularly valuable for tasks involving
structured data, making it a go-to tool for tasks like data cleaning,
transformation, and exploration in fields like data science, finance, and
business analytics.
Numpy:
Seaborn:
Sea born is a Python data visualization library built on top of Matplotlib. It
provides a high-level interface for creating attractive and informative statistical
graphics. In simpler terms, Seaborn makes it easier to generate visually
appealing plots and charts for analyzing data, allowing users to explore
patterns, trends, and relationships in their datasets
# Sample DataFrame
data = {'category': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C'],
'value': [10, 15, 20, 25, 30, 35, 40, 45, 50]}
df = pd.DataFrame(data)
In the context provided, "LED" doesn't seem to stand for any specific
abbreviation related to data extraction. "Led" in this context likely refers to the
individual taking the lead or being in charge of the data extraction process. It
indicates that they were responsible for overseeing and managing the
extraction of data from various sources for further analysis.
Ensuring Data Integrity: Data integrity is crucial for accurate analysis. This
person likely implemented measures to maintain the accuracy, consistency,
and reliability of the data throughout the process.
One challenge was dealing with inconsistent data formats across different
sources. I overcame this by implementing a robust data cleaning process using
Pandas, which standardized the formats and ensured data integrity, allowing
for accurate analysis."