Krishna Data Scientist +1 (713) - 478-5282

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

KRISHNA saigupta1239@gmail.

com
DATA SCIENTIST +1(713)-478-5282

INTRODUCTION
Around 7+ years of professional experience in Requirement Gathering, Analysis, Developing, Testing, and
implementing life cycle utilizing approaches like Agile, Waterfall and Test-Driven Development (TDD).
PROFESSIONAL SUMMARY
 profound experience as a Data Scientist/Machine Learning, Data Engineer and Data Analyst Developer with
excellent Statistical Analysis, Data Mining and Machine Learning Skills.
 Worked in the domains of Financial & Insurance Services, Healthcare and Retail.
 Expertise in managing full life cycle of Data Science project includes transforming business requirements
into Data
Collection, Data Cleaning, Data Preparation, Data Validation, Data Mining, and Data Visualization from
structured and unstructured Data sources.
 Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large
datasets using Data Staging.
 Hands on experience of statistical modeling techniques such as: linear regression, Lasso regression, logistic
regression, ANOVA, clustering analysis, principal component analysis.
 Hands on Experience in writing User Defined Functions (UDFs) to extend functionality with Scala, and python for
Data preprocessing
 Professional working experience in Machine Learning algorithms such as LDA, Linear Regression, Logistic
Regression, SVM, Random Forest, Decision Trees, Clustering, Neural Networks and Principal Component
Analysis.
 Working knowledge on Anomaly detection, Recommender Systems and Feature Creation, Validation using
ROC plot and K-fold cross validation.
 Professional working experience of using programming languages and tools such as Python, R, Hive, Spark,
and PL/SQL.
 Hands on experience in ELK (Elasticsearch, Logstash, and Kibana) stack and AWS, Azure, and GCP.
 Hands on experience of Data Science libraries in Python such as Pandas, Numpy, SciPy, scikit-learn, Matplotlib,
Seaborn, Beautiful Soup, NLTK.
 Working experience in RDBMS such as SQL Server 2012/2008 and Oracle 11g.
 Extensive experience of Hadoop, Hive and NOSQL data bases such as MongoDb, Cassandra, Snowflake and
HBase.
 Experience in data visualizations using Python, R, Power BI, and Tableau 9.4/9.2.
 Highly experienced in MS SQL Server with Business Intelligence in SQL Server Integration Services (SSIS), SQL
Server Analysis Services (SSAS), SAS, and SQL Server Reporting Services (SSRS).
 Familiar with conducting GAP analysis, User Acceptance Testing (UAT), SWOT analysis, cost benefit analysis
and ROI analysis.
 Deep understanding of software Development Life Cycle (SDLC) as well as Agile/Scrum Methodology to
accelerate Software Development iteration
 Experience with version control tool- Git, and SVN.
 Extensive experience in handling multiple tasks to meet deadlines and creating deliverables in fast-paced
environments and interacting with business and end users.
 Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics
with an excellent understanding of business operations and analytics tools for effective analysis of data.
TECHNICAL SKILLS

Programming Languages Python, Java, R, C, C++, SAS Enterprise Miner, and SQL (Oracle & SQL Server)

Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization


(Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random
Forest, Gradient Boosting, X treme Gradient Boosting (XGBM), Deep Learning - Neural
Machine Learning Networks, Deep Neural Networks (CNN, RNN & LSTM) with Keras and Tensor flow,
Dimensionality Reduction- Principal Component Analysis (PCA), and Information
Value, Hierarchical & K-means clustering, K-Nearest Neighbors.

Cloud Technologies AWS, Azure, and Google cloud


Tableau, Google Analytics, Advanced Microsoft Excel, and Power BI.
Data Visualization
Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering,
Text Preprocessing
Sentiment Analysis and Word2Vec
Spark/Pyspark, Kafka, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume, Airflow
Big Data Tools and Oozie.

ETL Tools Informatica Power center, Talend Open studio, and DataStage
Operating Systems Linux, Unix, and Windows
Methodologies Agile, Waterfall, OOAD, SCRUM
Version Control SVN, CVS, GIT and Clear Case

PROFESSIONAL EXPERIENCE
Client: Allied Solutions May 2019-Present
Role: Data Scientist/ Machine Learning Engineer
Responsibilities:
 Experienced machine learning and statistical modeling techniques to develop and evaluate algorithms to
improve performance, quality, data management and accuracy.
 Experienced in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and
Logistic Regression, SVM, Clustering, Neural Networks, Principal Component Analysis, and good knowledge on
Recommender Systems.
 Gathered, analyzed, documented, and translated application requirements into data models, supported
standardization of documentation and the adoption of standards and practices related to data and applications.
 Responsible for identifying the data sets required to come up with predictive models for providing solutions for
both internal and external business problems.
 Data preparation Includes Data Mapping of unlined data from various formats, Identifying The missing data,
Finding the correlations, scaling, and removing the Junk data to further process the data for building a predictive
model into Apache Spark.
 Responsible for supervising the Data cleansing, Validation, data classifications and data modelling activities.
 To develop algorithms in python like K – Means, Random Forest linear regression, XG Boost and SVM.as part of
data analysis.
 Proficient with a deep learning framework such as TensorFlow or Keras and libraries like Scikit-learn
 Based on over all Statistics, Model Performance and Run Time decided Final Model and achieved accuracy,
precision, recall in the range of 75-80 % on average for the validated models
 Calculated statistical thresholds for A/B Tests and routinely collected data for multiple tests at a time.
 Experienced in testing of ETL and message queue workflows and workloads.
 Implementation experiences in Machine Learning and deep learning, including Regression, Classification,
Neural network, object tracking, Natural Language Processing (NLP) using packages like Tensor Flow, Keras,
NLTK, Spacy, and BERT.
 Development of Web Services using REST API’s for sending and getting data from the external interface in the
JSON format.
 Understanding of data structures, data modeling and software architecture
 Constructed event pipeline around AWS to simulate and display status of trades in real time
 Worked on Azure databases, the database server is hosted on Azure and use Microsoft credentials to login to
the DB rather than the Windows authentication that is typically used.
 Built streaming pipeline with confluent AWS Kubernetes with python to support CI/CD
 Utilized Spark SQL to extract and process data by parsing using Datasets or RDDs in Hive Context, with
transformations and actions (map, flat Map, filter, reduce, reduce By Key).

Environment: Python, Numpy, Pandas, TensorFlow, Seaborn, SciPy, NLP, Matplotlib, GitHub, Spark, Sqoop, Kafka,
Hive, Java script and AWS, Azure, S3, Jupyter Notebook, Mango DB, Tableau, and Postman.

Client: General Electric OCT 2018- April 2019


Role: Data scientist/ Machine Learning Engineer
Responsibilities:
 Developed the application by using AGILE-SCRUM methodology.
 Involved in Data Preprocessing Techniques for making the data useful for creating Machine Learning models.
 Owned and executed end to end data science project.
 Translate product requirements into analytical requirements/specification, design and develop required
functionality.
 Involved in creating various regression and classification algorithms by using various Sklearn libraries such as
Linear Regression, Decision Trees, and Random Forest.
 Involved in creating Machine Learning models for hyper tuning test content which is useful for making better
decisions regarding the products.
 Objects detected from images using Keras, Tensorflow, OpenCV, Pytorch
 Involved in distributed data and computing tools, including MapReduce, Hadoop, Hive, impala, Pyspark etc.
 Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation
(Bagging) and Boosting (Light GBM, Gradient).
 Used F-Score, Z-test, T-test, P-Value Precision, recall evaluating model performance.
 Applied concepts of probability, distribution, and statistical inference on the given dataset to unearth interesting
findings using comparison, R-squared, P-value etc.
 Worked on Hadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and
MapReduce concepts.
 Work on cloud platforms like GCP Big Query/ AWS Lambda, Kinesis, AWS EMR, etc.
 Developed REST API’s for sending and getting data from the external interface in the JSON format.
 Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005
with high volume data.
Environment: PyCharm, Jupyter Notebook Tableau, python, Numpy, Pandas, scikit learn, H20, CUDA, SQL server,
Netezza 4.2, SQL, Oracle, HADOOP (HDFS), PIG, Python, SciPy, Matplotlib, NOSQL, and AWS/GCP .

Client: Arohak INC May 2015- Jul 2018


Role: Data scientist/ Machine Learning Engineer
Responsibilities:
 Developed Data pipelines using python for School based data, Training and Testing.
 Worked on projects starting from gathering requirements to developing the entire application. Hands on with
Anaconda Python Environment. Developed, activated, and programmed in Anaconda environment
 Performed Exploratory Data analysis (EDA) to find and understand interactions between different fields in the
dataset, for dimensionality reduction, to detect outliers, summarize main characteristics and extract important
variables graphically
 Responsible for Data Cleaning, Feature Engineering, feature scaling using Numpy, and Pandas.
 Extract data and actionable insights from a variety of client sources and systems, find probabilistic and
deterministic matches across second- and third-party data sources, and complete exploratory data analysis
 Worked on writing and as well as read data from CSV and excel file formats
 Proficient in object-oriented programming, design patterns, algorithms, and data structures
 Wrote python routines to log into the websites and fetch data for selected options
 Implemented Python scripts to update content in databases and manipulate files
 Experience using python libraries like pandas, Numpy, matplotlib, Sklearn, SciPy to Load the dataset,
summarizing the dataset, visualizing the dataset, evaluating some algorithms, and making some predictions
 Experience using ML techniques: clustering, regression, classification, graphical models
 Carried out Regression, K-means clustering and Decision Trees along with Data Visualization reports for the
management using R
 Implemented classification algorithms such as Logistic Regression, KNN and Random Forests to predict the
school student’s data.
 Implemented various algorithms like Principal Component Analysis (PCA) and Linear discriminant analysis
(LDA) for dimensionality reduction and normalize the large datasets.
 Performed data visualization and designed dashboards using Tableau, generated reports, including charts,
summaries, and graphs to interpret the findings to the team and stakeholders
 Developed numerous MapReduce jobs in python for Data Cleansing and Analyzing Data
 Knowledge in extracting and synthesizing of data from Azure: data lake storage (ADLS), blob storage, SQL DW,
SQL Server; and legacy systems: Oracle and its companion data lake storage
 Automated machine learning and analytics pipelines using Azure
 Administered regular user and application support for highly complex issues involving multiple components such
as Hive, Spark, Kafka, MapReduce
 Developed Full life cycle of Data Lake, Data Warehouse with Bigdata technologies like Spark and Hadoop
Environment: Jupyter Notebook, H20, MapReduce, Python, Scala, Spark, Kafka, Hive, MySQL, MYSQL Server, Mango
Db, Azure, PyCharm, GIT, Linux, Numpy, Pandas, Matplotlib, Sklearn, SciPy, PCA, k-means Clustering, Decision Trees,
KNN, Random Forest, AWS Sage maker, and Tableau.

Client: Capgemini July 2013- Apr 2015


Role: Data scientist
Responsibilities:
 Performed data integrity checks, data cleansing, exploratory analysis and feature engineer using python and
data visualization packages such as Matplotlib, Seaborn.
 Used Python to develop a variety of models and algorithms for analytic purposes.
 Developed logistic regression models to predict subscription response rate based on customer's variables like
past transactions, promotions, response to prior mailings, demographics, interests, and hobbies, etc.
 Used Python to implement different machine learning algorithms, including Generalized Linear Model, Random
Forest, and Gradient Boosting.
 Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
 Recommended and evaluated marketing approaches based on quality analytics on customer consuming
behavior.
 Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models
and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
 Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and
De-normalization of database.
 Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine
learning methods including classifications, regressions, dimensionally reduction etc.
 Performed data visualization and Designed dashboards with Tableau, and provided complex reports, including
charts, summaries, and graphs to interpret the findings to the team and stakeholders.
 Identified process improvements that significantly reduce workloads or improve quality.

Environment: Python (Scikit-Learn/SciPy/Numpy/Pandas), Linux, Tableau, Hadoop, Map Reduce, Hive, Oracle,
Windows 10/XP, JIRA.

You might also like