Rakesh Kumar - Data Scientist
Rakesh Kumar - Data Scientist
Professional Summary:
As a Lead Data Scientist, he has extensive experience working with large-scale enterprises across various domains.
Expertise in driving and developing AI-related innovations and ideas, from conception to full-scale implementation,
in areas such as NLP, time series analysis, computer vision, and AI marketing and recommendation systems. Broad
understanding of software engineering projects related to micro-services and data science.
Education
Master of Data Science: The University of British Columbia, Vancouver
Bachelor Of Technology: Information Technology, Maharshi Dayanand University - Rohtak
Certifications
Specialized Models: Time Series and Survival Analysis (Coursera) DeepLearning.AI TensorFlow Developer (Coursera)
Deep Learning Specialization (Coursera Andrew NG)
Technical Skills:
Deep Learning: Transformer
• LLM(BERT, BART, GPT-2, GPT-3, T5, XML ROBERTA, Longformer, and Electra)
• Prompt engineering(GPT-3.5 DaVinci & Turbo, Zero & Few shot learning + COT + Experts, LangChain)
• Chatbot
• Siamese Networks
• Dialog Management
• Conversational AI (RASA, Dialogflow and Custom Chatbots)
• CNN-based models (MobileNet, ResNet)
• OCR (Tesseract, EasyOCR, and PaddleOCR)
• Object Detection (R-CNN, Fast R-CNN, Faster R-CNN, RetinaNet, YOLO, and SSD)
• Image Segmentation
• Image classification
• Object localization & detection
Languages/Framework: Python, Java, C, C++, R, Javascript
• TensorFlow, Pytorch, SQL, Flask
• Spring Boot, Microservices
Time Series/AI-Marketing: ARIMA, SARIMA, FBProphet, Deep Learning Forecasting
Big Data Technologies: PIG, Hive, PySpark, Hadoop
• MongoDB, BigQuery, Snowflake, Apache Airflow
Additional: Personalized & Non-Personalized Recommendations Systems
• SQL, Vector DB
• Assessment, Reporting, Presentation
• Databricks, Snowflake
• Docker, Jenkins, Kubernetes, Ansible, Chef, Terraform
• Virtualization, Azure ML, AWS, GCP
• ETL, Tableau, Kafka, Redis
• Elasticsearch
• DeepSpeed, LORA, Distillation, Quantization, ONNX, TensorRT, XLA Compiler
Professional Experience:
Ai Intelli, Canada 2021-08 - Present
Lead Data Scientist
Responsibilities
Leading the development of an intelligent document processing tool using Python and libraries like Pandas, NumPy,
spaCy, NLTK, and others to extract information from image, PDF, and HTML files.
Actively contributing to the implementation of YOLO for object detection, with a focus on optimizing its performance
for low latency using ONNX-TensorRT
Leading the design and development of a Named Entity Recognition (NER) system using the Large Language Model
GPT- NER, BERT, and RoBERTa, a character-based spell checker utilizing a seq-2-seq model using transformer
architecture, and fine-tuned BERT, RoBERTa, word embedding techniques (Word2Vec, GloVe, FastText) for semantic
similarity.
Collaborating with DevOps to establish a Continuous Integration and Continuous Deployment (CICD) pipeline for
machine learning model training and deployment.
Utilizing various AWS cloud services, including SageMaker, EC2 instances, and S3 bucket, for model tuning, training,
and storage.
Conducting data visualization and analysis using Tableau and integrating it with GCP's Big Query for enhanced data
processing.
Led the development and optimization of complex deep learning models using Keras to achieve state-of-the-art
performance on relevant business problems.
Working the improvement of Paddle OCR and Tesseract 4 OCR accuracy by creating a training dataset and fine-tuning
the model.
Collaborated with solution architects and software developers to integrate microservices into existing systems,
leading the team in extensive testing and debugging.