0% found this document useful (0 votes)
11 views14 pages

Bigdata Engineering Syllabus

The document outlines a comprehensive learning plan for programming languages, data structures, algorithms, databases, and cloud services, focusing on Python, SQL, and Big Data technologies. It includes time estimates for learning various topics, such as data processing, data warehousing, and orchestration with Airflow, as well as practical exercises on platforms like HackerRank and LeetCode. Each section specifies key concepts, tools, and techniques essential for mastering the respective subjects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Bigdata Engineering Syllabus

The document outlines a comprehensive learning plan for programming languages, data structures, algorithms, databases, and cloud services, focusing on Python, SQL, and Big Data technologies. It includes time estimates for learning various topics, such as data processing, data warehousing, and orchestration with Airflow, as well as practical exercises on platforms like HackerRank and LeetCode. Each section specifies key concepts, tools, and techniques essential for mastering the respective subjects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
01. Programming Language : a. Python xii. xill. xiv. xv. b. Scal Basic Syntax Variables Data Types Operators List Tuples Sets Dictionaries Conditional Statements (If..Else) Loops Try...Except Reading Files (CSV,JSON, TEXT, Excel) Writing Files Functions Working with Dates la c. Java Practice of hackerrank or leetcode with easy problems (10-15) ‘Time for learning - 2 Weeks 02. Data Structures & Algorithms (Basic): a. Time Complexity and Space Complexity (Big O notation) ze-eeang . Arrays . Linked List |. Stack Queue Tree Graph Searching Linear Search Binary Search Interpolation Search |. Sorting L D ii, wv. v. Practice of Selection Sort Insertion Sort Merge Sort Quick Sort Heap Sort with easy problems (10-12) geeksforgeeks Time for learning - 1-2 Months (Depending on previous experience) 03. Database Fundamentals ; Fre seEemgpangwp . DDL (CREATE, DROP, ALTER, TRUNCATE, RENAME) . DCL (GRANT and REVOKE) . DML (INSERT, UPDATE, DELETE) . TCL (COMMIT, ROLLBACK) ition (MAX, MIN, FIRST, AVG,COUNT, SUM) Integrity Constraints (Primary Key, Foreign Key) Data Schema . ACID Properties Views Stored Procedures . ER and Relational Diagrams Indexing m.Hashing n Normalization forms 04. SQL Seripting : a. Transactional Databases : MySQL, PostgreSQL b. Joins (Left, inner, Outer, Full, Right) I ¢. Sub Queries d. UNION Statement e. Date Function f. Nested Queries g. Group By h. Having i. CASE Statements J. Window Functions Practice of hackerrank or leetcode with easy problems (10-15) Time for learning - 3-4 Weeks (section 3 and 4) 05. BigData Fundamentals : a. BigData Basics and Characteristics? b. 5 V's of BigData ¢. Vertical vs Horizontal Scaling d. Scaling Up and Scaling Out e. ETL Pipelines f. File formats i. CSV ii, JSON li, AVRO iv. Parquet v. ORC g. Type of Data i. Structured ii. Unstructured lil, Semi-structured Time for learning - 1 Week (Only Theory) 06. Cluster Computing a. Hadoop Ecosystem i. HDFS ji, Mar-Reduce ii, Yarn I b. Apache Hive How to load data in different file formats Internal Tables External Tables Querying table data stored in HDFS Partitioning Bucketing Map-Side Join Sorted-Merge Join UDF in Hive SerDe in Hive eR BSsczBee 07. Apache Spark a. Spark Core b. Spark SQL ©. Spark Streaming 1 d. Difference Between Hadoop and Spark Time for learning - 3-4 Weeks (Hands-on and theory) 08. Data Processing a. Batch Processing b. Real-Time Processing ¢. Hybrid Processing Time for learning - 1-2 Weeks (Understand basic concept) I 09. Data Warehousing Fundamentals: a. OLAP vs OLTP b. Dimension Tables J¢- Data Cube d. Extract Transform Load (ETL) e. E-R Modeling VS Dimensional Modeling f. Fact Tables g. Star Schema h. Snowflake Schema i. Warehouse Designing Questions Time for learning - 1-2 Weeks (Theory) spe, Columns oasis hatin ainere and Slicing ; Is io ify Helfadilit pllith kx RESse2ee- ‘Time for learning - 1-2 Weeks (Theory and HandsOn) 11. Data Orchestration (AirFlow) : a. Intro to Airflow b. Implementing Airflow DAGs c. Maintaining and monitoring Airflow workflows d. Building production pipelines in Airflow Time for learning - 1-2 Weeks (Theory and HandsOn) a. Difference between NoSQL vs SQL b. Features of NoSQL c. Types of NoSQL database d. CAP Theorem e. Eventual Consistency f. Tools - i, Cassandra li, AWS OynamodB iv. MongoDB I ‘Time for learning - 2-3 Weeks (Theory and HandsOn) Learn MongoOB or Cassandra 15. Cloud Services (AWS) : ‘a. Ondemand Machines | AWS EC2 . Access Management LAWS IAM . Object Storage LAWS S3 4. Transactional Database Services |. AWS RDS 4. MySQL

You might also like