Internship Report
Internship Report
CELEBAL TECHNOLOGIES
I extend my heartfelt thanks to Jitasmita Banerjee, my mentor and guide, for providing me
with the opportunity to work on challenging projects and for his continuous support
throughout the internship. His insights and expertise in the SQL domain have been
instrumental in shaping my understanding and skills.
I am grateful to the Department of Mechanical Engineering at Indian Institute of Engineering
Science and Technology, Shibpur for allowing me to pursue an internship in the technology
domain. Special thanks to our head of the department, Dr. Sudip Ghosh for their visionary
support. This decision broadened my horizons, exposing me to the intersection of mechanical
engineering and technology, particularly in SQL. Thank you for your trust and
encouragement, shaping my career aspirations.
This internship has been a transformative experience, and I am grateful for the opportunities,
guidance, and support I have received. It has not only enriched my knowledge in the SQL
domain but has also provided me with a broader perspective on the interdisciplinary nature of
engineering.
CONTENTS
1) Abstract 01
2) Introduction 02
3) SQL Data types 03
4) SQL Commands 04
i. Data Definition Language 04
ii. Data Query Language 05
iii. Data Manipulation Language 09
iv. Data Control Language 10
v. Transaction Control Language 10
5) SQL Joins 10
i. Inner Join 11
ii. Outer Join 11
6) Project 13
7) Conclusion 28
ABSTRACT
The implementation phase involved populating the database with relevant data, and SQL
queries were crafted to analyze key aspects of the soccer tournament. This internship not only
honed my SQL proficiency but also provided practical insights into the challenges and
nuances of managing large-scale sports data.
The report concludes with a reflection on the overall learning experience, highlighting key
achievements, challenges overcome, and the impact of the project on my skill development as
a data engineer. The SQL soccer database analysis project at Celebal Technologies was an
enriching endeavor that not only contributed to my academic and professional growth but
also provided tangible value in the realm of sports data management.
1
INTRODUCTION
What is Database?
Database is a collection of interrelated data.
What is DBMS?
DBMS (Database Management System) is software used to create, manage, and organize
databases.
What is SQL?
SQL is Structured Query Language - used to store, manipulate and retrieve data from
RDBMS. (It is not a database, it is a language used to interact with database)
2
Numeric Data types:
SQL Commands:
1) DQL (Data Query Language) : Used to retrieve data from databases. (SELECT).
2) DDL (Data Definition Language) : Used to create, alter, and delete database objects
like tables, indexes, etc. (CREATE, DROP, ALTER, RENAME, TRUNCATE)
3) DML (Data Manipulation Language): Used to modify the database. (INSERT,
UPDATE, DELETE)
4) DCL (Data Control Language): Used to grant & revoke permissions. (GRANT,
REVOKE)
5) TCL (Transaction Control Language): Used to manage transactions. (COMMIT,
ROLLBACK, START TRANSACTIONS, SAVEPOINT)
3
Data Definition Language (DDL):
CREATE TABLE:
Used to create a new table in the database. Specifies the table name, column names,
data types, constraints, and more.
Example: CREATE TABLE employees (id INT PRIMARY KEY, name
VARCHAR(50), salary DECIMAL(10, 2));
ALTER TABLE:
Used to modify the structure of an existing table. You can add, modify, or drop
columns, constraints, and more.
Example: ALTER TABLE employees ADD COLUMN email VARCHAR(100);
DROP TABLE:
Used to delete an existing table along with its data and structure.
Example: DROP TABLE employees;
CREATE INDEX:
Used to create an index on one or more columns in a table. Improves query
performance by enabling faster data retrieval.
Example: CREATE INDEX idx_employee_name ON employees (name);
DROP INDEX:
Used to remove an existing index from a table.
Example: DROP INDEX idx_employee_name;
CREATE CONSTRAINT:
Used to define constraints that ensure data integrity. Constraints include PRIMARY
KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK.
4
Example: ALTER TABLE orders ADD CONSTRAINT fk_customer FOREIGN KEY
(customer_id) REFERENCES customers(id);
DROP CONSTRAINT:
Used to remove an existing constraint from a table.
Example: ALTER TABLE orders DROP CONSTRAINT fk_customer;
TRUNCATE TABLE:
Used to delete the data inside a table, but not the table itself.
Syntax – TRUNCATE TABLE table_name
SELECT:
The SELECT statement is used to select data from a database.
Here, column1, column2, ... are the field names of the table.
If you want to select all the fields available in the table, use the following syntax:
SELECT * FROM table_name;
WHERE:
The WHERE clause is used to filter records.
Syntax: SELECT column1, column2, ... FROM table_name WHERE condition;
5
Operators used in WHERE are:
= : Equal
> : Greater than
< : Less than
>= : Greater than or equal
<= : Less than or equal
!= : Not equal.
Syntax:
SELECT column1, column2, ... FROM table_name WHERE condition1 AND
condition2 AND condition3 ...;
SELECT column1, column2, ... FROM table_name WHERE condition1 OR
condition2 OR condition3 ...;
SELECT column1, column2, ... FROM table_name WHERE NOT condition;
Example:
SELECT * FROM Customers WHERE Country=’India’ AND City=’Japan’;
SELECT * FROM Customers WHERE Country=’America’ AND (City=’India’ OR
City=’Korea’);
DISTINCT:
Removes duplicate rows from query results.
Syntax: SELECT DISTINCT column1, column2 FROM table_name;
LIKE:
The LIKE operator is used in a WHERE clause to search for a specified pattern in a
column. There are two wildcards often used in conjunction with the LIKE operator:
o The percent sign (%) represents zero, one, or multiple characters
o The underscore sign (_) represents one, single character
IN:
Filters results based on a list of values in the WHERE clause.
Example: SELECT * FROM products WHERE category_id IN (1, 2, 3);
6
BETWEEN:
Filters results within a specified range in the WHERE clause.
Example: SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01'
AND '2023-06-30';
IS NULL:
Filters results within a specified range in the WHERE clause.
Example: SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01'
AND '2023-06-30';
AS:
Renames columns or expressions in query results.
Example: SELECT first_name AS "First Name", last_name AS "Last Name" FROM
employees;
ORDER BY:
The ORDER BY clause allows you to sort the result set of a query based on one or
more columns.
Basic Syntax:
The ORDER BY clause is used after the SELECT statement to sort query results.
Sorting by Expressions:
It's possible to sort by calculated expressions, not just column values.
Example: SELECT product_name, price, price * 1.1 AS discounted_price FROM
products ORDER BY discounted_price;
7
Sorting NULL Values:
By default, NULL values are considered the smallest in ascending order and the
largest in descending order. You can control the sorting behaviour of NULL values
using the NULLS FIRST or NULLS LAST options.
Example: SELECT column_name FROM table_name ORDER BY column_name
NULLS LAST;
Sorting by Position:
Instead of specifying column names, you can sort by column positions in the ORDER
BY clause.
Example: SELECT product_name, price FROM products ORDER BY 2 DESC, 1
ASC;
GROUP BY:
The GROUP BY clause in SQL is used to group rows from a table based on one or
more columns.
Syntax:
The GROUP BY clause follows the SELECT statement and is used to group rows
based on specified columns.
Aggregation Functions:
Aggregation functions (e.g., COUNT, SUM, AVG, MAX, MIN) are often used with
GROUP BY to calculate values for each group.
Example: SELECT department, AVG(salary) FROM employees GROUP BY
department;
HAVING Clause:
The HAVING clause is used with GROUP BY to filter groups based on aggregate
function results. It's similar to the WHERE clause but operates on grouped data.
Example: SELECT department, AVG(salary) FROM employees GROUP BY
department HAVING AVG(salary) > 50000;
8
Combining GROUP BY and ORDER BY:
You can use both GROUP BY and ORDER BY in the same query to control the order
of grouped results.
Example: SELECT department, COUNT(*) FROM employees GROUP BY
department ORDER BY COUNT(*) DESC;
AGGREGATE FUNCTIONS:
These are used to perform calculations on groups of rows or entire result sets. They
provide insights into data by summarising and processing information.
INSERT:
The INSERT statement adds new records to a table.
Syntax: INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2,
...); Example: INSERT INTO employees (first_name, last_name, salary) VALUES
('John', 'Doe', 50000)
UPDATE:
The UPDATE statement modifies existing records in a table.
Syntax: UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE
condition; Example: UPDATE employees SET salary = 55000 WHERE first_name =
'John';
DELETE:
The DELETE statement removes records from a table.
Syntax: DELETE FROM table_name WHERE condition;
9
Example: DELETE FROM employees WHERE last_name = 'Doe';
There are two main DCL commands in SQL: GRANT and REVOKE.
SQL JOINS
In a DBMS, a join is an operation that combines rows from two or more tables based on a
related column between them. Joins are used to retrieve data from multiple tables by linking
them together using a common key or column.
Types of Joins:
1. Inner Join
2. Outer Join
3. Cross Join
4. Self Join
10
INNER JOIN
An inner join combines data from two or more tables based on a specified condition, known
as the join condition. The result of an inner join includes only the rows where the join
condition is met in all participating tables. It essentially filters out non-matching rows and
returns only the rows that have matching values in both tables.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
Here:
● columns refers to the specific columns you want to retrieve from the tables.
● table1 and table2 are the names of the tables you are joining.
● column is the common column used to match rows between the tables.
● The ON clause specifies the join condition, where you define how the tables are related.
OUTER JOIN
Outer joins combine data from two or more tables based on a specified condition, just like
inner joins. However, unlike inner joins, outer joins also include rows that do not have
matching values in both tables. Outer joins are particularly useful when you want to include
data from one table even if there is no corresponding match in the other table.
There are three types of outer joins: left outer join, right outer join, and full outer join.
A left outer join returns all the rows from the left table and the matching rows from the right
table. If there is no match in the right table, the result will still include the left table's row
with NULL values in the right table's columns.
Example:
11
FROM Customers
Example:
SELECT Customers.CustomerName, Orders.Product
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
12
PROJECT
Database:
The sample database is designed to capture and manage data related to a soccer tournament,
specifically based on the EURO CUP 2023. The structure of the database has been carefully
crafted to facilitate efficient storage and retrieval of information pertinent to a soccer
tournament. This design aims to address various queries and questions that may arise when
analysing or managing data related to the tournament.
Key components of the database likely include tables for teams, players, matches, venues,
and possibly more, each with specific fields to store relevant information. For example, the
teams table may include details about participating teams, such as team name, country, and
coach. The players table could store information about individual players, including their
name, position, and statistics.
The matches table might contain data about each game in the tournament, including match
date, teams playing, scores, and other relevant match-specific details. Additionally, a venues
table could be incorporated to store information about the locations where the matches are
held, such as stadium name, city, and capacity.
This database design is structured to provide a comprehensive and organized representation
of the EURO CUP 2023, making it easier for users to formulate and execute queries related to
team performance, player statistics, match outcomes, and other aspects of the tournament.
The thoughtful design ensures that users can gain valuable insights and answer various
questions about the soccer tournament efficiently.
13
Tables Description:
soccer_country:
soccer_city:
soccer_venue:
14
aud_capacity — this is the capacity of audience for each venue
soccer_team:
team_id — this is the ID for each team. Each teams are representing to a country
which are referencing the country_id column of soccer_country table
team_group — the name of the group in which the team belongs
match_played — how many matches a team played in group stage
won — how many matches a team won
draw — how many matches a team draws
lost — how many matches a team lose
goal_for — how many goals a team conceded
goal_agnst — how many goals a team scored
goal_diff — the difference of goal scored and goal conceded
points — how many points a team achieved from their group stage matches
group_position — in which position a team finished their group stage matches
playing_position:
player_mast:
referee_mast:
match_mast:
15
match_no — this if the unique ID for a match
play_stage — this indicates that in which stage a match is going on, i.e. G for Group
stage, R for Round of 16 stage, Q for Quarter final stage, S for Semi Final stage, and F
for Final
play_date — date of the match played
results — the result of the match, either win or draw
decided_by — how the result of the match has been decided, either N for by normally
or P for by penalty shootout
goal_score — score for a match
venue_id — the venue where the match played and the venue will be one of the venue
referencing the venue_id column of soccer_venue table
referee_id — ID of the referee who is selected for the match which referencing the
referee_id column of referee_mast table
audence — number of audience appears to watch the match
plr_of_match — this is the player who awarded the player of a particular match and
who is selected a 23 men playing squad for a team which referencing the player_id
column of player_mast table
stop1_sec — how many stoppage time ( in second) have been added for the 1st half of
play
stop2_sec — how many stoppage time ( in second) have been added for the 2nd half of
play
coach_mast:
asst_referee_mast:
ass_ref_id — this is the unique ID for each referee assists the main referee
ass_ref_name — name of the assistant referee
country_id — the country where an assistant referee belongs and the countries are
those which are referencing the country_id column of soccer_country table
match_details:
16
decided_by — how the result achieved by the team, indicated N for normal score or P
for penalty shootout
goal_score — how many goal scored by the team
penalty_score — how many goal scored by the team in penalty shootout
ass_ref — the assistant referee assist the referee which are referencing the ass_ref_id
column of asst_referee_mast table
player_gk — the player who is keeping the goal for the team, is referencing the
player_id column of player_mast table
goal_details:
penalty_shootout:
player_booked:
17
team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
player_id — this is the ID of a player who is selected for the 23 men squad of a team
for the tournament and which is referencing the player_id column of player_mast table
booking_time — this is the time when a player booked
sent_off — this is the flag Y when a player sent off
play_schedule — when a player booked, is it in normal play session indicated by NT
or in stoppage time indicated by ST or in extra time indicated by ET
play_half — in which half a player booked
player_in_out:
match_captain:
team_coaches:
team_id — this is the ID of a team who is playing in the tournament and referencing
the country_id column of soccer_country table
coach_id — a team may be one or more coaches, this indicates the coach(s) who is/are
coaching the team is referencing the coach_id column of coach_mast table
penalty_gk:
18
match_no — this is the match_no which is referencing the match_no column of
match_mast table
team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
player_gk — the player who kept goal at the time of penalty shootout, is referencing
the player_id column of player_mast table
create table soccer_city(city_id numeric primary key, city varchar(25), country_id numeric);
create table player_mast(player_id numeric primary key, team_id numeric, jersy_no numeric,
player_name varchar,
19
create table match_mast(match_no numeric primary key, player_stage character(1), play_date
date, results character(5),
play_half numeric );
20
create table match_details(match_no numeric foreign key references match_mast(match_no),
play_stage varchar(1),
create table penalty_shootout(kick_id numeric primary key, match_no numeric foreign key
references match_mast(match_no),
team_id numeric foreign key references soccer_team(team_id), player_id numeric foreign key
references player_mast(player_id),
create table goal_details(goal_id numeric primary key, match_no numeric foreign key
references match_mast(match_no),
21
create table asst_referee_mast(ass_ref_id numeric, ass_ref_name varchar(40),
Analysis:
1) soccer_venue : Return the total count of venues for the EURO CUP 2030
FROM soccer_venue;
2) goal_details : Write a query to find the number of goals scored within normal play during
the EURO cup 2030
FROM goal_details
22
3) match_mast : write a SQL query to find the number of matches that ended with a result.
SELECT COUNT(*)
FROM match_mast
23
4) match_mast : write a SQL query to find the number of matches that ended in draws.
FROM match_details
WHERE win_lose='D';
5) match_mast : write a SQL query to find out when the Football EURO cup 2030 will end.
FROM match_mast;
24
6) goal_details : write a SQL query to find the number of self-goals scored during the 2016
European Championship.
FROM goal_details
WHERE goal_type='O';
25
7) penalty_shootout : write a SQL query to find the number of matches that resulted in a
penalty shootout.
FROM penalty_shootout;
8) goal_details : write a SQL query to find the number of goals scored in every match in
extra time. Sort the result-set on match number. Return match number, number of goals in
extra time.
GROUP BY match_no
ORDER BY match_no;
26
9) goal_details : write a SQL query to find the matches in which no stoppage time was
added during the first half of play. Return match no, date of play, and goal scored.
FROM match_mast
27
Conclusion:
In conclusion, my internship at Celebal Technologies was a transformative experience in data
engineering, specifically focused on SQL soccer database analysis for the EURO CUP. The
project emphasized the crucial role of a well-designed database in extracting meaningful
insights. This experience deepened my SQL skills and provided practical insights into sports
data management. Successfully meeting project objectives showcased the real-world
applications of data engineering.
Beyond technical skills, the internship enhanced communication and problem-solving
abilities. The mentorship received at Celebal Technologies has been instrumental in my
professional growth. The acquired knowledge serves as a robust foundation for future
endeavors in the dynamic field of data engineering and analytics. I am eager to apply these
skills to contribute meaningfully at the intersection of technology and sports data analysis.
28