0% found this document useful (0 votes)
1K views31 pages

Internship Report

The document is a project report submitted by Arnav Anand, detailing his internship experience at Celebal Technologies from May 27th to July 25th 2023. The internship involved designing an SQL database to store and analyze data from soccer tournaments like the EURO CUP. Key aspects of the database included tables for teams, players, matches, venues, and referees with relationships established through foreign keys. The implementation phase involved populating the database and using SQL queries to extract insights from the data. The report reflects on the skills and knowledge gained regarding SQL and managing large sports datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views31 pages

Internship Report

The document is a project report submitted by Arnav Anand, detailing his internship experience at Celebal Technologies from May 27th to July 25th 2023. The internship involved designing an SQL database to store and analyze data from soccer tournaments like the EURO CUP. Key aspects of the database included tables for teams, players, matches, venues, and referees with relationships established through foreign keys. The implementation phase involved populating the database and using SQL queries to extract insights from the data. The report reflects on the skills and knowledge gained regarding SQL and managing large sports datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

PROJECT REPORT

CELEBAL TECHNOLOGIES

Project-Based Summer Internship


Submitted by
Arnav Anand

Enrollment ID: 2020MEB103

Under the guidance of


Jitasmita Banerjee

Period of internship: 27.05.2023 – 25.07.2023

Department of Mechanical Engineering

Indian Institute of Engineering Science and Technology, Shibpur


ACKNOWLEGEMENT
I would like to express my sincere gratitude to Celebal Technologies, Jaipur, where I had the
privilege to undertake my internship as a part of the curriculum for my Bachelor's in
Mechanical Engineering. This invaluable experience has provided me with exposure to the
dynamic world of SQL and its applications in real-world scenarios.

I extend my heartfelt thanks to Jitasmita Banerjee, my mentor and guide, for providing me
with the opportunity to work on challenging projects and for his continuous support
throughout the internship. His insights and expertise in the SQL domain have been
instrumental in shaping my understanding and skills.
I am grateful to the Department of Mechanical Engineering at Indian Institute of Engineering
Science and Technology, Shibpur for allowing me to pursue an internship in the technology
domain. Special thanks to our head of the department, Dr. Sudip Ghosh for their visionary
support. This decision broadened my horizons, exposing me to the intersection of mechanical
engineering and technology, particularly in SQL. Thank you for your trust and
encouragement, shaping my career aspirations.

This internship has been a transformative experience, and I am grateful for the opportunities,
guidance, and support I have received. It has not only enriched my knowledge in the SQL
domain but has also provided me with a broader perspective on the interdisciplinary nature of
engineering.
CONTENTS
1) Abstract 01
2) Introduction 02
3) SQL Data types 03
4) SQL Commands 04
i. Data Definition Language 04
ii. Data Query Language 05
iii. Data Manipulation Language 09
iv. Data Control Language 10
v. Transaction Control Language 10
5) SQL Joins 10
i. Inner Join 11
ii. Outer Join 11
6) Project 13
7) Conclusion 28
ABSTRACT

This internship report encapsulates my immersive experience as a Data Engineer Intern at


Celebal Technologies, where I undertook a project focused on SQL database analysis for
soccer tournament data. The primary objective was to design and implement a robust
database structure tailored for the EURO CUP, facilitating efficient data storage and retrieval.
The report begins with an introduction to the project's scope, outlining the importance of a
well-structured database in the context of soccer tournaments. It delves into the
methodologies employed in the database design process, emphasizing the utilization of SQL
for creating tables to store information about teams, players, matches, venues, and referees.
Throughout the internship, significant attention was given to ensuring data integrity and
establishing relationships between tables through foreign keys. The relational database model
enabled seamless querying, empowering stakeholders to extract valuable insights into team
performances, player statistics, and match outcomes.

The implementation phase involved populating the database with relevant data, and SQL
queries were crafted to analyze key aspects of the soccer tournament. This internship not only
honed my SQL proficiency but also provided practical insights into the challenges and
nuances of managing large-scale sports data.
The report concludes with a reflection on the overall learning experience, highlighting key
achievements, challenges overcome, and the impact of the project on my skill development as
a data engineer. The SQL soccer database analysis project at Celebal Technologies was an
enriching endeavor that not only contributed to my academic and professional growth but
also provided tangible value in the realm of sports data management.

1
INTRODUCTION

What is Database?
Database is a collection of interrelated data.

What is DBMS?
DBMS (Database Management System) is software used to create, manage, and organize
databases.

What is SQL?
SQL is Structured Query Language - used to store, manipulate and retrieve data from
RDBMS. (It is not a database, it is a language used to interact with database)

We use SQL for CRUD Operations :


● CREATE - To create databases, tables, insert tuples in tables etc
● READ - To read data present in the database.
● UPDATE - Modify already inserted data.
● DELETE - Delete database, table or specific data point/tuple/row or multiple rows.

Note - SQL keywords are NOT case sensitive.


Eg: select is the same as SELECT in SQL

SQL Data Types


In SQL, data types define the kind of data that can be stored in a column or variable.

String Data types:

Data type Description Storage

char(n) Fixed width Defined width


character string

varchar(n) Variable width 2 bytes +


character string number of
chars

2
Numeric Data types:

Data type Description Storage

bit Integer that can be 0, 1, or NULL

tinyint Allows whole numbers from 0 to 255 1 byte

smallint Allows whole numbers between -32,768 and 2 bytes


32,767

int Allows whole numbers between -2,147,483,648 4 bytes


and 2,147,483,647

bigint Allows whole numbers between - 8 bytes


9,223,372,036,854,775,808 and
9,223,372,036,854,775,807

SQL Commands:
1) DQL (Data Query Language) : Used to retrieve data from databases. (SELECT).
2) DDL (Data Definition Language) : Used to create, alter, and delete database objects
like tables, indexes, etc. (CREATE, DROP, ALTER, RENAME, TRUNCATE)
3) DML (Data Manipulation Language): Used to modify the database. (INSERT,
UPDATE, DELETE)
4) DCL (Data Control Language): Used to grant & revoke permissions. (GRANT,
REVOKE)
5) TCL (Transaction Control Language): Used to manage transactions. (COMMIT,
ROLLBACK, START TRANSACTIONS, SAVEPOINT)

3
Data Definition Language (DDL):

Data Definition Language (DDL) is a subset of SQL (Structured Query Language)


responsible for defining and managing the structure of databases and their objects. DDL
commands enable you to create, modify, and delete database objects like tables, indexes,
constraints, and more.

Key DDL commands are:

 CREATE TABLE:
Used to create a new table in the database. Specifies the table name, column names,
data types, constraints, and more.
Example: CREATE TABLE employees (id INT PRIMARY KEY, name
VARCHAR(50), salary DECIMAL(10, 2));

 ALTER TABLE:
Used to modify the structure of an existing table. You can add, modify, or drop
columns, constraints, and more.
Example: ALTER TABLE employees ADD COLUMN email VARCHAR(100);

 DROP TABLE:
Used to delete an existing table along with its data and structure.
Example: DROP TABLE employees;

 CREATE INDEX:
Used to create an index on one or more columns in a table. Improves query
performance by enabling faster data retrieval.
Example: CREATE INDEX idx_employee_name ON employees (name);

 DROP INDEX:
Used to remove an existing index from a table.
Example: DROP INDEX idx_employee_name;

 CREATE CONSTRAINT:
Used to define constraints that ensure data integrity. Constraints include PRIMARY
KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK.

4
Example: ALTER TABLE orders ADD CONSTRAINT fk_customer FOREIGN KEY
(customer_id) REFERENCES customers(id);

 DROP CONSTRAINT:
Used to remove an existing constraint from a table.
Example: ALTER TABLE orders DROP CONSTRAINT fk_customer;

 TRUNCATE TABLE:
Used to delete the data inside a table, but not the table itself.
Syntax – TRUNCATE TABLE table_name

DATA QUERY/RETRIEVAL LANGUAGE (DQL or DRL):


DQL (Data Query Language) is a subset of SQL focused on retrieving data from databases.
The SELECT statement is the foundation of DQL and allows us to extract specific columns
from a table.

 SELECT:
The SELECT statement is used to select data from a database.

Syntax: SELECT column1, column2, ... FROM table_name;

Here, column1, column2, ... are the field names of the table.

If you want to select all the fields available in the table, use the following syntax:
SELECT * FROM table_name;

Ex: SELECT CustomerName, City FROM Customers;

 WHERE:
The WHERE clause is used to filter records.
Syntax: SELECT column1, column2, ... FROM table_name WHERE condition;

Ex: SELECT * FROM Customers WHERE Country='Mexico';

5
Operators used in WHERE are:

= : Equal
> : Greater than
< : Less than
>= : Greater than or equal
<= : Less than or equal
!= : Not equal.

 AND, OR and NOT:


The WHERE clause can be combined with AND, OR, and NOT operators. The AND
and OR operators are used to filter records based on more than one condition: The
AND operator displays a record if all the conditions separated by AND are TRUE.
The OR operator displays a record if any of the conditions separated by OR is TRUE.
The NOT operator displays a record if the condition(s) is NOT TRUE.

Syntax:
SELECT column1, column2, ... FROM table_name WHERE condition1 AND
condition2 AND condition3 ...;
SELECT column1, column2, ... FROM table_name WHERE condition1 OR
condition2 OR condition3 ...;
SELECT column1, column2, ... FROM table_name WHERE NOT condition;
Example:
SELECT * FROM Customers WHERE Country=’India’ AND City=’Japan’;
SELECT * FROM Customers WHERE Country=’America’ AND (City=’India’ OR
City=’Korea’);

 DISTINCT:
Removes duplicate rows from query results.
Syntax: SELECT DISTINCT column1, column2 FROM table_name;

 LIKE:
The LIKE operator is used in a WHERE clause to search for a specified pattern in a
column. There are two wildcards often used in conjunction with the LIKE operator:
o The percent sign (%) represents zero, one, or multiple characters
o The underscore sign (_) represents one, single character

 IN:
Filters results based on a list of values in the WHERE clause.
Example: SELECT * FROM products WHERE category_id IN (1, 2, 3);

6
 BETWEEN:
Filters results within a specified range in the WHERE clause.
Example: SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01'
AND '2023-06-30';

 IS NULL:
Filters results within a specified range in the WHERE clause.
Example: SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01'
AND '2023-06-30';

 AS:
Renames columns or expressions in query results.
Example: SELECT first_name AS "First Name", last_name AS "Last Name" FROM
employees;

 ORDER BY:
The ORDER BY clause allows you to sort the result set of a query based on one or
more columns.

Basic Syntax:
The ORDER BY clause is used after the SELECT statement to sort query results.

Syntax: SELECT column1, column2 FROM table_name ORDER BY column1


[ASC|DESC];

Ascending and Descending Order:


By default, the ORDER BY clause sorts in ascending order (smallest to largest). You
can explicitly specify descending order using the DESC keyword.
Example: SELECT product_name, price FROM products ORDER BY price DESC;

Sorting by Multiple Columns:


You can sort by multiple columns by listing them sequentially in the ORDER BY
clause.
Rows are first sorted based on the first column, and for rows with equal values,
subsequent columns are used for further sorting.
Example: SELECT first_name, last_name FROM employees ORDER BY last_name,
first_name;

Sorting by Expressions:
It's possible to sort by calculated expressions, not just column values.
Example: SELECT product_name, price, price * 1.1 AS discounted_price FROM
products ORDER BY discounted_price;

7
Sorting NULL Values:
By default, NULL values are considered the smallest in ascending order and the
largest in descending order. You can control the sorting behaviour of NULL values
using the NULLS FIRST or NULLS LAST options.
Example: SELECT column_name FROM table_name ORDER BY column_name
NULLS LAST;

Sorting by Position:
Instead of specifying column names, you can sort by column positions in the ORDER
BY clause.
Example: SELECT product_name, price FROM products ORDER BY 2 DESC, 1
ASC;

 GROUP BY:
The GROUP BY clause in SQL is used to group rows from a table based on one or
more columns.

Syntax:
The GROUP BY clause follows the SELECT statement and is used to group rows
based on specified columns.

Syntax: SELECT column1, aggregate_function(column2) FROM table_name


GROUP BY column1;

Aggregation Functions:
Aggregation functions (e.g., COUNT, SUM, AVG, MAX, MIN) are often used with
GROUP BY to calculate values for each group.
Example: SELECT department, AVG(salary) FROM employees GROUP BY
department;

Grouping by Multiple Columns:


You can group by multiple columns by listing them in the GROUP BY clause. This
creates a hierarchical grouping based on the specified columns.
Example: SELECT department, gender, AVG(salary) FROM employees GROUP BY
department, gender;

HAVING Clause:
The HAVING clause is used with GROUP BY to filter groups based on aggregate
function results. It's similar to the WHERE clause but operates on grouped data.
Example: SELECT department, AVG(salary) FROM employees GROUP BY
department HAVING AVG(salary) > 50000;

8
Combining GROUP BY and ORDER BY:
You can use both GROUP BY and ORDER BY in the same query to control the order
of grouped results.
Example: SELECT department, COUNT(*) FROM employees GROUP BY
department ORDER BY COUNT(*) DESC;

 AGGREGATE FUNCTIONS:
These are used to perform calculations on groups of rows or entire result sets. They
provide insights into data by summarising and processing information.

Common Aggregate Functions:

COUNT(): Counts the number of rows in a group or result set.


SUM(): Calculates the sum of numeric values in a group or result set.
AVG(): Computes the average of numeric values in a group or result set.
MAX(): Finds the maximum value in a group or result set.
MIN(): Retrieves the minimum value in a group or result set.

DATA MANIPULATION LANGUAGE:


Data Manipulation Language (DML) in SQL encompasses commands that manipulate data
within a database. DML allows you to insert, update, and delete records, ensuring the
accuracy and currency of your data.

 INSERT:
The INSERT statement adds new records to a table.
Syntax: INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2,
...); Example: INSERT INTO employees (first_name, last_name, salary) VALUES
('John', 'Doe', 50000)

 UPDATE:
The UPDATE statement modifies existing records in a table.
Syntax: UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE
condition; Example: UPDATE employees SET salary = 55000 WHERE first_name =
'John';

 DELETE:
The DELETE statement removes records from a table.
Syntax: DELETE FROM table_name WHERE condition;

9
Example: DELETE FROM employees WHERE last_name = 'Doe';

Data Control Language (DCL):


Data Control Language focuses on the management of access rights, permissions, and
security-related aspects of a database system. DCL commands are used to control who can
access the data, modify the data, or perform administrative tasks within a database. DCL is an
important aspect of database security, ensuring that data remains protected and only
authorised users have the necessary privileges.

There are two main DCL commands in SQL: GRANT and REVOKE.

Transaction Control Language (TCL):


Transaction Control Language (TCL) deals with the management of transactions within a
database. TCL commands are used to control the initiation, execution, and termination of
transactions, which are sequences of one or more SQL statements that are executed as a
single unit of work. Transactions ensure data consistency, integrity, and reliability in a
database by grouping related operations together and either committing or rolling back
changes based on the success or failure of those operations.
There are three main TCL commands in SQL: COMMIT, ROLLBACK, and SAVEPOINT.

SQL JOINS
In a DBMS, a join is an operation that combines rows from two or more tables based on a
related column between them. Joins are used to retrieve data from multiple tables by linking
them together using a common key or column.
Types of Joins:
1. Inner Join
2. Outer Join
3. Cross Join
4. Self Join

10
INNER JOIN
An inner join combines data from two or more tables based on a specified condition, known
as the join condition. The result of an inner join includes only the rows where the join
condition is met in all participating tables. It essentially filters out non-matching rows and
returns only the rows that have matching values in both tables.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
Here:
● columns refers to the specific columns you want to retrieve from the tables.

● table1 and table2 are the names of the tables you are joining.

● column is the common column used to match rows between the tables.

● The ON clause specifies the join condition, where you define how the tables are related.

OUTER JOIN
Outer joins combine data from two or more tables based on a specified condition, just like
inner joins. However, unlike inner joins, outer joins also include rows that do not have
matching values in both tables. Outer joins are particularly useful when you want to include
data from one table even if there is no corresponding match in the other table.
There are three types of outer joins: left outer join, right outer join, and full outer join.

LEFT OUTER JOIN:

A left outer join returns all the rows from the left table and the matching rows from the right
table. If there is no match in the right table, the result will still include the left table's row
with NULL values in the right table's columns.

Example:

SELECT Customers.CustomerName, Orders.Product

11
FROM Customers

LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

RIGHT OUTER JOIN:


A right outer join is similar to a left outer join, but it returns all rows from the right table and
the matching rows from the left table. If there is no match in the left table, the result will still
include the right table's row with NULL values in the left table's columns.
Example:
SELECT Customers.CustomerName, Orders.Product
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

FULL OUTER JOIN:


A full outer join returns all rows from both the left and right tables, including matches and
non-matches. If there's no match, NULL values appear in columns from the table where
there's no corresponding value.

Example:
SELECT Customers.CustomerName, Orders.Product
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

12
PROJECT

Database:
The sample database is designed to capture and manage data related to a soccer tournament,
specifically based on the EURO CUP 2023. The structure of the database has been carefully
crafted to facilitate efficient storage and retrieval of information pertinent to a soccer
tournament. This design aims to address various queries and questions that may arise when
analysing or managing data related to the tournament.
Key components of the database likely include tables for teams, players, matches, venues,
and possibly more, each with specific fields to store relevant information. For example, the
teams table may include details about participating teams, such as team name, country, and
coach. The players table could store information about individual players, including their
name, position, and statistics.

The matches table might contain data about each game in the tournament, including match
date, teams playing, scores, and other relevant match-specific details. Additionally, a venues
table could be incorporated to store information about the locations where the matches are
held, such as stadium name, city, and capacity.
This database design is structured to provide a comprehensive and organized representation
of the EURO CUP 2023, making it easier for users to formulate and execute queries related to
team performance, player statistics, match outcomes, and other aspects of the tournament.
The thoughtful design ensures that users can gain valuable insights and answer various
questions about the soccer tournament efficiently.

13
Tables Description:
soccer_country:

 country_id — this is a unique ID for each country


 country_abbr — this is the sort name of each country
 country_name — this is the name of each country

soccer_city:

 city_id — this is a unique ID for each city


 city — this is the name of the city
 country_id — this is the ID of the country where the cities are located and only those
countries will be available which are in soccer_country table

soccer_venue:

 venue_id — this is a unique ID for each venue


 venue_name — this is the name of the venue
 city_id — this is the ID of the city where the venue is located and only those cities will
be available which are in the soccer_city table

14
 aud_capacity — this is the capacity of audience for each venue

soccer_team:

 team_id — this is the ID for each team. Each teams are representing to a country
which are referencing the country_id column of soccer_country table
 team_group — the name of the group in which the team belongs
 match_played — how many matches a team played in group stage
 won — how many matches a team won
 draw — how many matches a team draws
 lost — how many matches a team lose
 goal_for — how many goals a team conceded
 goal_agnst — how many goals a team scored
 goal_diff — the difference of goal scored and goal conceded
 points — how many points a team achieved from their group stage matches
 group_position — in which position a team finished their group stage matches

playing_position:

 position_id — this is a unique ID for each position where a player played


 position_desc — this is the name of the position where a player played

player_mast:

 player_id — this is a unique ID for each player


 team_id — this is the team where a player played, and only those teams which
referencing the country_id column of the table soccer_country
 jersey_no — the number which labeled on the jersey for each player
 player_name — name of the player
 posi_to_play — the position where a player played, and the positions are referencing
the position_id column of playing_position table
 dt_of_bir — date of birth of each player
 age — approximate age at the time of playing the tournament
 playing_club — the name of the club for which a player was playing at the time of the
tournament

referee_mast:

 referee_id — this is the unique ID for each referee


 referee_name — name of the referee
 country_id — the country, where a referee belongs and the countries are those which
referencing the country_id column of soccer_country table

match_mast:

15
 match_no — this if the unique ID for a match
 play_stage — this indicates that in which stage a match is going on, i.e. G for Group
stage, R for Round of 16 stage, Q for Quarter final stage, S for Semi Final stage, and F
for Final
 play_date — date of the match played
 results — the result of the match, either win or draw
 decided_by — how the result of the match has been decided, either N for by normally
or P for by penalty shootout
 goal_score — score for a match
 venue_id — the venue where the match played and the venue will be one of the venue
referencing the venue_id column of soccer_venue table
 referee_id — ID of the referee who is selected for the match which referencing the
referee_id column of referee_mast table
 audence — number of audience appears to watch the match
 plr_of_match — this is the player who awarded the player of a particular match and
who is selected a 23 men playing squad for a team which referencing the player_id
column of player_mast table
 stop1_sec — how many stoppage time ( in second) have been added for the 1st half of
play
 stop2_sec — how many stoppage time ( in second) have been added for the 2nd half of
play

coach_mast:

 coach_id — this is the unique ID for a coach


 coach_name — this is the name of the coach

asst_referee_mast:

 ass_ref_id — this is the unique ID for each referee assists the main referee
 ass_ref_name — name of the assistant referee
 country_id — the country where an assistant referee belongs and the countries are
those which are referencing the country_id column of soccer_country table

match_details:

 match_no — number of the match which is referencing the match_no column of


match_mast table
 play_stage — stage of the match, i.e. G for group stage, R for Round of 16, Q for
Quarter Final, S for Semi final and F for final
 team_id — the team which is one of the playing team and it is referencing the
country_id column of soccer_country table
 win_lose — team either win or lose or drawn indicated by the character W, L, or D

16
 decided_by — how the result achieved by the team, indicated N for normal score or P
for penalty shootout
 goal_score — how many goal scored by the team
 penalty_score — how many goal scored by the team in penalty shootout
 ass_ref — the assistant referee assist the referee which are referencing the ass_ref_id
column of asst_referee_mast table
 player_gk — the player who is keeping the goal for the team, is referencing the
player_id column of player_mast table

goal_details:

 goal_id — this is the unique ID for each goal


 match_no — this is match_no which is referencing the match_no column of
match_mast table
 player_id — this is the ID of a player who is selected for the 23 men squad of a team
for the tournament and which is referencing the player_id column of player_mast table
 team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
 goal_time — this is the time when the goal scored
 goal_type — this is the type of goal which came in normally indicated by N or own
goal indicating by O and goal came from penalty indicated by P
 play_stage — this is the play stage in which goal scored, indicated by G for group
stage, R for round of 16 stage, Q for quarter final stage, S for semifinal stage and F for
final match
 goal_schedule — when the goal came, is it normal play session indicated by NT or in
stoppage time indicated by ST or in extra time indicated by ET
 goal_half — in which half of match goal came

penalty_shootout:

 kick_id — this is unique ID for each penalty kick


 match_no — this is the match_no which is referencing the match_no column of
match_mast table
 team_id — this is the ID of each team who is playing in the tournament and
referencing the country_id column of soccer_country table
 player_id — this is the ID of a player who is selected for the 23 men squad of a team
for the tournament and which is referencing the player_id column of player_mast table
 score_goal — this is the flag Y if able to score the goal or N when not
 kick_no — this is the kick number for the kick of an individual match

player_booked:

 match_no — this is the match_no which is referencing the match_no column of


match_mast table

17
 team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
 player_id — this is the ID of a player who is selected for the 23 men squad of a team
for the tournament and which is referencing the player_id column of player_mast table
 booking_time — this is the time when a player booked
 sent_off — this is the flag Y when a player sent off
 play_schedule — when a player booked, is it in normal play session indicated by NT
or in stoppage time indicated by ST or in extra time indicated by ET
 play_half — in which half a player booked

player_in_out:

 match_no — this is the match_no which is referencing the match_no column of


match_mast table
 team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
 player_id — this is the ID of a player who is selected for the 23 men squad of a team
for the tournament and which is referencing the player_id column of player_mast table
 in_out — this is the flag I when a player came into the field or O when go out from the
field
 time_in_out — when a player come into the field or go out from the field
 play_schedule — when a player come in or go out of the field, is it in normal play
session indicated by NT or in stoppage time indicated by ST or in extra time indicated
by ET
 play_half — in which half a player come in or go out

match_captain:

 match_no — this is the match_no which is referencing the match_no column of


match_mast table
 team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
 player_captain — the player who represents as a captain for a team, is referencing the
player_id column of player_mast table

team_coaches:

 team_id — this is the ID of a team who is playing in the tournament and referencing
the country_id column of soccer_country table
 coach_id — a team may be one or more coaches, this indicates the coach(s) who is/are
coaching the team is referencing the coach_id column of coach_mast table

penalty_gk:

18
 match_no — this is the match_no which is referencing the match_no column of
match_mast table
 team_id — this is the ID of each team who are playing in the tournament and
referencing the country_id column of soccer_country table
 player_gk — the player who kept goal at the time of penalty shootout, is referencing
the player_id column of player_mast table

Queries for creating tables:


create database euro_cup_2030;

create table playing_position(position_id varchar(2) primary key,position_desc varchar(15));

create table soccer_city(city_id numeric primary key, city varchar(25), country_id numeric);

create table player_mast(player_id numeric primary key, team_id numeric, jersy_no numeric,
player_name varchar,

posi_to_play varchar(2) , dt_of_br date,

age numeric, playing_club varchAR(2));

create table soccer_country(country_id numeric primary key, country_abbr varchar(4),


country_name varchar(40));

create table soccer_venue(venue_id numeric primary key, venue_name varchar(30),

city_id numeric foreign key references soccer_city(city_id),aud_capacity numeric);

create table referee_mast(referee_id numeric primary key, referee_name varchar(40),


country_id numeric);

19
create table match_mast(match_no numeric primary key, player_stage character(1), play_date
date, results character(5),

decide_by character(1), goal_score character(5), venue_id numeric foreign key references


soccer_venue(venue_id),

referee_id numeric foreign key references referee_mast(referee_id), audience numeric,


plr_of_match numeric,

stop1_sec numeric, stop2_sec numeric);

create table coach_mast(coach_id numeric primary key, coach_name varchar(40));

create table soccer_team (team_id numeric primary key, team_group character(1),


match_played numeric, won numeric, draw numeric, lost numeric,

goal_for numeric, goal_agnst numeric, goal_diff numeric, points numeric, group_position


numeric);

create table player_booked(match_no numeric, team_id numeric foreign key references


soccer_team(team_id),

player_id numeric foreign key references player_mast(player_id),

booking_time varchar(40), sent_off character(1), play_schedule character(2), play_half


numeric);

create table player_in_out(match_no numeric, team_id numeric foreign key references


soccer_team(team_id),

player_id numeric foreign key references player_mast(player_id), in_out character(1),


play_schedule character(2),

play_half numeric );

20
create table match_details(match_no numeric foreign key references match_mast(match_no),
play_stage varchar(1),

team_id numeric foreign key references soccer_team(team_id), win_loos varchar(1),


decided_by varchar(1), goal_score numeric,

penalty_score numeric, ass_ref numeric, player_gk numeric);

create table team_coaches(team_id numeric foreign key references soccer_team(team_id),

coach_id numeric foreign key references coach_mast(coach_id));

create table match_captain(match_no numeric foreign key references match_mast(match_no),

team_id numeric foreign key references soccer_team(team_id), player_captain numeric);

create table penalty_gk(match_no numeric foreign key references match_mast(match_no),

team_id numeric foreign key references soccer_team(team_id),player_gk numeric);

create table penalty_shootout(kick_id numeric primary key, match_no numeric foreign key
references match_mast(match_no),

team_id numeric foreign key references soccer_team(team_id), player_id numeric foreign key
references player_mast(player_id),

score_goal varchar(1), kick_no numeric);

create table goal_details(goal_id numeric primary key, match_no numeric foreign key
references match_mast(match_no),

player_id numeric foreign key references player_mast(player_id), team_id numeric foreign


key references soccer_team(team_id),

goal_time numeric , goal_type character(1), play_stage character(1), goal_schedule


character(2), goal_half numeric);

21
create table asst_referee_mast(ass_ref_id numeric, ass_ref_name varchar(40),

country_id numeric foreign key references soccer_country(country_id));

Analysis:
1) soccer_venue : Return the total count of venues for the EURO CUP 2030

SELECT COUNT(venue_id) as no_of_venues

FROM soccer_venue;

2) goal_details : Write a query to find the number of goals scored within normal play during
the EURO cup 2030

SELECT COUNT(goal_id) as no_of_goals

FROM goal_details

WHERE goal_schedule ='NT';

22
3) match_mast : write a SQL query to find the number of matches that ended with a result.

SELECT COUNT(*)

FROM match_mast

WHERE results is not null;

23
4) match_mast : write a SQL query to find the number of matches that ended in draws.

SELECT COUNT(*)/2 as draw

FROM match_details

WHERE win_lose='D';

5) match_mast : write a SQL query to find out when the Football EURO cup 2030 will end.

SELECT max(play_date) as finale

FROM match_mast;

24
6) goal_details : write a SQL query to find the number of self-goals scored during the 2016
European Championship.

SELECT COUNT(*) as Self_Goals

FROM goal_details

WHERE goal_type='O';

25
7) penalty_shootout : write a SQL query to find the number of matches that resulted in a
penalty shootout.

SELECT COUNT(distinct(match_no)) as no_of_matches

FROM penalty_shootout;

8) goal_details : write a SQL query to find the number of goals scored in every match in
extra time. Sort the result-set on match number. Return match number, number of goals in
extra time.

SELECT match_no , COUNT(*)

FROM goal_details WHERE goal_time >90

GROUP BY match_no

ORDER BY match_no;

26
9) goal_details : write a SQL query to find the matches in which no stoppage time was
added during the first half of play. Return match no, date of play, and goal scored.

SELECT match_no, play_date, goal_score

FROM match_mast

WHERE stop1_sec =0;

27
Conclusion:
In conclusion, my internship at Celebal Technologies was a transformative experience in data
engineering, specifically focused on SQL soccer database analysis for the EURO CUP. The
project emphasized the crucial role of a well-designed database in extracting meaningful
insights. This experience deepened my SQL skills and provided practical insights into sports
data management. Successfully meeting project objectives showcased the real-world
applications of data engineering.
Beyond technical skills, the internship enhanced communication and problem-solving
abilities. The mentorship received at Celebal Technologies has been instrumental in my
professional growth. The acquired knowledge serves as a robust foundation for future
endeavors in the dynamic field of data engineering and analytics. I am eager to apply these
skills to contribute meaningfully at the intersection of technology and sports data analysis.

28

You might also like