Progress Report: Loading of Required Dataset
Progress Report: Loading of Required Dataset
S.
Name of the
N Enroll. No. Branch Mobile No. E-mail address
student
o.
1
2 02215002816 Karan sehgal E.C.E 8826608475 [email protected]
3 04215002816 Rishabh Rawat E.C.E 8750644313 [email protected]
● PreProcessing of Data.
2
Roadblocks and their workarounds encountered during the progress of the
project are explained below,
❖ Missing values in the dataset
➢ Some of the important pieces of information were found
missing while working on the dataset, it included passengers’
age column.
➢ The first workaround was to take average values of
passengers’ ages and fill it where it is NULL. This was not
giving good results and was not a reliable measure to work
against.
➢ Thinking through various solutions, a better approach was
finally implemented which included taking average age for
every possible category of the passenger i.e., calculating the
average of the ages for all the kids, women, men, young boys,
young girl, separately (and all possible salutations were
included). Then filling NULL values according to the
passenger’s identity.
❖ Incompatible chunk of data present
➢ There were many rows in the dataset containing more than
90% NULL or missing values and they were not contributing
to the accuracy of the prediction.
➢ Filling those missing/NULL values was the goal at first, but
after some evaluations it was clear that dropping those values
was the better option at hand.
❖ Redundant information in the dataset
➢ There were many columns in the dataset that had absolutely
zero contribution to the end result.
3
➢ Dropping those columns was the best way to reduce the
overhead in the computation. These columns included,
● Cabin,
● name,
● survived,
● passengerId, and
● ticket
4
Percentage of work completed till date: …………………………..
Evaluation Criteria
Name Regularity Progress Timely Total
Enroll.
S.No. of the Branch (02) of work submission of Marks
No.
student done progress report (10)
(06) (02)
1
2
3