0% found this document useful (0 votes)
16 views98 pages

ML &DS Internship Report

TechCiti is a leading information technology services platform that enhances business operations and customer engagement globally, serving over 1500 clients. The organization focuses on software development, cloud services, and IT infrastructure management, aiming to provide exceptional value through innovation and quality. The internship program offers students practical experience in software development, emphasizing skills in design, project management, and collaboration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views98 pages

ML &DS Internship Report

TechCiti is a leading information technology services platform that enhances business operations and customer engagement globally, serving over 1500 clients. The organization focuses on software development, cloud services, and IT infrastructure management, aiming to provide exceptional value through innovation and quality. The internship program offers students practical experience in software development, emphasizing skills in design, project management, and collaboration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 98

Objective of the Organisation

TechCiti is a vast comprehensive information technology services and solutions platform that

digitally transforms business operations, enhances customer engagement and augments

operational efficiency for its customers all over the world. TechCiti offers an integrated portfolio

of products, solutions and services. It serves more than 1500 customers ranging from Fortune 500

companies to emerging start-ups. Techciti Technologies has evolved as one of the leading

Managed Service Provider (MSP’s) in APAC region. TechCiti derives its strength from its strong

leadership team focused on inspiring an environment of entrepreneurial culture seeped in

delivering exceptional value to the customers.

The company network portfolio consists two companies “TechCiti Technologies Private

Limited” and “TechCiti Software Consulting Private Limited “. TechCiti Technologies Private

Limited being the parent company and TechCiti Software Consulting Private Limited being the

deemed subsidiary of TechCiti Technologies Private Limited.

Through a well-defined development, support and quality framework, TechCiti consults

companies on their technology roadmap and implements, supports and maintains business-

critical applications and the underlying infrastructure. The company brings along in-depth

expertise and robust experience in IT Infrastructure Management, Digital Experience

Management, Digital Networking, Automation solutions, Cloud services, performance

management, Cloud Security Solutions, Global Network Software Solutions and application

development.

Our Vision
Our vision is to enable people and organizations realize their potential reinventing their

engagement in defining the future using - technology.

Our Mission

Our mission is to achieve the leading position as a distinguished & absolute end-to-end

information technology infrastructure & service provider. We want to develop with profitable

growth through superior Customer service, Innovation, Quality and Commitment.

Services

SOFTWARE DEVELOPMENT SERVICES

Organizations today need to anticipate their business needs and constantly evolve Software

Product Development practices. TechCiti Technologies begin with a thorough understanding and

analysis of requirements. We engage with organizations to bring differentiation in user

experience, development, enhancements, support, and maintenance across the complete

application lifecycle and software solutions. Today, software application maintenance is a

daunting task for enterprises.

The software industry is on the cusp of tectonic changes in how and where data is stored and

processed. For over 30 years, the venerable relational database management system (RDBMS),

running in corporate data centers, has held the bulk of the world’s data. This is not sustainable.

RDBMS technology can no longer keep pace with the velocity, volume, and variety of data being

created and consumed. For this new world of Big Data, SQL databases are required.

CLOUD SERVICES
Flexibility and scalability are the future of businesses. TechCiti provide Cloud Solutions with the

ability to meet the future needs of your business to dynamically scale your infrastructure as per

your needs. Through the strategic implementation of pure cloud-based business software

TechCiti help to improve businesses performance.

Give power to your applications with newer architectures designed by us and supported by

industry leading platforms, allowing for heightened scalability. TechCiti has expertise across

popular platforms such as Amazon, Azure, Sales force, and more.

Our cloud consulting practice focuses on application readiness, defining the cloud strategy &

roadmap, selection of cloud types and platforms and a wide range of security aspects.

Cloud services provide many IT services traditionally hosted in-house, including provisioning an

application/database server from the cloud, replacing in-house storage/backup with cloud storage

and accessing software and applications directly from a web browser without prior installation.

There are three basic types of cloud services:

Software as a service (SaaS)

Infrastructure as a service (IaaS)

Platform as a service (PaaS)

Cloud services provide great flexibility in provisioning, duplicating and scaling resources to

balance the requirements of users, hosted applications and solutions. Cloud services are built,

operated and managed by a cloud service provider, which works to ensure end-to-end

availability, reliability and security of the cloud.


Objective of the Internship

This internship programme aims to afford participants with the opportunity to gain targeted

quality work experience on a large working on software development Filed. Through directed

practical experience, in small work groups led by a teacher (staff or student), and supported by

quality intensive taught units covering key elements of the design, establishment and

maintenance of systems, the student is embedded in the workplace. Opportunity for individual

areas of interest and taking responsibility for our operations is offered in the project grouping and

time in the programme given for such.

The internship aims to allow students to fast track their careers. With the intention to design or

consult, teach or project manage, the internship aims to deliver taught information, make explicit

and practice the skills needed to be better equipped for these goals.

Assessment of success in the internship is within the students themselves. We aim to make the

learning opportunity a conscious one, to present PDC extension studies in a cohesive

progression, offer activities using a variety of learning styles, revisit and reposition skills to

solidify learning. Opportunities for assessment of progress made lie in the learning portfolio

compiled by the student (or with a facilitator as note-taker) throughout the programme: to track

aims, competence and confidence. Designs will be made up through out the course, focusing on

different circumstances with different aims for learning from the exercise, with guided peer

review to broaden the learning opportunity in reflection.

The internship should help the student understand design, to move towards teaching or consulting

professionally. Taught elements are supported by practical involvement in the day to day running

of the farm.
A generalist approach is considered as mandatory, opportunities for specialism and the

application of prior learning and specialist skills exist in the project stream, evidenced in the

portfolio, and in the final presentations. The design stream aims to develop designing skills and

awareness in a series of constrained design briefs, culminating in a design of choice, allowing the

intern to focus on a site in mind, or one chosen to represent their interests from local, defined

Concurrent learning streams:

• Taught elements –course extension learning on key elements of Software design.

• Practical elements – workshops for skills and methods used in areas of web Technolgies taught

elements.

• Taught specialist courses – certificated independent courses covering key professional areas of

expertise.

• Interest elements – project work drawing on workshop skills, applied to areas of need or

development on the Software .

• Reflective learning – self guided, with assistance of facilitators, to track and exhibit progress

through the portfolio.

Scope of the Internship

This system is designed to provide multiple techniques and optimized working in a simultaneous

fashion from a single interface so that the organizations will be having a cost effective way of

working and analysis. All that it fundamentals are properly organized and even the customization

provisions are provided for better work orientation. In the proposed system all the related

perceptions are provided in an inbuilt format so that the considerations can be accessed on a

faster mode.
Some of the important points of the proposed system in as following-

Collaborative working is supported so multiple analytical teams can associate at one place

in terms of the provisional research

All requirements of the information integration and extraction is associated and even

different formats are provided which can be selected and which can be incorporated

directly

The associations that are required for the convertibility will be provided and it will be

provided in a selective mood so various options will be provided to the users

The information that is extracted will be modeled with the help of inbuilt methodology

and even direct formulation support is provided

All related working and acknowledgements will be organized on a central place which

will help to maintain the security of the data as suppose multiple research orientations has

been undertaken all the related research data will be automatically backed up and saved

In the proposed system all the related representation methods are provided so that after

the Representation the design it can be properly published which includes even the audio

In the proposed system multiple charting types are provided and even the related

templates system is included so the various examples are provided which can be directly

incorporated and used

Media publish provided which can be used for the direct transfer of the related

information if required

The problems that has been encounter has been properly understood and a detailed system has

been designed in such a way that it will be incorporated with different types of references of

resources should provide different consideration for workability which will be supported for
multiple types of organizational strategic design formations. The users can utilize one system for

different types of levels of working which will be helpful in such a way that all collaborative and

consolidated working can be established.

Course Syllabus

HTML

• Introduction to HTML5

• Text Formatting Tags

• Image Tag

• Heading Tag

• Listing Tag (Ordered List and Unordered List)

• Marquee Tag

• Table Tag

• Anchor Tag

• Website Using Table Tag

CSS

• Introduction to CSS

• CSS Syntax

• CSS Colour, Background and Border


• CSS Margin, padding and outline

• CSS Links

• CSS Lists and Tables

• CSS Float, Opacity and Image Gallery

• CSS Form Tags

• CSS Gradients, Shadows, Text Effects, Rounded Corners

• CSS 2D and 3D Transforms

• Website Using CSS

• Introduction to Bootstrap

• Website Using Bootstrap Layout

JQuery

• Selectors

• Basic Selectors

• Attribute Selectors

• Form Selectors

• Hierarchy

• Filtering Elements
• Basic Filters

• Child Filters

• Content Filters

• Visibility

• Effects

• Basic

• Fading

• Slide Effects

• Handling Events

• Basic DOM

• CSS Style

Bootstrap 3

• Introduction to Bootstrap

• Bootstrap Grid System

• Bootstrap Grid System - Advanced.


• Creating Layouts with Bootstrap

• Bootstrap CSS - Understanding the CSS

• CSS Customization / Skins

• Responsive Web design with Bootstrap

• Single Page Responsive site with Bootstrap

• Introduction to LESS and SASS

• Customizing Bootstrap3 with LESS

• Bootstrap Plug-ins

• Bootstrap Layout Components

PYTHON

• Introduction to Python

• Python Basics

• Datatypes

• List

• Tuples

• Strings
• Dictionary

• Sets

• Conditional Statement

• Looping Statement

• Functions

• Python OOPS Concepts

• Exception Handling

• File Handling

• GUI concepts-Tkinter

DJANGO

• Introduction to Django

• Django Http request_GET and POST

• Dynamic Passing Data

• Jinja2 Template Exception

• Django Template Languages

• Django Static Concepts

• Django Models and Migrations


• Django admin configurations

• Django database with user methods app

• Django authentication concepts

DATA SCIENCE AND MACHINE LEARNING:

• Data Science Introduction

• Statistical Concepts

• Library Packages

• Numpy

• Pandas

• Dictionary

• Sets

• Machine Learning Introduction

• Types Of Machine Learning

• Linear Regression

• Logistic Regression

• Naïve Bayes Classifier


• Decision Tree

• Random Forest

• Support Vector Machine

• K-Nearest Neighbor

First Week Internship Course Details

HTML AND CSS:

HTML stands for Hypertext Markup Language. It allows the user to create and structure

sections, paragraphs, headings, links, and blockquotes for web pages and applications.

HTML is not a programming language, meaning it doesn’t have the ability to create dynamic

functionality. Instead, it makes it possible to organize and format documents, similarly to

Microsoft Word.

When working with HTML, we use simple code structures (tags and attributes) to mark up a

website page. For example, we can create a paragraph by placing the enclosed text within a

starting <p> and closing </p> tag.

Overall, HTML is a markup language that is really straightforward and easy to learn even for

complete beginners in website building. Here’s what you’ll learn by reading this article:
HTML was invented by Tim Berners-Lee, a physicist at the CERN research institute in

Switzerland. He came up with the idea of an Internet-based hypertext system.

Hypertext means a text that contains references (links) to other texts that viewers can access

immediately. He published the first version of HTML in 1991, consisting of 18 HTML tags.

Since then, each new version of the HTML language came with new tags and attributes (tag

modifiers) to the markup.

According to Mozilla Developer Network’s HTML Element Reference, currently, there are 140

HTML tags, although some of them are already obsolete (not supported by modern browsers).

Due to a quick rise in popularity, HTML is now considered an official web standard. The HTML

specifications are maintained and developed by the World Wide Web Consortium (W3C). You

can check out the latest state of the language anytime on W3C’s website.

The biggest upgrade of the language was the introduction of HTML5 in 2014. It added several

new semantic tags to the markup, that reveal the meaning of their own content, such

as <article>, <header>, and <footer>.

HTML documents are files that end with a .html or .htm extension. You can view then using any

web browser (such as Google Chrome, Safari, or Mozilla Firefox). The browser reads the HTML

file and renders its content so that internet users can view it.

Usually, the average website includes several different HTML pages. For instance: home pages,

about pages, contact pages would all have separate HTML documents.
Each HTML page consists of a set of tags (also called elements), which you can refer to as the

building blocks of web pages. They create a hierarchy that structures the content into sections,

paragraphs, headings, and other content blocks.

Most HTML elements have an opening and a closing that use the <tag></tag> syntax.

Below, you can see a code example of how HTML elements can be structured:

. <div>

. <h1>The Main Heading</h1>

. <h2>A catchy subheading</h2>

. <p>Paragraph one</p>

. <imgsrc="/"alt="Image">

. <p>Paragraph two with a <ahref="https://example.com">hyperlink</a></p>

. </div>

 The outmost element is a simple division (<div></div>) you can use to mark up bigger

content sections.

 It contains a heading (<h1></h1>), a subheading (<h2></h2>), two paragraphs

(<p></p>), and an image (<img>).

 The second paragraph includes a link (<a></a>) with a href attribute that contains the

destination URL.

 The image tag also has two attributes: src for the image path and alt for the image

description.
CSS:

CSS stands for Cascading Style Sheets with an emphasis placed on “Style.” While HTML is used

to structure a web document (defining things like headlines and paragraphs, and allowing you to

embed images, video, and other media), CSS comes through and specifies your document’s style

—page layouts, colors, and fonts are all determined with CSS. Think of HTML as the foundation

(every house has one), and CSS as the aesthetic choices (there’s a big difference between a

Victorian mansion and a mid-century modern home).

As we have mentioned before, CSS is a language for specifying how documents are presented to

users — how they are styled, laid out, etc.

A document is usually a text file structured using a markup language — HTML is the most

common markup language, but you may also come across other markup languages such

as SVG or XML.

Presenting a document to a user means converting it into a form usable by your

audience. Browsers, like Firefox, Chrome, or Edge , are designed to present documents visually,

for example, on a computer screen, projector or printer.


CSS is a rule-based language — you define rules specifying groups of styles that should be

applied to particular elements or groups of elements on your web page. For example "I want the

main heading on my page to be shown as large red text."

The following code shows a very simple CSS rule that would achieve the styling described

above:

h1 {

color: red;

font-size: 5em;}

The rule opens with a selector . This selects the HTML element that we are going to style. In this

case we are styling level one headings (<h1>).

We then have a set of curly braces { }. Inside those will be one or more declarations, which take

the form of property and value pairs. Each pair specifies a property of the element(s) we are

selecSting, then a value that we'd like to give the property.

Before the colon, we have the property, and after the colon, the value. CSS properties have

different allowable values, depending on which property is being specified. In our example, we

have the color property, which can take various color values. We also have the font-size property.

This property can take various size units as a value.


As there are so many things that you could style using CSS, the language is broken down

into modules. You'll see reference to these modules as you explore MDN and many of the

documentation pages are organized around a particular module. For example, you could take a

look at the MDN reference to the Backgrounds and Borders module to find out what its purpose

is, and what different properties and other features it contains. You will also find links to the CSS

Specification that defines the technology (see below).

At this stage you don't need to worry too much about how CSS is structured, however it can

make it easier to find information if, for example, you are aware that a certain property is likely

to be found among other similar things and are therefore probably in the same specification.

For a specific example, let's go back to the Backgrounds and Borders module — you might think

that it makes logical sense for the background-color and border-color properties to be defined in

this module. And you'd be right

Second Week Internship Course Details

Bootstrap4

Bootstrap is an open source framework used to develop the responsive web

applications or responsive designs. Responsive means application should be runs

on smaller screens like mobile phones and tablets. Every element of the HTML

document get stacked when the page get smaller or minimized. By default

bootstrap takes 12 columns of width with equal separation of the columns that

means every column having same size. But you can alter the default values and

you can make layouts,design


according to your requirements using <span> tag.
Bootstrap provide grid system for all kind of devices such as extra small, small,

medium, large, extra-large which can help to run the app on every devices. Further

it provide some stylish buttons, forms, tables and so on. Bootstrap 4 is the newest

version with some additional features compare to previous versions. In this project

bootstrap 4 is used for the front development along with the django framework.

jQuery

jQuery is a JavaScript Library. jQuery greatly simplifies JavaScript programming. jQuery is

easy to learn. The purpose of jQuery is to make it much easier to use JavaScript on your

website. jQuery takes a lot of common tasks that require many lines of JavaScript code to

accomplish, and wraps them into methods that you can call with a single line of code.

Installation

npm install jquery

Third Week Internship Course Details

PYTHON

Python is a high-level, general-purpose and a very popular programming language. Python

programming language (latest Python 3) is being used in web development, Machine Learning

applications, along with all cutting edge technology in Software Industry. Python Programming

Language is very well suited for Beginners, also for experienced programmers with other

programming languages like C++ and Java.

Python Programming Language in most efficient way, with the topics from basics to advanced

(like Web-scraping, Django, Deep-Learning, etc.) with examples.


Below are some facts about Python Programming Language:

1. Python is currently the most widely used multi-purpose, high-level programming

language.

2. Python allows programming in Object-Oriented and Procedural paradigms.

3. Python programs generally are smaller than other programming languages like Java.

Programmers have to type relatively less and indentation requirement of the language,

makes them readable all the time.

4. Python language is being used by almost all tech-giant companies like – Google,

Amazon, Facebook, Instagram, Dropbox, Uber… etc.

5. The biggest strength of Python is huge collection of standard library which can be used

for the following:

 Machine Learning

 GUI Applications (like Kivy, Tkinter, PyQt etc. )

 Web frameworks like Django (used by YouTube, Instagram, Dropbox)

 Image processing (like OpenCV, Pillow)

 Web scraping (like Scrapy, BeautifulSoup, Selenium)

 Test frameworks

 Multimedia

 Scientific computing

 Text processing and many more

DATATYPES
STRINGS

A string is a sequence of characters. It can be declared in python by using double-quotes.

Strings are immutable, i.e., they cannot be changed.

Lists

Lists are one of the most powerful tools in python. They are just like the arrays declared in

other languages. But the most powerful thing is that list need not be always homogeneous.

A single list can contain strings, integers, as well as objects. Lists can also be used for

implementing stacks and queues. Lists are mutable, i.e., they can be altered once declared.

Tuples

A tuple is a sequence of immutable Python objects. Tuples are just like lists with the

exception that tuples cannot be changed once declared. Tuples are usually faster than lists.

Iterations

Iterations or looping can be performed in python by ‘for’ and ‘while’ loops. Apart from

iterating upon a particular condition, we can also iterate on strings, lists, and tuples.

Set

Is an unordered collection of data type that is iterable, mutable and has no duplicate

elements. The order of elements in a set is undefined though it may consist of various

elements.

The major advantage of using a set, as opposed to a list, is that it has a highly optimized

method for checking whether a specific element is contained in the set.

Dictionary in Python is an unordered collection of data values, used to store data values

like a map, which unlike other Data Types that hold only single value as an element,

Dictionary holds key:value pair. Key value is provided in the dictionary to make it more

optimized.
PYTHON OOPS CONCEPTS

Object Oriented Programming is a way of computer programming using the idea of “objects”

to represents data and methods. It is also, an approach used for creating neat and reusable

code instead of a redundant one. the program is divided into self-contained objects or several

mini-programs. Every Individual object represents a different part of the application having

its own logic and data to communicate within themselves.

What are Classes and Objects?

A class is a collection of objects or you can say it is a blueprint of objects defining the

common attributes and behavior. Now the question arises, how do you do that?

Well, it logically groups the data in such a way that code reusability becomes easy. I can give

you a real-life example- think of an office going ’employee’ as a class and all the attributes

related to it like ’emp_name’, ’emp_age’, ’emp_salary’, ’emp_id’ as the objects in Python.

Let us see from the coding perspective that how do you instantiate a class and an object.

Class is defined under a “Class” Keyword.

Example:

1 class class1(): // class 1 is the name of the class

Note: Python is not case-sensitive.

Objects:

Objects are an instance of a class. It is an entity that has state and behavior. In a nutshell, it is

an instance of a class that can access the data.


Syntax: obj = class1()

Here obj is the “object “ of class1.

DJANGO

Django is a web application framework written in Python programming language. It is based

on MVT (Model View Template) design pattern. The Django is very demanding due to its

rapid development feature. It takes less time to build application after collecting client

requirement.

This framework uses a famous tag line:The web framework for perfectionists with

deadlines.

By using Django, we can build web applications in very less time. Django is designed in such

a manner that it handles much of configure things automatically, so we can focus on

application development only

Features of Django

o Rapid Development

o Secure

o Scalable

o Fully loaded

o Versatile

o Open Source

o Vast and Supported Community

Rapid Development
Django was designed with the intention to make a framework which takes less time to build

web application. The project implementation phase is a very time taken but Django creates it

rapidly.

Secure

Django takes security seriously and helps developers to avoid many common security

mistakes, such as SQL injection, cross-site scripting, cross-site request forgery etc. Its user

authentication system provides a secure way to manage user accounts and passwords.

Scalable

Django is scalable in nature and has ability to quickly and flexibly switch from small to large

scale application project.

Django Installation

To install Django, first visit to django official site (https://www.djangoproject.com) and

download django by clicking on the download section. Here, we will see various options to

download The Django.

Django requires pip to start installation. Pip is a package manager system which is used to

install and manage packages written in python. For Python 3.4 and higher versions pip3 is

used to manage packages

Django Project

we have installed Django successfully. Now, we will learn step by step process to create a

Django application.
To create a Django project, we can use the following command. projectname is the name of

Django application.

1. $ django-admin startproject projectname

Running the Django Project

Django project has a built-in development server which is used to run application instantly

without any external web server. It means we don't need of Apache or another web server to

run the application in development mode.

To run the application, we can use the following command.

1. $ python3 manage.py runserver

Django Admin Interface

Django provides a built-in admin module which can be used to perform CRUD operations on

the models. It reads metadata from the model to provide a quick interface where the user can

manage the content of the application.

This is a built-in module and designed to perform admin related tasks to the user.

Look server has started and can be accessed at localhost with port 8000. Let's access it using

the browser, it looks like the below.


The application is running successfully. Now, we can customize it according to our

requirement and can develop a customized web application.

Django Admin Interface

Django provides a built-in admin module which can be used to perform CRUD operations

on the models. It reads metadata from the model to provide a quick interface where the

user can manage the content of the application.

This is a built-in module and designed to perform admin related tasks to the user.
Let's see how to activate and use Django's admin module (interface).

The admin app (django.contrib.admin) is enabled by default and already added into

INSTALLED_APPS section of the settings file.

To access it at browser use '/admin/' at a local machine like localhost:8000/admin/ and it

shows the following output:

Django MVT

The MVT (Model View Template) is a software design pattern. It is a collection of three

important components Model View and Template. The Model helps to handle database. It

is a data access layer which handles the data.

The Template is a presentation layer which handles User Interface part completely. The

View is used to execute the business logic and interact with a model to carry data and

renders a template.
Although Django follows MVC pattern but maintains it?s own conventions. So, control is

handled by the framework itself.

There is no separate controller and complete application is based on Model View and

Template. That?s why it is called MVT application.

See the following graph that shows the MVT based control flow.

Here, a user requests for a resource to the Django, Django works as a controller and check to

the available resource in URL.

If URL maps, a view is called that interact with model and template, it renders a template.

Django responds back to the user and sends a template as a response.

Django Model

In Django, a model is a class which is used to contain essential fields and methods. Each

model class maps to a single table in the database.


Django Model is a subclass of django.db.models.Model and each field of the model class

represents a database field (column).

Django Templates

Django provides a convenient way to generate dynamic HTML pages by using its template

system.

A template consists of static parts of the desired HTML output as well as some special syntax

describing how dynamic content will be inserted.

To configure the template system, we have to provide some entries in settings.py file.

1. TEMPLATES = [

2. {

3. 'BACKEND': 'django.template.backends.django.DjangoTemplates',

4. 'DIRS': [os.path.join(BASE_DIR,'templates')],

5. 'APP_DIRS': True,

6. 'OPTIONS': {

7. 'context_processors': [

8. 'django.template.context_processors.debug',

9. 'django.template.context_processors.request',

10. 'django.contrib.auth.context_processors.auth',

11. 'django.contrib.messages.context_processors.messages',

12. ],

13. },

14. },

15. ]
Django Template Simple Example

First, create a directory templates inside the project app as we did below.

After that create a template index.html inside the created folder.


Django Views

A view is a place where we put our business logic of the application. The view is a python

function which is used to perform some business logic and return a response to the user. This

response can be the HTML contents of a Web page, or a redirect, or a 404 error.

All the view function are created inside the views.py file of the Django app.

Django View Simple Example

//views.py

1. import datetime

2. # Create your views here.

3. from django.http import HttpResponse

4. def index(request):
5. now = datetime.datetime.now()

6. html = "<html><body><h3>Now time is %s.</h3></body></html>" % now

7. return HttpResponse(html) # rendering the template in HttpResponse

Let's step through the code.

First, we will import DateTime library that provides a method to get current date and time

and HttpResponse class.

Next, we define a view function index that takes HTTP request and respond back.

View calls when gets mapped with URL in urls.py. For example

1. path('index/', views.index),

Output:
Django URL Mapping

Well, till here, we have learned to create a model, view, and template. Now, we will learn

about the routing of application.

Since Django is a web application framework, it gets user requests by URL locater and

responds back. To handle URL, django.urls module is used by the framework.

Let's open the file urls.py of the project and see the what it looks like

// urls.py

1. from django.contrib import admin

2. from django.urls import path

3.

4. urlpatterns = [

5. path('admin/', admin.site.urls),

6. ]

See, Django already has mentioned a URL here for the admin. The path function takes the

first argument as a route of string or regex type.

The view argument is a view function which is used to return a response (template) to the

user.

The django.urls module contains various functions, path(route,view,kwargs,name) is one

of those which is used to map the URL and call the specified view.

Fourth Week Internship Course Details


Data Science is kinda blended with various tools, algorithms, and machine learning

principles. Most simply, it involves obtaining meaningful information or insights from

structured or unstructured data through a process of analyzing, programming and business

skills. It is a field containing many elements like mathematics, statistics, computer science,

etc. Those who are good at these respective fields with enough knowledge of the domain in

which you are willing to work can call themselves as Data Scientist. It’s not an easy thing

to do but not impossible too. You need to start from data, it’s visualization, programming,

formulation, development, and deployment of your model. In the future, there will be great

hype for data scientist jobs. Taking in that mind, be ready to prepare yourself to fit in this

world.

Data science is not a one-step process such that you will get to learn it in a short time and

call ourselves a Data Scientist. It’s passes from many stages and every element is

important. One should always follow the proper steps to reach the ladder. Every step has its

value and it counts in your model. Buckle up in your seats and get ready to learn about

those steps.

 Problem Statement: No work start without motivation, Data science is no exception

though. It’s really important to declare or formulate your problem statement very

clearly and precisely. Your whole model and it’s working depend on your statement.

Many scientist considers this as the main and much important step of Date Science. So

make sure what’s your problem statement and how well can it add value to business or

any other organization.

 Data Collection: After defining the problem statement, the next obvious step is to go in

search of data that you might require for your model. You must do good research, find

all that you need. Data can be in any form i.e unstructured or structured. It might be in
various forms like videos, spreadsheets, coded forms, etc. You must collect all these

kinds of sources.

 Data Cleaning: As you have formulated your motive and also you did collect your

data, the next step to do is cleaning. Yes, it is! Data cleaning is the most favorite thing

for data scientists to do. Data cleaning is all about the removal of missing, redundant,

unnecessary and duplicate data from your collection. There are various tools to do so

with the help of programming in either R or Python. It’s totally on you to choose one of

them. Various scientist have their opinion on which to choose. When it comes to the

statistical part, R is preferred over Python, as it has the privilege of more than 12,000

packages. While python is used as it is fast, easily accessible and we can perform the

same things as we can in R with the help of various packages.

 Data Analysis and Exploration: It’s one of the prime things in data science to do and

time to get inner Holmes out. It’s about analyzing the structure of data, finding hidden

patterns in them, studying behaviors, visualizing the effects of one variable over others

and then concluding. We can explore the data with the help of various graphs formed

with the help of libraries using any programming language. In R, ggplot is one of the

most famous models while matplotlib in Python.

 Data Modelling: Once you are done with your study that you have formed from data

visualization, you must start building a hypothesis model such that it may yield you a

good prediction in future. Here, you must choose a good algorithm that best fit to your

model. There different kinds of algorithms from regression to classification,

SVM( Support vector machines), Clustering, etc. Your model can be of a Machine

Learning algorithm. You train your model with the train data and then test it with test

data. There are various methods to do so. One of them is the K-fold method where you
split your whole data into two parts, One is Train and the other is test data. On these

bases, you train your model.

 Optimization and Deployment: You followed each and every step and hence build a

model that you feel is the best fit. But how can you decide how well your model is

performing? This where optimization comes. You test your data and find how well it is

performing by checking its accuracy. In short, you check the efficiency of the data

model and thus try to optimize it for better accurate prediction. Deployment deals with

the launch of your model and let the people outside there to benefit from that. You can

also obtain feedback from organizations and people to know their need and then to work

more on your model.

NUMPY

 Numpy is an open source Python library used for scientific computing and provides a host of

features that allow a Python programmer to work with high-performance arrays and matrices.

In addition, pandas s a package for data manipulation that uses the DataFrame objects from R

(as well as different R packages) in a Python environment.

PANDAS

Pandas Pandas is an open-source library exclusively designed for data analysis and data

manipulation. It is built on top of Python’s NumPy package, meaning that Pandas relies on

NumPy for functioning. Essentially, Pandas includes data structures and operations for

manipulating time series and numerical tables.

MACHINE LEARNING

INTRODUCTION:
Machine Learning is a system that can learn from example through self-improvement

and without being explicitly coded by programmer. The breakthrough comes with the idea

that a machine can singularly learn from the data (i.e., example) to produce accurate results.

A typical machine learning tasks are to provide a recommendation. For those who

have a Netflix account, all recommendations of movies or series are based on the user's

historical data. Tech companies are using unsupervised learning to improve the user

experience with personalizing recommendation. Machine learning is also used for a variety of

task like fraud detection, predictive maintenance, portfolio optimization, automatize task and

so on.

CLASSIFICATION OF MACHINE LEARNING:

 Supervised Learning

 Unsupervised Learning

 Reinforcement Learning

 Semi Supervised Learning

SUPERVISED LEARNING:

Supervised learning as the name indicates the presence of a supervisor as a teacher.

Basically supervised learning is a learning in which we teach or train the machine using data

which is well labeled that means some data is already tagged with the correct answer.

After that, the machine is provided with a new set of examples (data) so that

supervised learning algorithm analyses the training data(set of training examples) and

produces a correct outcome from labeled data.


(or)

 Supervised learning is like learning with a teacher

 training dataset is like a teacher

 the training dataset is used to train the machine

Example:

Based on some prior knowledge (when its sunny, temperature is higher; when its

cloudy, humidity is higher, etc.) weather apps predict the parameters for a given time.

CLASSIFICATION OF SUPERVISED LEARNING

Classification:

Machine is trained to classify something into some class.

 classifying whether a patient has disease or not

 classifying whether an email is spam or not

Regression:

Machine is trained to predict some value like price, weight or height.

 predicting house/property price


 predicting stock market price

LIST OF COMMON ALGORITHMS:

 Nearest Neighbor

 Naive Bayes

 Decision Trees

 Linear Regression

 Support Vector Machines (SVM)

 Logistic Regression

UNSUPERVISED LEARNING:

Unsupervised learning is the training of machine using information that is neither

classified nor labeled and allowing the algorithm to act on that information without guidance.

Here the task of machine is to group unsorted information according to similarities, patterns

and differences without any prior training of data.

Unlike supervised learning, no teacher is provided that means no training will be

given to the machine. Therefore machine is restricted to find the hidden structure in unlabeled

data by our-self.

(or)

 is like learning without a teacher

 the machine learns through observation & find structures in data


EXAMPLE:

A friend invites you to his party where you meet totally strangers. Now you will

classify them using unsupervised learning (no prior knowledge) and this classification can be

on the basis of gender, age group, dressing, educational qualification or whatever way you

would like. Why this learning is different from Supervised Learning? Since you didn't use

any past/prior knowledge about people and classified them "on-the-go".

CLASSIFICATION OF UNSUPERVISED LEARNING:

Clustering:

A clustering problem is where you want to discover the inherent groupings in the data

 such as grouping customers by purchasing behavior

Association:

An association rule learning problem is where you want to discover rules that describe

large portions of your data

 such as people that buy X also tend to buy Y

LIST OF COMMON ALGORITHMS:

 k-means clustering, Association Rules

REINFORCEMENT LEARNING:

Reinforcement learning is all about making decisions sequentially. In simple words

we can say that the output depends on the state of the current input and the next input

depends on the output of the previous input.


In Reinforcement learning decision is dependent, So we give labels to sequences of

dependent decisions

EXAMPLE:

Chess Game

LIST OF COMMON ALGORITHMS:

 Q-Learning

 Temporal Difference (TD)

 Deep Adversarial Networks

SEMI SUPERVISED LEARNING:

In this type of learning, the algorithm is trained upon a combination of labeled and

unlabeled data. Typically, this combination will contain a very small amount of labeled data

and a very large amount of unlabeled data.

EXAMPLE:

Speech Analysis: Since labeling of audio files is a very intensive task, Semi-

Supervised learning is a very natural approach to solve this problem.

LINEAR REGRESSION:
Linear Regression is a machine learning algorithm based on supervised learning. It

performs a regression task. Regression models a target prediction value based on

independent variables. It is mostly used for finding out the relationship between variables and

forecasting.

Hypothesis function for Linear Regression :

While training the model we are given :

x: input training data (univariate – one input variable(parameter))

y: labels to data (supervised learning)

When training the model – it fits the best line to predict the value of y for a given value of x.

The model gets the best regression fit line by finding the best θ 1 and θ2 values.

θ 1: intercept

θ2: coefficient of x

Once we find the best θ1 and θ2 values, we get the best fit line. So when we are finally using

our model for prediction, it will predict the value of y for the input value of x.
ESTIMATION OF HOME PRICE:

LINEAR REGRESSION USING SKLEARN


LOGISTIC REGRESSION

Logistic regression is basically a supervised classification algorithm. In a classification

problem, the target variable(or output), y, can take only discrete values for given set of

features(or inputs), X.

Contrary to popular belief, logistic regression IS a regression model. The model builds a

regression model to predict the probability that a given data entry belongs to the category

numbered as “1”. Just like Linear regression assumes that the data follows a linear function,

Logistic regression models the data using the sigmoid function.


Logistic regression becomes a classification technique only when a decision threshold is

brought into the picture. The setting of the threshold value is a very important aspect of

Logistic regression and is dependent on the classification problem itself.

The decision for the value of the threshold value is majorly affected by the values

of precision and recall. Ideally, we want both precision and recall to be 1, but this seldom is

the case. In case of a Precision-Recall tradeoff we use the following arguments to decide

upon the thresold:-

1. Low Precision/High Recall: In applications where we want to reduce the number of

false negatives without necessarily reducing the number false positives, we choose a

decision value which has a low value of Precision or high value of Recall. For example, in a

cancer diagnosis application, we do not want any affected patient to be classified as not

affected without giving much heed to if the patient is being wrongfully diagnosed with

cancer. This is because, the absence of cancer can be detected by further medical diseases

but the presence of the disease cannot be detected in an already rejected candidate.

2. High Precision/Low Recall: In applications where we want to reduce the number of

false positives without necessarily reducing the number false negatives, we choose a

decision value which has a high value of Precision or low value of Recall. For example, if
we are classifying customers whether they will react positively or negatively to a

personalised advertisement, we want to be absolutely sure that the customer will react

positively to the advertisemnt because otherwise, a negative reaction can cause a loss

potential sales from the customer.

Based on the number of categories, Logistic regression can be classified as:

1. binomial: target variable can have only 2 possible types: “0” or “1” which may

represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.

2. multinomial: target variable can have 3 or more possible types which are not

ordered(i.e. types have no quantitative significance) like “disease A” vs “disease B” vs

“disease C”.

3. ordinal: it deals with target variables with ordered categories. For example, a test score

can be categorized as:“very poor”, “poor”, “good”, “very good”. Here, each category

can be given a score like 0, 1, 2, 3.

First of all, we explore the simplest form of Logistic Regression, i.e Binomial Logistic

Regression.

NAIVE BAYES

Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the

simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and

reliable algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.

Naive Bayes classifier assumes that the effect of a particular feature in a class is independent

of other features. For example, a loan applicant is desirable or not depending on his/her

income, previous loan and transaction history, age, and location. Even if these features are

interdependent, these features are still considered independently. This assumption simplifies
computation, and that's why it is considered as naive. This assumption is called class

conditional independence.

 P(h): the probability of hypothesis h being true (regardless of the data). This is known

as the prior probability of h.

 P(D): the probability of the data (regardless of the hypothesis). This is known as the

prior probability.

 P(h|D): the probability of hypothesis h given the data D. This is known as posterior

probability.

 P(D|h): the probability of data d given that the hypothesis h was true. This is known

as posterior probability.

How Naive Bayes classifier works?

Let’s understand the working of Naive Bayes through an example. Given an example of

weather conditions and playing sports. You need to calculate the probability of playing

sports. Now, you need to classify whether players will play or not, based on the weather

condition.

First Approach (In case of a single feature)

Naive Bayes classifier calculates the probability of an event in the following steps:

 Step 1: Calculate the prior probability for given class labels

 Step 2: Find Likelihood probability with each attribute for each class
 Step 3: Put these value in Bayes Formula and calculate posterior probability.

 Step 4: See which class has a higher probability, given the input belongs to the higher

probability class.

For simplifying prior and posterior probability calculation you can use the two tables

frequency and likelihood tables. Both of these tables will help you to calculate the prior and

posterior probability. The Frequency table contains the occurrence of labels for all features.

There are two likelihood tables. Likelihood Table 1 is showing prior probabilities of labels

and Likelihood Table 2 is showing the posterior probability.

Now suppose you want to calculate the probability of playing when the weather is overcast.

Probability of playing:

P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)

Calculate Prior Probabilities:


P(Overcast) = 4/14 = 0.29

P(Yes)= 9/14 = 0.64

Calculate Posterior Probabilities:

P(Overcast |Yes) = 4/9 = 0.44

Put Prior and Posterior probabilities in equation (1)

P (Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98(Higher)

Similarly, you can calculate the probability of not playing:

Probability of not playing:

P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)

Calculate Prior Probabilities:

P(Overcast) = 4/14 = 0.29

P(No)= 5/14 = 0.36

Calculate Posterior Probabilities:

P(Overcast |No) = 0/9 = 0

Put Prior and Posterior probabilities in equation (2)

P (No | Overcast) = 0 * 0.36 / 0.29 = 0

The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast

than players will play the sport.

Decision Tree Algorithm

A decision tree is a flowchart-like tree structure where an internal node represents feature(or

attribute), the branch represents a decision rule, and each leaf node represents the outcome.
The topmost node in a decision tree is known as the root node. It learns to partition on the

basis of the attribute value. It partitions the tree in recursively manner call recursive

partitioning. This flowchart-like structure helps you in decision making. It's visualization like

a flowchart diagram which easily mimics the human level thinking. That is why decision

trees are easy to understand and interpret.

Decision Tree is a white box type of ML algorithm. It shares internal decision-making logic,

which is not available in the black box type of algorithms such as Neural Network. Its

training time is faster compared to the neural network algorithm. The time complexity of

decision trees is a function of the number of records and number of attributes in the given

data. The decision tree is a distribution-free or non-parametric method, which does not

depend upon probability distribution assumptions. Decision trees can handle high

dimensional data with good accuracy.

How does the Decision Tree algorithm work?

The basic idea behind any decision tree algorithm is as follows:


1. Select the best attribute using Attribute Selection Measures(ASM) to split the records.

2. Make that attribute a decision node and breaks the dataset into smaller subsets.

3. Starts tree building by repeating this process recursively for each child until one of the

condition will match:

1. All the tuples belong to the same attribute value.

2. There are no more remaining attributes.

3. There are no more instances.

Attribute Selection Measures

Attribute selection measure is a heuristic for selecting the splitting criterion that partition data

into the best possible manner. It is also known as splitting rules because it helps us to

determine breakpoints for tuples on a given node. ASM provides a rank to each feature(or

attribute) by explaining the given dataset. Best score attribute will be selected as a splitting

attribute (Source). In the case of a continuous-valued attribute, split points for branches also
need to define. Most popular selection measures are Information Gain, Gain Ratio, and Gini

Index.

Information Gain

Shannon invented the concept of entropy, which measures the impurity of the input set. In

physics and mathematics, entropy referred as the randomness or the impurity in the system.

In information theory, it refers to the impurity in a group of examples. Information gain is the

decrease in entropy. Information gain computes the difference between entropy before split

and average entropy after split of the dataset based on given attribute values. ID3 (Iterative

Dichotomiser) decision tree algorithm uses information gain.

Where, Pi is the probability that an arbitrary tuple in D belongs to class Ci.

Where,

 Info(D) is the average amount of information needed to identify the class label of a

tuple in D.

 |Dj|/|D| acts as the weight of the jth partition.

 InfoA(D) is the expected informa-tion required to classify a tuple from D based on the

partitioning by A.
The attribute A with the highest information gain, Gain(A), is chosen as the splitting attribute

at node N().

Gain Ratio

Information gain is biased for the attribute with many outcomes. It means it prefers the

attribute with a large number of distinct values. For instance, consider an attribute with a

unique identifier such as customer_ID has zero info(D) because of pure partition. This

maximizes the information gain and creates useless partitioning.

C4.5, an improvement of ID3, uses an extension to information gain known as the gain ratio.

Gain ratio handles the issue of bias by normalizing the information gain using Split Info. Java

implementation of the C4.5 algorithm is known as J48, which is available in WEKA data

mining tool.

Where,

 |Dj|/|D| acts as the weight of the jth partition.

 v is the number of discrete values in attribute A.

The gain ratio can be defined as

The attribute with the highest gain ratio is chosen as the splitting attribute (Source).
Gini index

Another decision tree algorithm CART (Classification and Regression Tree) uses the Gini

method to create split points.

Where, pi is the probability that a tuple in D belongs to class Ci.

The Gini Index considers a binary split for each attribute. You can compute a weighted sum

of the impurity of each partition. If a binary split on attribute A partitions data D into D1 and

D2, the Gini index of D is:

In case of a discrete-valued attribute, the subset that gives the minimum gini index for that

chosen is selected as a splitting attribute. In the case of continuous-valued attributes, the

strategy is to select each pair of adjacent values as a possible split-point and point with

smaller gini index chosen as the splitting point.

The attribute with minimum Gini index is chosen as the splitting attribute.

The Random Forests Algorithm


Let’s understand the algorithm in layman’s terms. Suppose you want to go on a trip and you

would like to travel to a place which you will enjoy.

So what do you do to find a place that you will like? You can search online, read reviews on

travel blogs and portals, or you can also ask your friends.

Let’s suppose you have decided to ask your friends, and talked with them about their past

travel experience to various places. You will get some recommendations from every friend.

Now you have to make a list of those recommended places. Then, you ask them to vote (or

select one best place for the trip) from the list of recommended places you made. The place

with the highest number of votes will be your final choice for the trip.

In the above decision process, there are two parts. First, asking your friends about their

individual travel experience and getting one recommendation out of multiple places they have

visited. This part is like using the decision tree algorithm. Here, each friend makes a selection

of the places he or she has visited so far.

The second part, after collecting all the recommendations, is the voting procedure for

selecting the best place in the list of recommendations. This whole process of getting

recommendations from friends and voting on them to find the best place is known as the

random forests algorithm.

It technically is an ensemble method (based on the divide-and-conquer approach) of decision

trees generated on a randomly split dataset. This collection of decision tree classifiers is also

known as the forest. The individual decision trees are generated using an attribute selection

indicator such as information gain, gain ratio, and Gini index for each attribute. Each tree

depends on an independent random sample. In a classification problem, each tree votes and
the most popular class is chosen as the final result. In the case of regression, the average of all

the tree outputs is considered as the final result. It is simpler and more powerful compared to

the other non-linear classification algorithms.

How does the algorithm work?

It works in four steps:

1. Select random samples from a given dataset.

2. Construct a decision tree for each sample and get a prediction result from each

decision tree.

3. Perform a vote for each predicted result.

4. Select the prediction result with the most votes as the final prediction.

Advantages:

 Random forests is considered as a highly accurate and robust method because of the

number of decision trees participating in the process.


 It does not suffer from the overfitting problem. The main reason is that it takes the

average of all the predictions, which cancels out the biases.

 The algorithm can be used in both classification and regression problems.

 Random forests can also handle missing values. There are two ways to handle these:

using median values to replace continuous variables, and computing the proximity-

weighted average of missing values.

 You can get the relative feature importance, which helps in selecting the most

contributing features for the classifier.

Disadvantages:

 Random forests is slow in generating predictions because it has multiple decision

trees. Whenever it makes a prediction, all the trees in the forest have to make a

prediction for the same given input and then perform voting on it. This whole process

is time-consuming.

 The model is difficult to interpret compared to a decision tree, where you can easily

make a decision by following the path in the tree.

Random Forests vs Decision Trees

 Random forests is a set of multiple decision trees.

 Deep decision trees may suffer from overfitting, but random forests prevents

overfitting by creating trees on random subsets.

 Decision trees are computationally faster.


 Random forests is difficult to interpret, while a decision tree is easily interpretable and

can be converted to rules.

SUPPORT VECTOR MACHINE

Support Vector Machines is considered to be a classification approach, it but can be

employed in both types of classification and regression problems. It can easily handle

multiple continuous and categorical variables.

SVM constructs a hyperplane in multidimensional space to separate different classes.

SVM generates optimal hyperplane in an iterative manner, which is used to minimize an

error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best

divides the dataset into classes.

Support Vectors
Support vectors are the data points, which are closest to the hyperplane. These points will

define the separating line better by calculating margins. These points are more relevant to the

construction of the classifier.

Hyperplane

A hyperplane is a decision plane which separates between a set of objects having different

class memberships.

Margin

A margin is a gap between the two lines on the closest class points. This is calculated as the

perpendicular distance from the line to support vectors or closest points. If the margin is

larger in between the classes, then it is considered a good margin, a smaller margin is a bad

margin.

Classifier Building in Scikit-learn:


K-Nearest Neighbors

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no

assumption for underlying data distribution. In other words, the model structure determined

from the dataset. This will be very helpful in practice where most of the real world datasets

do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need

any training data points for model generation. All training data used in the testing phase. This

makes training faster and testing phase slower and costlier. Costly testing phase means time

and memory. In the worst case, KNN needs more time to scan all data points and scanning all

data points will require more memory for storing training data.

K-Nearest Neighbors

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no

assumption for underlying data distribution. In other words, the model structure determined
from the dataset. This will be very helpful in practice where most of the real world datasets

do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need

any training data points for model generation. All training data used in the testing phase. This

makes training faster and testing phase slower and costlier. Costly testing phase means time

and memory. In the worst case, KNN needs more time to scan all data points and scanning all

data points will require more memory for storing training data.

How does the KNN algorithm work?

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding

factor. K is generally an odd number if the number of classes is 2. When K=1, then the

algorithm is known as the nearest neighbor algorithm. This is the simplest case. Suppose P1

is the point, for which label needs to predict. First, you find the one closest point to P1 and

then the label of the nearest point assigned to P1.


Suppose P1 is the point, for which label needs to predict. First, you find the k closest point to

P1 and then classify points by majority vote of its k neighbors. Each object votes for their

class and the class with the most votes is taken as the prediction. For finding closest similar

points, you find the distance between points using distance measures such as Euclidean

distance, Hamming distance, Manhattan distance and Minkowski distance. KNN has the

following basic steps:

1. Calculate distance

2. Find closest neighbors

3. Vote for labels


Eager Vs. Lazy Learners

Eager learners mean when given training points will construct a generalized model before

performing prediction on given new points to classify. You can think of such learners as

being ready, active and eager to classify unobserved data points.

Lazy Learning means there is no need for learning or training of the model and all of the data

points used at the time of prediction. Lazy learners wait until the last minute before
classifying any data point. Lazy learner stores merely the training dataset and waits until

classification needs to perform. Only when it sees the test tuple does it perform generalization

to classify the tuple based on its similarity to the stored training tuples. Unlike eager learning

methods, lazy learners do less work in the training phase and more work in the testing phase

to make a classification. Lazy learners are also known as instance-based learners because lazy

learners store the training points or instances, and all learning is based on instances.

Curse of Dimensionality

KNN performs better with a lower number of features than a large number of features. You

can say that when the number of features increases than it requires more data. Increase in

dimension also leads to the problem of overfitting. To avoid overfitting, the needed data will

need to grow exponentially as you increase the number of dimensions. This problem of

higher dimension is known as the Curse of Dimensionality.

To deal with the problem of the curse of dimensionality, you need to perform principal

component analysis before applying any machine learning algorithm, or you can also use

feature selection approach. Research has shown that in large dimension Euclidean distance is

not useful anymore. Therefore, you can prefer other measures such as cosine similarity,

which get decidedly less affected by high dimension.

How do you decide the number of neighbors in KNN?

Now, you understand the KNN algorithm working mechanism. At this point, the question

arises that How to choose the optimal number of neighbors? And what are its effects on the

classifier? The number of neighbors(K) in KNN is a hyperparameter that you need choose at
the time of model building. You can think of K as a controlling variable for the prediction

model.

Research has shown that no optimal number of neighbors suits all kind of data sets. Each

dataset has it's own requirements. In the case of a small number of neighbors, the noise will

have a higher influence on the result, and a large number of neighbors make it

computationally expensive. Research has also shown that a small amount of neighbors are

most flexible fit which will have low bias but high variance and a large number of neighbors

will have a smoother decision boundary which means lower variance but higher bias.

Generally, Data scientists choose as an odd number if the number of classes is even. You can

also check by generating the model on different values of k and check their performance. You

can also try Elbow method here.


SCREENSHOTS
SAMPLE CODE:

from django.contrib import messages

from django.views.generic.detail import DetailView

from patient.models import PatientReg

from django.contrib.auth.models import User, auth

from django.shortcuts import render,redirect

from django.http import HttpResponse

import pandas as pd

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import numpy as np

from reportlab.platypus import Paragraph, SimpleDocTemplate, Table, TableStyle,

Image,Spacer

#creating the reportlab pdf library here.

import time
from reportlab.lib.enums import TA_JUSTIFY

from reportlab.lib.pagesizes import letter

from reportlab.lib import colors

from reportlab.lib.styles import getSampleStyleSheet

#creating the CNN library from here

import datetime

#importing the smtp

import smtplib

from email.mime.multipart import MIMEMultipart

from email.mime.text import MIMEText

from email.mime.base import MIMEBase

from email import encoders

#creating the reportlab pdf library here.

import time

from reportlab.lib.enums import TA_JUSTIFY


from reportlab.lib.pagesizes import letter

from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

from reportlab.lib.units import inch

# Create your views here.

def home(request):

if request.method == 'POST':

email=request.POST['email']

password=request.POST['password']

user = auth.authenticate(username=email,password=password)

if user is not None:

auth.login(request,user)

return render(request,"doctor/option.html")

else :

messages.info(request,'invalid crenditials')

return render(request,"doctor/home.html")

else :
return render(request,"doctor/home.html")

return render(request,'doctor/home.html')

def login(request):

if request.method == 'POST':

email=request.POST['email']

password=request.POST['password']

user = auth.authenticate(username=email,password=password)

if user is not None:

auth.login(request,user)

return render(request,"doctor/option.html")

else :

messages.info(request,'invalid crenditials')

return render(request,"doctor/home.html")

else :
return render(request,"doctor/home.html")

#return render(request,'doctor/home.html')

#def register(request):

# return render(request,'doctor/register.html')

def login2(request):

if request.method == 'POST':

email=request.POST['email']

password=request.POST['password']

user = auth.authenticate(username=email,password=password)

if user is not None:

auth.login(request,user)

return render(request,"doctor/option.html")

else :

messages.info(request,'invalid crenditials')

return render(request,"doctor/home.html")

else :
return render(request,"doctor/home.html")

return render(request,'doctor/home.html')

def register(request):

if request.method == 'POST':

first_name=request.POST['first_name']

last_name=request.POST['last_name']

email1=request.POST['email']

email2=request.POST['email2']

password1=request.POST['password']

password2=request.POST['password2']

if password1 == password2 and email1 == email2:

if User.objects.filter(username=email1):

#print("Username is taken")

messages.info(request,'Username is taken')

return redirect('register')

else:
user = User.objects.create_user(username=email1, password = password1,

email = email1, first_name=first_name,last_name=last_name)

user.save()

print("user created")

else:

#print("Password not matching or email is not matching")

messages.info(request,'Password not matching or email is not matching')

return redirect('register')

#return HttpResponse("<script>alert('User created')</script>")

return render(request,'doctor/registerComplet.html')

else :

return render(request,'doctor/register.html')

def rcomplete(request):

return render(request,'doctor/registerComplet.html')
def bipolar(request):

return render(request,'doctor/bipolarReport.html')

def predBipolar(request):

a = request.POST['Age']

b = request.POST['Right_answers']

c = request.POST['Audio_prosody']

d = request.POST['Combined_channel']

e = request.POST['Face_video']

f = request.POST['Body_video']
g = request.POST['Positive_valence']

h = request.POST['Negative_valence']

i = request.POST['Dominant']

j = request.POST['Submissive']

pemail1 = request.POST['pemail']

docname1 = request.POST['docname']

reportof = request.POST['reportof']

lists =[a,b,c,d,e,f,g,h,i,j]

df = pd.read_csv(r"static/database/Bipolar.csv")

X_train = df[['Age', 'Right_answers', 'Audio_prosody', 'Combined_channel',

'Face_video','Body_video','Positive_valence','Negative_valence','Dominant','Submissive

']]

Y_train = df[['Type']]

tree = DecisionTreeClassifier(max_leaf_nodes=6, random_state=0)

tree.fit(X_train, Y_train)

prediction = tree.predict([[a,b,c,d,e,f,g,h,i,j]])
return render(request,'doctor/predictBipolar.html',

{"data":prediction,"lists":lists,"a1":a,"b1":b,"c1":c,"d1":d,"e1":e,"f1":f,"g1":g,"h1"

:h,"i1":i,"j1":j,"pemail1":pemail1,"docname1":docname1,"reportof":reportof})

def bipolarSv(request):

a1 = request.POST['Age']

b = request.POST['Right_answers']

c = request.POST['Audio_prosody']

d1 = request.POST['Combined_channel']

e = request.POST['Face_video']

f = request.POST['Body_video']

g = request.POST['Positive_valence']

h = request.POST['Negative_valence']

i = request.POST['Dominant']

j = request.POST['Submissive']

#k = "nothing"

pemail1 = request.POST['pemail']

docname1 = request.POST['docname']
reportof1 = request.POST['reportof']

detail = request.POST['data']

#importing the package in realtime.

from .models import BipolarReport

from patient.models import PatientReg

#importing reportlab

#Genrating the report here

basename = "BipolarReport"

suffix = datetime.datetime.now().strftime("%y%m%d_%H%M%S")

filename2 = "_".join([basename, suffix])

loc="static/report/"+filename2+".pdf"

b3 = PatientReg.objects.get(pemail=pemail1)

fname = b3.pname

#file naming is in above

doc = SimpleDocTemplate(loc,pagesize=letter,
rightMargin=72,leftMargin=72,

topMargin=72,bottomMargin=18)

Story=[]

logo = "static/images/seal.png"

#giving all body of report

formatted_time = time.ctime()

full_name = fname

address_parts = [pemail1]

im = Image(logo, 2*inch, 2*inch)

Story.append(im)

styles=getSampleStyleSheet()

styles.add(ParagraphStyle(name='Justify', alignment=TA_JUSTIFY))

ptext = '<font size="12">%s</font>' % formatted_time

Story.append(Paragraph(ptext, styles["Normal"]))

Story.append(Spacer(1, 12))
# Create return address

ptext = '<font size="12"></font>'

Story.append(Paragraph(ptext, styles["Normal"]))

for part in address_parts:

ptext = '<font size="12">%s</font>' % part.strip()

Story.append(Paragraph(ptext, styles["Normal"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">Dear %s:</font>' % full_name.split()[0].strip()

Story.append(Paragraph(ptext, styles["Normal"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">We have generated the report of <b> %s</b>, we found the

your risk of %s is \

=<b>%s</b>, we recommend you to care for your health, because your this health

will\

help you to live the happy life. We are attaching the report here</font>' %

(reportof1,reportof1,detail)
Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

----\

Patient email = %s || Doctor name=%s \

</font>' % (pemail1,docname1)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

----\

Report of = <b>%s </b> \

</font>' % (reportof1)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))
ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

----\

<b>Age</b>= %s || <b>Right_answers</b>= %s || <b>Audio_prosody</b>=%s

</font>' % (a1,b,c)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

-----\

<b>Combined_channel</b>= %s || <b>Face_video</b>= %s ||

<b>Body_video</b>=%s \

</font>' % (d1,e,f)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))
ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

-----\

<b>Positive_valence</b>= %s || <b>Negative_valence</b>= %s || <b>

Dominant</b>=%s \

</font>' % (g,h,i)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

-----\

<b>Submissive</b>= %s || \

</font>' % (j)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))
ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

----\

Your rishk about the<b> %s</b>=<b> %s</b> \

</font>' % (reportof1,detail)

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">\

-------------------------------------------------------------------------------------------------------------

----\

</font>'

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))
ptext = '<font size="12">Thank you very much and we look forward to serving

you.</font>'

Story.append(Paragraph(ptext, styles["Justify"]))

Story.append(Spacer(1, 12))

ptext = '<font size="12">Sincerely,</font>'

Story.append(Paragraph(ptext, styles["Normal"]))

Story.append(Spacer(1, 48))

ptext = '<font size="12">%s</font>' % (docname1)

Story.append(Paragraph(ptext, styles["Normal"]))

Story.append(Spacer(1, 12))

doc.build(Story)

#applying the smtp server here

fromaddr = "[email protected]"

toaddr = pemail1
msg = MIMEMultipart()

msg['From'] = fromaddr

msg['To'] = toaddr

msg['Subject'] = "This is your report"

body = "Kindly check the attachment"

msg.attach(MIMEText(body, 'plain'))

filename = filename2+".pdf"

attachment = open(loc, "rb")

p = MIMEBase('application', 'octet-stream')

p.set_payload((attachment).read())

encoders.encode_base64(p)
p.add_header('Content-Disposition', "attachment; filename= %s" % filename)

msg.attach(p)

s = smtplib.SMTP('smtp.gmail.com', 587)

s.starttls()

s.login(fromaddr, "techcititech@123")

text = msg.as_string()
s.sendmail(fromaddr, toaddr, text)

print("Msg sent successful")

s.quit()

#saving the data

if len(BipolarReport.objects.filter(patientemail=pemail1)) == 1:

a = BipolarReport.objects.get(patientemail = pemail1)

a.docname = docname1

a.reportof = reportof1

a.reportnm = filename

a.Age = a1

a.Right_answers = b

a.Audio_prosody = c

a.Combined_channel = d1

a.Face_video = e

a.Body_video = f

a.Positive_valence = g

a.Negative_valence = h
a.Dominant = i

a.Submissive = j

a.riskvalue = detail

a.save()

else:

d =

BipolarReport(patientemail=pemail1,docname=docname1,reportof=reportof1,reportn

m=filename,Age=a1,Right_answers=b,Audio_prosody=c,Combined_channel=d1,Face_v

ideo=e,Body_video=f,Positive_valence=g,Negative_valence=h,Dominant=i,Submissive=j,

riskvalue=detail)

d.save()

return render(request,'doctor/sendSuccess.html')

You might also like