Ethical Hacking: Internshala
Ethical Hacking: Internshala
Internshala
Terminology
• Hacking - Hacking is the art or technique of finding and exploiting a security loophole
in an infrastructure like a website, a software, a computer, or even a human being, and
the artist is called a hacker.
• Loopholes - In technical terms, a loophole can be referred to a part of a system which is
not properly defined or secured and hence can be exploited to cause unintended things in
the system.
• Unethical hacking - When a hacker uses his knowledge to steal from or cause damage to
other people, it is known as Unethical Hacking. Like stealing, unethical hacking is also a
crime and if caught, the thief will be arrested and would be tried in court.
• Ethical hacking - When the hacker helps organizations or individuals with finding security
loopholes and fixing them with their permission, it is referred to as ethical hacking. And
this is legal because you take permission from the system owner and your motive is not to
cause harm or steal, but to secure the system.
Cyber Crimes
The Indian Cyber Laws and the Indian IT act classifies cybercrimes into 2
broad categories. An activity is considered a cyber crime if
1. A computer is being used to attack other computers. For example:
hacking, virus/worm attacks, DOS attacks, etc.
2. A computer is being used as a weapon to commit real world crimes.
For example: cyber terrorism, IPR violations, credit card frauds, EFT
frauds, pornography, etc.
3. Sending spam in bulk, flooding someone's email with junk emails, or
impersonating an email authority, are all criminal offences under the
Indian cyber law.
4. Even if you mean no harm, an organization, in this case the college
authorities, under the Indian Cyber Law, can file a case against you if you
choose to test their website without consent. Though, if you don't misuse
a loophole and neither cause any damage while looking for it, most
organizations will appreciate your effort. But, one must not take their
chances when legal matters are concerned. Better to be safe, than sorry.
5. Even if a grey hat hacker does minor black hat activities such as
using/distributing cracks/pirated software, creating fake social media
accounts for fun or editing his/her attendance in college, if caught, can be
charged for cyber offence.
6. Cyber laws not only consider gaining access to/corrupting devices/data
owned by others as a crime, but also the act of attempting to gain access,
whether you actually gain it or not, is considered a crime.
Hacker types
1. Script kiddie - In programming and hacking cultures, a script
kiddie, skiddie, or skid is an unskilled individual who uses scripts or
programs, such as a web shell, developed by others to attack computer
systems and networks and deface websites. It is generally assumed that
most script kiddies are juveniles who lack the ability to write
sophisticated programs or exploits on their own and that their objective
is to try to impress their friends or gain credit in computer-enthusiast
communities. However, the term does not relate to the actual age of
the participant. The term is considered to be derogatory.
2. Elite hackers - Members of the gifted segment of the Computer
Underground seen by their cyber colleagues to have special hacking
talent. Recently, the label “elite” has been altered to include not only
the ethical tester of virtual boundaries but also the detector of cyber
sabotage. Unlike crackers, elite hackers avoid deliberately destroying
information or otherwise damaging the computer systems they have
exploited.
3. State Sponsored hackers - State-sponsored hackers are also often
identifiable by their dedication to a specific target. Criminal hacking is
usually designed to target the largest possible number of victims in
order to increase the chances that someone will click on a malicious
link or mistakenly transfer money.
Hacktivist - A person who gains unauthorized access to computer files
or networks in order to further social or political ends.
Phreaks - A person who hacks into telecommunications systems,
especially to obtain free calls.
White hat hackers can be also be known as security auditors or security
experts.
No matter how well a black hat hacker clears his tracks, if the crime is
capital enough, the authorities will put all efforts to trace him, like using
data forensics, hiring white hat hackers and even using statewide
surveillance. In simple words, it depends on what the hacker has done,
and who is after him.
SIPs
• Black box testing is done from the perspective of a black hat hacker
and so no source code is provided. This is done to see what a tester can
do without any internal knowledge of the infrastructure.
• During a white box test, the complete architecture of an application is
made transparent to the hacker, and so he is able to look at even the
smallest configurations and user roles and find flaws in them; whereas
in a black box test, a lot of time is spent in proper information
gathering since not much is known about the target and hence lesser
possibilities of finding a lot of bugs.
Testing methodologies
Following methodologies are done on the basis of assistance,
• Black hat testing – When no assistance is provided to the experts, it is
known as Black hat testing.
• Data packets – The server always sends the data in small chunks
which are known as data packets, this is done for more efficient
transfers.
NAT
• The process of transferring and translating data between internal and
external networks is known as NAT or Network address translation.
• Actually, it is made to overcome the over needed IP addresses, so that
now, only the routers would have the IP addresses not all the nodes
would have IPs.
SIP
• For small internal networks (Like your home or small office):
192.168.0.0 to 192.168.255.255
• For large internal networks (Like large MNCs, colleges, schools):
172.16.0.0 to 172.31.255.255
• For massive internal networks (Like telecom networks, satellites):
10.0.0.0 to 10.255.255.255
• 127.0.0.1 : This is called the Loopback address and is used as the
address of your own machine.
Domain name and DNS
• A domain name is a humanly understandable name of any web
application hosted on one or more servers, and it helps to connect to
them.
• A DNS or domain name system is a system of devices that helps us
finding the corresponding IP address of domain name.
Why DNS is a system of devices not a single server?
There’s a huge number of domain names and one server cannot store
all the domain names with their IP addresses.
One server cannot handle all the requests per second
• Everything starts form our system only, it checks our cache then it goes
to OS cache, if not found there too then it goes to it’s default DNS server
provider, everything has to remember the latter's IP, looks in it’s cache.
• There are 13 root name servers spread across the entire world.
• So, here are some commands that we can use to do a live DNS lookup
and to check for IP addresses of domain names.
Now let’s check what applications are running on our computers/laptops. For this we
will look at the open ports running on our machines.
HTTPS: This is the secure version of HTTP, where s stands for secured and is used to transfer web pages in a secured way. Most
websites that we visit, like internshala, amazon, google, etc., use HTTPS and not HTTP.
The fact that it is secured means that all communications between your browser and the website you are connected to, will be
encrypted.
You can see this in the address bar located at the top of the browser.
FTP: FTP stands for File Transfer Protocol and is used while transferring files.
SMTP: SMTP stands for Simple Mail Transfer Protocol and as the name suggests, it is used to send emails from one device to
another. But when you open gmail, or compose and send an email, does your address bar show SMTP, or HTTPS. Well, try it and
find out for yourself, and lookout for the reason somewhere in this topic.
VOIP: This stands for Voice Over Internet Protocol and is used for making a voice call over the internet.
OSI Model
Application Layer- This layer provides an interactive interface for the user to enter and view data. One can give inputs in the form of text, audio, images,
files, etc. The browser makes up the application layer.
Presentation Layer- After the application layer, the data passes to the presentation layer. This is where the data is converted into computer friendly format,
i.e in binary code. So, the presentation layer encodes the input, compresses it, and encrypts it if required. Then the data is sent to the next layer.
Session Layer- This layer initiates a connection and creates a session, so that some context can be provided to the communication between the two
devices.
Transport Layer- This layer establishes an application level connectivity. For this, it attaches the source and destination port numbers.
It also performs the task of error control, which means that it makes a checklist, so that it can be cross checked at the receiving end to ensure that all the
data is transferred properly and not destroyed on the way. These checklists are known as checksums.
Network Layer- At the network layer, the source and destination IP addresses are attached, for the purpose of identification of devices, and to decide the
virtual path that needs to be taken by the data packet. So, we can say that this layer does network level routing and pathing of packets.
Data Link Layer- This layer attaches the source and destination MAC addresses, which are used to identify the hardware of the device. It also calculates
checksums for error checking of the metadata that has been attached at all the previous layers, and also to manage the flow of data.
Physical Layer- This is where the data is converted to hardware friendly signals, like radio signals, light signals, or electric signals, depending on the
hardware that is being used for data transfer.
SIP
• Streaming live matches requires fast speed, and so UDP is a better fit. Also, in case
of live matches, even if some frames on the screen are lost, it’s acceptable instead
of waiting for all frames to be loaded.
• You might think that it's SMTP since we use it for sending mails, but both Gmail
and Yahoo are simply websites and hence use HTTPs.
• Even then, HTTPs is used and not SMTP as you might have thought. When you
press “Send” your email is simply sent from your browser to Google’s Server using
HTTPs. Google’s internal server then sends the email using SMTP. But if you are
using a software like Outlook, then you can see that it uses SMTP to send your
email.
• No matter what it is called, packets or datagrams, but all protocols break data into
chunks before transferring it
• In case of TCP, for every packet that is sent, an acknowledgement is
expected from the receiver. So A will send P and wait for a fixed time in the
hope of an acknowledgement. If an acknowledgement is not received in this
fixed time, A will resend P and start waiting again. If B successfully receives
P and tells that to A, then A will send the next packet.
• The application layer is the main application that the user interacts with. It
acts as a medium between the user and the computer. It takes user input,
passes it down to other layers, and shows back the output.
• The Data Link layer deals with the Physical address also known as the Media
Access Control (MAC) Address.
• The session layer creates a connection between 2 devices and manages that.
Its job is to hold data specific to the time period during which the devices are
connected, and hence each packet sent while the connection is active has a
context.
Proxy
• A proxy is used to hide devices, hiding in the way by not letting the
receiver’s end to know the IP address of the sender’s IP, instead
proxy’s address is sent to the receiving end to establish a connection.
• One of the disadvantages of a proxy server is that it stores log data, so
that if anyone get into the proxy server, they may be able to retrieve
the data.
Uses
Uses of proxy servers:
General users:
1. Obscure their IP
2. Avoid surveillance
3. Bypass browsing restrictions
4. Access resources as from a different country
Developers:
1. Monitoring web traffic
2. Troubleshooting web applications
Network administrators:
1. To block malicious traffic
2. To balance overflowing traffic
SIP
• Upon setting up a proxy in the browser, a developer/debugger, or a pentester can see all the
traffic of a website and then analyse it and play around with it. Also as traffic goes through a
proxy, network administrators can block websites and monitor user traffic for administration.
• Although a VPN guarantees that all data is secure, anonymous and there are no logs, if there
is a legal request, VPNs will reveal the original IP address or other such related information
about the user. So, a VPN cannot be used to be an untraceable malicious hacker.
• A proxy’s simple goal is to transfer web traffic through an intermediate gateway whereas the
goal of a VPN is to make sure each and every packet leaving your device regardless of the
application/protocol, is encrypted and then sent while masking your original IP address.
• DHCP is used for assigning IP addresses, ARP for conversion between IP and MAC
addresses, and ICMP is used for debugging network stability and deliverability.
Information Gathering and Reconnaissance
• Gathering as much information about the target as possible and
organizing it in a structured manner so that it can be utilized later in
the vulnerability assessment and penetration testing phase, is known
as information gathering.
• Reconnaissance is the process of analyzing all this information
gathered and utilizing it to understand the target.
• Digital footprints are the clues or traces left by a person when they
are online.
• The two most important information services made in information
gathering are WhoIs information and reverse IP lookup.
• WhoIs is a protocol that queries and receives response from the
database that stores the registration information of a domain or an IP
address.
Here are some links that can be used for the WhoIs lookup.
• https://www.whois.com/whois/
• https://whois.icann.org/en
Sometimes multiple websites are hosted on a server, some of the
websites may belong to a common organization and some may be
different, these websites belonging to different organizations are
known as shared hosting.
A reverse IP lookup looks up the IP address and gives a list of all the
domains running on the same server.
Here are some links that can be used for the Reverse IP lookup:
• https://mxtoolbox.com/reverselookup.aspx
• http://viewdns.info/reverseip/
SIP
• Whois information is in most cases not cross checked and the registrant
can enter any information in the whois details while registering a
domain/IP. Hence it cannot be used as a proof of ownership.
• In white box testing, most of the information is provided by the client
itself and although information assessment is still required, it is lesser as
compared to that required during a black box test.
• Whois shows registration details of a domain. Reverse registrant check
shows all domains registered by a single registrant. Hosting history shows
whois information of a domain over time. Reverse IP check is used to see
all other domains resolving to the same IP as the IP of a given domain.
Here are some key pieces of information that a security expert usually gathers about a website:
• 1. Related domains and subdomains
• 2. Technology and programming languages being used
• 3. Cached pages
• 4. Website history
• 5. Publicly indexed files on search engines
• 6. Default pages and login forms
• 7. Related IP addresses
• 8. Other services running on those IP addresses
• 9. Version of the services/software being used
• 10. Publicly disclosed vulnerabilities in the software being used
• 11. Default users
• 12. Default passwords
• 13. Valid email address and usernames
SIP
Gathering targeted information about people
• 1. Name-How to find out full names and their related information:
• Social media platforms
• Professional platforms
• 2. Email- How to find out the name behind an email address:
• Forgot password
• Services linked to that email
• Google search
• 3. Mobile numbers- How to find out the name behind a phone numbers:
• Login and forgot password pages
• Google search
Gathering targeted information about organisations
• 1. How to find information about an organisation:
• Social media platforms
• Company review services
• Organisation financial analysis services
Gathering information about websites and web servers
• 1. Getting an idea about the technology being used by websites and web servers:
• www.builtwith.com
• Important sections:
• Frameworks: To see the programming languages used
• Hosting providers: To see where the website is hosted
• Webserver: To see the server software being used
• 2. Going through the history of a website
• To see how the website looked in the past, its features, additions and deletions that have been made over time:
• web.archive.org
• Important sections:
• Go to the year you want to see
• Check out screenshots taken on any day, and also see the website as it was on that day
• 3. Finding out sub domains related to a domain
• www.dnsdumpster.com
• Important sections:
• Host Records (A): To see a list of all the sub domains of any given domain.
SIP
• Google stores all the websites in a special database called search index.
• Dorks are specific search filters that can be applied to a search engine, to make the search targeted and specific.
• For google, it’s site:.
Some of the most commonly used google dorks are:
• 1. site: <Domain>
This is the most common dork, and it filters out web pages from a single website.
Eg: “site:internshala.com” lists out all the web pages on internshala.
It can also be used to search for web pages within a specific sub domain, or even for an entire TLD.
So you can search for “site:trainings.internshala.com” to search for a specific sub domain or for “site:in” to search for all the web pages
with the top level domain (TLD) ".in" in them.
• 2. inurl: <Text to find>
This keyword can be used to find URLs with specific text in them.
So if you search for “inurl: login.html” it will give a list of all URLs where the text login.html is present.
• 3. intitle: <Title text>
This dork can be used to search for web pages which have some specific keyword in the web page title.
For example, “intitle: admin login” gives a list of several admin panels.
• 4. intext : <Text>
This dork can be used to search for specific keywords in the body of the web page.
So if you type “intext: webcam login” it returns a lot of interesting results, some of which look like login pages of live webcams across the globe. Some have
weak passwords, or no passwords at all, which makes them vulnerable to attack.
• 5. filetype: <Type>
This is the most useful dork, and can be used to filter out web pages which have a particular type. This dork can be used to search for documents (pdf),
spreadsheets (xls), webpages (html), server pages (php), executables (exe), presentations (ppt) and much more.
A lot of students use it to quickly find pdfs related to the assignments that they are supposed to make. For example, if you do a search for “Revolt of 1857
filetype: pdf” you get a result of all pdfs on the topic.
• 6. ext: <File extension>
This is similar to the “filetype: <Type>” dork, and can be used to search for specific or uncommon file extensions.
Eg: “ext: config” returns a list of all configurations which have the name “filename.config”
• 7. “Exact word”
We have already learnt how to use this dork in the previous topic.
When we search for a keyword without putting double quotes, the result includes pages which have the exact word, or synonyms, or other related material. But,
when we do a search using double quotation marks, the search is more specific, and returns only those web pages which actually contain the keyword as it is.
• 8. Negative search - (minus)
This search is used to eliminate certain types from the main search.
For example, you want to find out platforms that have the beginners guide to C++. But, you want a free version of the book, and not a paid one.
So you can simple search for “Beginner's guide to C++ -buy -order -purchase -pay” to get results of free books.
• 9. “Keyword 1” | “Keyword 2”
This search can be used to put an OR between keywords, which are in double quotes.
Eg: “admin login” | “administrator login”
• 10. IP: dork on bing can be used to find websites related to a specific IP address.
SIP
• The GHDB holds no responsibility of what one does with the dorks
and hence, if after using a dork you end up clicking a link and
accessing critical information of a non-consented organisation, you
can get in trouble.
Web servers can be of various types. Each one has a specific function, and hence a specific
configuration. Let us read about some of the most common web servers.
• Application Server- This server executes the main business logic of the application. Whenever the
user requests for something, the application server runs the code written by the developer.
• Database Server- A database server is a system where all the data is stored. Whenever the user
requests for some data, it is fetched from the database server. The data is stored here in an efficient
and secure manner.
• Backup Server- This server helps us create backups for files, data, etc. This is done to prevent the
loss of data in case of an unexpected failure. A backup server can also act like the secondary server,
in case the primary server is down.
• DNS Server- The Domain Name Server manages the domain names and their IP addresses. The
main function of a DNS server is to map a domain name to its respective IP address.
• Mail Server- A mail server is used for sending and receiving emails. Some of the protocols used
for this transfer are SMTP, POP, IMAP, etc. The Microsoft Exchange Server is an example of a
mail server.
Depending on the size of the web application, all these servers can be present on one physical server
or on separate servers.
So, like we said, it is not necessary to have these as 5 different servers, but a combination of these can be present in one physical server,
depending on the requirement of the application.
Now, this server will have some architecture which should be appropriate for the kind of functions that the server will perform.
This architecture is called a web server architecture. It is made up of these 5 basic elements. Let’s look at each one of these.
• Server OS- Just like every computer has an operating system, similarly the computer that hosts the website also needs to have an OS.
Examples are Linux, Windows, IBM AIX, etc.
• Server Software- We know that every website needs to address the incoming requests of the users. This request could be for a web
page in the website, or for any other functionality that the website provides. For this, the server needs to run the code of the website to
generate a response for the user. But, to handle all this function, the server needs a software which is called the server software.
Examples are Apache, nginx, IIS, etc.
• Programming Language- Every website has a backend part which is basically written as lines of code, using a programming
language. So, the web server architecture includes a particular programming language that is used to write this code.
Examples are: PHP, Python, Perl, Ruby, ASP (.NET), JSP, etc.
• Database Software- Every website has users and it stores the information of these users in the database. So your login credentials,
your preferences, cart items in case of an e-commerce, or any other details that you provide while accessing a website is stored in the
database in a secure and efficient manner. And to access this data from the database, a software is required. This is known as the
database software.
Examples are: MySQL, MS SQL, MongoDB, Casandra DB, Postgre SQL, etc.
• Front End Components- So, we know that every website has a frontend or a user interface, which is what the user sees on the
browser while browsing through the website. So, there needs to be a front end language to write the front ends code.
Examples are: HTML, javascript, Jquery, CSS, Bootstrap, etc.
• Some of the most common web server architecture combinations are:
• (The front end component is not mentioned in any of these architecture combinations.)
• WAMP- WAMP stands for Windows, Apache, MySQL, PHP.
• LAMP- LAMP stands for Linux, Apache, MySQL, PHP. It is one of the most frequently used
combinations since all the components are available free of cost.
• MAMP- MAMP stands for Mac, Apache, MySQL, PHP. It is most commonly used for web development
and local testing processes by Mac OS based developers.
• XAMPP- Unlike other web server architectures, XAMPP can be used across any operating system. So the
X in XAMPP stands for cross platform. The rest of it stands for Apache, MariaDB and PHP.
• WIMSA- It is the most commonly used Windows architecture. WIMSA stands for Windows, IIS, MS
SQL, ASP.NET.
• Some of the other non abbreviated web server architectures are:
• Windows, tomcat, JSP, Postgre SQL
• PHP, nginx, mongoDB
• Python, nginx, mongoDB
• To give you a clear picture, the most commonly used OS is Linux. Apache is the most commonly used
server software and PHP is the most commonly used server side programming language.
SIP
• HTTPS is used to make sure that everything the user enters on the
browser (like username, password, bank details etc) is encrypted and
then sent to the webserver. The same is true for the response sent by
the server. This is done so that the communication between them
cannot be sniffed. However, this only makes the communication
secure and not the website.
• In PHP, although strings have to be stored in " " like $name="jack",
for numbers it doesn’t make any difference. I.e. $a=1 is the same as
$a="1" is the same as $a='1'.
VAPT
• Vulnerability assessment phase is a phase where hacker or a security
expert tries to find all the vulnerabilities in a system.
• Penetration testing phase is a phase where a hacker or a security
expert exploits a vulnerability and tests how much damage he can
cause using that vulnerability.
SIP
• OWASP has released Top 10 lists of 2010, 2013 and 2017, which helps security experts and
developers to find the most common web based system vulnerabilities.
• VA helps in finding the vulnerabilities and shows that they exist, while PT helps in
exploiting them and also tells the impact these vulnerabilities can cause. PT is dependent on
VA.
• Injection Vulnerabilities allow a hacker to inject and execute his own code like PHP code or
commands like windows cmd commands on the server itself giving him complete control of
the server.
• This is a common vulnerability done by most developers/ admins as they keep the default
password and username for convenience.
• These flaws occur when developers/server admins use 3rd party softwares/applications/code
that already have known public vulnerabilities and anyone can search about about them and
misuse it.
SQL and Database
• SQL is structured query language used to query data from the
database.
• Database is a collection of data stored by a website in particular
manner.
• Database software stores data in the form of tables.
• SQL is a language which is used inside Server Side Programming Languages to
communicate to database software in order to Save data in databases and
retrieve it later.
• Data Definition Language (DDL):- This command is used to define the
structure of the data like how and where it would be stored. It is used in
creating databases and tables, defining the structure of the tables and the
columns. Examples include :- Create table, Alter table, Drop table.
Data Manipulation Language (DML):- These commands are used to
manipulate already existing data inside a table or insert new data (rows) inside
a table. It helps to edit, delete, and create rows. Example Commands: Insert
into <table>, update table (rows) and delete table (rows).
Data Query Language (DQL):- These commands are used to Query data from
the database i.e. fetch required data from the database. It is used to fetch data
from all the rows, fetch specific data, sort data and even calculate values inside
the rows. Examples: Select <columns> from <table>, Order by <column>.
SIP
• If the non-numeric values are not enclosed in single or double quotes,
the script will either show an error or fail to load.
• Asterisk is used to fetch all the columns from the table.
• Double hyphens are always sent using a space at the end of the query.
Generally, we cannot send space in the URL’s, and for this reason
these are sent as --+.
• UNION based SQL injections are used to exploit vulnerabilities and it
requires same number of columns to be present in both the Select
statements.
SIP
• INFORMATION_SCHEMA stores all the metadata which include
structural properties of all the databases. It is an inbuilt database in
most of software database.
• Attempting to or trying to do SQL injection on any website without
prior consent can backfire on the tester even if the website is not
vulnerable.
• Repeater is used for manually editing and repeating a particular HTTP
request any number of times and to analyse the application responses.
This can be very helpful when we only want to play with one request
and do not want to keep on switching the intercept on/off.
• The Intercept pauses each and every request and the next request
won’t go until we pass the intercepted one. This can be helpful when
can send a request only once, for example in case of payment or
deleting something, but if we want to try sql injections, then it
becomes tricky as every time we will need to fill the form, intercept
the request, edit it, forward it and analyse the response in the browser.
Instead we can use repeater for this.
• Any time you want to send a single request a large number of times,
each with same or some different but calculated data, like when brute
forcing mobile number, passwords or simply sending a registration
request 1000 times, you can use Burp’s intruder.
• Even if you are just checking the security of the website, tampering
the HTTP requests without prior consent is illegal.
SIP
• Client side filters are implemented in our system using client side
code like HTML, JS, CSS, etc. so if we find a way to tamper this request
before it is sent to the server there is a chance that we can bypass the
filters. Whereas in server side filters, when a request with invalid data
reaches the server, it is there in the code that the checking process is
written. So all checks happen out of our reach and hence it is not so
easy to bypass server side filters especially when we don’t even know
what the filters are.
When can improper or missing server side validation happen?
1. The developer do not suspect the possible requests that can come
from the user that can create a loophole in the application logic.
2. Application only implements checks and validates on the client side.
3. Server side validations are done without considering all possible
attack vectors.
In server side filters the request is sent to the server before anything, the
server checks if the data is in correct format or has the correct value and
only if this is true, does further processing take place.
This is a clear example of Missing/improper server side checks but the
captcha is implemented on the client side and it can be bypassed by
completely removing the captcha parameter from the request. So it is a
client side filter bypass too. On the server, generally if the captcha is
incorrect, the request is rejected, hence it is a server side filter, but if we
remove the captcha parameter, then the request is still accepted, hence
bypassing the server side filter. This is due to Missing/improper server
side checks. Hence all 3 are correct.
Understanding response headers
So, in the previous module, we looked at some server side attacks. These attacks
are used to attack the server or to take complete control of the server. It is
important to know server side languages like PHP and sql to carry out these
attacks, or to prevent them.
However, in this module, we will look at the client side attacks. These attacks are
used to cause harm to the users of a web application directly.
So, by carrying out these attacks, the hacker can directly attack the browser of the
victim. To understand these attacks, we need to know client side languages like
HTML and JavaScript.
To understand client side attacks, let us first understand how a web browser works.
We know that when we open a website, let’s say internshala, an HTTP request is sent to the
server. The server then processes this request and sends back an HTTP response to our
browser. Now, this HTTP response is parsed by our browser and displayed to us.
But, this HTTP response contains something called HTTP headers. These headers are the
metadata that is not shown to us.
But, if we analyze these response headers, we can learn a lot about the way HTTP responses
work.
Now usually the http response headers are very lengthy, and we are not going to look at each
and every line.
We will mainly look at 3 important HTTP response headers.
The first line of the header, that tells us about the nature of the response.
The set-cookie header.
Content length header.
We will look at each one of these.
Now, this is just one type of response. There are a few more important responses
that we must know about.
30X: A response in the 300 range is used to signify redirection. For example, if you
requested for page 1, but are being redirected to page 2. In this case, the response
will say, “301 Moved Permanently to Location: page2”.
40X: These responses depict errors that occur due to the user’s fault. The most
common response we have all come across is 404:Not Found error. We get this
response when the page we have requested for does not exist.
Another example is the 403: Forbidden response. This comes when you request for
a page that you are not supposed to visit.
50X: These responses occur when there has been some error on the server side. For
example, if a website is not able to connect to its database due to some server side
code error, you might see 500 internal server error.
Now, after the first line of the response headers, we see some standard HTTP
response headers. These headers basically tell the browser about the response and
how to handle it. They are like the configuration settings sent by a web server to be
stored in the browser for later usage.
In these settings, you may choose to study about some of them in detail. These
include the Content Security Policy, Referrer Policy, Allow Origin, X-powered-by,
etc. We will not be covering these in our topic, but you can read more about them
online.
Sessions and Cookies
• Sessions start when one visit a website and end, when they leave the
website.
• Cookies are a client side piece of information and are stored in your browser.
Cookies are used to give you a personalized experience.
Cookies are used to identify you as a user and authenticate you automatically
you at each step, so you don’t have to enter your password at each action you
take.
Every cookie has an expiry time.
A cookie becomes invalid and gets deleted upon expiry.
A cookie is 30-100 characters long.
Here is a list of some commonly used event listeners.
1. onclick and ondblclick: In this case the eventhandler listens to a “click” event. If any user clicks on the web
page, this event listener will get triggered and will show an alert.
Eg:
<html>
<body>
<button name="test_button" onclick="alert('clicked!')" ondblclick="alert('double clicked!')">Click me!</button>
</body>
</html>
2. Iframe onload: Another very common event listener is ‘onload’, which simply gets triggered when some
element (image, body, iframe, video etc. has finished loading.
Eg:
<html>
<body>
<iframe src="https://ipchicken.com" onload="alert('lo')"></iframe>
</body>
</html>
• 3. Image onerror: In this event listener, the src attribute in the img tag looks for the file given in the URL. But, if the url
raises an error and cannot be accessed, we can have an onerror event listener to display an appropriate message to the user.
Eg:
<html>
<body>
<p>
• Example of onerror event:
</p>
<img src="x" onerror="alert('No image found');">
</body>
</html>
4. Using getElementById method: Here, we ask the user to input her name using <input> tag. Then we access this input
using the getElementById method, and add a “Hi”, to it. Then we display it on the alert box. This alert is displayed when the
user clicks on the button after giving the input.
Eg:
<html>
<body>
<input type="text" placeholder="Enter your name" id="textfield1">
<button onclick="alert('Hi '+document.getElementById('textfield1').value)">Click Me</button>
</body>
</html>
SIP
• Event listeners are simple code that wait for an event to happen on a
specific element and when it does, carry out the written code after it.
For every DOM event, you can setup an event listener like onClick,
onMouseOver, onDblClick, onKeypress, etc. For example to setup
onClick listener on the button we can write this code: <button
id=”button1” onClick=”alert(‘hello’)”> Click me </button>.
XSS
• Temporary XSS - The vulnerabilities that allows hackers to insert malicious codes
into the HTML code of the browser are called as temporary XSS or reflected xss.
This attack is called temporary as the injected attack is not stored within the
application, rather it infects only those users who have access to these links.
• Permanent XSS - The vulnerabilities that allows hackers to inject and execute
malicious client side scripts through the browser which gets permanently stored
in the server are called as permanent XSS or stored XSS.
• HTML injection - When a hacker is not able to execute JavaScript using XSS, but
still able to cause potential harm using HTML. This particular vulnerability is
called as HTML injection which occurs due to improper output validation as the
website without any proper sanitation attaches the user input to its own HTML
code.
SIP
• Ctrl + U or right click on the web page and select view page source,
gives us the HTML code of the web page.
• Ctrl + F, is used for searching.
• Proof of Concept(PoC), is used to confirm that the web page is
vulnerable to JavaScript injection.