Activation Functions Last Updated : 22 Oct, 2024 Comments Improve Suggest changes Like Article Like Report To put it in simple terms, an artificial neuron calculates the 'weighted sum' of its inputs and adds a bias, as shown in the figure below by the net input. Mathematically, \text{Net Input} =\sum \text{(Weight} \times \text{Input)+Bias} Now the value of net input can be any anything from -inf to +inf. The neuron doesn't really know how to bound to value and thus is not able to decide the firing pattern. Thus the activation function is an important part of an artificial neural network. They basically decide whether a neuron should be activated or not. Thus it bounds the value of the net input. The activation function is a non-linear transformation that we do over the input before sending it to the next layer of neurons or finalizing it as output. Types of Activation Functions Several different types of activation functions are used in Deep Learning. Some of them are explained below: Step Function: Step Function is one of the simplest kind of activation functions. In this, we consider a threshold value and if the value of net input say y is greater than the threshold then the neuron is activated. Mathematically, f(x)= \begin{cases} 1, & \text{if x} \geq 0 \\ 0,& \text{if x}<0 \end{cases} Given below is the graphical representation of step function.Sigmoid Function: Sigmoid function is a widely used activation function. It is defined as: \frac{1}{(1+e^{-x})} Graphically, This is a smooth function and is continuously differentiable. The biggest advantage that it has over step and linear function is that it is non-linear. This is an incredibly cool feature of the sigmoid function. This essentially means that when I have multiple neurons having sigmoid function as their activation function – the output is non linear as well. The function ranges from 0-1 having an S shape.ReLU: The ReLU function is the Rectified linear unit. It is the most widely used activation function. It is defined as: f(x) = \max(0, x) Graphically, The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time. What does this mean ? If you look at the ReLU function if the input is negative it will convert it to zero and the neuron does not get activated. Leaky ReLU: Leaky ReLU function is nothing but an improved version of the ReLU function. Instead of defining the ReLU function as 0 for x less than 0, we define it as a small linear component of x. It can be defined as: f(x) = \begin{cases} ax, & x<0\\ x, & \text{otherwise} \end{cases} Graphically, Read More:Activation functions in Neural NetworksUnderstanding Activation Functions in DepthTypes Of Activation Function in ANNPractices Questions - Activation FunctionsQ 1. Formula: sigma(x) = 1 / (1 + e-x)Compute the output of the sigmoid activation function for the input values: -1, 0, and 1.Q 2. Formula: ReLU(x) = max(0, x)Calculate the output of the ReLU activation function for the input values: -3, 0, and 3.Q 3. Formula: Leaky ReLU(x) = max(0.01 * x, x)Implement the Leaky ReLU activation function for the input value 0.5 with a negative slope coefficient of 0.01.Q 4. Formula: \text{softmax}(x_i) = \frac{e^{x_i}}{\sum(e^{x_j})} for all j)For the input vector [1, 2, 3], compute the output of the SoftMax activation function.Q 5. Swish(x) = x × sigma(x)Compute the output of the Swish activation function for an input value of 2. Comment More infoAdvertise with us Next Article Matrices V Vineet Joshi Follow Improve Article Tags : Engineering Mathematics Neural Network Similar Reads Engineering Mathematics Tutorials Engineering mathematics is a vital component of the engineering discipline, offering the analytical tools and techniques necessary for solving complex problems across various fields. Whether you're designing a bridge, optimizing a manufacturing process, or developing algorithms for computer systems, 3 min read Linear AlgebraMatricesMatrices are key concepts in mathematics, widely used in solving equations and problems in fields like physics and computer science. A matrix is simply a grid of numbers, and a determinant is a value calculated from a square matrix.Example: \begin{bmatrix} 6 & 9 \\ 5 & -4 \\ \end{bmatrix}_{2 3 min read Row Echelon FormRow Echelon Form (REF) of a matrix simplifies solving systems of linear equations, understanding linear transformations, and working with matrix equations. A matrix is in Row Echelon form if it has the following properties:Zero Rows at the Bottom: If there are any rows that are completely filled wit 4 min read Eigenvalues and EigenvectorsEigenvalues and eigenvectors are fundamental concepts in linear algebra, used in various applications such as matrix diagonalization, stability analysis and data analysis (e.g., PCA). They are associated with a square matrix and provide insights into its properties.Eigen value and Eigen vectorTable 10 min read System of Linear EquationsA system of linear equations is a set of two or more linear equations involving the same variables. Each equation represents a straight line or a plane and the solution to the system is the set of values for the variables that satisfy all equations simultaneously.Here is simple example of system of 5 min read Matrix DiagonalizationMatrix diagonalization is the process of reducing a square matrix into its diagonal form using a similarity transformation. This process is useful because diagonal matrices are easier to work with, especially when raising them to integer powers.Not all matrices are diagonalizable. A matrix is diagon 8 min read LU DecompositionLU decomposition or factorization of a matrix is the factorization of a given square matrix into two triangular matrices, one upper triangular matrix and one lower triangular matrix, such that the product of these two matrices gives the original matrix. It is a fundamental technique in linear algebr 6 min read Finding Inverse of a Square Matrix using Cayley Hamilton Theorem in MATLABMatrix is the set of numbers arranged in rows & columns in order to form a Rectangular array. Here, those numbers are called the entries or elements of that matrix. A Rectangular array of (m*n) numbers in the form of 'm' horizontal lines (rows) & 'n' vertical lines (called columns), is calle 4 min read Sequence & SeriesMathematics | Sequence, Series and SummationsSequences, series, and summations are fundamental concepts of mathematical analysis and it has practical applications in science, engineering, and finance.Table of ContentWhat is Sequence?Theorems on SequencesProperties of SequencesWhat is Series?Properties of SeriesTheorems on SeriesSummation Defin 8 min read Binomial TheoremBinomial theorem is a fundamental principle in algebra that describes the algebraic expansion of powers of a binomial. According to this theorem, the expression (a + b)n where a and b are any numbers and n is a non-negative integer. It can be expanded into the sum of terms involving powers of a and 15+ min read Finding nth term of any Polynomial SequenceGiven a few terms of a sequence, we are often asked to find the expression for the nth term of this sequence. While there is a multitude of ways to do this, In this article, we discuss an algorithmic approach which will give the correct answer for any polynomial expression. Note that this method fai 4 min read CalculusLimits, Continuity and DifferentiabilityLimits, Continuity, and Differentiation are fundamental concepts in calculus. They are essential for analyzing and understanding function behavior and are crucial for solving real-world problems in physics, engineering, and economics.Table of ContentLimitsKey Characteristics of LimitsExample of Limi 10 min read Cauchy's Mean Value TheoremCauchy's Mean Value theorem provides a relation between the change of two functions over a fixed interval with their derivative. It is a special case of Lagrange Mean Value Theorem. Cauchy's Mean Value theorem is also called the Extended Mean Value Theorem or the Second Mean Value Theorem.According 7 min read Taylor SeriesA Taylor series represents a function as an infinite sum of terms, calculated from the values of its derivatives at a single point.Taylor series is a powerful mathematical tool used to approximate complex functions with an infinite sum of terms derived from the function's derivatives at a single poi 8 min read Inverse functions and composition of functionsInverse Functions - In mathematics a function, a, is said to be an inverse of another, b, if given the output of b a returns the input value given to b. Additionally, this must hold true for every element in the domain co-domain(range) of b. In other words, assuming x and y are constants, if b(x) = 3 min read Definite Integral | Definition, Formula & How to CalculateA definite integral is an integral that calculates a fixed value for the area under a curve between two specified limits. The resulting value represents the sum of all infinitesimal quantities within these boundaries. i.e. if we integrate any function within a fixed interval it is called a Definite 8 min read Application of Derivative - Maxima and MinimaDerivatives have many applications, like finding rate of change, approximation, maxima/minima and tangent. In this section, we focus on their use in finding maxima and minima.Note: If f(x) is a continuous function, then for every continuous function on a closed interval has a maximum and a minimum v 6 min read Probability & StatisticsMean, Variance and Standard DeviationMean, Variance and Standard Deviation are fundamental concepts in statistics and engineering mathematics, essential for analyzing and interpreting data. These measures provide insights into data's central tendency, dispersion, and spread, which are crucial for making informed decisions in various en 10 min read Conditional ProbabilityConditional probability defines the probability of an event occurring based on a given condition or prior knowledge of another event. It is the likelihood of an event occurring, given that another event has already occurred. In probability, this is denoted as A given B, expressed as P(A | B), indica 12 min read Bayes' TheoremBayes' Theorem is a mathematical formula used to determine the conditional probability of an event based on prior knowledge and new evidence. It adjusts probabilities when new information comes in and helps make better decisions in uncertain situations.Bayes' Theorem helps us update probabilities ba 13 min read Probability Distribution - Function, Formula, TableA probability distribution is a mathematical function or rule that describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment.While a Frequency Distribution shows 13 min read Covariance and CorrelationCovariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. Relationship between Independent and dependent variab 6 min read Practice QuestionsLast Minute Notes - Engineering MathematicsGATE CSE is a national-level engineering entrance exam in India specifically for Computer Science and Engineering. It's conducted by top Indian institutions like IISc Bangalore and various IITs. In GATE CSE, engineering mathematics is a significant portion of the exam, typically constituting 15% of 15+ min read Engineering Mathematics - GATE CSE Previous Year QuestionsSolving GATE Previous Year's Questions (PYQs) not only clears the concepts but also helps to gain flexibility, speed, accuracy, and understanding of the level of questions generally asked in the GATE exam, and that eventually helps you to gain good marks in the examination. Previous Year Questions h 4 min read Like