Math ABC: Sigmoid function(ANN Activation Function) and Its Shape

sigma open paren x close paren equals the fraction with numerator 1 and denominator 1 plus e raised to the negative x power end-fraction

Sigmoid function has the formula as above. The sigmoid function maps the input x to a value between 0 and 1.

Activation functions are a crucial component of neural networks, responsible for introducing non-linearity into the model. Without non-linear activation functions, a neural network would essentially behave like a linear model, regardless of the number of layers, limiting its capacity to solve complex problems. The "best" activation function is often found by quick experimentation on your specific dataset.

How People choose the activation function for ANN?

Standard rules of thumb (what practitioners actually do)

🔹 Hidden layers (most important choice)

Activation	When used	Why
ReLU	Default choice	Simple, fast, no vanishing gradient
Leaky ReLU / GELU	Deeper or transformer models	Fix “dead ReLU” problem
tanh	Small networks	Zero-centered but vanishing gradients
sigmoid	Rare today	Severe vanishing gradient

👉 Rule:

If unsure → start with ReLU (or GELU)

🔹 Output layer (data-dependent)

Here people do consider the data.

Problem type	Output activation	Reason
Binary classification	Sigmoid	Outputs probability (0–1)
Multi-class (one label)	Softmax	Class probabilities sum to 1
Regression (unbounded)	Linear	No restriction
Regression (0–1)	Sigmoid	Bounded output
Regression (−1 to 1)	tanh	Symmetric range

This is the only place where data range strongly drives activation choice.

Just Simulation

Example of Sigmoid Curve Plotting

How to interpret the graph

Early stage (0–20 hrs): slow gains while fundamentals are forming
Middle stage (30–60 hrs): rapid skill growth (steep slope)
Later stage (70+ hrs): plateau as learning saturates

References

Johnson, Peter. Fundamentals of Machine Learning: An Introduction to Neural Networks (p. 85). Kindle Edition.

Math ABC

Saturday, 27 December 2025

Sigmoid function(ANN Activation Function) and Its Shape

Standard rules of thumb (what practitioners actually do)

🔹 Hidden layers (most important choice)

🔹 Output layer (data-dependent)

How to interpret the graph

No comments:

Post a Comment

Report Abuse