Sigmoid function has the formula as above. The sigmoid function maps the input x to a value between 0 and 1.
Activation functions are a crucial component of neural networks, responsible for introducing non-linearity into the model. Without non-linear activation functions, a neural network would essentially behave like a linear model, regardless of the number of layers, limiting its capacity to solve complex problems. The "best" activation function is often found by quick experimentation on your specific dataset.
Standard rules of thumb (what practitioners actually do)
🔹 Hidden layers (most important choice)
| Activation | When used | Why |
|---|---|---|
| ReLU | Default choice | Simple, fast, no vanishing gradient |
| Leaky ReLU / GELU | Deeper or transformer models | Fix “dead ReLU” problem |
| tanh | Small networks | Zero-centered but vanishing gradients |
| sigmoid | Rare today | Severe vanishing gradient |
👉 Rule:
If unsure → start with ReLU (or GELU)
🔹 Output layer (data-dependent)
Here people do consider the data.
| Problem type | Output activation | Reason |
|---|---|---|
| Binary classification | Sigmoid | Outputs probability (0–1) |
| Multi-class (one label) | Softmax | Class probabilities sum to 1 |
| Regression (unbounded) | Linear | No restriction |
| Regression (0–1) | Sigmoid | Bounded output |
| Regression (−1 to 1) | tanh | Symmetric range |
This is the only place where data range strongly drives activation choice.
Just Simulation
Example of Sigmoid Curve Plotting
How to interpret the graph
-
Early stage (0–20 hrs): slow gains while fundamentals are forming
-
Middle stage (30–60 hrs): rapid skill growth (steep slope)
-
Later stage (70+ hrs): plateau as learning saturates
References
Johnson, Peter. Fundamentals of Machine Learning: An Introduction to Neural Networks (p. 85). Kindle Edition.
No comments:
Post a Comment