At its core, a neural network is a computational model inspired by the structure and function of the human brain. It's built from interconnected units called neurons.

1. The Artificial Neuron

An artificial neuron (or perceptron) is a simple mathematical function. It takes one or more inputs, processes them, and produces an output. Here's what happens inside:

  1. Inputs and Weights: Each input (x_i) coming into the neuron is multiplied by a corresponding weight (w_i). A weight represents the strength or importance of that input. A higher weight means the input has more influence.
  2. Summation and Bias: The neuron sums all the weighted inputs to get a single value, z=∑(w_ix_i). It then adds a bias (b) to this sum. The bias is an extra parameter that allows the neuron to shift its output up or down, making it more flexible. The full calculation is z=(∑w_ix_i)+b.
  3. Activation Function: The result, z, is then passed through an activation function, f(z), to produce the neuron's final output.

2. The Role of Activation Functions

The activation function is the most critical part of the neuron. Its purpose is to introduce non-linearity into the network.

Why is this so important? If we only used linear operations (like weighted sums), a stack of many layers would still just be a single, large linear function. It would be no more powerful than a simple linear regression model. By adding non-linearity, neural networks can learn incredibly complex, curved relationships and boundaries in the data, which is necessary for tasks like image recognition or natural language processing.

3. Common Activation Functions

  • Sigmoid: This function squashes any input value into a range between 0 and 1. Its formula is σ(z)=1+e−z1​. It was historically popular but is less used in hidden layers today due to the "vanishing gradient" problem, which can slow down training. It's still useful in the output layer for binary classification problems.
  • Tanh (Hyperbolic Tangent): Similar to sigmoid but squashes values to a range between -1 and 1. Its formula is tanh(z)=ez+e−zez−e−z​. It's zero-centered, which can help with training, but it also suffers from the vanishing gradient problem.
  • ReLU (Rectified Linear Unit): This is the most popular activation function for hidden layers. It's incredibly simple: if the input is positive, it returns the input; otherwise, it returns zero. The formula is f(z)=max(0,z).
  • Pros: It's computationally very efficient and helps mitigate the vanishing gradient problem.
  • Cons: It can suffer from the "dying ReLU" problem, where neurons can get stuck in a state where they always output zero.

Here's a simple Python implementation of a neuron using the ReLU activation function:

Python


import numpy as np

def relu(z):
  """Rectified Linear Unit (ReLU) activation function."""
  return np.maximum(0, z)

def simple_neuron(inputs, weights, bias):
  """A single neuron that calculates its output."""
  # Ensure inputs and weights are numpy arrays for dot product
  inputs = np.array(inputs)
  weights = np.array(weights)

  # 1. Calculate the weighted sum and add bias
  z = np.dot(inputs, weights) + bias

  # 2. Pass the result through the activation function
  output = relu(z)

  return output

# Example usage
inputs = [1.0, 2.5, -0.5]
weights = [0.8, -0.2, 0.4]
bias = 0.5

neuron_output = simple_neuron(inputs, weights, bias)
print(f"The neuron's output is: {neuron_output}")
# Calculation: (1.0*0.8 + 2.5*(-0.2) + (-0.5)*0.4) + 0.5 = (0.8 - 0.5 - 0.2) + 0.5 = 0.1 + 0.5 = 0.6
# ReLU(0.6) = 0.6