_{^{The content for this site available on GitHub. If you want to launch the notebooks interactively click on the binder stamp below.}}

< Randomized Optimization for | Contents | Experiment Support Vector Machine Classifier on iris dataset >

Randomized Optimization¶

Which RO algorithms work best depends highly on the structure of the problem. In this notebook, we will compare how two commonly used RO algs perofrm on two problems with different structures.

Note: GA's and SA's are both commonly used but there is not state of the art algorithm for all optimization problems. Good RO outcomes require understanding the structure of the problem and the algorithms strengths and weaknesses.....alternatively the structure of the problem can be inferred from the algorithm's performance :D

Randomized Optimization Algorithms on Toy Problems¶

This notebook explores the performance of different randomized optimization (RO) algorithms on two classic toy problems: the Traveling Salesman Problem (TSP) and the Knapsack Problem.

Problem Overview¶

TSP is a classic combinatorial optimization problem where the goal is to find the shortest possible route that visits each city exactly once and returns to the origin city.

The Knapsack Problem involves selecting a subset of items with given weights and values to maximize total value without exceeding a weight limit.

Problem	Type	Input	Output	Challenge
TSP	Permutation problem	Distance matrix, cities	Best route (permutation)	Finding optimal cyclic permutation
Knapsack	Subset selection	Weights, values, max capacity	Selected items (0/1 vector)	Best value under weight constraint

RO Algorithm Strengths and Weaknesses¶

Algorithm	Strengths	Weaknesses
Genetic Algorithm	Good for large search spaces & multimodal	Slower, parameter tuning needed
Simulated Annealing	Simple, good global exploration	May be slow to converge

Genetic Algorithm (GA)¶

Algorithm:

Evolve a population of solutions using selection, crossover (recombination), and mutation.
Keep fitter individuals more likely to survive to the next generation.

What it does:

Searches large, complex, or multimodal spaces by mimicking natural evolution.
Maintains diversity and explores a wide variety of solutions.

Strengths:

Handles large, noisy, nonlinear, or discrete search spaces well.
Good at escaping local optima due to population-based search.
Flexible: works with both permutation (e.g. TSP) and binary/vector spaces (e.g. Knapsack).

Weaknesses:

Slower due to population overhead.
Requires careful tuning (population size, mutation rate, crossover type).
Can converge prematurely if diversity is lost.

Good for:

TSP, Knapsack, scheduling, neural architecture search, multimodal fitness landscapes.

Less good for:

Small search spaces, problems with fast greedy solutions, when runtime is limited.

Simulated Annealing (SA)¶

Algorithm:

Start with an initial solution.
At each step, randomly perturb the solution.
Accept worse solutions with a decreasing probability (based on temperature).
Gradually "cool down" the system to settle into a minimum.

What it does:

Emulates annealing in metallurgy — controlled cooling to minimize energy.
Allows occasional uphill moves to escape local optima early in the process.

Strengths:

Simple and memory-efficient.
Effective at escaping local minima in early stages.
No need to maintain a population.

Weaknesses:

Sensitive to cooling schedule.
May converge slowly or poorly if not tuned.
No memory of past states, so can revisit bad solutions.

Good for:

Permutation-based problems like TSP.
Continuous or combinatorial optimization with many local minima.

Less good for:

Highly rugged or deceptive search spaces.
Problems with tight time/accuracy constraints.

Expectations¶

How do we expect these RO algs to perform on TSP and Knapsack?

Aspect	TSP (e.g. 10-50 cities)	Knapsack (e.g. 20-50 items)
Genetic Algorithm	Learns good tours with crossover/mutation	Good balance between value/weight
Simulated Annealing	Can explore far and settle gradually	Can get stuck if cooling is too fast
Runtime (small scale)	GA > SA	GA > SA

Setup Code¶

In [1]:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:

import numpy as np
import matplotlib.pyplot as plt
import random
import time
import tracemalloc

In [ ]:

# --------------------------
# TSP Problem Setup
# --------------------------
def euclidean_distance(p1, p2):
    return np.linalg.norm(np.array(p1) - np.array(p2))


def total_tsp_distance(route, coords):
    return sum(
        euclidean_distance(coords[route[i]], coords[route[(i + 1) % len(route)]])
        for i in range(len(route))
    )


def generate_tsp_coordinates(n_cities, seed=42):
    np.random.seed(seed)
    return np.random.rand(n_cities, 2) * 100


# --------------------------
# Knapsack Problem Setup
# --------------------------
def generate_knapsack_items(
    n_items=50, weight_range=(1, 30), value_range=(10, 100), capacity_ratio=0.4, seed=42
):
    np.random.seed(seed)
    weights = np.random.randint(*weight_range, size=n_items)
    values = np.random.randint(*value_range, size=n_items)
    capacity = int(capacity_ratio * np.sum(weights))
    return weights, values, capacity


def knapsack_value(selection, weights, values, capacity):
    total_weight = np.dot(selection, weights)
    if total_weight > capacity:
        return 0  # Penalty for infeasibility
    return np.dot(selection, values)

Genetic Algorithm for TSP¶

Note: GAs require you to define an initial population (set of candidate solutions...random permutations) to evolve, fitness function (eg. total distance), selection strategy, cross over function (eg. ordered for TSP), mutation function (eg. random swap of cities), etc.

In [4]:

def ga_tsp(coords, n_generations=300, pop_size=100, mutation_rate=0.01):
    def init_population():
        # initialize population with random routes (permutations of city indices)
        return [np.random.permutation(len(coords)) for _ in range(pop_size)]

    def crossover(p1, p2):
        start, end = sorted(random.sample(range(len(p1)), 2))
        child = [-1] * len(p1)
        child[start:end] = p1[start:end]
        pointer = 0
        for gene in p2:
            if gene not in child:
                while child[pointer] != -1:
                    pointer += 1
                child[pointer] = gene
        return np.array(child)

    def mutate(route):
        if random.random() < mutation_rate:
            a, b = random.sample(range(len(route)), 2)
            route[a], route[b] = route[b], route[a]

    population = init_population()  # initial random population...routes
    best_route = None
    best_distance = float("inf")  # init as large as possible
    history = []  # to track best distance over generations

    for _ in range(n_generations):
        # sort population by fitness (distance)
        population = sorted(population, key=lambda r: total_tsp_distance(r, coords))
        # update best route if current best in population is better
        if total_tsp_distance(population[0], coords) < best_distance:
            best_route = population[0]
            best_distance = total_tsp_distance(best_route, coords)
        # track best distance over generations
        history.append(best_distance)

        # start next generation with top 10 routes (elitism)
        next_generation = population[:10]  # Elitism
        # fill in the rest of the population with crossover and mutation
        while len(next_generation) < pop_size:
            # randomly select 2 parents from top 50 (selection pressure)
            parents = random.choices(population[:50], k=2)
            # create a child via crossover
            child = crossover(parents[0], parents[1])
            # possibly mutate the child
            mutate(child)
            # add the child to the next generation
            next_generation.append(child)
        # replace old population with the new one
        population = next_generation

    return best_route, best_distance, history

Simulated Annealing for TSP¶

In [ ]:

# --------------------------
# Simulated Annealing for TSP
# --------------------------
def sa_tsp(coords, T_init=1000, alpha=0.995, n_iter=10000):
    # swap 2 cities in a route
    def swap(route):
        a, b = random.sample(range(len(route)), 2)
        route[a], route[b] = route[b], route[a]
        return route

    # init: start with a random route and call that the "best"
    current = np.random.permutation(len(coords))
    best = current.copy()
    current_distance = total_tsp_distance(current, coords)
    best_distance = current_distance

    # set initial temperature and history (best distances over time)
    T = T_init
    history = [current_distance]

    for _ in range(n_iter):
        candidate = swap(current.copy())  # generate a neighbor by swapping two cities
        candidate_distance = total_tsp_distance(
            candidate, coords
        )  # eval neighbors distance
        delta = (
            candidate_distance - current_distance
        )  # how much better/worse is the candidate?
        if delta < 0 or np.random.rand() < np.exp(
            -delta / T
        ):  # accept if better or with some probability
            current = candidate
            current_distance = candidate_distance
            if candidate_distance < best_distance:
                best = candidate
                best_distance = candidate_distance
        history.append(
            best_distance
        )  # record best and gradually cool the system (low acceptance probability of worse solutions)
        T *= alpha

    return best, best_distance, history

Genetic Algorithm for Knapsack¶

In [ ]:

# --------------------------
# Genetic Algorithm
# --------------------------
def ga_knapsack(
    weights, values, capacity, n_generations=200, pop_size=100, mutation_rate=0.05
):
    n_items = len(weights)

    def init_population():
        return [np.random.randint(0, 2, size=n_items) for _ in range(pop_size)]

    def mutate(solution):
        if random.random() < mutation_rate:
            idx = random.randint(0, n_items - 1)
            solution[idx] = 1 - solution[idx]

    def crossover(p1, p2):
        point = random.randint(1, n_items - 1)
        return np.concatenate((p1[:point], p2[point:]))

    population = init_population()
    best_solution = None
    best_value = 0
    history = []

    for _ in range(n_generations):
        population = sorted(
            population,
            key=lambda x: knapsack_value(x, weights, values, capacity),
            reverse=True,
        )
        if knapsack_value(population[0], weights, values, capacity) > best_value:
            best_solution = population[0]
            best_value = knapsack_value(best_solution, weights, values, capacity)
        history.append(best_value)

        next_generation = population[:10]  # Elitism
        while len(next_generation) < pop_size:
            parents = random.choices(population[:50], k=2)
            child = crossover(parents[0], parents[1])
            mutate(child)
            next_generation.append(child)
        population = next_generation

    return best_solution, best_value, history

Simulated Annealing for Knapsack¶

In [14]:

# --------------------------
# Simulated Annealing
# --------------------------
def sa_knapsack(weights, values, capacity, T_init=100, alpha=0.98, n_iter=3000):
    n_items = len(weights)

    def flip_bit(solution):
        idx = random.randint(0, n_items - 1)
        solution[idx] = 1 - solution[idx]
        return solution

    current = np.random.randint(0, 2, size=n_items)
    best = current.copy()
    current_value = knapsack_value(current, weights, values, capacity)
    best_value = current_value
    T = T_init
    history = [current_value]

    for _ in range(n_iter):
        candidate = flip_bit(current.copy())
        candidate_value = knapsack_value(candidate, weights, values, capacity)
        delta = candidate_value - current_value
        if delta > 0 or np.random.rand() < np.exp(delta / T):
            current = candidate
            current_value = candidate_value
            if candidate_value > best_value:
                best = candidate
                best_value = candidate_value
        history.append(best_value)
        T *= alpha

    return best, best_value, history

Run Experiments¶

Measure runtime, memory and best score for each algorithm on both problems.

TSP¶

In [6]:

# generate coordinates for TSP
coords = generate_tsp_coordinates(n_cities=25)
print("Coordinates for TSP:", coords)

Coordinates for TSP: [[37.45401188 95.07143064]
 [73.19939418 59.86584842]
 [15.60186404 15.59945203]
 [ 5.80836122 86.61761458]
 [60.11150117 70.80725778]
 [ 2.05844943 96.99098522]
 [83.24426408 21.23391107]
 [18.18249672 18.34045099]
 [30.4242243  52.47564316]
 [43.19450186 29.12291402]
 [61.18528947 13.94938607]
 [29.21446485 36.63618433]
 [45.60699842 78.51759614]
 [19.96737822 51.42344384]
 [59.24145689  4.64504127]
 [60.75448519 17.05241237]
 [ 6.5051593  94.88855373]
 [96.56320331 80.83973481]
 [30.46137692  9.7672114 ]
 [68.42330265 44.01524937]
 [12.20382348 49.51769101]
 [ 3.43885211 90.93204021]
 [25.87799816 66.25222844]
 [31.17110761 52.00680212]
 [54.67102793 18.48544555]]

In [7]:

# --- Measure GA performance ---
tracemalloc.start()
start_time = time.perf_counter()
ga_route, ga_dist, ga_history = ga_tsp(coords)
end_time = time.perf_counter()
ga_time = end_time - start_time
ga_mem_current, ga_mem_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(
    f"GA TSP: Best Distance = {ga_dist}, Time = {ga_time:.2f}s, Memory = {ga_mem_current / 1024:.2f}KB (Peak: {ga_mem_peak / 1024:.2f}KB)"
)

GA TSP: Best Distance = 523.5145741618214, Time = 7.70s, Memory = 16.69KB (Peak: 77.29KB)

In [8]:

# --- Measure SA performance ---
tracemalloc.start()
start_time = time.perf_counter()
sa_route, sa_dist, sa_history = sa_tsp(coords)
end_time = time.perf_counter()
sa_time = end_time - start_time
sa_mem_current, sa_mem_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(
    f"SA TSP: Best Distance = {sa_dist}, Time = {sa_time:.2f}s, Memory = {sa_mem_current / 1024:.2f}KB (Peak: {sa_mem_peak / 1024:.2f}KB)"
)

SA TSP: Best Distance = 610.5479333111537, Time = 2.23s, Memory = 87.12KB (Peak: 97.07KB)

In [10]:

# --------------------------
# Plot Convergence
# --------------------------
plt.figure(figsize=(10, 5))
plt.plot(ga_history, label="Genetic Algorithm")
plt.plot(sa_history, label="Simulated Annealing")
plt.xlabel("Iteration")
plt.ylabel("Distance")
plt.title("TSP: Convergence Comparison")
plt.legend()
plt.grid(True)
plt.show()

In [11]:

# Plot runtime and memory comparison
algos = ["Genetic Algorithm", "Simulated Annealing"]
runtimes = [ga_time, sa_time]
mem_current = [ga_mem_current / 1024, sa_mem_current / 1024]  # in KB
mem_peak = [ga_mem_peak / 1024, sa_mem_peak / 1024]  # in KB

fig, ax1 = plt.subplots(figsize=(8, 4))

# Runtime bars (left y-axis)
ax1.bar(algos, runtimes, color="skyblue")
ax1.set_ylabel("Runtime (s)", color="skyblue")
ax1.tick_params(axis="y", labelcolor="skyblue")
ax1.set_title("Algorithm Resource Comparison")

# Memory lines (right y-axis)
ax2 = ax1.twinx()
ax2.plot(algos, mem_peak, color="orange", marker="o", label="Peak Memory (KB)")
ax2.plot(
    algos,
    mem_current,
    color="green",
    marker="s",
    linestyle="--",
    label="Current Memory (KB)",
)
ax2.set_ylabel("Memory (KB)", color="gray")
ax2.tick_params(axis="y", labelcolor="gray")

# Combine legends from both axes
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax2.legend(lines_1 + lines_2, labels_1 + labels_2, loc="upper left")

plt.grid(True, axis="y", linestyle="--", alpha=0.7)
plt.tight_layout()
plt.show()

So, GA found a better solution but took longer to run.

It also used less memory in our implementation...but wait???....isn't SA is supposed to be more memory efficient? Yes, but in our implementation we store the entire history. A better implementation may cap history length or optimize swap, etc

Knapsack¶

Goal: maximize value without exceeding weight limit.

In [15]:

weights, values, capacity = generate_knapsack_items()

In [16]:

# --- Measure GA performance ---
tracemalloc.start()
start_ga = time.perf_counter()
ga_sol, ga_val, ga_hist = ga_knapsack(weights, values, capacity)
end_ga = time.perf_counter()
ga_time = end_ga - start_ga
ga_mem_current, ga_mem_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(
    f"GA Knapsack: Best Value = {ga_val}, Time = {ga_time:.2f}s, Memory = {ga_mem_current / 1024:.2f}KB (Peak: {ga_mem_peak / 1024:.2f}KB)"
)

GA Knapsack: Best Value = 1824, Time = 0.26s, Memory = 5.74KB (Peak: 103.04KB)

In [17]:

# --- Measure SA performance ---
tracemalloc.start()
start_sa = time.perf_counter()
sa_sol, sa_val, sa_hist = sa_knapsack(weights, values, capacity)
end_sa = time.perf_counter()
sa_time = end_sa - start_sa
sa_mem_current, sa_mem_peak = tracemalloc.get_traced_memory()
tracemalloc.stop()

print(
    f"SA Knapsack: Best Value = {sa_val}, Time = {sa_time:.2f}s, Memory = {sa_mem_current / 1024:.2f}KB (Peak: {sa_mem_peak / 1024:.2f}KB)"
)

SA Knapsack: Best Value = 1342, Time = 0.06s, Memory = 28.61KB (Peak: 38.54KB)

So GA found a better solution but took longer to run.

In [18]:

# --------------------------
# Plot Convergence
# --------------------------
plt.figure(figsize=(10, 5))
plt.plot(ga_hist, label="Genetic Algorithm")
plt.plot(sa_hist, label="Simulated Annealing")
plt.xlabel("Iteration")
plt.ylabel("Value")
plt.title("Knapsack: Convergence Comparison")
plt.legend()
plt.grid(True)
plt.show()

In [19]:

# --------------------------
# Combined Runtime & Memory Plot
# --------------------------
algos = ["Genetic Algorithm", "Simulated Annealing"]
runtimes = [ga_time, sa_time]
mem_current = [ga_mem_current / 1024, sa_mem_current / 1024]
mem_peak = [ga_mem_peak / 1024, sa_mem_peak / 1024]

fig, ax1 = plt.subplots(figsize=(8, 4))
ax1.bar(algos, runtimes, color="skyblue")
ax1.set_ylabel("Runtime (s)", color="skyblue")
ax1.tick_params(axis="y", labelcolor="skyblue")
ax1.set_title("Knapsack: Resource Comparison")

ax2 = ax1.twinx()
ax2.plot(algos, mem_peak, color="orange", marker="o", label="Peak Memory (KB)")
ax2.plot(
    algos,
    mem_current,
    color="green",
    marker="s",
    linestyle="--",
    label="Current Memory (KB)",
)
ax2.set_ylabel("Memory (KB)", color="gray")
ax2.tick_params(axis="y", labelcolor="gray")

lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax2.legend(lines_1 + lines_2, labels_1 + labels_2, loc="upper left")

plt.grid(True, axis="y", linestyle="--", alpha=0.7)
plt.tight_layout()
plt.show()

Again, our implementation of SA stored the entire history, which is not memory efficient.

Summary: Experimental Results vs Expectations¶

Overall, our expectations aligned fairly well with results....Genetic Algorithm is slower but more robust for complex or multimodal spaces. Simulated Annealing is faster but seems more sensitive to parameter tuning and problem structure....gets stuck. We didn't expect SA to have such poor memory but that was due to our implementation choice more than a feature of the algorithm itself.

Aspect	TSP Results	Knapsack Results
Genetic Algorithm	Found a better route than Simulated Annealing, with longer runtime	Achieved a significantly higher total value than Simulated Annealing, with longer runtime
Simulated Annealing	Produced a reasonable solution quickly but did not reach the best quality	Produced a lower-value solution quickly; less effective on the problem landscape
Runtime	Matched expectation: Genetic Algorithm was slower than Simulated Annealing	Matched expectation: Genetic Algorithm was slower than Simulated Annealing
Memory Usage	Simulated Annealing had higher peak memory due to route copying and history	Simulated Annealing had higher peak memory due to repeated bit flips and state storage
Algorithm Behavior	Simulated Annealing explored well early, but Genetic Algorithm exploited better over time	Genetic Algorithm balanced exploration and exploitation more effectively; Simulated Annealing likely cooled too quickly

Possible Improvements¶

Tune cooling schedule in Simulated Annealing to maintain exploration longer?
Try adaptive mutation rates in Genetic Algorithm to preserve diversity?
Cap memory usage in Simulated Annealing by limiting history length?

< Randomized Optimization for | Contents | Experiment Support Vector Machine Classifier on iris dataset >