The content for this site available on GitHub. If you want to launch the notebooks interactively click on the binder stamp below.
< Randomized Optimization | Contents
Experiment: Support Vector Machine Classifier on iris dataset¶
⚖️ Quick Summary¶
Goal: Find the optimal decision boundary that maximizes the margin between classes.
🧠 Core Idea:¶
- SVM finds the hyperplane that best separates classes with the largest margin.
- Uses support vectors (critical boundary cases) to define the hyperplane.
- Can use kernels to handle nonlinear separation.
🧮 Example Configuration / Hyperparams:¶
kernel='rbf'
(default: good for nonlinear boundaries)C=1.0
(regularization strength)gamma='scale'
(kernel coefficient)
🔧 Expectations:¶
Aspect | Expectation | Notes |
---|---|---|
Accuracy | High | Excellent for small, clean datasets like Iris |
Overfitting | Moderate (tunable) | Controlled via C and kernel choice |
Training Time | Moderate | Slower than KNN or DTC; fast on small datasets |
Interpretability | Low | Not easily visualizable or interpretable |
Kernel Choice | Critical | RBF for nonlinear, linear for simpler separable cases |
🔑 Characteristics:¶
- Powerful for both linear and nonlinear problems.
- Sensitive to feature scaling.
- Kernel trick makes SVM flexible, but tuning is important.
In [1]:
%reload_ext autoreload
%autoreload 2
In [2]:
# scv specific imports
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay
# experiment helper imports
from helpers.base_imports import *
Setup experiment with data and model¶
In [3]:
exp = Experiment(
type="c", # classification
name="svc-iris-linear",
dataset="iris-20test-shuffled-v1",
)
exp
Out[3]:
Note that sklearn has svm.SVC(kernel="linear", C=C) and svm.LinearSVC(C=C, max_iter=10000) for linear SVMs. The former is more flexible and can use other kernels, while the latter is optimized for linear kernels and a bit faster. For this small dataset we'll just use the former.
In [4]:
# add the steps to the pipeline
steps = [
("scaler", StandardScaler()),
(
"classifier",
SVC(
kernel="linear", # linear, poly, rbg, etc
gamma="auto", # kernel coefficient (1/n features)
C=1.0, # regularization parameter
random_state=RANDOM_SEED,
),
),
]
exp.estimator = Pipeline(
steps=steps,
memory=CACHE_DIR,
)
In [5]:
exp.estimator.get_params()
Out[5]:
Get dataset by name (eda already done in another notebook and train/test split saved so we will be working with the same data)
In [6]:
notes, X_train, X_test, y_train, y_test, target_names = get_dataset(exp.dataset)
print(notes)
print(target_names)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Out[6]:
In [10]:
# Make sure both are 1D numpy arrays
# y_train = y_train.to_numpy().ravel()
# y_test = y_test.to_numpy().ravel()
print("y_train:", type(y_train), y_train.shape)
print("y_test:", type(y_test), y_test.shape)
In [12]:
classes, counts = np.unique(y_train, return_counts=True)
for label, count in zip(classes, counts):
print(f"Class {label}: {count} samples")
SVM is biased towards majority class but classes in the train data are roughly balanced so we'll continue without any class balancing.
In [13]:
exp.update_param("n_train_samples", X_train.shape[0])
exp.update_param("n_test_samples", X_test.shape[0])
exp.summary_df
Out[13]:
Optimize hyperparameters¶
In [14]:
# add the steps to the pipeline
steps = [
# NOTE: DTs don't need scaling, but we include it here for consistency when comparing to other classifiers
("scaler", StandardScaler()),
(
"classifier",
SVC(
kernel="linear", # linear, poly, rbg, etc
gamma="auto", # kernel coefficient (1/n features)
C=1.0, # regularization parameter
random_state=RANDOM_SEED,
),
),
]
exp.estimator = Pipeline(
steps=steps,
memory=CACHE_DIR,
)
In [16]:
param_grid = {
"classifier__kernel": ["linear", "rbf", "poly"],
"classifier__C": [0.1, 1, 10, 100],
"classifier__gamma": ["scale", "auto", 0.1, 0.01, 0.001],
}
grid = GridSearchCV(exp.estimator, param_grid, cv=3)
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)
print("Best Cross-Validation Score:", grid.best_score_)
Ok, it does well with a C of 1 (regularization strength) and a linear kernel.
In [18]:
grid.best_estimator_.get_params()
Out[18]:
In [19]:
my_params = {
"classifier__kernel": "linear",
"classifier__C": 1.0,
# "classifier__gamma": "auto",
}
exp.estimator.set_params(**my_params)
exp.estimator.get_params()
Out[19]:
In [20]:
# fit on training data
start_time = pd.Timestamp.now()
exp.estimator.fit(X=X_train, y=y_train)
train_time = pd.Timestamp.now() - start_time
In [21]:
exp.update_param("train_time", train_time)
exp.update_param(
"mean_accuracy",
exp.estimator.score(X_test, y_test),
# add_column=True
)
exp.summary_df
Out[21]:
Take a look at the trained model¶
In [22]:
# get precision, recall, f1, accuracy
start_time = pd.Timestamp.now()
y_pred = exp.estimator.predict(X_test)
query_time = pd.Timestamp.now() - start_time
In [23]:
exp.update_param("query_time", query_time)
In [25]:
target_names_list = target_names["target_names"].tolist()
cm = confusion_matrix(
y_true=y_test,
y_pred=y_pred,
# normalize="true"
)
cmd = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=target_names_list)
cmd.plot()
plt.title("Confusion Matrix")
# plt.savefig(f"{FIGS_DIR}/{exp.name}_confusion-matrix.png")
Out[25]:
In [26]:
exp.update_param("confusion_matrix", np.array2string(cm))
In [27]:
cr = classification_report(y_true=y_test, y_pred=y_pred, output_dict=True)
exp.update_param("classification_report", str(cr))
cr
Out[27]:
In [29]:
# add custom decision tree classification specific metrics to the summary_df
exp.update_param(
"kernel",
exp.estimator.named_steps["classifier"].kernel,
add_column=True,
)
exp.update_param(
"C",
exp.estimator.named_steps["classifier"].C,
add_column=True,
)
exp.update_param(
"gamma",
exp.estimator.named_steps["classifier"].gamma,
add_column=True,
)
exp.summary_df
Out[29]:
In [ ]:
exp.save(overwrite_existing=False)