Problem Set

Starter ML coding exercises.

The first batch maps out the core implementation tasks for metrics, training loops, data pipelines, retrieval, ranking, and debugging.

precision_at_k

Rank examples by score and compute relevant labels in the top k positions.

Metrics Easy Start

recall_at_k

Measure how many relevant items were recovered in the top k results.

Metrics Easy Next

ndcg

Compute discounted cumulative gain and normalize it by the ideal ranking.

Metrics Medium Next

map

Average precision across ranked results and aggregate across queries.

Metrics Medium Next

auc

Compute pairwise ranking quality for positive and negative examples.

Metrics Medium Next

logloss

Score probabilistic binary predictions with numerical stability.

Metrics Easy Next

calibration_error

Bucket predictions and measure probability calibration error.

Metrics Medium Next

grouped_auc

Compute AUC per group and aggregate over uneven group sizes.

Metrics Hard Next

binary_cross_entropy

Implement numerically stable BCE with optional sample weights.

Loss Easy Next

softmax_cross_entropy

Use log-sum-exp to compute multiclass cross entropy safely.

Loss Medium Next

pairwise_ranking_loss

Compare positive and negative items with a margin-based objective.

Loss Medium Next

hinge_loss

Implement max-margin binary classification loss.

Loss Easy Next

focal_loss

Downweight easy examples for imbalanced binary classification.

Loss Medium Next

ips_weighting

Apply inverse propensity weights for counterfactual training data.

Loss Hard Next

minibatch_iterator

Yield stable mini-batches with optional shuffling and drop-last behavior.

Data Easy Next

feature_normalization

Normalize numeric features while handling constants and missing values.

Data Easy Next

categorical_encoding

Map categories to stable ids with unknown and rare value handling.

Data Medium Next

sequence_padding

Pad and truncate variable-length sequences for batched model input.

Data Easy Next

time_window_aggregation

Aggregate events over rolling historical windows without leakage.

Data Medium Next

deduplication

Remove duplicate records using keys, timestamps, and deterministic tie-breaks.

Data Easy Next

train_test_split_by_time

Split events chronologically while avoiding leakage from future data.

Data Medium Next

top_k_heap

Maintain the best k candidates from a stream using a heap.

Retrieval Easy Next

cosine_similarity

Compute cosine similarity with zero-vector handling.

Retrieval Easy Next

brute_force_nearest_neighbor

Search all vectors and return nearest neighbors with deterministic ordering.

Retrieval Medium Next

inverted_index

Build a token-to-document index and retrieve matching candidates.

Retrieval Medium Next

bm25

Score documents with term frequency, inverse document frequency, and length normalization.

Retrieval Hard Next

two_stage_ranker

Combine candidate retrieval and a second-pass scoring interface.

Retrieval Hard Next

compare_two_datasets

Check row counts, schemas, duplicate keys, and changed values.

Debugging Easy Next

detect_null_drift

Compare null rates across datasets and flag suspicious feature changes.

Debugging Medium Next

feature_distribution_comparison

Compare two datasets and surface meaningful feature drift.

Debugging Medium Next

label_distribution_by_segment

Aggregate label rates by segment and highlight skewed groups.

Debugging Medium Next

calibration_by_bucket

Bucket predictions and compare average score to observed label rate.

Debugging Medium Next

confusion_matrix_by_group

Compute confusion matrices per segment for binary classifiers.

Debugging Hard Next