recall_at_k
Measure how many relevant items were recovered in the top k results.
Problem Set
The first batch maps out the core implementation tasks for metrics, training loops, data pipelines, retrieval, ranking, and debugging.
Rank examples by score and compute relevant labels in the top k positions.
Measure how many relevant items were recovered in the top k results.
Compute discounted cumulative gain and normalize it by the ideal ranking.
Average precision across ranked results and aggregate across queries.
Compute pairwise ranking quality for positive and negative examples.
Score probabilistic binary predictions with numerical stability.
Bucket predictions and measure probability calibration error.
Compute AUC per group and aggregate over uneven group sizes.
Implement numerically stable BCE with optional sample weights.
Use log-sum-exp to compute multiclass cross entropy safely.
Compare positive and negative items with a margin-based objective.
Implement max-margin binary classification loss.
Downweight easy examples for imbalanced binary classification.
Apply inverse propensity weights for counterfactual training data.
Yield stable mini-batches with optional shuffling and drop-last behavior.
Normalize numeric features while handling constants and missing values.
Map categories to stable ids with unknown and rare value handling.
Pad and truncate variable-length sequences for batched model input.
Aggregate events over rolling historical windows without leakage.
Remove duplicate records using keys, timestamps, and deterministic tie-breaks.
Split events chronologically while avoiding leakage from future data.
Maintain the best k candidates from a stream using a heap.
Compute cosine similarity with zero-vector handling.
Search all vectors and return nearest neighbors with deterministic ordering.
Build a token-to-document index and retrieve matching candidates.
Score documents with term frequency, inverse document frequency, and length normalization.
Combine candidate retrieval and a second-pass scoring interface.
Check row counts, schemas, duplicate keys, and changed values.
Compare null rates across datasets and flag suspicious feature changes.
Compare two datasets and surface meaningful feature drift.
Aggregate label rates by segment and highlight skewed groups.
Bucket predictions and compare average score to observed label rate.
Compute confusion matrices per segment for binary classifiers.
No problems match the current filters.