36 Hours of Algorithms and Sleep Deprivation: Third Place at the AlphaBit AI Datathon

Datathon Team Photo AlphaBit AI Datathon 2025 - ESI SBA

Links

Connect with me: LinkedIn - Aymen Guerrouf

The AlphaBit Club at ESI SBA organized a 36-hour AI Datathon with 9 distinct challenges. Not tutorials, not guided notebooks — real problems with minimal documentation, conflicting data, and evaluation metrics designed to punish lazy solutions.

We formed a team of 5, entered with the intention of learning, and left with third place overall.

Here's what actually happened.

The Environment: 36 Hours of Controlled Chaos

The format was straightforward: 9 challenges, all running simultaneously, submit as many solutions as you want before the deadline. Final ranking based on aggregate performance across all tasks.

Most teams specialized. Pick 2-3 challenges, go deep, ignore the rest.

We decided to attack everything.

In hindsight, this was the right call. The scoring rewarded breadth. A mediocre submission on a neglected challenge often outperformed missing it entirely.

Citation Network: Graph Neural Networks from Scratch

This was one of my primary challenges. Each node represented a research paper, edges represented citations, and the goal was to predict paper topics using both node features and graph structure.

The dataset:

edges.csv — citation links between papers
features.csv — node feature vectors
labels_train.csv — topic labels for training nodes
splits.csv — explicit train/val/test masks

The task was clear: build a GNN (GCN, GraphSAGE, GAT) using PyTorch Geometric or DGL, train on masked nodes, predict on the hidden test set.

My approach:

Graph construction — Built the adjacency structure using PyG's Data object
Architecture — Tested GCN and GraphSAGE variants with different hidden dimensions
Training discipline — Strict adherence to the provided masks, CrossEntropyLoss, validation-based hyperparameter tuning

The challenge was beginner-friendly by design, but execution still mattered. Proper message passing, avoiding overfitting on the small labeled set, and respecting the split masks.

Final position: 9th place.

Solid fundamentals, nothing fancy.

AITSP: When Python Isn't Fast Enough

The scheduling challenge was where things got interesting.

PFSP (Permutation Flow Shop Scheduling Problem) is NP-hard. The search space explodes exponentially. Naive approaches hit local optima around 2100 when competitive solutions needed 1600.

The problem demanded algorithmic engineering, not just model selection.

The Technical Stack

Phase 1: Speed

Python's default performance was unacceptable. I rewrote the makespan computation using Numba with Taillard's acceleration technique.

Result: 1,000x speedup. From 5,000 evaluations per second to 5 million.

Phase 2: Search Architecture

Small instances (N ≤ 20) went to OR-Tools CP-SAT for exact solutions. Large instances got an Iterated Greedy algorithm with NEH-D initialization and path relinking to escape cycles.

Phase 3: Parallelization

I deployed multiple search agents simultaneously:

Forward solver
Backward solver (reversed job order)
Aggressive destroy/rebuild cycles
Conservative local refinement

Four CPUs, four independent search trajectories, unified best-solution tracking.

The final engine could explore the solution space with surgical precision while competitors were still debugging their greedy implementations.

Final position: 7th place.

The algorithm punched above its weight.

RAG Challenge: Teaching Models to Detect Contradictions

This challenge split into two sub-tasks:

Legal Clerk Agent — Parse zoning laws that intentionally contain contradictions. The evaluation specifically tested whether your model could identify conflicting rules instead of hallucinating a resolution.

Fact-Checking Agent — Classify claims as True, False, or Partially True based on a knowledge base. Negation handling was critical.

Most LLM-based systems fail at contradiction detection. They're trained to be helpful, which means they try to reconcile conflicting information instead of flagging it.

My approach:

Structured retrieval — Retrieved relevant law sections with explicit conflict markers
Prompt engineering — Forced the model to enumerate contradictions before attempting resolution
Confidence calibration — Tuned the system to output "conflict detected" rather than fabricating agreement

The fact-checking component required similar discipline. I built retrieval logic that surfaced negations explicitly and structured prompts that separated evidence extraction from classification.

Final position: 3rd place.

The system worked because I designed it to be correct, not confident.

Find the Water: Late Discovery, Strong Finish

We noticed this segmentation challenge embarrassingly late. Hours of potential optimization time, gone.

The pragmatic response: ship something functional, fast.

I assembled a minimal pipeline:

Lightweight U-Net architecture
Aggressive data augmentation
Clean RLE encoding for submission
Short but efficient training loop

No hyperparameter tuning. No ensemble methods. Just solid fundamentals executed quickly.

Final position: 2nd place.

Sometimes good engineering under pressure beats elaborate solutions with more time.

Masked X-Ray Challenge: Pneumonia Detection with Minimal Labels

This challenge tested model resilience under extreme data constraints. The task: classify masked chest X-rays for pneumonia detection, evaluated on AUC.

The dataset structure was intentionally restrictive:

train/ — Unlabeled images only. No annotations whatsoever.
val/ — A small labeled subset for local evaluation
test/ — Hidden labels, used for final scoring

The catch: you had to figure out how to leverage unlabeled training data with only a tiny validation set for supervision. Classic semi-supervised learning territory.

My approach:

Pseudo-labeling — Used the validation-trained model to generate soft labels for unlabeled training data
Progressive training — Started with high-confidence pseudo-labels, gradually included harder samples
Augmentation strategy — Heavy augmentations to prevent overfitting on the small labeled set
AUC optimization — Focused on ranking quality rather than raw accuracy

The evaluation metric (AUC) rewarded models that could separate classes reliably, not just predict correctly. This meant calibration mattered as much as accuracy.

Final position: 7th place.

Limited data, limited labels, still competitive.

Sentence Meaning Similarity: Understanding Beyond Words

The AISM challenge asked a deceptively simple question: do these two sentences mean the same thing?

The constraint that made it interesting: no pretrained LLMs allowed. Notebook submission mandatory.

This forced us back to fundamentals. No BERT, no sentence transformers, no fine-tuning massive models. You had to build something from scratch that could capture semantic similarity.

My approach:

Feature engineering — Word overlap metrics, TF-IDF similarity, syntactic pattern matching
Embedding methods — Trained lightweight embeddings on the provided corpus
Ensemble logic — Combined multiple similarity signals into a unified classifier
F1 optimization — Balanced precision and recall for the binary classification

The 40/60 public/private split meant leaderboard positions could shift dramatically at final evaluation. Overfitting to the public test set was a trap.

Final position: 2nd place.

No pretrained models, pure engineering.

The Molecular Classification Mess

Mass spectrometry data is notoriously ugly. This dataset was worse.

Fragment m/z values, intensity spectra, SMILES strings, precursor metadata — all with inconsistent formatting, missing values, and severe class imbalance.

Data cleaning consumed hours that should have gone to modeling.

The final pipeline:

Binned spectral features
Molecular fingerprint generation
Multi-class classifier with probability calibration

Final position: 4th place.

A solid finish on a notoriously messy domain.

Histopathology Classification: Medical Imaging at Scale

The histopathology challenge was a 7-class tissue classification problem. Each image patch came from microscopy slides, and the task was to distinguish between normal tissue, benign lesions, atypical hyperplasia, carcinoma in situ, and invasive carcinoma.

The classes:

0: Normal (N)
1: Papillary Benign (PB)
2: Usual Ductal Hyperplasia (UDH)
3: Flat Epithelial Atypia (FEA)
4: Atypical Ductal Hyperplasia (ADH)
5: Ductal Carcinoma In Situ (DCIS)
6: Invasive Carcinoma (IC)

The evaluation metric was Quadratic Weighted Cohen's Kappa — designed for ordinal classification where the distance between classes matters. Predicting IC when the true class is Normal carries a much heavier penalty than being off by one category.

This made the problem clinically realistic. In real diagnostics, confusing normal tissue with invasive carcinoma is catastrophic. Confusing ADH with DCIS is a much smaller error.

The approach:

Transfer learning — Started with pretrained CNN backbones, fine-tuned on the training set
Data augmentation — Heavy geometric and color augmentations to handle the limited dataset
Ordinal awareness — Loss function modifications to respect the class ordering
Ensemble methods — Combined multiple model predictions for stability

Final position: 2nd place.

Medical imaging with proper ordinal scoring.

The Reality of Hour 30

By the final stretch:

Someone on the team started talking to notebook cells that weren't there
Jupyter crashed and took 2 hours of unsaved work with it
One teammate fell asleep on the keyboard and executed random cells
I debugged a failing submission for 30 minutes before realizing the bug was a missing comma

The scoreboard kept updating. We kept climbing.

Resilience matters more than people admit. Most competitors build solutions when conditions are optimal. The teams that win build solutions when everything is falling apart.

Final Results

Challenge	Position	My Contribution
Citation Network	9th	Lead
AITSP Optimization	7th	Lead
RAG Legal/Fact-Check	3rd	Lead
Find the Water	2nd	Team
Masked X-Ray	7th	Team
Sentence Similarity (AISM)	2nd	Team
Histopathology	2nd	Team
Molecular Classification	4th	Team

Overall: 3rd Place

What I Learned

Speed is a feature. The scheduling challenge proved that algorithmic performance isn't academic. A 1000x speedup translated directly to better solutions.

Correctness beats confidence. The RAG challenge rewarded systems that admitted uncertainty over systems that hallucinated answers.

Breadth matters in multi-task competitions. Submitting something on every challenge outperformed specializing in a few. We attacked all 9 problems.

Exhaustion reveals fundamentals. At hour 30, you don't have the cognitive resources for clever tricks. You fall back on solid engineering habits. Build those habits before you need them.

Competition clarifies priorities. When you have 36 hours and 9 problems, you learn very quickly what actually matters versus what feels important.

Looking Forward

This datathon reinforced something I already suspected: I enjoy problems that require both theoretical understanding and practical engineering. The scheduling challenge wasn't about knowing optimization theory — it was about implementing it efficiently. The RAG challenge wasn't about understanding LLMs — it was about constraining them to behave correctly.

Third place with a team that attacked every challenge, survived the chaos, and learned more in 36 hours than most workshops teach in a semester.

I'll be back for the next one.

Let's Connect

If you're considering entering an AI competition: Do it. The time pressure forces you to prioritize ruthlessly, the variety of challenges exposes gaps in your knowledge, and the competition reveals where your skills actually stand.

What's your approach to competition strategy? Specialize deeply or attack broadly?

Connect with me: LinkedIn

36 hours of algorithms, caffeine, and determination. Third place at AlphaBit AI Datathon 2025, ESI SBA. The optimization never stops.

Guerrouf Aymen

36 Hours of Algorithms and Sleep Deprivation: Third Place at the AlphaBit AI Datathon

36 Hours of Algorithms and Sleep Deprivation: Third Place at the AlphaBit AI Datathon

Links

The Environment: 36 Hours of Controlled Chaos

Citation Network: Graph Neural Networks from Scratch

AITSP: When Python Isn't Fast Enough

The Technical Stack

RAG Challenge: Teaching Models to Detect Contradictions

Find the Water: Late Discovery, Strong Finish

Masked X-Ray Challenge: Pneumonia Detection with Minimal Labels

Sentence Meaning Similarity: Understanding Beyond Words

The Molecular Classification Mess

Histopathology Classification: Medical Imaging at Scale

The Reality of Hour 30

Final Results

What I Learned

Looking Forward

Let's Connect

Enjoyed this post?