Hi HN — I am the founder of Smash, a padel club management and AI coaching app launching in June 2026, primarily targeting the GCC market (Dubai, Abu Dhabi, Riyadh, Doha). This post is about the computer vision piece — the shot classification system we call SmashIQ — because it is the part I most want to get external feedback on.
What SmashIQ Does
SmashIQ takes a video recording of a padel match and returns per-shot classifications across 13 shot types (forehand drive, backhand drive, forehand volley, backhand volley, lob, bandeja, vibora, smash, chiquita, bajada, and three subcategories). For each shot, it provides a technique score (0–100) on the dimensions relevant to that shot type — contact consistency, follow-through arc, landing zone accuracy — and a ranked set of coaching focuses.
The pipeline also extracts a highlight reel of the 5–8 most technically significant moments from the match — best shots, characteristic errors, tactical patterns — and generates a text debrief that reads like a coach wrote it rather than a model.
The Technical Stack
Stage 1: Frame Extraction and Player Detection
We extract at 10 fps from the input video (most match recordings are 30–60 fps, so this is aggressive downsampling). Player detection runs a fine-tuned YOLOv8n model that we trained on 15,000 manually annotated padel frames. The padel-specific fine-tuning was necessary because standard COCO-pretrained detectors struggled with the glass wall reflections that are specific to padel courts — they were regularly detecting reflected players as additional bounding boxes.
Stage 2: Court Registration
We register the court geometry from the first 60 seconds of each video — identifying the court boundary lines, service line, net, and glass wall positions. This lets us normalise player positions across different camera angles and court sizes. The registration is done with a homography-based approach calibrated against known court dimensions.
Stage 3: Shot Detection and Classification
Shot events are detected using optical flow combined with body pose estimation (MediaPipe Holistic at reduced resolution). We use pose to identify the stroke preparation and follow-through phases, and optical flow to detect the ball contact moment and post-contact ball trajectory.
Classification uses a temporal CNN that takes a 24-frame window around the detected contact point. The model was trained on 47,000 labelled shot events from our match dataset. Accuracy on the held-out test set: 97.9% across all 13 classes.
Stage 4: Technique Scoring
This was the hardest part to get right. Classifying a shot as a bandeja is relatively easy. Scoring whether it was a good bandeja requires defining what "good" looks like in quantitative terms for each shot type across different player levels.
We built a hand-labelled dataset of 4,200 shots across quality levels — poor, acceptable, good, excellent — and trained a regression model on pose-derived features specific to each shot type. Contact point relative to head, elbow angle at impact, shoulder rotation completeness, follow-through arc. The model learns to weight these features differently per shot type.
Stage 5: Debrief Generation
We use a fine-tuned language model to generate coaching debriefs. The LLM is given the structured shot data and technique scores as context and instructed to produce positive-first, specific, actionable coaching output — no generic advice, no fake confidence.
Getting the coaching voice right took more iteration than the CV pipeline. We tested 11 different prompt formulations and trained against a rubric built from feedback from 3 professional padel coaches who rated generated debriefs for accuracy, specificity, and motivational quality.
Infrastructure
The analysis pipeline runs on a Mac mini M4 Max (64 GB unified memory) co-located in Dubai, exposed publicly via Cloudflare tunnel. Yes, this is unconventional. It is also significantly cheaper than equivalent cloud GPU capacity and has proven to be extremely reliable over 8 months of operation. The tunnel handles the variable residential IP without requiring a static address.
Processing time: a 90-minute match recording (typical GCC game length) processes in 8–14 minutes end-to-end. A 30-minute training session in 3–5 minutes.
What We Got Wrong
- •Underestimating glass wall reflections. We spent 6 weeks on detector robustness for reflected-player false positives before landing on a court-mask approach that excludes the glass regions from detection zones.
- •Not building the retry/backoff layer early enough. The Cloudflare tunnel goes down rarely but non-zero. We had three months of production incidents before we built proper circuit-breaking into the dispatch layer.
- •Overcomplicating the court registration for fisheye cameras. Many GCC padel facilities use fisheye-lens cameras for better court coverage. Our initial registration model assumed rectilinear projection. We now run a pre-pass to detect and undistort fisheye before registration.
What We Would Like Feedback On
The technique scoring model uses hand-labelled quality ratings from three coaches as ground truth. We are aware this is a limited and potentially biased training set. We would welcome input on better approaches to generating ground-truth quality labels for sports technique at scale.
The Mac mini infrastructure is effective now at our current volume, but we have no clear migration path when we need to scale. If you have experience with hobby-grade to production-scale computer vision infrastructure transitions, I would love to talk.