This is a classic case of train-validation-test domain shift, commonly observed in industrial defect detection systems. Your model performs well on your validation dataset (likely from clean or curated data), but fails in the real factory setup. Here’s a structured approach to diagnosing and fixing it.
🔍 Root Cause Analysis
1. Domain Shift
- Lighting conditions (harsh shadows, reflections, inconsistent exposure)
- Camera differences (resolution, lens distortion, focus, noise)
- Product variations (small differences in textures, colors, shapes)
- Different defect types or frequencies in real data
- Background clutter or motion blur
2. Overfitting to Validation Set
- Your validation set might be too similar to the training set.
- You might be testing on patches/images that are manually cleaned, centered, or normalized.
3. Class Imbalance or Labeling Drift
- Defect types might be rare or underrepresented in training.
- Labels in the real factory may not match the assumptions made during training (e.g., missing defect boundaries, new defect types).
🧪 What to Explore
A. Real-vs-Validation Distribution Gap
Use embedding visualization (e.g., t-SNE, UMAP) to compare:
# Example using embeddings from a CNN backbone
tsne = TSNE(n_components=2)
real_embed = model.extract_features(factory_images)
val_embed = model.extract_features(validation_images)
embeddings = np.concatenate([real_embed, val_embed])
labels = ['factory'] * len(real_embed) + ['val'] * len(val_embed)
# Plot t-SNE to see if they cluster separately
B. Visual Error Analysis
Manually inspect the false positives and false negatives:
- What kinds of defects are missed?
- Are background artifacts confusing the model?
- Are non-defect regions being misclassified?
C. Augmentation Mismatch
Check if your augmentations reflect factory variation:
- Random lighting
- Motion blur
- Partial occlusions
- Low contrast
- Specular highlights
🛠️ How to Fix It
1. Domain Adaptation Techniques
A. Data-centric fixes
- Collect real factory data, even if unlabeled
- Use self-supervised pretraining (e.g., MoCo, SimCLR, DINO) on factory images to initialize the backbone
- Add hard samples (false positives/negatives from factory) into retraining
B. Model-centric fixes
- Fine-tune on a small labeled subset of real factory data
- Apply unsupervised domain adaptation (e.g., CORAL, MMD loss)
- Use domain generalization methods like MixStyle, RandAugment, or Tent
2. Rebalancing & Curriculum Learning
- Add more defect types or underrepresented classes
- Use focal loss or class-balanced loss
- Start training on easy examples, then introduce harder (factory) cases
3. Better Data Pipelines
- Normalize images with factory-specific statistics (mean/std from factory)
- Apply domain-specific augmentations (e.g., Gaussian blur, JPEG compression)
- Use test-time augmentation (TTA) during inference
📈 Evaluation Protocol Fix
- Split real factory data into pseudo-validation, calibration, and test buckets.
- Perform cross-domain validation (train on lab, validate on factory).
- Try cross-camera, cross-time validation to simulate shift.
✅ Summary: Actions to Take
Category | Actions |
---|---|
Data | Collect hard cases from factory, augment with real-world noise |
Model | Fine-tune on real samples, apply domain adaptation |
Validation | Redefine validation strategy to include factory images |
Augmentation | Use domain-relevant augmentations (blur, lighting, occlusion, etc.) |
Loss Function | Try focal loss, mixup, or uncertainty-aware losses |
If you can share specific details (e.g., defect types, training vs. real data samples, model type), I can tailor these recommendations further.
Certainly. Let’s go deep into Unsupervised Domain Adaptation (UDA), especially in the context of image-based defect detection. You’ll get:
- The problem setup
- Mathematical formulation
- Key UDA techniques
- Example methods with intuition
- Implementation strategies
🔧 Problem Setup
You have:
-
Source domain: labeled training data \(\mathcal{D}_S = \{(x_i^S, y_i^S)\}_{i=1}^N\)
-
Target domain: unlabeled test (factory) data \(\mathcal{D}_T = \{x_j^T\}_{j=1}^M\)
Both domains have different distributions:
\[P_S(x, y) \ne P_T(x, y)\]But your goal is to train a model that performs well on the target domain, even though you don’t have target labels.
📐 Objective of UDA
Learn a model $f(x) = y$ such that:
- It performs well on source ($\mathcal{D}_S$)
- It generalizes to target ($\mathcal{D}_T$)
This is done by aligning feature distributions between source and target so that the classifier trained on source features can work on target features.
🧠 Core UDA Strategies
UDA techniques can be categorized into 3 major approaches:
1. Feature Alignment
The goal is to make the feature distributions of source and target match.
A. Domain Adversarial Training (DANN)
Idea: Use a domain classifier to distinguish source vs target features, and confuse it with a feature extractor.
Architecture:
Image -> Feature Extractor -> (1) Classifier
(2) Domain Discriminator
- Use gradient reversal layer (GRL) to reverse gradients from the domain discriminator.
-
Objective:
\[\min_{F,C} \max_D \left[ L_{\text{cls}}(C(F(x^S)), y^S) - \lambda L_{\text{dom}}(D(F(x)), d) \right]\]where $d=0$ for source, $d=1$ for target.
This forces the feature extractor $F$ to learn domain-invariant representations.
B. CORAL (CORrelation ALignment)
Match second-order statistics (covariances) between source and target:
\[\text{CORAL loss} = \| \text{Cov}(F(x^S)) - \text{Cov}(F(x^T)) \|_F^2\]It’s simple and effective for aligning distributions.
C. MMD (Maximum Mean Discrepancy)
Minimize the distance between distributions in RKHS:
\[\text{MMD}^2 = \| \mu_S - \mu_T \|_{\mathcal{H}}^2\]Where $\mu_S = \mathbb{E}[F(x^S)]$, $\mu_T = \mathbb{E}[F(x^T)]$.
Used in methods like Deep Adaptation Networks (DAN).
2. Self-Supervised Learning on Target
Even if the target has no labels, you can pretrain or co-train the feature extractor using self-supervised tasks:
- Rotation prediction
- Jigsaw puzzles
- BYOL, SimCLR, MoCo
- DINO (for vision transformers)
This helps structure the feature space on the target domain, improving generalization.
3. Pseudo-Labeling
Assign pseudo-labels to confident target predictions and retrain the model:
- Predict $\hat{y}_j = f(x_j^T)$
- Pick confident examples: $\max(\hat{y}_j) > \tau$
- Train on $(x_j^T, \hat{y}_j)$ as if labeled
Refine pseudo-labels iteratively (self-training).
🏗 Example: DANN (Domain Adversarial Neural Network)
class FeatureExtractor(nn.Module):
def __init__(self):
...
def forward(self, x): return features
class ClassClassifier(nn.Module):
def forward(self, features): return logits
class DomainClassifier(nn.Module):
def forward(self, features): return domain_logits
class GRL(Function):
@staticmethod
def forward(ctx, x): return x.view_as(x)
@staticmethod
def backward(ctx, grad_output): return -grad_output
Training loop:
features = extractor(input)
class_output = class_classifier(features)
domain_output = domain_classifier(GRL.apply(features))
# Losses
loss_class = CE(class_output, label)
loss_domain = BCE(domain_output, domain_label)
loss_total = loss_class + λ * loss_domain
📊 Evaluation Metrics for UDA
- Target accuracy (if you have few labels for evaluation)
- t-SNE plots of source and target embeddings
- Domain classifier accuracy (ideally should be ~50%)
✅ Summary
Technique | Goal | Example Methods |
---|---|---|
Feature Alignment | Align feature distributions | DANN, MMD, CORAL |
Self-supervision | Structure target domain features | SimCLR, MoCo, DINO |
Pseudo-labeling | Train on confident target preds | FixMatch, Mean Teacher |
Unsupervised Domain Adaptation
If your factory setup differs a lot (e.g., different lighting or background), DANN or CORAL + self-supervised pretraining is often the most effective combo.
Would you like code implementations or papers for a specific method like DANN or MMD?