Profile picture

Fish Species Classifier

Building and comparing two deep learning approaches to classify 31 species of fish from images

Posted on May 21, 2026

project image

Overview

This project builds an image classification system that identifies 31 species of fish from a photograph.  The dataset was sourced from Google Drive and restructured from scratch. This is because the original directory contained separate train, test and validation folders, each with 31 category subdirectories. All image paths were collected, shuffled and re-split at a 70/15/15 ratio to produce clean, balanced train, validation and test sets before any model was built.

Two completely separate deep learning pipelines were then developed on the same dataset:- a Convolutional Neural Network built entirely from scratch, and a transfer learning model using EfficientNetB0 pre-trained on ImageNet. Building both models on identical data made it possible to directly compare their training behaviour, convergence speed, and test performance. This project is both a working classifier and an experiment in deep learning methodology.

The final transfer learning model is deployed as a live, interactive demo on Hugging Face Spaces. The link is accessible at the bottom of this page. Simply upload a fish image and receive a species prediction in real time.

Background 

Fish species identification has real-world applications in marine biology, fisheries management, and ecological monitoring. Manual identification requires specialist knowledge, but a classifier that works from a photograph could significantly reduce the time and cost involved in species surveys.

This project started as a learning exercise in convolutional neural networks, but quickly raised a more interesting question of "how much does training from scratch actually cost compared to leveraging pre-trained weights from a large dataset like ImageNet?" That question drove the decision to build both models rather than just one, and to hold every other variable constant, e.g. same image size, same batch size, same epoch budget and same optimiser, so the comparison would be meaningful.

The specific questions guiding the project were:

  1. Can a custom CNN, with no pre-trained knowledge, learn to distinguish 31 visually similar fish species within 10 epochs?
  2. How does EfficientNetB0 with its ImageNet weights frozen compare on the same task and the same epoch budget?
  3. What do the training curves reveal about overfitting, convergence speed, and generalisation?
  4. Which model is worth deploying?

 

Tools

Tool/Library How it was used
Python Core language for all data work and model development
TensorFlow/Keras Building, compiling, training, and evaluating both models
EfficientNetB0 Pre-trained ImageNet backbone for the transfer learning model
ImageDataGenerator Data augmentation pipeline for the scratch CNN (rotation, flips, zoom, brightness)
tf.keras.utils.image_dataset_from_directory tf.data pipeline for the EfficientNet model 
NumPy Array manipulation and image preprocessing
Matplotlib Plotting the training and validation accuracy/loss curves over epochs
Google Colab Cloud GPU environment for training both models
Google Drive Data storage and model checkpoint saving
Hugging Face Spaces Live deployment of the final classifier as an interactive web app
Gradio Building the prediction UI on Hugging Face Spaces
GitHub Version control and public repository for both notebooks

 

Key Analysis Highlights

The project is structured in two parallel pipelines, each fully documented in its own colab notebook. Both share the same dataset, image resolution, batch size, optimizer and epoch budget to ensure a fair comparison.

1. Dataset Preparation and Splits

The raw dataset contained 31 fish species across pre-divided folders. Rather than using those original splits, all image paths were collected using os.walk(), shuggled randomly, and re-divided into 70% training, 15% validation, 15% test. This produced consistent, reproducible splits that both models were trained and evaluated against. 

The 31 fish species in the dataset are: 

Animal fish, Bangus, Big Head Carp, Black sptted Barb, Catfish, Climbing Perch, Fourfinger Threadfin, Freshwater Eel, Glass Perchlet, Goby, Gold Fish, Gourami, Grass Carp, Green Spotted Puffer, Indian Carp, Indo-Pacific Tarpon, Jaguar Gapote, Janitor Fish, Knife Fish, Long-Snouted Pipefish, Mosquito Fish, Musfish, Mullet, Pangasius, Perch, Scat Fish, Silver Barb, Silver Carp, Silver Perch, Snakehead, Tilapia.

 

2. CNN from Scratch - Architecture and Training

The scratch model was built as a Sequential network with a progressively deepening convolutional stack, going from 32 filters at the input to 128 filters across all five hidden layers: 

 

Layer Configuration
Input Conv layer Conv2D - 32 filters, (3x3), ReLU, same padding
Hidden Layer 1 Conv2D - 64 filters, (3x3), ReLU → MaxPooling2D → Dropout (0.25)
Hidden Layer 2 Conv2D - 128 filters, (3x3), ReLU → MaxPooling2D → Dropout (0.25)
Hidden Layer 3 Conv2D - 128 filters, (3x3), ReLU → MaxPooling2D → Dropout (0.25)
Hidden Layer 4 Conv2D - 128 filters, (3x3), ReLU → MaxPooling2D → Dropout (0.25)
Hidden Layer 5 Conv2D - 128 filters, (3x3), ReLU → MaxPooling2D → Dropout (0.25)
Flatten Converts 2D feature maps to a 1D vector
Dense 128 units, ReLU
Dropout 0.5 ﹘final regularisation before the output layer
Output Dense - 31 units, Softmax

 

Training Configuration
Parameter Value
Image size 180 x 180 x 3
Batch size 64
Max epochs 10
Early stopping Val_loss, patience = 3, restores best weights
Optimizer Adam (default learning rate)
Loss function Categorical Cross-Entropy
augmentation Rotation ± 15°, width/height ± 20%, horizontal flip, zoom ± 10%, brightness [0.8 - 1.2]

 

Augmentation was applied only to the training set through ImageDataGenerator. Validation and test images were rescaled to [0, 1] with no augmentation.

Notebook: Classification from Scratch 

 

 

3. Transfer Learning:- EfficientNetB0

The second model used EfficientNetB0 loaded with weights="imagenet" and include_top=False, with the entire base frozen (base_model = False). Only the custom head on top was trained: 

Layer Configuration
EfficientNetB0 Pre-trained ImageNet weights, fully frozen - used as a feature extractor
GlobalAveragePooling2D Collapses spatial dimensions of the base output to a single vector
Dense 128 units, ReLU
Dropout 0.5
Output Dense - 31 units, Softmax

 

Training Configuration
Parameters Value
Image size 180 x 180 x 3
Batch size 64
Max epochs 10
Early stopping val_loss, patience = 3, restore best weights
Optimizer Adam (default learning rate)
Loss function Sparse Categorical Cross-Entropy
Preprocessing efficientnet.preprocessing_input() applied via .map() on train and validation sets
Extra callbacks Model Checkpoint (save best only), BackupAndRestore (crash recovery)

 

The dataset was loaded using tf.keras.utils.image_dataset_from_directory rather than ImageDataGenerator, providing a tf.data pipeline. EfficientNet's own preprocess_input() function was applied as a .map() step after loading, scaling pixel values into the range expected by the ImageNet-trained base.

Notebook: Classification using Pre-trained Model

 

 

4. Training Curves

Both models tracked training accuracy, validation accuracy, training loss, and validation loss across all epochs. These were plotted at the end of each notebook as a side-by-side Matplotlib figure with two subplots - accuracy on the left, loss on the right. 

CNN from Scratch -- Underfitting

The scratch model showed clear signs of underfitting throughout all 10 epochs. Training accuracy started at 13.61% and reached only 36.16% by epoch 10, while validation accuracy tracked slightly above it the entire time, ending at 38.97%. Both curves rose together at a slow steady pace with no sign of divergence between them.

Validation accuracy consistently exceeded training accuracy and this is explained by the dropout layers. Dropout randomly deactivates neurons during training, suppressing the training accuracy score, but is switched off entirely at inference time, giving the validation pass a small but consistent advantage. This means the gap is not a sign of good generalisation; it reflets how heavily the model was regularised during training. With both curves still below 40% after 10 full epochs on a 31 class problem, the model simply did not have enough training time to learn the task well. Reducing the Dropout rates, training for more epochs, or simplifying the architecture to reduce the regularisation burden would likely improve results significantly.

 

Epoch Train Accuracy Val Accuracy Train Loss Val Loss
1 13.61% 13.62% 3.3231 3.2806
3 17.86% 19.67% 3.1029 2.9411
5 24.59% 29.42% 2.7994 2.5549
7 30.30% 38.97% 2.5311 2.4543
10 36.16% 38.97% 2.2786 2.1589

 

Test set evaluation: Test Accuracy - 39.04% | Test Loss - 2.1835

 

 

EfficientNetB0 -- Strong and Efficient Learning

The pre-trained model gave a completely different story. By epoch 1 alone, validation accuracy had already reached 79.00% before the model had completed a single full pass of training, the frozen EfficientNetB0 base was producing rich, meaningful features that the classification head could immediately use. Training accuracy started lower at 48.82% in epoch 1, again because Dropout was active during training but disabled at inference.

From epoch 2 onwards both curves climbed steeply and consistently, with the model reaching 91.63% training accuracy and 94.48% validation accuracy by epoch 10. Loss dropped from 1.90 to 0.29 on the training set and from 0.84 to 0.20 on validation. The curves plateaued gradually and converged towards the end. This shows that the model was well-fitted and had learned the task without memorising it.

Note: Epoch 1 took 2,646 seconds while all subsequent epochs completed in 48 - 84 seconds. This is because the tf.data pipeline builds its internal cache on the first pass; from epoch 2 onwards the data served from cached, which is why the training time dropped dramatically. 

 

Epoch Train Accuracy Val Accuracy Train Loss Val Loss
1 48.82% 79.00% 1.9014 0.8432
3 79.90% 89.00% 0.7053 0.4141
5 85.46% 91.77% 0.4951 0.3186
7 88.45% 93.23% 0.3996 0.2466
10 91.63% 94.48% 0.2877 0.2015

 

Test set evaluation: Test Accuracy - 94.43% | Test Loss - 0.2095

 

 

There is a stark contrast between the two models. After 10 epochs, the scratch CNN reached 36% accuracy while EfficientNetB0 reached 94%. The difference is not primarily about architecture depth, it is about what the model already knows before training begins. The frozen EfficientNetB0 base had years of visual feature learning fusioned into its weights. On the other hand, the scratch model had to discover every edge, texture, and shape pattern from nothing, and 10 epochs was not enough time to do that for 31 classes.

 

5. Live Deployment

The trained EfficientB0 model was saved as fish_classifier_model.keras and the class names exported to class_names.json. Both files were uploaded to Hugging Face Spaces where a Gradio interface was built to accept an uploaded image and return a species prediction. 

Live Demo: https://johntemitope-fish-classifier.hf.space

 

Challenges and Learnings

  • Dataset restructuring before training: The original dataset was split into train/test/validation folders, but the splits were not verified for balance. Consolidating all images with os.walk() across nested category subdirectories, shuffling the full list, and re-splitting at 70/15/15 increased the workrate but gave confidence that both models were evaluated on the same, fairly distributed data. Another area of note was the path management across the nested folder structure, this required care to avoid assigning images to the wrong category label.
  • Diagnosing underfitting in the scratch model: The scratch model's training curves showed both training and validation accuracy below 40% after 10 epochs, with validation tracking consistently shown to be above training throughout. This was identified as underfitting rather than the more common overfitting, and if required reading the curves carefully. The heavy Dropout regularisation suppressed training accuracy below what the model was actually capable of at inference time indicating the model needed more epochs to learn 31 classes from scratch.
  • Colab session crashes during EfficientNet training: The EfficientNetB0 notebook included both ModelCheckpoint and BackupAndRestore callback specifically because Colab GPU sessions can disconnect mid-training without warning. A recovery cell was also written to reload the best saved checkpoint and resume training. This is an essential in a workflow having multi epoch training.
  • Two different data pipeline approaches: The scratch model used ImageDataGenerator.flow_from_directory(), while the EfficientNet model used tf.keras.utils.image_dataset_from directory() with a .map() preprocessing step. These handle pixel scaling differently: ImageDataGenerator rescales with rescale=1./255 inline, while the tf.data pipeline required a separate application of efficientnet.preprocesss_input(), which scales pixel values differently to match what the ImageNet-trained base expects. Mixing these up would produce silently poor results with no obvious error message.
  • Matching preprocessing at inference time: When deploying to Hugging Face Spaces, the same preprocess_input() scaling applied during training had to be replicated exacted in the Gradio inference function. A mismatch here produces confident but wrong predictions, a subtle bug that required careful comparison of the training pipeline and the deployed code before the demo worked correctly.

 

Less Challenging Activities

  • Building the model architectures: Keras' Sequential API made both architectures straighforward to assemble. Each layer reads almost like a written specification, which made reviewing the design and adjusting it between experiments quick and clear.
  • Setting up EarlyStopping: Once the callback attern was understood, adding patience=3 and restore_back_weights=True removed the need to manually decide when to stop training and ensured the save model was always the best performing checkpoint rather than the final epoch.
  • Plotting training curves: Pulling history.history["accuracy"] and history.history["val_accuracy"] into a Matplotlib figure took only a few lines and made the behavioural difference between the two models visible. 
  • Saving and exporting the model: Keras' model.save() to .keras format and the accompanying class_names.json export were clean and reliable operations. Having both file made the Hugging Face deployment straightforward.
  • Data augmentation setup: ImageDataGenerator with rotation, shift, flip, zoom, and brightness parameters was simple to configure and rquired no custom code beyond setting the parameter values.

 

Result / Outcome

Both models were capable of classifying fish species, but the results were not close. On the test set, the EfficientNetB0 transfer learning model achieved 94.43% accuracy (loss: 0.2095) in 10 epochs; the scratch CNN achieved only 39.04% accuracy (loss: 2.1835) in the same number of epochs and showed clear signs of underfitting throughout. These test results closely matched the validation results from training, confirming that both models generalised consistently to truly unseen data. The pre-trained model converged faster, used its epoch budget far more efficiently, and is the clear choice for deployment.

Using the scratch model involved building a six layer convolutional architecture from first principles, diagnosing underfitting from training curves, and understanding why validation accuracy can sit above training accuracy even when the model is struggling. These are valuable learning approaches that are intuitive and can only be done by training from scratch and not by using a pre-trained base. The lesson was direct and observable. With limited data and a fixed epoch budget, starting from random weights puts a CNN at a severe disadvantage against a model that already knows how to see. Beyond the classification results, the project developed a full end-to-end pipeline, i.e. raw dataset restructuring, augmented training, training curve analysis, model checkpointing, crash recovery and live deployment. The side-by-side comparison of models was the most valuable structural decision.

 

Live Demo: https://johntemitope-fish-classifier.hf.space

Hugging Face Space: https://huggingface.co/spaces/Johntemitope/Fish_Classifier

Code GitHub: https://github.com/John-Temitope/fish_classifier.git

Dataset: Google Drive