Performance and Profiling

This section presents the results of performance benchmarking conducted on the fivedreg package. The benchmarks evaluate training time, memory consumption, and model accuracy across varying dataset sizes.

Benchmark Methodology

The profiling was performed using synthetic 5-dimensional polynomial data with the following configuration:

  • Model Architecture: 3 hidden layers with 64, 32, and 16 neurons respectively

  • Learning Rate: 0.001

  • Max Iterations: 500 epochs

  • Dataset Sizes: 100, 1,000, 2,500, 5,000, 7,500, and 10,000 samples

  • Train/Test Split: 80/20

The synthetic target function used was a polynomial:

\[y = x_0^2 + 2 x_1 x_2 + x_3^2 + x_4 + 0.5 x_0 x_4 + \epsilon\]

where \(\epsilon \sim \mathcal{N}(0, 0.1)\) represents Gaussian noise.

Benchmark Results

Performance Metrics by Dataset Size

Size

Epochs

Train Time (s)

Train Mem (MiB)

Pred Mem (MiB)

MSE

100

1

0.51

25.70

1.09

5.53

-10.52

1,000

1

0.48

8.48

1.58

2.35

-2.97

2,500

1

0.51

5.89

1.27

0.20

0.64

5,000

1

0.51

3.97

2.03

0.17

0.69

7,500

1

0.55

4.34

1.64

0.13

0.77

10,000

1

0.55

4.16

1.61

0.03

0.94

Note

The “Epochs” column shows the actual epochs run during memory profiling (limited to 1 for profiling efficiency). Training time measurements were taken with the full 500 epochs to capture realistic training performance.

Visualizations

Performance benchmark plots showing training time, memory usage, MSE, and R² score vs dataset size

Key Findings

Training Time Performance

Training time remains remarkably constant across all dataset sizes, averaging approximately 0.5 seconds:

  • 100 samples: ~0.51 seconds

  • 10,000 samples: ~0.55 seconds

This near-constant training time demonstrates excellent scalability of the LightweightNN implementation, with TensorFlow efficiently handling batch operations regardless of dataset size within this range.

Memory Usage Patterns

Memory consumption shows an interesting pattern:

  • Training memory: Higher for small datasets (25.7 MiB at 100 samples), decreasing and stabilizing at 4–6 MiB for larger datasets (2,500+ samples)

  • Prediction memory: Consistent at approximately 1–2 MiB across all dataset sizes

The elevated memory usage for small datasets is likely due to fixed TensorFlow overhead representing a larger proportion of total memory. As dataset size increases, this overhead is amortized, resulting in more efficient memory utilization.

Model Accuracy

Model performance improves significantly with larger datasets:

  • Small datasets (100–1,000 samples): Poor fit with negative R² scores (-10.52 to -2.97), indicating the model underperforms compared to a mean baseline. This is expected given the complexity of the polynomial target function and insufficient training data.

  • Medium datasets (2,500–5,000 samples): Acceptable fit with R² between 0.64–0.69.

  • Large datasets (7,500–10,000 samples): Strong fit with R² reaching 0.94 and MSE dropping to 0.03.

Recommendations

Based on the profiling results:

  1. Dataset Size: For reliable predictions, use at least 7,500+ samples to achieve R² > 0.75. For production applications targeting R² > 0.90, aim for 10,000+ samples.

  2. Memory Planning: Expect approximately 4–6 MiB for training with datasets of 2,500+ samples. Smaller datasets may require up to 25 MiB due to fixed overhead.

  3. Training Time Budget: Training completes in approximately 0.5 seconds regardless of dataset size (up to 10,000 samples), making the model highly efficient for iterative development.

  4. Early Stopping: For production use, enable early stopping to potentially reduce training time further while maintaining accuracy.

Reproducing the Benchmarks

The benchmarking code is available in the fivedreg_profiling/ directory:

cd fivedreg_profiling
pip install -r requirements.txt
jupyter notebook profiling.ipynb

Run all cells to regenerate the benchmark results and plots.