Performance and Profiling
=========================

This section presents the results of performance benchmarking conducted on the ``fivedreg`` package.
The benchmarks evaluate training time, memory consumption, and model accuracy across varying dataset sizes.

Benchmark Methodology
---------------------

The profiling was performed using synthetic 5-dimensional polynomial data with the following configuration:

- **Model Architecture**: 3 hidden layers with 64, 32, and 16 neurons respectively
- **Learning Rate**: 0.001
- **Max Iterations**: 500 epochs
- **Dataset Sizes**: 100, 1,000, 2,500, 5,000, 7,500, and 10,000 samples
- **Train/Test Split**: 80/20

The synthetic target function used was a polynomial:

.. math::

   y = x_0^2 + 2 x_1 x_2 + x_3^2 + x_4 + 0.5 x_0 x_4 + \epsilon

where :math:`\epsilon \sim \mathcal{N}(0, 0.1)` represents Gaussian noise.

Benchmark Results
-----------------

.. list-table:: Performance Metrics by Dataset Size
   :widths: 12 12 18 18 18 12 12
   :header-rows: 1

   * - Size
     - Epochs
     - Train Time (s)
     - Train Mem (MiB)
     - Pred Mem (MiB)
     - MSE
     - R²
   * - 100
     - 1
     - 0.51
     - 25.70
     - 1.09
     - 5.53
     - -10.52
   * - 1,000
     - 1
     - 0.48
     - 8.48
     - 1.58
     - 2.35
     - -2.97
   * - 2,500
     - 1
     - 0.51
     - 5.89
     - 1.27
     - 0.20
     - 0.64
   * - 5,000
     - 1
     - 0.51
     - 3.97
     - 2.03
     - 0.17
     - 0.69
   * - 7,500
     - 1
     - 0.55
     - 4.34
     - 1.64
     - 0.13
     - 0.77
   * - 10,000
     - 1
     - 0.55
     - 4.16
     - 1.61
     - 0.03
     - 0.94

.. note::

   The "Epochs" column shows the actual epochs run during memory profiling (limited to 1 for profiling efficiency).
   Training time measurements were taken with the full 500 epochs to capture realistic training performance.

Visualizations
--------------

.. image:: _static/benchmark_plots.png
   :alt: Performance benchmark plots showing training time, memory usage, MSE, and R² score vs dataset size
   :align: center
   :width: 100%

Key Findings
------------

Training Time Performance
^^^^^^^^^^^^^^^^^^^^^^^^^

Training time remains **remarkably constant** across all dataset sizes, averaging approximately **0.5 seconds**:

- 100 samples: ~0.51 seconds
- 10,000 samples: ~0.55 seconds

This near-constant training time demonstrates excellent scalability of the ``LightweightNN`` implementation,
with TensorFlow efficiently handling batch operations regardless of dataset size within this range.

Memory Usage Patterns
^^^^^^^^^^^^^^^^^^^^^

Memory consumption shows an interesting pattern:

- **Training memory**: Higher for small datasets (25.7 MiB at 100 samples), decreasing and stabilizing at 4–6 MiB for larger datasets (2,500+ samples)
- **Prediction memory**: Consistent at approximately 1–2 MiB across all dataset sizes

The elevated memory usage for small datasets is likely due to fixed TensorFlow overhead representing a larger proportion of total memory.
As dataset size increases, this overhead is amortized, resulting in more efficient memory utilization.

Model Accuracy
^^^^^^^^^^^^^^

Model performance improves significantly with larger datasets:

- **Small datasets (100–1,000 samples)**: Poor fit with negative R² scores (-10.52 to -2.97), indicating the model underperforms compared to a mean baseline.
  This is expected given the complexity of the polynomial target function and insufficient training data.

- **Medium datasets (2,500–5,000 samples)**: Acceptable fit with R² between 0.64–0.69.

- **Large datasets (7,500–10,000 samples)**: Strong fit with R² reaching 0.94 and MSE dropping to 0.03.

Recommendations
---------------

Based on the profiling results:

1. **Dataset Size**: For reliable predictions, use at least 7,500+ samples to achieve R² > 0.75. For production applications targeting R² > 0.90, aim for 10,000+ samples.

2. **Memory Planning**: Expect approximately 4–6 MiB for training with datasets of 2,500+ samples. Smaller datasets may require up to 25 MiB due to fixed overhead.

3. **Training Time Budget**: Training completes in approximately **0.5 seconds** regardless of dataset size (up to 10,000 samples), making the model highly efficient for iterative development.

4. **Early Stopping**: For production use, enable early stopping to potentially reduce training time further while maintaining accuracy.

Reproducing the Benchmarks
--------------------------

The benchmarking code is available in the ``fivedreg_profiling/`` directory:

.. code-block:: bash

   cd fivedreg_profiling
   pip install -r requirements.txt
   jupyter notebook profiling.ipynb

Run all cells to regenerate the benchmark results and plots.