PHPackages                             hi-folks/statistics - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. hi-folks/statistics

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

hi-folks/statistics
===================

PHP package that provides functions for calculating mathematical statistics of numeric data.

v1.5.0(2mo ago)398103.5k—1.5%29[1 PRs](https://github.com/Hi-Folks/statistics/pulls)MITPHPPHP ^8.2|^8.3|^8.4|8.5CI passing

Since Jan 8Pushed 2mo ago2 watchersCompare

[ Source](https://github.com/Hi-Folks/statistics)[ Packagist](https://packagist.org/packages/hi-folks/statistics)[ Docs](https://github.com/hi-folks/statistics)[ RSS](/packages/hi-folks-statistics/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (39)Used By (0)

 [![PHP package for Statistics](https://repository-images.githubusercontent.com/445609326/e2539776-0f8f-4556-be1d-887ea2368813)](https://repository-images.githubusercontent.com/445609326/e2539776-0f8f-4556-be1d-887ea2368813)

 Statistics PHP package
========================

[](#----statistics-php-package)

 [ ![Latest Version on Packagist](https://camo.githubusercontent.com/377901666648f560460dde5b76051515e45e47a300855b5dbb7edf6384da2c0b/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f68692d666f6c6b732f737461746973746963732e7376673f7374796c653d666f722d7468652d6261646765) ](https://packagist.org/packages/hi-folks/statistics) [ ![Total Downloads](https://camo.githubusercontent.com/42505c451c2126b6f8ec6f8c3e3bc3b7493ac64c4a5940951941f18fccb95cf3/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f68692d666f6c6b732f737461746973746963732e7376673f7374796c653d666f722d7468652d6261646765) ](https://packagist.org/packages/hi-folks/statistics)
 [ ![Static Code analysis](https://camo.githubusercontent.com/b530001ce5fa01a64adb4686abe5ca9deab65652b9d4195bdbcd5261406a6784/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048505374616e2d6c6576656c253230382d627269676874677265656e2e7376673f7374796c653d666f722d7468652d6261646765) ](https://github.com/Hi-Folks/statistics/blob/main/.github/workflows/static-code-analysis.yml) [![Packagist License](https://camo.githubusercontent.com/4f9fee033dacc9a439a6511125288be0af997caab560d67f15184be238c48e2e/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765)](https://camo.githubusercontent.com/4f9fee033dacc9a439a6511125288be0af997caab560d67f15184be238c48e2e/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765)
 [![Packagist PHP Version Support](https://camo.githubusercontent.com/f01f090a482a8f572e047d21db3192c99cff662678347f6dd08459c1f9a6ea09/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765)](https://camo.githubusercontent.com/f01f090a482a8f572e047d21db3192c99cff662678347f6dd08459c1f9a6ea09/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765) [![GitHub last commit](https://camo.githubusercontent.com/fd33088caa03cb44ed0ae4b7b53c65b5c9f52bebadc5c457ceb2bf10263a5048/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6173742d636f6d6d69742f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765)](https://camo.githubusercontent.com/fd33088caa03cb44ed0ae4b7b53c65b5c9f52bebadc5c457ceb2bf10263a5048/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6173742d636f6d6d69742f68692d666f6c6b732f737461746973746963733f7374796c653d666f722d7468652d6261646765)

 [ ![Tests](https://github.com/hi-folks/statistics/actions/workflows/run-tests.yml/badge.svg?branch=main&style=for-the-badge) ](https://github.com/hi-folks/statistics/actions/workflows/run-tests.yml)

  *A PHP package for descriptive statistics, normal distribution, outlier detection, and streaming analytics on numeric data.*

This package provides a comprehensive set of statistical functions for PHP: descriptive statistics (mean, median, mode, standard deviation, variance, quantiles), robust measures (trimmed mean, weighted median, median absolute deviation), distribution modelling (normal distribution with PDF, CDF, and inverse CDF), outlier detection (z-score and IQR-based), z-scores, percentiles, coefficient of variation, frequency tables, correlation, regression (linear, logarithmic, power, and exponential), kernel density estimation, and O(1) memory streaming statistics.

It works with any numeric dataset — from sports telemetry and sensor data to race results, survey responses, and financial time series.

**Articles and resources:**

- [Exploring Olympic Downhill Results with PHP Statistics](https://dev.to/robertobutti/exploring-olympic-downhill-results-with-php-statistics-3eo1) — a step-by-step analysis of 2026 Olympic downhill race data
- [Statistics with PHP](https://dev.to/robertobutti/statistics-with-php-4pfp) — introduction to the package and its core functions
- [PHP Statistics on Laravel News](https://laravel-news.com/php-statistics)

> This package is inspired by the [Python statistics module](https://docs.python.org/3/library/statistics.html)

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require hi-folks/statistics
```

Usage
-----

[](#usage)

### Stat class

[](#stat-class)

Stat class has methods to calculate an average or typical value from a population or sample. This class provides methods for calculating mathematical statistics of numeric data. The various mathematical statistics are listed below:

Mathematical StatisticDescription`mean()`arithmetic mean or "average" of data`fmean()`floating-point arithmetic mean, with optional weighting and precision`trimmedMean()`trimmed (truncated) mean — mean after removing outliers from each side`median()`median or "middle value" of data`weightedMedian()`weighted median — median with weights, where each value has a different importance`medianLow()`low median of data`medianHigh()`high median of data`medianGrouped()`median of grouped data, using interpolation`mode()`single mode (most common value) of discrete or nominal data`multimode()`list of modes (most common values) of discrete or nominal data`quantiles()`cut points dividing the range of a probability distribution into continuous intervals with equal probabilities (supports `exclusive` and `inclusive` methods)`thirdQuartile()`3rd quartile, is the value at which 75 percent of the data is below it`firstQuartile()`first quartile, is the value at which 25 percent of the data is below it`percentile()`value at any percentile (0–100) with linear interpolation`pstdev()`Population standard deviation`stdev()`Sample standard deviation`sem()`Standard error of the mean (SEM) — measures precision of the sample mean`meanAbsoluteDeviation()`mean absolute deviation (MAD) — average distance from the mean`medianAbsoluteDeviation()`median absolute deviation — median distance from the median, robust to outliers`pvariance()`variance for a population (supports pre-computed mean via `mu`)`variance()`variance for a sample (supports pre-computed mean via `xbar`)`skewness()`adjusted Fisher-Pearson sample skewness`pskewness()`population (biased) skewness`kurtosis()`excess kurtosis (sample formula, 0 for normal distribution)`coefficientOfVariation()`coefficient of variation (CV%), relative dispersion as percentage`zscores()`z-scores for each value — how many standard deviations from the mean`outliers()`outlier detection based on z-score threshold`iqrOutliers()`outlier detection based on IQR method (box plot whiskers), robust for skewed data`geometricMean()`geometric mean`harmonicMean()`harmonic mean`correlation()`Pearson’s or Spearman’s rank correlation coefficient for two inputs`covariance()`the sample covariance of two inputs`linearRegression()`return the slope and intercept of simple linear regression parameters estimated using ordinary least squares (supports `proportional: true` for regression through the origin)`logarithmicRegression()`logarithmic regression — fits `y = a × ln(x) + b`, ideal for diminishing returns patterns (e.g., athletic improvement, learning curves)`powerRegression()`power regression — fits `y = a × x^b`, useful for power law relationships`exponentialRegression()`exponential regression — fits `y = a × e^(b×x)`, useful for exponential growth or decay`rSquared()`coefficient of determination (R²) — proportion of variance explained by linear regression`confidenceInterval()`confidence interval for the mean using the normal (z) distribution`zTest()`one-sample Z-test — tests whether the sample mean differs significantly from a hypothesized population mean`tTest()`one-sample t-test — like z-test but appropriate for small samples where the population standard deviation is unknown`tTestTwoSample()`two-sample independent t-test (Welch's) — compares the means of two independent groups without assuming equal variances`tTestPaired()`paired t-test — tests whether the mean difference between paired observations is significantly different from zero`kde()`kernel density estimation — returns a closure that estimates the probability density (or CDF) at any point`kdeRandom()`random sampling from a kernel density estimate — returns a closure that generates random floats from the KDE distribution#### Stat::mean( array $data )

[](#statmean-array-data-)

Return the sample arithmetic mean of the array *$data*. The arithmetic mean is the sum of the data divided by the number of data points. It is commonly called “the average”, although it is only one of many mathematical averages. It is a measure of the central location of the data.

```
use HiFolks\Statistics\Stat;
$mean = Stat::mean([1, 2, 3, 4, 4]);
// 2.8
$mean = Stat::mean([-1.0, 2.5, 3.25, 5.75]);
// 2.625
```

#### Stat::fmean( array $data, array|null $weights = null, int|null $precision = null )

[](#statfmean-array-data-arraynull-weights--null-intnull-precision--null-)

Return the arithmetic mean of the array `$data`, as a float, with optional weights and precision control. This function behaves like `mean()` but ensures a floating-point result and supports weighted datasets. If `$weights` is provided, it computes the weighted average. The result is rounded to a given decimal $precision. The result is rounded to `$precision` decimal places. If `$precision` is null, no rounding is applied — this may lead to results with long or unexpected decimal expansions due to the nature of floating-point arithmetic in PHP. Using rounding helps ensure cleaner, more predictable output.

```
use HiFolks\Statistics\Stat;

// Unweighted mean (same as mean but always float)
$fmean = Stat::fmean([3.5, 4.0, 5.25]);
// 4.25

// Weighted mean
$fmean = Stat::fmean([3.5, 4.0, 5.25], [1, 2, 1]);
// 4.1875

// Custom precision
$fmean = Stat::fmean([3.5, 4.0, 5.25], null, 2);
// 4.25

$fmean = Stat::fmean([3.5, 4.0, 5.25], [1, 2, 1], 3);
// 4.188
```

If the input is empty, or weights are invalid (e.g., length mismatch or sum is zero), an exception is thrown. Use this function when you need floating-point accuracy or to apply custom weighting and rounding to your average.

#### Stat::trimmedMean( array $data, float $proportionToCut = 0.1, ?int $round = null )

[](#stattrimmedmean-array-data-float-proportiontocut--01-int-round--null-)

Return the trimmed (truncated) mean of the data. Computes the mean after removing the lowest and highest fraction of values. This is a robust measure of central tendency, less sensitive to outliers than the regular mean.

The `$proportionToCut` parameter specifies the fraction to trim from **each** side (must be in the range `[0, 0.5)`). For example, `0.1` removes the bottom 10% and top 10%.

```
use HiFolks\Statistics\Stat;
$mean = Stat::trimmedMean([1, 2, 3, 4, 5, 6, 7, 8, 9, 100], 0.1);
// 5.5 (outlier 100 and lowest value 1 removed)

$mean = Stat::trimmedMean([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 0.2);
// 5.5 (removes 2 values from each side)

$mean = Stat::trimmedMean([1, 2, 3, 4, 5], 0.0);
// 3.0 (no trimming, same as regular mean)
```

#### Stat::geometricMean( array $data )

[](#statgeometricmean-array-data-)

The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).

```
use HiFolks\Statistics\Stat;
$mean = Stat::geometricMean([54, 24, 36], 1);
// 36.0
```

#### Stat::harmonicMean( array $data )

[](#statharmonicmean-array-data-)

The harmonic mean is the reciprocal of the arithmetic mean() of the reciprocals of the data. For example, the harmonic mean of three values a, b, and c will be equivalent to 3/(1/a + 1/b + 1/c). If one of the values is zero, the result will be zero.

```
use HiFolks\Statistics\Stat;
$mean = Stat::harmonicMean([40, 60], null, 1);
// 48.0
```

You can also calculate the harmonic weighted mean. Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?

```
use HiFolks\Statistics\Stat;
Stat::harmonicMean([40, 60], [5, 30], 1);
// 56.0
```

where:

- 40, 60: are the elements
- 5, 30: are the weights for each element (the first weight is the weight of the first element, the second one is the weight of the second element)
- 1: is the decimal numbers you want to round

#### Stat::median( array $data )

[](#statmedian-array-data-)

Return the median (middle value) of numeric data, using the common “mean of middle two” method.

```
use HiFolks\Statistics\Stat;
$median = Stat::median([1, 3, 5]);
// 3
$median = Stat::median([1, 3, 5, 7]);
// 4
```

#### Stat::weightedMedian( array $data, array $weights, ?int $round = null )

[](#statweightedmedian-array-data-array-weights-int-round--null-)

Return the weighted median of the data. The weighted median is the value where the cumulative weight reaches 50% of the total weight. This is useful for survey data, financial analysis, or any dataset where observations have different importance.

All weights must be positive numbers and the weights array must have the same length as the data array.

```
use HiFolks\Statistics\Stat;
$median = Stat::weightedMedian([1, 2, 3], [1, 1, 1]);
// 2.0 (equal weights, same as regular median)

$median = Stat::weightedMedian([1, 2, 3], [1, 1, 10]);
// 3.0 (heavy weight on 3 pulls the median)

$median = Stat::weightedMedian([1, 2, 3, 4], [1, 1, 1, 1]);
// 2.5 (equal weights, even count — averages the two middle values)
```

#### Stat::medianLow( array $data )

[](#statmedianlow-array-data-)

Return the low median of numeric data. The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned.

```
use HiFolks\Statistics\Stat;
$median = Stat::medianLow([1, 3, 5]);
// 3
$median = Stat::medianLow([1, 3, 5, 7]);
// 3
```

#### Stat::medianHigh( array $data )

[](#statmedianhigh-array-data-)

Return the high median of data. The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned.

```
use HiFolks\Statistics\Stat;
$median = Stat::medianHigh([1, 3, 5]);
// 3
$median = Stat::medianHigh([1, 3, 5, 7]);
// 5
```

#### Stat::medianGrouped( array $data, float $interval = 1.0 )

[](#statmediangrouped-array-data-float-interval--10-)

Estimate the median for numeric data that has been grouped or binned around the midpoints of consecutive, fixed-width intervals. The `$interval` parameter specifies the width of each bin (default `1.0`). This function uses interpolation within the median interval, assuming values are evenly distributed across each bin.

```
use HiFolks\Statistics\Stat;
$median = Stat::medianGrouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]);
// 3.7
$median = Stat::medianGrouped([1, 3, 3, 5, 7]);
// 3.25
$median = Stat::medianGrouped([1, 3, 3, 5, 7], 2);
// 3.5
```

For example, demographic data summarized into ten-year age groups:

```
use HiFolks\Statistics\Stat;
// 172 people aged 20-30, 484 aged 30-40, 387 aged 40-50, etc.
$data = array_merge(
    array_fill(0, 172, 25),
    array_fill(0, 484, 35),
    array_fill(0, 387, 45),
    array_fill(0, 22, 55),
    array_fill(0, 6, 65),
);
round(Stat::medianGrouped($data, 10), 1);
// 37.5
```

#### Stat::quantiles( array $data, $n=4, $round=null, $method='exclusive' )

[](#statquantiles-array-data-n4-roundnull-methodexclusive--)

Divide data into n continuous intervals with equal probability. Returns a list of n - 1 cut points separating the intervals. Set n to 4 for quartiles (the default). Set n to 10 for deciles. Set n to 100 for percentiles which gives the 99 cut points that separate data into 100 equal-sized groups.

The `$method` parameter controls the interpolation method:

- `'exclusive'` (default): uses `m = count + 1`. Suitable for sampled data that may have more extreme values beyond the sample.
- `'inclusive'`: uses `m = count - 1`. Suitable for population data or samples known to include the most extreme values. The minimum value is treated as the 0th percentile and the maximum as the 100th percentile.

```
use HiFolks\Statistics\Stat;
$quantiles = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
// [ 55.0, 88.0, 92.0 ]
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,103, 107, 101, 81, 109, 104], 10);
// [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]

// Inclusive method
$quantiles = Stat::quantiles([1, 2, 3, 4, 5], method: 'inclusive');
// [2.0, 3.0, 4.0]
```

#### Stat::firstQuartile( array $data, $round=null )

[](#statfirstquartile-array-data-roundnull--)

The lower quartile, or first quartile (Q1), is the value under which 25% of data points are found when they are arranged in increasing order.

```
use HiFolks\Statistics\Stat;
$percentile = Stat::firstQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 55.0
```

#### Stat::thirdQuartile( array $data, $round=null )

[](#statthirdquartile-array-data-roundnull--)

The upper quartile, or third quartile (Q3), is the value under which 75% of data points are found when arranged in increasing order.

```
use HiFolks\Statistics\Stat;
$percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 92.0
```

#### Stat::percentile( array $data, float $p, ?int $round = null )

[](#statpercentile-array-data-float-p-int-round--null-)

Return the value at the given percentile of the data, using linear interpolation between adjacent data points (exclusive method, consistent with `quantiles()`).

The percentile `$p` must be between 0 and 100. Requires at least 2 data points.

```
use HiFolks\Statistics\Stat;
$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 50);
// 55.0 (median)

$value = Stat::percentile([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], 90);
// 91.0
```

#### Stat::pstdev( array $data )

[](#statpstdev-array-data-)

Return the **Population** Standard Deviation, a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

```
use HiFolks\Statistics\Stat;
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 0.986893273527251
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 0.9869
```

#### Stat::stdev( array $data )

[](#statstdev-array-data-)

Return the **Sample** Standard Deviation, a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

```
use HiFolks\Statistics\Stat;
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]);
// 1.0810874155219827
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 1.0811
```

#### Stat::sem( array $data, ?int $round = null )

[](#statsem-array-data-int-round--null-)

Return the standard error of the mean (SEM). SEM measures how precisely the sample mean estimates the population mean. It decreases as the sample size grows.

Formula: `stdev / sqrt(n)`

Requires at least 2 data points.

```
use HiFolks\Statistics\Stat;
$sem = Stat::sem([2, 4, 4, 4, 5, 5, 7, 9]);
// 0.7559...

$sem = Stat::sem([2, 4, 4, 4, 5, 5, 7, 9], 4);
// 0.7559
```

#### Stat::meanAbsoluteDeviation( array $data, ?int $round = null )

[](#statmeanabsolutedeviation-array-data-int-round--null-)

Return the mean absolute deviation (MAD) — the average of the absolute deviations from the mean.

MAD is a simple, intuitive measure of dispersion: it tells you "on average, how far values are from the mean". Unlike standard deviation, it does not square the differences, making it easier to interpret and somewhat less sensitive to outliers.

Use MAD when you want a straightforward, interpretable measure of spread, especially for reporting to non-technical audiences.

```
use HiFolks\Statistics\Stat;
$mad = Stat::meanAbsoluteDeviation([1, 2, 3, 4, 5]);
// 1.2

$mad = Stat::meanAbsoluteDeviation([1, 2, 3, 4, 5], 1);
// 1.2
```

#### Stat::medianAbsoluteDeviation( array $data, ?int $round = null )

[](#statmedianabsolutedeviation-array-data-int-round--null-)

Return the median absolute deviation — the median of the absolute deviations from the median.

This is one of the most **robust measures of dispersion** available. Because it uses the median (not the mean) as the center and takes the median (not the mean) of deviations, it is highly resistant to outliers. Even if up to half the data points are extreme, the median absolute deviation remains stable.

Use it when your data may contain outliers, when you need a robust alternative to standard deviation, or for outlier detection (values far from the median in units of MAD are likely outliers).

```
use HiFolks\Statistics\Stat;
$mad = Stat::medianAbsoluteDeviation([1, 2, 3, 4, 5]);
// 1.0

// Robust to outliers — the outlier 1000 does not affect the result:
$mad = Stat::medianAbsoluteDeviation([1, 2, 3, 4, 1000]);
// 1.0
```

#### Stat::variance ( array $data, ?int $round = null, int|float|null $xbar = null)

[](#statvariance--array-data-int-round--null-intfloatnull-xbar--null)

Variance is a measure of dispersion of data points from the mean. Low variance indicates that data points are generally similar and do not vary widely from the mean. High variance indicates that data values have greater variability and are more widely dispersed from the mean.

To calculate the variance from a *sample*:

```
use HiFolks\Statistics\Stat;
$variance = Stat::variance([2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]);
// 1.3720238095238095
```

If you have already computed the mean, you can pass it via `xbar` to avoid recalculation:

```
$data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5];
$mean = Stat::mean($data);
$variance = Stat::variance($data, xbar: $mean);
```

If you need to calculate the variance on the whole population and not just on a sample you need to use *pvariance* method. You can optionally pass the population mean via `mu`:

```
use HiFolks\Statistics\Stat;
$variance = Stat::pvariance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]);
// 1.25

// With pre-computed mean:
$data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25];
$mu = Stat::mean($data);
$variance = Stat::pvariance($data, mu: $mu);
```

#### Stat::skewness ( array $data, ?int $round = null )

[](#statskewness--array-data-int-round--null-)

Skewness is a measure of the asymmetry of a distribution. The adjusted Fisher-Pearson formula is used, which is the same as Excel's `SKEW()` and Python's `scipy.stats.skew(bias=False)`.

A positive skewness indicates a right-skewed distribution (tail extends to the right), while a negative skewness indicates a left-skewed distribution. A symmetric distribution has a skewness of 0.

Requires at least 3 data points.

```
use HiFolks\Statistics\Stat;
$skewness = Stat::skewness([1, 2, 3, 4, 5]);
// 0.0 (symmetric)

$skewness = Stat::skewness([1, 1, 1, 1, 1, 10]);
// positive (right-skewed)
```

If you need the population (biased) skewness instead of the sample skewness, use `pskewness()`. This is equivalent to `scipy.stats.skew(bias=True)`:

```
use HiFolks\Statistics\Stat;
$pskewness = Stat::pskewness([1, 1, 1, 1, 1, 10]);
```

#### Stat::kurtosis ( array $data, ?int $round = null )

[](#statkurtosis--array-data-int-round--null-)

Kurtosis measures the "tailedness" of a distribution — how much data lives in the extreme tails compared to a normal distribution. This method returns the **excess kurtosis** using the sample formula, which is the same as Excel's `KURT()` and Python's `scipy.stats.kurtosis(bias=False)`.

A normal distribution has excess kurtosis of 0. Positive values (leptokurtic) indicate heavier tails and more outliers. Negative values (platykurtic) indicate lighter tails and fewer outliers.

Requires at least 4 data points.

```
use HiFolks\Statistics\Stat;
$kurtosis = Stat::kurtosis([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
// negative (platykurtic, lighter tails than normal)

$kurtosis = Stat::kurtosis([1, 2, 2, 2, 2, 2, 2, 2, 2, 50]);
// positive (leptokurtic, heavier tails due to outlier)
```

#### Stat::coefficientOfVariation( array $data, ?int $round = null, bool $population = false )

[](#statcoefficientofvariation-array-data-int-round--null-bool-population--false-)

The coefficient of variation (CV) is the ratio of the standard deviation to the mean, expressed as a percentage. It measures relative variability and is useful for comparing dispersion across datasets with different units or scales.

By default it uses the sample standard deviation. Pass `population: true` to use the population standard deviation instead.

Requires at least 2 data points (sample) or 1 (population). Throws if the mean is zero.

```
use HiFolks\Statistics\Stat;
$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50]);
// ~52.70 (sample)

$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], round: 2);
// 52.7

$cv = Stat::coefficientOfVariation([10, 20, 30, 40, 50], population: true);
// ~47.14 (population)
```

#### Stat::zscores( array $data, ?int $round = null )

[](#statzscores-array-data-int-round--null-)

Return the z-score for each value in the dataset. A z-score indicates how many standard deviations a value is from the mean. Z-scores are useful for standardizing data, comparing values from different distributions, and identifying outliers.

The z-scores of any dataset always sum to zero, and values beyond ±2 or ±3 are typically considered unusual or outliers.

Requires at least 2 data points and non-zero standard deviation.

```
use HiFolks\Statistics\Stat;
$zscores = Stat::zscores([2, 4, 4, 4, 5, 5, 7, 9]);
// array of z-scores, one per value

$zscores = Stat::zscores([2, 4, 4, 4, 5, 5, 7, 9], 2);
// z-scores rounded to 2 decimal places
```

#### Stat::outliers( array $data, float $threshold = 3.0 )

[](#statoutliers-array-data-float-threshold--30-)

Return values from the dataset that are outliers based on z-score threshold. A value is considered an outlier if its absolute z-score exceeds the threshold.

The default threshold of 3.0 is a widely used convention — in a normal distribution, about 99.7% of values fall within 3 standard deviations of the mean, so values beyond that are rare. Use a lower threshold (e.g. 2.0) for stricter detection, or a higher one for more lenient filtering.

```
use HiFolks\Statistics\Stat;
$outliers = Stat::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 100]);
// [100]

$outliers = Stat::outliers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1.0);
// values more than 1 stdev from the mean
```

#### Stat::iqrOutliers( array $data, float $factor = 1.5 )

[](#statiqroutliers-array-data-float-factor--15-)

Return values that are outliers based on the Interquartile Range (IQR) method. A value is an outlier if it falls below `Q1 - factor * IQR` or above `Q3 + factor * IQR`. This is the same method used for box plot whiskers.

Unlike z-score based detection, the IQR method is **robust** — it does not assume a normal distribution and is not influenced by extreme values themselves. This makes it the preferred choice for skewed data or when the dataset may already contain outliers that would distort the mean and standard deviation.

Use `factor: 1.5` (default) for mild outliers, or `factor: 3.0` for extreme outliers only.

**Example: Ski downhill race times**

In a ski downhill race, most athletes finish between 108–116 seconds. A time of 200s (e.g. a crash/DNF) or 50s (e.g. a timing error) would be flagged as outliers:

```
use HiFolks\Statistics\Stat;
$times = [110.2, 112.5, 108.9, 115.3, 111.7, 114.0, 109.8, 113.6, 200.0, 50.0];
$outliers = Stat::iqrOutliers($times);
// [200.0, 50.0] — the crash and the timing error are detected

$extremeOnly = Stat::iqrOutliers($times, 3.0);
// only the most extreme values
```

#### Stat::covariance ( array $x , array $y )

[](#statcovariance--array-x--array-y-)

Covariance, static method, returns the sample covariance of two inputs *$x* and *$y*. Covariance is a measure of the joint variability of two inputs.

```
$covariance = Stat::covariance(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 2, 3, 1, 2, 3, 1, 2, 3]
);
// 0.75
```

```
$covariance = Stat::covariance(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -7.5
```

#### Stat::correlation ( array $x , array $y, string $method = ‘linear’ )

[](#statcorrelation--array-x--array-y-string-method--linear-)

Return the Pearson’s correlation coefficient for two inputs. Pearson’s correlation coefficient r takes values between -1 and +1. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.

Use `$method = ‘ranked’` for Spearman’s rank correlation, which measures monotonic relationships (not just linear). Spearman’s correlation is computed by applying Pearson’s formula to the ranks of the data.

```
$correlation = Stat::correlation(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
);
// 1.0
```

```
$correlation = Stat::correlation(
    [1, 2, 3, 4, 5, 6, 7, 8, 9],
    [9, 8, 7, 6, 5, 4, 3, 2, 1]
);
// -1.0
```

Spearman’s rank correlation (non-linear but monotonic relationship):

```
$correlation = Stat::correlation(
    [1, 2, 3, 4, 5],
    [1, 4, 9, 16, 25],
    ‘ranked’
);
// 1.0
```

#### Stat::linearRegression ( array $x , array $y , bool $proportional = false )

[](#statlinearregression--array-x--array-y--bool-proportional--false-)

Return the slope and intercept of simple linear regression parameters estimated using ordinary least squares. Simple linear regression describes the relationship between an independent variable *$x* and a dependent variable *$y* in terms of a linear function.

```
$years = [1971, 1975, 1979, 1982, 1983];
$films_total = [1, 2, 3, 4, 5]
list($slope, $intercept) = Stat::linearRegression(
    $years,
    $films_total
);
// 0.31
// -610.18
```

What happens in 2022, according to the samples above?

```
round($slope * 2022 + $intercept);
// 17.0
```

When `proportional` is `true`, the regression line is forced through the origin (intercept = 0). This is useful when the relationship between *$x* and *$y* is known to be proportional:

```
list($slope, $intercept) = Stat::linearRegression(
    [1, 2, 3, 4, 5],
    [2, 4, 6, 8, 10],
    proportional: true,
);
// $slope = 2.0
// $intercept = 0.0
```

#### Stat::logarithmicRegression( array $x, array $y )

[](#statlogarithmicregression-array-x-array-y-)

Fit a logarithmic model **y = a × ln(x) + b**. Returns `[a, b]`.

This model naturally captures diminishing returns — fast initial change that gradually flattens. It is useful for data where early gains are large but improvement slows over time, such as athletic performance trends, learning curves, or market saturation.

All x values must be positive (you cannot take the logarithm of zero or negative numbers).

Internally, this transforms x to ln(x) and applies linear regression, so it leverages the same robust ordinary least squares implementation.

```
use HiFolks\Statistics\Stat;

// Simulated weekly running paces (seconds/km) — diminishing improvement
$weeks = [1, 2, 3, 4, 5, 6, 7, 8];
$paces = [350, 342, 337, 333, 330, 328, 326, 325];

[$a, $b] = Stat::logarithmicRegression($weeks, $paces);
// $a = -12.33 (pace drops by 12.33 sec per unit of ln(week))
// $b = 350.2

// Predict pace at week 12:
$predicted = $a * log(12) + $b;
// ~320 seconds = 5:20/km
```

Compare with linear regression to see which fits better:

```
// R² for logarithmic model (transform x first)
$logWeeks = array_map(fn($v) => log($v), $weeks);
$r2Log = Stat::rSquared($logWeeks, $paces);
// 0.9987

// R² for linear model
$r2Linear = Stat::rSquared($weeks, $paces);
// 0.9176

// Logarithmic wins — the data has diminishing returns
```

#### Stat::powerRegression( array $x, array $y )

[](#statpowerregression-array-x-array-y-)

Fit a power model **y = a × x^b**. Returns `[a, b]`.

Power regression is useful for data following power law relationships (e.g., scaling laws, allometric relationships). Both x and y values must be positive.

Internally, this linearizes as ln(y) = ln(a) + b × ln(x) and applies linear regression.

```
use HiFolks\Statistics\Stat;

// Data following y = 3 * x^2
$x = [1, 2, 3, 4, 5];
$y = [3, 12, 27, 48, 75];

[$a, $b] = Stat::powerRegression($x, $y);
// $a = 3.0
// $b = 2.0 (the exponent)
```

#### Stat::exponentialRegression( array $x, array $y )

[](#statexponentialregression-array-x-array-y-)

Fit an exponential model **y = a × e^(b×x)**. Returns `[a, b]`.

Exponential regression is useful for data with exponential growth (positive b) or decay (negative b), such as population growth, compound interest, or radioactive decay. All y values must be positive.

Internally, this linearizes as ln(y) = ln(a) + b × x and applies linear regression.

```
use HiFolks\Statistics\Stat;

// Data following y = 2 * e^(0.5*x)
$x = [1, 2, 3, 4, 5];
$y = [3.30, 5.44, 8.96, 14.78, 24.36];

[$a, $b] = Stat::exponentialRegression($x, $y);
// $a ≈ 2.0
// $b ≈ 0.5
```

#### Stat::rSquared( array $x, array $y, bool $proportional = false, ?int $round = null )

[](#statrsquared-array-x-array-y-bool-proportional--false-int-round--null-)

Return the coefficient of determination (R²) — the proportion of variance in the dependent variable explained by the linear regression model. Values range from 0 (no explanatory power) to 1 (perfect fit).

Requires at least 2 data points and arrays of the same length.

```
use HiFolks\Statistics\Stat;
$r2 = Stat::rSquared([1, 2, 3, 4, 5], [2, 4, 6, 8, 10]);
// 1.0 (perfect linear relationship)

$r2 = Stat::rSquared(
    [1971, 1975, 1979, 1982, 1983],
    [1, 2, 3, 4, 5],
    round: 2,
);
// 0.96
```

With proportional regression (through the origin):

```
$r2 = Stat::rSquared(
    [1, 2, 3, 4, 5],
    [2, 4, 6, 8, 10],
    proportional: true,
);
// 1.0
```

To compute R² for non-linear models, transform the data the same way the regression method does:

```
// R² for logarithmic regression
$logX = array_map(fn($v) => log($v), $x);
$r2 = Stat::rSquared($logX, $y);

// R² for power regression
$logX = array_map(fn($v) => log($v), $x);
$logY = array_map(fn($v) => log($v), $y);
$r2 = Stat::rSquared($logX, $logY);

// R² for exponential regression
$logY = array_map(fn($v) => log($v), $y);
$r2 = Stat::rSquared($x, $logY);
```

#### Stat::confidenceInterval( array $data, float $confidenceLevel = 0.95, ?int $round = null )

[](#statconfidenceinterval-array-data-float-confidencelevel--095-int-round--null-)

Return the confidence interval for the mean using the normal (z) distribution.

Computes: `mean ± z * (stdev / √n)`, where the z-critical value is derived from the inverse normal CDF.

Requires at least 2 data points. The confidence level must be between 0 and 1 exclusive.

```
use HiFolks\Statistics\Stat;
[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9]);
// 95% CI: [3.52, 6.48] (approximately)

[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9], confidenceLevel: 0.99);
// 99% CI: wider interval

[$lower, $upper] = Stat::confidenceInterval([2, 4, 4, 4, 5, 5, 7, 9], round: 2);
// [3.52, 6.48]
```

#### Stat::zTest( array $data, float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null )

[](#statztest-array-data-float-populationmean-alternative-alternative--alternativetwosided-int-round--null-)

Perform a one-sample Z-test for the mean. Tests whether the sample mean differs significantly from a hypothesized population mean using the normal distribution.

Returns an associative array with `zScore` and `pValue`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.

Requires at least 2 data points.

```
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;

$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0);
// ['zScore' => 2.6457..., 'pValue' => 0.0081...]

$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, alternative: Alternative::Greater);
// one-tailed test: is the sample mean greater than 3?

$result = Stat::zTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
// ['zScore' => 2.6458, 'pValue' => 0.0081]
```

#### Stat::tTest( array $data, float $populationMean, Alternative $alternative = Alternative::TwoSided, ?int $round = null )

[](#statttest-array-data-float-populationmean-alternative-alternative--alternativetwosided-int-round--null-)

Perform a one-sample t-test for the mean. Tests whether the sample mean differs significantly from a hypothesized population mean using the Student's t-distribution. Unlike the z-test, the t-test is appropriate for small samples where the population standard deviation is unknown.

Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.

Requires at least 2 data points.

```
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;

$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0);
// ['tStatistic' => 2.6457..., 'pValue' => 0.0331..., 'degreesOfFreedom' => 7]

$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, alternative: Alternative::Greater);
// one-tailed test: is the sample mean greater than 3?

$result = Stat::tTest([2, 4, 4, 4, 5, 5, 7, 9], populationMean: 3.0, round: 4);
// ['tStatistic' => 2.6458, 'pValue' => 0.0331, 'degreesOfFreedom' => 7]
```

#### Stat::tTestTwoSample( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )

[](#statttesttwosample-array-data1-array-data2-alternative-alternative--alternativetwosided-int-round--null-)

Perform a two-sample independent t-test (Welch's t-test). Compares the means of two independent groups without assuming equal variances. Uses the Welch–Satterthwaite approximation for degrees of freedom.

Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. The alternative hypothesis can be `TwoSided` (default), `Greater`, or `Less`.

Requires at least 2 data points in each sample.

```
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;

// Compare two groups
$group1 = [30.02, 29.99, 30.11, 29.97, 30.01, 29.99];
$group2 = [29.89, 29.93, 29.72, 29.98, 30.02, 29.98];
$result = Stat::tTestTwoSample($group1, $group2);
// ['tStatistic' => 1.6245..., 'pValue' => 0.1444..., 'degreesOfFreedom' => 6.84...]

// One-tailed test: is group1 mean greater than group2 mean?
$result = Stat::tTestTwoSample($group1, $group2, alternative: Alternative::Greater);

// Groups can have different sizes
$result = Stat::tTestTwoSample([1, 2, 3, 4, 5, 6, 7, 8], [3, 4, 5], round: 4);
```

#### Stat::tTestPaired( array $data1, array $data2, Alternative $alternative = Alternative::TwoSided, ?int $round = null )

[](#statttestpaired-array-data1-array-data2-alternative-alternative--alternativetwosided-int-round--null-)

Perform a paired t-test. Tests whether the mean difference between paired observations (e.g. before/after measurements on the same subjects) is significantly different from zero.

Returns an associative array with `tStatistic`, `pValue`, and `degreesOfFreedom`. Both arrays must have the same length.

Requires at least 2 paired observations.

```
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\Alternative;

// Before and after treatment measurements
$before = [200, 190, 210, 220, 215, 205, 195, 225];
$after  = [192, 186, 198, 212, 208, 198, 188, 215];
$result = Stat::tTestPaired($before, $after);
// ['tStatistic' => 5.715..., 'pValue' => 0.0007..., 'degreesOfFreedom' => 7]

// One-tailed: did the treatment decrease the values?
$result = Stat::tTestPaired($before, $after, alternative: Alternative::Greater);

$result = Stat::tTestPaired($before, $after, round: 4);
```

#### Stat::kde ( array $data , float $h , KdeKernel $kernel = KdeKernel::Normal , bool $cumulative = false )

[](#statkde--array-data--float-h--kdekernel-kernel--kdekernelnormal--bool-cumulative--false-)

Create a continuous probability density function (or cumulative distribution function) from discrete sample data using Kernel Density Estimation. Returns a `Closure` that can be called with any point to estimate the density (or CDF value).

Supported kernels: `KdeKernel::Normal` (alias `KdeKernel::Gauss`), `KdeKernel::Logistic`, `KdeKernel::Sigmoid`, `KdeKernel::Rectangular` (alias `KdeKernel::Uniform`), `KdeKernel::Triangular`, `KdeKernel::Parabolic` (alias `KdeKernel::Epanechnikov`), `KdeKernel::Quartic` (alias `KdeKernel::Biweight`), `KdeKernel::Triweight`, `KdeKernel::Cosine`.

```
use HiFolks\Statistics\Stat;
use HiFolks\Statistics\Enums\KdeKernel;

$data = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2];
$f = Stat::kde($data, h: 1.5);
$f(2.5);
// estimated density at x = 2.5
```

Using a different kernel:

```
$f = Stat::kde($data, h: 1.5, kernel: KdeKernel::Triangular);
$f(2.5);
```

Cumulative distribution function:

```
$F = Stat::kde($data, h: 1.5, cumulative: true);
$F(2.5);
// estimated CDF at x = 2.5 (probability that a value is  3
    [🍉] => 5
    [🍌] => 1
)

```

#### Freq::relativeFrequencies( array $data )

[](#freqrelativefrequencies-array-data-)

You can retrieve the frequency table in relative format (percentage):

```
$freqTable = Freq::relativeFrequencies($fruits, 2);
print_r($freqTable);
```

You can see the frequency table as an array with percentage of the occurrences:

```
Array
(
    [🍈] => 33.33
    [🍉] => 55.56
    [🍌] => 11.11
)

```

#### Freq::frequencyTableBySize( array $data , $size)

[](#freqfrequencytablebysize-array-data--size)

If you want to create a frequency table based on class (ranges of values) you can use frequencyTableBySize. The first parameter is the array, and the second one is the size of classes.

Calculate the frequency table with classes. Each group size is 4

```
$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
    13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTableBySize($data, 4);
print_r($result);
/*
Array
(
    [1] => 5
    [5] => 8
    [9] => 11
    [13] => 9
    [17] => 5
)
 */
```

#### Freq::frequencyTable()

[](#freqfrequencytable)

If you want to create a frequency table based on class (ranges of values) you can use frequencyTable. The first parameter is the array, and the second one is the number of classes.

Calculate the frequency table with 5 classes.

```
$data = [1,1,1,4,4,5,5,5,6,7,8,8,8,9,9,9,9,9,9,10,10,11,12,12,
    13,14,14,15,15,16,16,16,16,17,17,17,18,18, ];
$result = \HiFolks\Statistics\Freq::frequencyTable($data, 5);
print_r($result);
/*
Array
(
    [1] => 5
    [5] => 8
    [9] => 11
    [13] => 9
    [17] => 5
)
 */
```

### Statistics class

[](#statistics-class)

The methods provided by the `Freq` and the `Stat` classes are mainly **static** methods. If you prefer to use an object instance for calculating statistics you can choose to use an instance of the `Statistics` class. So for calling the statistics methods, you can use your object instance of the `Statistics` class.

For example for calculating the mean, you can obtain the `Statistics` object via the `make()` static method, and then use the new object `$stat` like in the following example:

```
$stat = HiFolks\Statistics\Statistics::make(
    [3,5,4,7,5,2]
);
echo $stat->valuesToString(5) . PHP_EOL;
// 2,3,4,5,5
echo "Mean              : " . $stat->mean() . PHP_EOL;
// Mean              : 4.3333333333333
echo "Count             : " . $stat->count() . PHP_EOL;
// Count             : 6
echo "Median            : " . $stat->median() . PHP_EOL;
// Median            : 4.5
echo "First Quartile  : " . $stat->firstQuartile() . PHP_EOL;
// First Quartile  : 2.5
echo "Third Quartile : " . $stat->thirdQuartile() . PHP_EOL;
// Third Quartile : 5
echo "Mode              : " . $stat->mode() . PHP_EOL;
// Mode              : 5
```

#### Calculate Frequency Table

[](#calculate-frequency-table)

The `Statistics` packages have some methods for generating Frequency Table:

- `frequencies()`: a frequency is the number of times a value of the data occurs;
- `relativeFrequencies()`: a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
- `cumulativeFrequencies()`: is the accumulation of the previous relative frequencies;
- `cumulativeRelativeFrequencies()`: is the accumulation of the previous relative ratio.

```
use HiFolks\Statistics\Statistics;

$s = Statistics::make(
    [98, 90, 70,18,92,92,55,83,45,95,88,76]
);
$a = $s->frequencies();
print_r($a);
/*
Array
(
    [18] => 1
    [45] => 1
    [55] => 1
    [70] => 1
    [76] => 1
    [83] => 1
    [88] => 1
    [90] => 1
    [92] => 2
    [95] => 1
    [98] => 1
)
 */

$a = $s->relativeFrequencies();
print_r($a);
/*
Array
(
    [18] => 8.3333333333333
    [45] => 8.3333333333333
    [55] => 8.3333333333333
    [70] => 8.3333333333333
    [76] => 8.3333333333333
    [83] => 8.3333333333333
    [88] => 8.3333333333333
    [90] => 8.3333333333333
    [92] => 16.666666666667
    [95] => 8.3333333333333
    [98] => 8.3333333333333
)
 */
```

`NormalDist` class
------------------

[](#normaldist-class)

The `NormalDist` class provides an easy way to work with normal distributions in PHP. It allows you to calculate probabilities and densities for a given mean (μ\\muμ) and standard deviation (σ\\sigmaσ).

### Key features

[](#key-features)

- Define a normal distribution with mean (μ\\muμ) and standard deviation (σ\\sigmaσ).
- Calculate the **Probability Density Function (PDF)** to evaluate the relative likelihood of a value.
- Calculate the **Cumulative Distribution Function (CDF)** to determine the probability of a value or lower.
- Calculate the **Inverse Cumulative Distribution Function (inv\_cdf)** to find the value for a given probability.

---

### Class constructor

[](#class-constructor)

```
$normalDist = new NormalDist(float $mu = 0.0, float $sigma = 1.0);
```

- `$mu`: The mean (default = `0.0`).
- `$sigma`: The standard deviation (default = `1.0`).
- Throws an exception if `$sigma` is non-positive.

---

### Methods

[](#methods)

#### Properties: mean, sigma, and variance

[](#properties-mean-sigma-and-variance)

You can access the distribution parameters via getter methods:

```
$normalDist = new NormalDist(100, 15);
$normalDist->getMean();             // 100.0
$normalDist->getSigma();            // 15.0
$normalDist->getMedian();           // 100.0 (equals mean for normal dist)
$normalDist->getMode();             // 100.0 (equals mean for normal dist)
$normalDist->getVariance();         // 225.0 (sigma squared)
$normalDist->getVarianceRounded(2); // 225.0
```

From samples:

```
$normalDist = NormalDist::fromSamples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]);
$normalDist->getVarianceRounded(5); // 0.25767
```

---

#### Creating a normal distribution instance from sample data

[](#creating-a-normal-distribution-instance-from-sample-data)

The `fromSamples()` static method creates a normal distribution instance with mu and sigma parameters estimated from the sample data.

Example:

```
$samples = [2.5, 3.1, 2.1, 2.4, 2.7, 3.5];
$normalDist = NormalDist::fromSamples($samples);
$normalDist->getMeanRounded(5); // 2.71667
$normalDist->getSigmaRounded(5); // 0.50761
```

#### Generate random samples `samples($n, $seed)`

[](#generate-random-samples-samplesn-seed)

Generates `$n` random samples from the normal distribution using the Box-Muller transform. An optional `$seed` parameter allows reproducible results.

```
$normalDist = new NormalDist(100, 15);

// Generate 5 random samples
$samples = $normalDist->samples(5);
// e.g. [98.3, 112.7, 89.1, 105.4, 101.2]

// Reproducible results with a seed
$samples = $normalDist->samples(1000, seed: 42);
```

---

#### Z-score `zscore($x)`

[](#z-score-zscorex)

Computes the standard score describing `$x` in terms of the number of standard deviations above or below the mean: `(x - mu) / sigma`.

```
$normalDist = new NormalDist(100, 15);
echo $normalDist->zscore(130);          // 2.0 (two std devs above mean)
echo $normalDist->zscore(85);           // -1.0 (one std dev below mean)
echo $normalDist->zscoreRounded(114, 3); // 0.933
```

---

#### Probability Density Function `pdf($x)`

[](#probability-density-function-pdfx)

Calculates the **Probability Density Function** at a given value xxx:

```
$normalDist->pdf(float $x): float
```

- Input: the value `$x` at which to evaluate the PDF.
- Output: the relative likelihood of `$x` in the distribution.

Example:

```
$normalDist = new NormalDist(10.0, 2.0);
echo $normalDist->pdf(12.0); // Output: 0.12098536225957168
```

---

#### Cumulative Distribution Function `cdf($x)`

[](#cumulative-distribution-function-cdfx)

Calculates the **Cumulative Distribution Function** at a given value `$x`:

```
$normalDist->cdf(float $x): float
```

- Input: the value `$x` at which to evaluate the CDF.
- Output: the probability that a random variable `$x` is less than or equal to `$x`.

Example:

```
$normalDist = new NormalDist(10.0, 2.0);
echo $normalDist->cdf(12.0); // Output: 0.8413447460685429
```

Calculating both, CDF and PDF:

```
$normalDist = new NormalDist(10.0, 2.0);

// Calculate PDF at x = 12
$pdf = $normalDist->pdf(12.0);
echo "PDF at x = 12: $pdf\n"; // Output: 0.12098536225957168

// Calculate CDF at x = 12
$cdf = $normalDist->cdf(12.0);
echo "CDF at x = 12: $cdf\n"; // Output: 0.8413447460685429
```

---

#### Inverse Cumulative Distribution Function `invCdf($p)`

[](#inverse-cumulative-distribution-function-invcdfp)

Computes the **Inverse Cumulative Distribution Function** (also known as the quantile function or percent-point function). Given a probability `$p`, it finds the value `$x` such that `cdf($x) = $p`.

```
$normalDist->invCdf(float $p): float
```

- Input: a probability `$p` in the range (0, 1) exclusive.
- Output: the value `$x` where `cdf($x) = $p`.
- Throws an exception if `$p` is not in (0, 1).

Example:

```
$normalDist = new NormalDist(0.0, 1.0);

// Find the value at the 95th percentile of a standard normal distribution
echo $normalDist->invCdfRounded(0.95, 5); // Output: 1.64485

// The median of a standard normal distribution
echo $normalDist->invCdf(0.5); // Output: 0.0
```

The `invCdf()` method is useful for:

- **Confidence intervals**: find critical values for a given confidence level.
- **Hypothesis testing**: determine thresholds for statistical significance.
- **Percentile calculations**: find the value corresponding to a specific percentile.

Round-trip example with `cdf()`:

```
$normalDist = new NormalDist(100, 15);

// inv_cdf(0.5) equals the mean
echo $normalDist->invCdf(0.5); // Output: 100.0

// Round-trip: cdf(invCdf(p)) ≈ p
echo $normalDist->cdfRounded($normalDist->invCdf(0.25), 2); // Output: 0.25
```

---

#### Quantiles `quantiles($n)`

[](#quantiles-quantilesn)

Divides the normal distribution into `$n` continuous intervals with equal probability. Returns a list of `$n - 1` cut points separating the intervals. Set `$n` to 4 for quartiles (the default), `$n` to 10 for deciles, or `$n` to 100 for percentiles.

```
$normalDist = new NormalDist(0.0, 1.0);

// Quartiles (default)
$normalDist->quantiles();    // [-0.6745, 0.0, 0.6745]

// Deciles
$normalDist->quantiles(10);  // 9 cut points

// Percentiles
$normalDist->quantiles(100); // 99 cut points
```

---

#### Overlapping coefficient `overlap($other)`

[](#overlapping-coefficient-overlapother)

Computes the overlapping coefficient (OVL) between two normal distributions. Measures the agreement between two normal probability distributions. Returns a value between 0.0 and 1.0 giving the overlapping area in the two underlying probability density functions.

```
$n1 = new NormalDist(2.4, 1.6);
$n2 = new NormalDist(3.2, 2.0);
echo $n1->overlapRounded($n2, 4); // 0.8035

// Identical distributions overlap completely
$n3 = new NormalDist(0, 1);
echo $n3->overlap($n3); // 1.0
```

---

#### Combining a normal distribution via `add()` method

[](#combining-a-normal-distribution-via-add-method)

The `add()` method allows you to combine a NormalDist instance with either a constant or another NormalDist object. This operation supports mathematical transformations and the combination of distributions.

The use cases are:

- Shifting a distribution: add a constant to shift the mean, useful in translating data.
- Combining distributions: combine independent or jointly normally distributed variables, commonly used in statistics and probability.

```
$birth_weights = NormalDist::fromSamples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]);
$drug_effects = new NormalDist(0.4, 0.15);
$combined = $birth_weights->add($drug_effects);

$combined->getMeanRounded(1); // 3.1
$combined->getSigmaRounded(1); // 0.5

$birth_weights->getMeanRounded(5); // 2.71667
$birth_weights->getSigmaRounded(5); // 0.50761
```

#### Scaling a normal distribution by a costant via `multiply()` method

[](#scaling-a-normal-distribution-by-a-costant-via-multiply-method)

The `multiply()` method for NormalDist multiplies both the mean (mu) and standard deviation (sigma) by a constant. This method is useful for rescaling distributions, such as when changing measurement units. The standard deviation is scaled by the absolute value of the constant to ensure it remains non-negative.

The method does not modify the existing object but instead returns a new NormalDist instance with the updated values.

Use Cases:

- Rescaling distributions: useful when changing units (e.g., from meters to kilometers, or Celsius to Farenhait).
- Transforming data: apply proportional scaling to statistical data.

```
$tempFebruaryCelsius = new NormalDist(5, 2.5); # Celsius
$tempFebFahrenheit = $tempFebruaryCelsius->multiply(9 / 5)->add(32); # Fahrenheit
$tempFebFahrenheit->getMeanRounded(1); // 41.0
$tempFebFahrenheit->getSigmaRounded(1); // 4.5
```

#### Subtracting from a normal distribution via `subtract()` method

[](#subtracting-from-a-normal-distribution-via-subtract-method)

The `subtract()` method is the counterpart to `add()`. It subtracts a constant or another NormalDist instance from this distribution.

- A constant (float): shifts the mean down, leaving sigma unchanged.
- A NormalDist instance: subtracts the means and combines the variances.

```
$nd = new NormalDist(100, 15);
$shifted = $nd->subtract(32);
$shifted->getMean();  // 68.0
$shifted->getSigma(); // 15.0 (unchanged)
```

#### Dividing a normal distribution by a constant via `divide()` method

[](#dividing-a-normal-distribution-by-a-constant-via-divide-method)

The `divide()` method is the counterpart to `multiply()`. It divides both the mean (mu) and standard deviation (sigma) by a constant.

```
// Convert Fahrenheit back to Celsius: (F - 32) / (9/5)
$tempFahrenheit = new NormalDist(41, 4.5);
$tempCelsius = $tempFahrenheit->subtract(32)->divide(9 / 5);
$tempCelsius->getMeanRounded(1);  // 5.0
$tempCelsius->getSigmaRounded(1); // 2.5
```

---

### References for NormalDist

[](#references-for-normaldist)

This class is inspired by Python’s `statistics.NormalDist` and aims to provide similar functionality for PHP users. (Work in Progress)

`StudentT` class
----------------

[](#studentt-class)

The `StudentT` class represents the Student’s t-distribution, which is used for hypothesis testing and confidence intervals when the population standard deviation is unknown, especially with small sample sizes. As the degrees of freedom increase, the t-distribution approaches the standard normal distribution.

### Creating a StudentT instance

[](#creating-a-studentt-instance)

```
use HiFolks\Statistics\StudentT;

$t = new StudentT(df: 10); // 10 degrees of freedom
```

### Probability Density Function (PDF)

[](#probability-density-function-pdf)

```
$t = new StudentT(5);
$t->pdf(0);        // ≈ 0.37961 (peak of the distribution)
$t->pdf(2.0);      // density at t=2
$t->pdfRounded(0); // 0.38
```

### Cumulative Distribution Function (CDF)

[](#cumulative-distribution-function-cdf)

```
$t = new StudentT(5);
$t->cdf(0);    // 0.5 (symmetric around zero)
$t->cdf(2.0);  // ≈ 0.94874
$t->cdfRounded(2.0); // 0.949
```

### Inverse CDF (Quantile Function)

[](#inverse-cdf-quantile-function)

```
$t = new StudentT(10);
$t->invCdf(0.975);  // ≈ 2.228 (critical value for 95% two-sided test)
$t->invCdf(0.5);    // 0.0 (median)
$t->invCdfRounded(0.975, 3); // 2.228
```

StreamingStat (Experimental)
----------------------------

[](#streamingstat-experimental)

> **Note**: `StreamingStat` is experimental in version 1.x. It will be released as stable in version 2. If you want to provide feedback, we are happy to hear from you — please open an issue at .

`StreamingStat` computes descriptive statistics in a single pass with O(1) memory, ideal for large datasets or generator-based streams.

```
use HiFolks\Statistics\StreamingStat;

$s = new StreamingStat();
$s->add(1)->add(2)->add(3)->add(4)->add(5);

$s->count();     // 5
$s->sum();       // 15.0
$s->min();       // 1.0
$s->max();       // 5.0
$s->mean();      // 3.0
$s->variance();  // 2.5
$s->stdev();     // 1.5811...
$s->skewness();  // 0.0
$s->kurtosis();  // -1.2
```

MethodDescriptionMin n`count()`Number of values added0`sum()`Sum of all values1`min()`Minimum value1`max()`Maximum value1`mean(?int $round = null)`Arithmetic mean1`variance(?int $round = null)`Sample variance2`pvariance(?int $round = null)`Population variance1`stdev(?int $round = null)`Sample standard deviation2`pstdev(?int $round = null)`Population standard deviation1`skewness(?int $round = null)`Sample skewness (adjusted Fisher-Pearson)3`pskewness(?int $round = null)`Population skewness3`kurtosis(?int $round = null)`Excess kurtosis (sample)4All methods throw `InvalidDataInputException` when insufficient data is available.

Utility classes
---------------

[](#utility-classes)

The package includes utility classes under `HiFolks\Statistics\Utils` for common array and formatting operations.

### `Arr` — array helpers

[](#arr--array-helpers)

```
use HiFolks\Statistics\Utils\Arr;
```

#### Arr::extract( array $data, array $columns )

[](#arrextract-array-data-array-columns-)

Extract one or more columns from an array of associative arrays. Returns one array per requested column.

```
$runners = [
    ['name' => 'Alice', 'age' => 30, 'score' => 95],
    ['name' => 'Bob',   'age' => 25, 'score' => 87],
];

[$ages, $scores] = Arr::extract($runners, ['age', 'score']);
// $ages = [30, 25], $scores = [95, 87]
```

#### Arr::partition( array $data, string $field, string $operator, mixed $value )

[](#arrpartition-array-data-string-field-string-operator-mixed-value-)

Split an array of associative arrays into `[$matching, $nonMatching]` groups based on a condition. Supported operators: `==`, `!=`, `>`, `=`, ` 1, 'minutes' => 20, 'seconds' => 45]
```

#### Format::hmsToSeconds( int $hours, int $minutes, int $seconds )

[](#formathmstoseconds-int-hours-int-minutes-int-seconds-)

Convert hours, minutes, and seconds to total seconds.

```
Format::hmsToSeconds(1, 20, 45);  // 4845
```

Testing
-------

[](#testing)

```
composer run test           Runs the test script
composer run test-coverage  Runs the test-coverage script
composer run format         Runs the format script
composer run static-code    Runs the static-code script
composer run all-check      Runs the all-check script
```

Changelog
---------

[](#changelog)

Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.

Contributing
------------

[](#contributing)

Please see [CONTRIBUTING](.github/CONTRIBUTING.md) for details.

Security Vulnerabilities
------------------------

[](#security-vulnerabilities)

Please review [our security policy](../../security/policy) on how to report security vulnerabilities.

Credits
-------

[](#credits)

- [Roberto B.](https://github.com/roberto-butti)
- [All Contributors](../../contributors)

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

64

—

FairBetter than 99% of packages

Maintenance86

Actively maintained with recent releases

Popularity52

Moderate usage in the ecosystem

Community22

Small or concentrated contributor base

Maturity79

Established project with proven stability

 Bus Factor1

Top contributor holds 76.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~56 days

Recently: every ~3 days

Total

28

Last Release

72d ago

Major Versions

v0.2.1 → v1.0.02023-12-26

PHP version history (5 changes)v0.1.0PHP ^8.0

v1.0.0PHP ^8.1

v1.0.1PHP ^8.1|^8.2|^8.3|^8.4

v1.2.0PHP ^8.2|^8.3|^8.4

v1.2.1PHP ^8.2|^8.3|^8.4|8.5

### Community

Maintainers

![](https://www.gravatar.com/avatar/fe3f0e9f35a03ea055996e023bc25cfe408742fe9433f66858b714133da55313?d=identicon)[roberto](/maintainers/roberto)

---

Top Contributors

[![roberto-butti](https://avatars.githubusercontent.com/u/678434?v=4)](https://github.com/roberto-butti "roberto-butti (254 commits)")[![trokhymchuk](https://avatars.githubusercontent.com/u/66204814?v=4)](https://github.com/trokhymchuk "trokhymchuk (21 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (19 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (15 commits)")[![tvermaashutosh](https://avatars.githubusercontent.com/u/96707067?v=4)](https://github.com/tvermaashutosh "tvermaashutosh (7 commits)")[![AmooAti](https://avatars.githubusercontent.com/u/36756531?v=4)](https://github.com/AmooAti "AmooAti (5 commits)")[![Arcturus22](https://avatars.githubusercontent.com/u/99889376?v=4)](https://github.com/Arcturus22 "Arcturus22 (2 commits)")[![Aryan4884](https://avatars.githubusercontent.com/u/116114086?v=4)](https://github.com/Aryan4884 "Aryan4884 (2 commits)")[![sukuasoft](https://avatars.githubusercontent.com/u/97813540?v=4)](https://github.com/sukuasoft "sukuasoft (2 commits)")[![Abhishekgupta204](https://avatars.githubusercontent.com/u/116148980?v=4)](https://github.com/Abhishekgupta204 "Abhishekgupta204 (2 commits)")[![shraddha761](https://avatars.githubusercontent.com/u/106100728?v=4)](https://github.com/shraddha761 "shraddha761 (2 commits)")[![HimanshuMahto](https://avatars.githubusercontent.com/u/93067059?v=4)](https://github.com/HimanshuMahto "HimanshuMahto (1 commits)")[![Abhineshhh](https://avatars.githubusercontent.com/u/142514166?v=4)](https://github.com/Abhineshhh "Abhineshhh (1 commits)")

---

Tags

hacktoberfestmathphpstatisticsstatisticshi-folks

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan, Rector

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/hi-folks-statistics/health.svg)

```
[![Health](https://phpackages.com/badges/hi-folks-statistics/health.svg)](https://phpackages.com/packages/hi-folks-statistics)
```

###  Alternatives

[markrogoyski/math-php

Math Library for PHP. Features descriptive statistics and regressions; Continuous and discrete probability distributions; Linear algebra with matrices and vectors, Numerical analysis; special mathematical functions; Algebra

2.4k7.1M40](/packages/markrogoyski-math-php)[wnx/laravel-stats

Get insights about your Laravel Project

1.8k1.8M7](/packages/wnx-laravel-stats)[rubix/tensor

A library and extension that provides objects for scientific computing in PHP.

2751.4M5](/packages/rubix-tensor)[oefenweb/statistics

Statistics library for PHP

33400.7k1](/packages/oefenweb-statistics)[richjenks/stats

Statistics library for non-statistical people

23149.0k1](/packages/richjenks-stats)[hi-folks/data-block

Data class for managing nested arrays and JSON data.

1472.2k](/packages/hi-folks-data-block)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
