PHPackages                             zeeml/dataset - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. zeeml/dataset

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

zeeml/dataset
=============

A multi-purpose DataSet for Machine Learning algorithms training

01511PHP

Since Jul 17Pushed 8y ago1 watchersCompare

[ Source](https://github.com/Zeeml/Dataset)[ Packagist](https://packagist.org/packages/zeeml/dataset)[ RSS](/packages/zeeml-dataset/feed)WikiDiscussions master Synced today

READMEChangelogDependenciesVersions (2)Used By (0)

[![build](https://camo.githubusercontent.com/a5ddf9e3972d945bfa9b475bfe48dcffde03bceeaa969d747a04002180639cd3/68747470733a2f2f7472617669732d63692e6f72672f5a65656d6c2f446174617365742e7376673f6272616e63683d6d6173746572)](https://camo.githubusercontent.com/a5ddf9e3972d945bfa9b475bfe48dcffde03bceeaa969d747a04002180639cd3/68747470733a2f2f7472617669732d63692e6f72672f5a65656d6c2f446174617365742e7376673f6272616e63683d6d6173746572)

Dataset
=======

[](#dataset)

A multi-purpose dataSet for Machine Learning algorithms training.

Create a DataSet
----------------

[](#create-a-dataset)

to create a DataSet to use for Zeeml Machine Learning, you need to specify a source : either a csv file or an array

### Create a dataSet from a csv file

[](#create-a-dataset-from-a-csv-file)

```
$dataSet =  DataSetFactory::create('/path/to/csv', ['name', 'Gender'], ['Height]);

```

The keys set in the header (first row of the CSV file) are used as keys for the dataSet

### Create a dataSet from an Array

[](#create-a-dataset-from-an-array)

```
$dataSet =  DataSetFactory::create(
    [
        ['name' => 'Zac',    'gender' => 'Male',    'height' => 180],
        ['name' => 'Emily',  'gender' => 'Female',  'height' => 177],
        ['name' => 'Edward', 'gender' => 'Male',    'height' => 175],
        ['name' => 'Mark',   'gender' => 'Male',    'height' => 183],
        ['name' => 'Lesly',  'gender' => 'Female',  'height' => 170],
    ]
);

```

Any other array format will throw an exception

Specify inputs and outputs
==========================

[](#specify-inputs-and-outputs)

**The prepare method must be called prior to any other call or an exception will be thrown.**

```
$mapper = new Mapper(['name', 'gendre'], ['height']);
$dataSet->prepare($mapper);

```

where **\['name', 'gendre'\]** are the indexes to use as inputs and **\['height'\]** are the indexes to use as outputs.

There is no limit to the number of inputs and outputs to pick from the entry

**If a key does not exist it will throw an exception.**

Manipulating the dataSet
========================

[](#manipulating-the-dataset)

In order to manipulate and change the values of the dataSet (cleaning, renaming ...) you can apply a "Policy".

A Policy is called when creating the Mapper. Each column can define multiple Policies :

```
$dataSet = DataSetFactory::create(
      [
          [180, 'Male'],
          [177, 'Female'],
          [170, ''],
          [183, 'Male'],
      ]
);
$mapper = new Mapper(
    [
        0 => [Policy::replaceWithAvg(), Policy::rename('height')],
    ],
    [
        1 => [Policy::skip()]
    ]
);
$dataSet->prepare($mapper);

```

\###Supported policies :

- **Policy::skip()** : If the value at the corresponding index is empty (NULL, false, '') the whole row will be skipped

    Example :

    ```
    $data = [
        [1, 2, 3],
        [4, null, 5],
        [6, 7, null],
        [null, 8, 9],
    ];

    $dataSet =  DataSetFactory::create($data);
    $mapper = new Mapper([0, 1 => Policy::skip()], [2 => Policy::skip()]);
    $dataSet->prepare($mapper);

    will use the following Inputs/Outputs :

    Inputs:
    [
        [1, 2],
        [null, 8], //No policy applied on 0
    ]

    Outputs:
    [
        [3],
        [9],
    ]

    ```
- **Policy::replaceWith()** : If the value at the corresponding index is empty (NULL, false, '') it will be replaced with the given value

    Example :

    ```
    $data = [
        [1, 2, 3],
        [4, null, 5],
        [6, 7, null],
        [null, 8, 9],
    ];

    $dataSet =  DataSetFactory::create($data);
    $mapper = new Mapper([0, 1 => Policy::replaceWith('Unknown')], [2 => Policy::replaceWith(-1)]);
    $dataSet->prepare($mapper);

    will use the following Inputs/Outputs :

    Inputs:
    [
        [1, 2],
        [4, 'Unknown'],
        [6, 7],
        [null, 8], //No policy applied on 0
    ]

    Outputs:
    [
        [3],
        [5],
        [-1],
        [9]
    ]

    ```
- **Policy::replaceWithAvg()** : The empty values will be replaced with the average value of that column calculated from the original DataSet.

    Example :

    ```
    $data = [
        [1, 2, 3],
        [4, null, 5],
        [6, 7, null],
        [null, 8, 9],
    ];

    $dataSet =  DataSetFactory::create($data);
    $mapper = new Mapper([0 => Policy::replaceWithAvg(), 1 => Policy::skip()], [2 => Policy::replaceWithAvg()]);
    $dataSet->prepare($mapper);

    will use the following Inputs/Outputs :

    Inputs:
    [
        [1, 2],
        [6, 7],
        [2.75, 8], // Avg(0) = 1 + 4 + 6 + 0 = 11 / 4 = 2.75
    ]

    Outputs:
    [
        [3],
        [-1],
        [9],
    ]
                                                                    ]

    ```
- **Policy::replaceWithMostCommon()** : The empty values will be replaced with the most common value (the value that occurs the most) If multiple values have the same frequency, one is taken randomly.

    Example :

    ```
    $data = [
        [1, 2, 3],
        [1, null, 5],
        [6, 7, null],
        [null, 8, 9],
    ];

    $dataSet =  DataSetFactory::create($data);
    $mapper = new Mapper([0=> Policy::replaceWithMostCommon(), 1 => Policy::skip()], [2]);
    $dataSet->prepare($mapper);

    will use the following Inputs/Outputs :

    Inputs:
    [
        [1, 2],
        [6, 7],
        [1, 8],
    ]

    Outputs:
    [
        [3],
        [null],
        [9],
    ]

    ```
- **Policy::custom()** : create your own Policy

    the callable function is only called when the value is empty. The callable must :

    - Take in a first parameter by reference which corresponds to the value of the column upon each iteration
    - Take in a second parameter which corresponds to the line
    - Return true to keep the line, false to skip it

    Example :

    ```
    $data = [
        [180, 'Male'],
        [177, 'Female'],
        [170, ''],
        [183, 'Male'],
    ];

    $dataSet =  DataSetFactory::create($data);

    $genderCleaner = function(&$value, $line) {
        if ($line[0] > 175) {
            $value = 'Male' ;
        } else {
            $value = 'Female';
        }

        return true;
    }

    $mapper = new Mapper([0], [1 => Policy::custom($genderCleaner)]);
    $dataSet->prepare($mapper);

    will use the following Inputs/Outputs :

    Inputs:
    [
        [180],
        [177],
        [170],
        [183],
    ]

    Outputs:
    [
        ['Male'],
        ['Female'],
        ['Female'],
        ['Male'],
    ]

    ```

Renaming keys of dataSet
------------------------

[](#renaming-keys-of-dataset)

You can rename the dataSet keys :

```
$data = [
    ['Zac',    'Male',    180],
    ['Emily',  'Female',  177],
    ['Edward', 'Male',    175],
    ['Mark',   'Male',    183],
    ['Lesly',  'Female',  170],
];

$dataSet =  DataSetFactory::create($data);

$mapper = new Mapper([0, 1], [2]);
$dataSet->prepare($mapper);

$dataSet->rename([0 => 'Name', 1 => 'Gender', 2 => 'Height']);

and the inputs/outputs matrices used are :

Inputs :
[
    ['Name' => 'Zac',    'Gender' => 'Male'],
    ['Name' => 'Emily',  'Gender' => 'Female'],
    ['Name' => 'Edward', 'Gender' => 'Male'],
    ['Name' => 'Mark',   'Gender' => 'Male'],
    ['Name' => 'Lesly',  'Gender' => 'Female'],
]

Outputs :
[
    ['Height' => 180],
    ['Height' => 177],
    ['Height' => 175],
    ['Height' => 183],
    ['Height' => 170],
]

```

###  Health Score

23

—

LowBetter than 26% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity13

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 55.2% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/1242077?v=4)[Olivier Lépine](/maintainers/crapougnax)[@crapougnax](https://github.com/crapougnax)

---

Top Contributors

[![crapougnax](https://avatars.githubusercontent.com/u/1242077?v=4)](https://github.com/crapougnax "crapougnax (16 commits)")[![elkaadka](https://avatars.githubusercontent.com/u/24205608?v=4)](https://github.com/elkaadka "elkaadka (13 commits)")

### Embed Badge

![Health badge](/badges/zeeml-dataset/health.svg)

```
[![Health](https://phpackages.com/badges/zeeml-dataset/health.svg)](https://phpackages.com/packages/zeeml-dataset)
```

###  Alternatives

[luecano/numero-a-letras

Convierte un número a letras.

76277.4k2](/packages/luecano-numero-a-letras)[nuhel/filament-croppie

182.2k](/packages/nuhel-filament-croppie)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
