PHPackages                             imonroe/corpora - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. imonroe/corpora

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

imonroe/corpora
===============

A PHP-friendly version of the dariusk/corpora javascript library. It provides "\[a\] collection of small corpuses of interesting data for the creation of bots and similar stuff"

1.0.3(1y ago)110CC0-1.0PHPPHP ^7.0|^8.0

Since Jan 5Pushed 1y agoCompare

[ Source](https://github.com/imonroe/corpora)[ Packagist](https://packagist.org/packages/imonroe/corpora)[ Docs](https://github.com/imonroe/corpora)[ RSS](/packages/imonroe-corpora/feed)WikiDiscussions master Synced today

READMEChangelog (4)Dependencies (3)Versions (5)Used By (0)

`imonroe/corpora`
=================

[](#imonroecorpora)

This is a PHP- and Composer-friendly fork of the `darius/corpora` package, designed for easy use with PHP projects.

Check the files in the `/data` directory to find out all the things in the corpora. Each JSON file is an array with about a thousand examples of whatever you're asking for.

Installation
------------

[](#installation)

`composer require imonroe/corpora`

Usage
-----

[](#usage)

```
use imonroe\corpora\Corpora;

$corpora = new Corpora;

\\ Returns an array of available categories

$categories = $corpora->getCategories();

// Returns an array of subcategories

$subcategories = $corpora->getCategories('architecture');

// Return just the description of a given data file, if one is available.

$description = $corpora->getDescription('words.nouns');
// $description == "A list of English nouns."

// Returns an array of data from the corpora.
// Specify the file you want in the form of "dirname.dirname.filename"
// Do not include the .json extension.
// Available files are included in the \data directory of this repo.

// for instance, if you wanted the contents of the ./data/words/nouns.json file, you'd
// request it like this:

$nouns = $corpora->getDataFile('words.nouns');

// If you want ./data/music/genres.json, you'd call it like:

$genres = $corpora->getDataFile('music.genres');

// If you want ./data/societies_and_groups/fraternities/fraternities.json,

$fraternities = $corpora->getDataFile('societies_and_groups.fraternities.fraternities');

// You can inspect any of these arrays in the usual way to find out what they contain.

```

Testing
-------

[](#testing)

`composer test`

Styling
-------

[](#styling)

`composer check-style`and `composer fix-style`

Original `darius/corpora` README:

Corpora
=======

[](#corpora)

This project is a collection of static corpora (plural of "corpus") that are potentially useful in the creation of weird internet stuff. I've found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I've been copy/pasting an `adjs.json` file from project to project. This is kind of awful, so I'm hoping that this project will at least help me keep everything in one place.

I would like this to help with rapid prototyping of projects. For example: you might use `nouns.json` to start with, just to see if an idea you had was any good. Once you've built the project quickly around the nouns collection, you can then rip it out and replace it with a more complex or exhaustive data source.

I'm also hoping that this can be used as a teaching tool: maybe someone has three hours to teach how to make Twitter bots. That doesn't give the student much time to find/scrape/clean/parse interesting data. My hope is that students can be pointed to this project and they can pick and choose different interesting data sources to meld together for the creation of prototypes.

License
-------

[](#license)

Since Corpora is more data than code, I have chosen to CC0 license this (rather than MIT license or similar).

[![](https://camo.githubusercontent.com/42a392c9ce97407913c025e81b79bb27bd1fc94e80bd88b73a6b2271219e13dc/687474703a2f2f692e6372656174697665636f6d6d6f6e732e6f72672f702f7a65726f2f312e302f38387833312e706e67)](http://creativecommons.org/publicdomain/zero/1.0/)

To the extent possible under law, [Darius Kazemi](http://tinysubversions.com) has waived all copyright and related or neighboring rights to Corpora. This work is published from: United States.

What is Corpora NOT?
--------------------

[](#what-is-corpora-not)

This project is not meant to replace exhaustive APIs -- if you want nouns, and you want every noun in the English language, replete with metadata, consider [Wordnik](http://developer.wordnik.com/docs). If you want the title of every Wikipedia article, use [the MediaWiki API](http://www.mediawiki.org/wiki/API:Main_page).

What is Corpora?
----------------

[](#what-is-corpora)

- Corpora is a repository of JSON files, meant to be language-neutral. If you want to create an NPM repo or whatever based on this, be my guest, but this repository will remain a collection of data files that can be interpreted by any language that can parse JSON.
- Corpora is a collection of *small* files. It is not meant to be an exhaustive source of anything: a list of resources should contain somewhere in the vicinity of 1000 items.
    - For example, Corpora will not contain any complete "dictionary" style files. Instead we host a sampling of 1000 common nouns, adjectives, and verbs.
    - Some lists are small enough by nature that we may contain a complete list of things in their category. For example, a list of heavily populated U.S. cities may only have 75 cities and be considered complete.

List of Corpora-related tools
-----------------------------

[](#list-of-corpora-related-tools)

- [corpora-project](https://www.npmjs.com/package/corpora-project), a Node.js NPM package for accessing corpora data offline.
- [pycorpora](https://github.com/aparrish/pycorpora), a simple Python interface for corpora
- [corpora-api](https://github.com/coleww/corpora-api), a Node.js server that offers up the corpora as a JSON API (now live at )

I have some data, how do I submit?
----------------------------------

[](#i-have-some-data-how-do-i-submit)

We accept pull requests to this repository. Some guidelines:

- BY SUBMITTING DATA AS A PULL REQUEST, YOU AGREE TO OUR APPLYING A [CC0](http://creativecommons.org/publicdomain/zero/1.0/) FREE CULTURE LICENSE TO THE DATA, MEANING THAT ANYONE CAN USE THE DATA FOR ANY REASON WITHOUT ATTRIBUTION IN PERPETUITY.
- Please submit all data as JSON format in a file with a `.json` extension, and please [JSONLint](http://jsonlint.com/) your files before submitting -- also, thanks to [Matt Rothenberg](https://github.com/mroth) we have Travis-CI testing, which will jsonlint your pull request automatically. If you see a test failure notification in your PR after you submit, there's a problem with your JSON!
- Keep individual files to about 1000 "things" maximum. Fewer than 1000 is fine, too.
- If you'd like attribution, I'm happy to include your name in this Readme file. Just remember that nobody who uses this data is obligated to include attribution in their own projects.

Contributors
------------

[](#contributors)

By [Darius Kazemi and Many Wonderful Contributors](https://github.com/dariusk/corpora/graphs/contributors).

###  Health Score

33

—

LowBetter than 72% of packages

Maintenance44

Moderate activity, may be stable

Popularity7

Limited adoption so far

Community19

Small or concentrated contributor base

Maturity58

Maturing project, gaining track record

 Bus Factor2

2 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~399 days

Total

4

Last Release

444d ago

PHP version history (2 changes)1.0.0PHP ^7.0

1.0.2PHP ^7.0|^8.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/afb5927cefc4188816f1829f1c3332f83a18a8badcf42db1befb2d08df3a9cb5?d=identicon)[imonroe](/maintainers/imonroe)

---

Top Contributors

[![dariusk](https://avatars.githubusercontent.com/u/266454?v=4)](https://github.com/dariusk "dariusk (263 commits)")[![hugovk](https://avatars.githubusercontent.com/u/1324225?v=4)](https://github.com/hugovk "hugovk (93 commits)")[![coleww](https://avatars.githubusercontent.com/u/5067315?v=4)](https://github.com/coleww "coleww (23 commits)")[![DazzlingDevelopment](https://avatars.githubusercontent.com/u/13905423?v=4)](https://github.com/DazzlingDevelopment "DazzlingDevelopment (15 commits)")[![imonroe](https://avatars.githubusercontent.com/u/1263892?v=4)](https://github.com/imonroe "imonroe (14 commits)")[![lcooke](https://avatars.githubusercontent.com/u/5613720?v=4)](https://github.com/lcooke "lcooke (11 commits)")[![serin-delaunay](https://avatars.githubusercontent.com/u/15715657?v=4)](https://github.com/serin-delaunay "serin-delaunay (8 commits)")[![suisea](https://avatars.githubusercontent.com/u/5679390?v=4)](https://github.com/suisea "suisea (8 commits)")[![Jerimee](https://avatars.githubusercontent.com/u/2730677?v=4)](https://github.com/Jerimee "Jerimee (8 commits)")[![enkiv2](https://avatars.githubusercontent.com/u/490009?v=4)](https://github.com/enkiv2 "enkiv2 (8 commits)")[![jimkang](https://avatars.githubusercontent.com/u/324298?v=4)](https://github.com/jimkang "jimkang (7 commits)")[![mroth](https://avatars.githubusercontent.com/u/40650?v=4)](https://github.com/mroth "mroth (7 commits)")[![greg-kennedy](https://avatars.githubusercontent.com/u/4950446?v=4)](https://github.com/greg-kennedy "greg-kennedy (7 commits)")[![kswedberg](https://avatars.githubusercontent.com/u/3485?v=4)](https://github.com/kswedberg "kswedberg (7 commits)")[![lee2sman](https://avatars.githubusercontent.com/u/7377908?v=4)](https://github.com/lee2sman "lee2sman (7 commits)")[![thisisparker](https://avatars.githubusercontent.com/u/400348?v=4)](https://github.com/thisisparker "thisisparker (6 commits)")[![aparrish](https://avatars.githubusercontent.com/u/125839?v=4)](https://github.com/aparrish "aparrish (6 commits)")[![samplereality](https://avatars.githubusercontent.com/u/1093156?v=4)](https://github.com/samplereality "samplereality (6 commits)")[![Hectate](https://avatars.githubusercontent.com/u/4451272?v=4)](https://github.com/Hectate "Hectate (5 commits)")[![charlesreid1](https://avatars.githubusercontent.com/u/368075?v=4)](https://github.com/charlesreid1 "charlesreid1 (5 commits)")

---

Tags

nounsadjectivesverbstest dataparts of speechcorporaword lists

###  Code Quality

TestsPHPUnit

Code StylePHP\_CodeSniffer

### Embed Badge

![Health badge](/badges/imonroe-corpora/health.svg)

```
[![Health](https://phpackages.com/badges/imonroe-corpora/health.svg)](https://phpackages.com/packages/imonroe-corpora)
```

###  Alternatives

[wapmorgan/morphos

A morphological solution for Russian and English language written completely in PHP. Provides classes to inflect personal names, geographical names, decline and pluralize nouns, generate cardinal and ordinal numerals, spell out money amounts and time.

8351.4M7](/packages/wapmorgan-morphos)[humanmade/coding-standards

Human Made Coding Standards

160443.1k59](/packages/humanmade-coding-standards)[dereuromark/cakephp-ide-helper

CakePHP IdeHelper Plugin to improve auto-completion

1882.3M44](/packages/dereuromark-cakephp-ide-helper)[hirethunk/verbs

An event sourcing package that feels nice.

515217.0k8](/packages/hirethunk-verbs)[mediawiki/mediawiki-codesniffer

MediaWiki CodeSniffer Standards

343.2M444](/packages/mediawiki-mediawiki-codesniffer)[ec-europa/toolkit

Toolkit packaged for Drupal projects based on Robo.

40252.8k34](/packages/ec-europa-toolkit)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
