PHPackages                             gidlov/copycat - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. gidlov/copycat

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

gidlov/copycat
==============

A universal scraping tool that can be used for all kinds of data collection. You can decide from where and what you want. All with regular expression. More info on the Github page.

v1.0.5(8y ago)7382.4k13LGPLPHPPHP &gt;=5.3.0

Since Aug 4Pushed 8y ago5 watchersCompare

[ Source](https://github.com/gidlov/copycat)[ Packagist](https://packagist.org/packages/gidlov/copycat)[ Docs](http://github.com/gidlov/copycat)[ RSS](/packages/gidlov-copycat/feed)WikiDiscussions master Synced 4w ago

READMEChangelog (2)DependenciesVersions (3)Used By (0)

Copycat - A PHP Scraping Class
==============================

[](#copycat---a-php-scraping-class)

[![Latest Stable Version](https://camo.githubusercontent.com/929ddbb9a8af4533fa8c6d1b312e8f83815fd188ba564a60522c99823b0acba7/68747470733a2f2f706f7365722e707567782e6f72672f6769646c6f762f636f70796361742f762f737461626c652e737667)](https://github.com/gidlov/copycat/releases)[![Total Downloads](https://camo.githubusercontent.com/ebdd64cd794576b74a6e0e60baed9e9140fe8314a632dfcbad81ff2dad82b1af/68747470733a2f2f706f7365722e707567782e6f72672f6769646c6f762f636f70796361742f646f776e6c6f616473)](https://packagist.org/packages/gidlov/copycat)[![Monthly Downloads](https://camo.githubusercontent.com/b26ef028bedef2142f0e14e606239e17b7aff071d9101b4d3db66c592222d164/68747470733a2f2f706f7365722e707567782e6f72672f6769646c6f762f636f70796361742f642f6d6f6e74686c79)](https://packagist.org/packages/gidlov/copycat)[![Reference Status](https://camo.githubusercontent.com/49a49d66b9b6ffd04a2f000f45960a850c888a7dea77f3928f49ad71e5abf6e7/68747470733a2f2f7777772e76657273696f6e6579652e636f6d2f7068702f6769646c6f763a636f70796361742f7265666572656e63655f62616467652e7376673f7374796c653d666c6174)](https://www.versioneye.com/php/gidlov:copycat/references)[![Software License](https://camo.githubusercontent.com/d9a7e43dcb8e5eca23ecaaa0d5ed7f39bf47e2ebf28c6bdc21f3f474f2a03d87/68747470733a2f2f706f7365722e707567782e6f72672f6769646c6f762f636f70796361742f6c6963656e7365)](LICENSE.txt)

You may find more info on [gidlov.com/en/code/copycat](https://gidlov.com/en/code/copycat)

### For Laravel 5/4 Developers

[](#for-laravel-54-developers)

In the `require` key of `composer.json` file add the following:

```
"gidlov/copycat": "1.*"

```

Run the Composer `update` command.

#### For Laravel 5 Developers

[](#for-laravel-5-developers)

Add to `providers` in `app/config/app.php`.

```
Gidlov\Copycat\CopycatServiceProvider::class,

```

and to `aliases` in the same file.

```
'Copycat' => Gidlov\Copycat\Copycat::class,

```

#### For Laravel 4 Developers

[](#for-laravel-4-developers)

Add to `providers` in `app/config/app.php`.

```
'Gidlov\Copycat\CopycatServiceProvider',

```

and to `aliases` in the same file.

```
'Copycat' => 'Gidlov\Copycat\Copycat',

```

Yet another scraping class
--------------------------

[](#yet-another-scraping-class)

I didn’t do much research before I wrote this class, so there is probably something similar out there, and certainly some more decent solution. *A Python version of this class is under development*.

But still, I needed a class that could pick out selected pieces from a web page, with regular expression, show or save it. I also needed to be able to save files and or pictures, and also specify or complete a current file name.

It is also possible to use a search engine to look up an address to extract data from. Assuming you has entered an expression for that particular page.

Briefly
-------

[](#briefly)

- Uses regular expression, match one or all.
- Can download and save files with custom file names.
- Possible to search through one or several tens of thousands of pages in sequence.
- Can use search engines to find out the right page.
- Also possible to apply callback functions for all items.

How to use this class
---------------------

[](#how-to-use-this-class)

Include the class and initiate your object with some custom [cURL parameters](http://php.net/manual/en/function.curl-setopt.php), if you need/like.

```
require_once('copycat.php');
$cc = new Copycat;
$cc->setCURL(array(
  CURLOPT_RETURNTRANSFER => 1,
  CURLOPT_CONNECTTIMEOUT => 5,
  CURLOPT_HTTPHEADER, "Content-Type: text/html; charset=iso-8859-1",
  CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17',
));
```

**I use [IMDb](http://imdb.com/) as our target source in these examples.**

Say we want to retrieve a particular film score, for simplicity, we happen to know the address of this very film, [Donnie Darko](http://www.imdb.com/title/tt0246578/). This is how the code could look like.

```
$cc->match(array(
    'score' => '/itemprop="ratingValue">(.*?)(.*?)(.*?)(.*?)(.*?).*?>(.*?)(.*?)
