PHPackages                             merzilla/inm-googlesitemap - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. merzilla/inm-googlesitemap

ActiveTypo3-cms-extension

merzilla/inm-googlesitemap
==========================

Google sitemap.xml extension for TYPO3 projects

0.5.4(5y ago)04171[1 issues](https://github.com/merzilla/inm_googlesitemap/issues)GPL-2.0+PHPPHP ^7.2

Since May 10Pushed 5y ago1 watchersCompare

[ Source](https://github.com/merzilla/inm_googlesitemap)[ Packagist](https://packagist.org/packages/merzilla/inm-googlesitemap)[ Docs](https://www.plan-net.ch)[ RSS](/packages/merzilla-inm-googlesitemap/feed)WikiDiscussions master Synced 4d ago

READMEChangelog (1)Dependencies (1)Versions (15)Used By (0)

What does it do?
================

[](#what-does-it-do)

This is a TYPO3 CMS extension. It provides an Extbase Command Controller task to generate a sitemap.xml by using the PHPCrawl library. It parses the given URL and finds all links in the HTML an follows them. So it works like a frontend crawler.

Usage
=====

[](#usage)

After installing the extension (activating it in your Extension Manager) you have to create a new Scheduler Task using the `Extbase CommandController Task`. Select `InmGooglesitemap Sitemap: generateSitemap` as command and then you´ll get the following options. (Besides the default cron options of TYPO3).

You may get this extension via Git clone or composer to your preferred destination.

Scheduler Task *Options / Arguments* for the crawling process
=============================================================

[](#scheduler-task-options--arguments-for-the-crawling-process)

*url*: The URL entry point for crawling.
----------------------------------------

[](#url-the-url-entry-point-for-crawling)

** - This will be the entry point for crawling, the first URL that will be called.

*sitemapFileName*: File name of the XML file. Default is "sitemap.xml".
-----------------------------------------------------------------------

[](#sitemapfilename-file-name-of-the-xml-file-default-is-sitemapxml)

*sitemap.xml* - This file will be saved in your webroot, so the sitemap will be reachable under URL `http://example.com/sitemap.xml`

*regexFileEndings*: Regular expression for file endings to skip
---------------------------------------------------------------

[](#regexfileendings-regular-expression-for-file-endings-to-skip)

*\#\\.(jpg|jpeg|gif|png|mp3|mp4|gz|ico)$# i* - per default, URLs having one of these file endings will be skipped

*regexDirectoryExclude*: Regular expression for directories to skip.
--------------------------------------------------------------------

[](#regexdirectoryexclude-regular-expression-for-directories-to-skip)

*\#\\/(typo3conf|fileadmin|uploads)\\/.\*$# i* - per default, these paths are skipped when found in URL

*obeyRobotsTxt*: Check to obey rules from robots.txt
----------------------------------------------------

[](#obeyrobotstxt-check-to-obey-rules-from-robotstxt)

Check this if you want to obey the rules in robots.txt

*requestLimit*: Max number of URLs to crawl.
--------------------------------------------

[](#requestlimit-max-number-of-urls-to-crawl)

0 - Default is "0" which means `no limit`. Enter a number &gt; 0 to set a limit.

*countOnlyProcessed*: Check if only fetched URLs should count for $requestLimit.
--------------------------------------------------------------------------------

[](#countonlyprocessed-check-if-only-fetched-urls-should-count-for-requestlimit)

Checkbox to fine tune the limit of max requested URLs.

*phpTimeLimit*: Value in seconds for setting time limit. Default = 10000.
-------------------------------------------------------------------------

[](#phptimelimit-value-in-seconds-for-setting-time-limit-default--10000)

*10000* - is the default value.

*htmlSuffix*: Default true: will only allow .htm|.html endings. Will also exclude query strings
-----------------------------------------------------------------------------------------------

[](#htmlsuffix-default-true-will-only-allow-htmhtml-endings-will-also-exclude-query-strings)

Checkbox to tell the crawler that a URL must end with `.html` or `.html`.

*linkExtractionTags*: By default the crawler searches for links in the following html-tags:
-------------------------------------------------------------------------------------------

[](#linkextractiontags-by-default-the-crawler-searches-for-links-in-the-following-html-tags)

href, src, url, location, codebase, background, data, profile, action and open. You may change this comma-separated list

*useTransferProtocol*: Enter transfer protocol to use: http (=default) or https. URLs with wrong protocol will not be written.
------------------------------------------------------------------------------------------------------------------------------

[](#usetransferprotocol-enter-transfer-protocol-to-use-http-default-or-https-urls-with-wrong-protocol-will-not-be-written)

*http* - maybe if you use a prox you have to set the protocol that must be prepended to the URLs.

*requestDelay*: float or string / time in seconds (float, e.g. 0.5 or 60/100 for 100 request per minute). Sets a delay for every HTTP-requests the crawler executes.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

[](#requestdelay-float-or-string--time-in-seconds-float-eg-05-or-60100-for-100-request-per-minute-sets-a-delay-for-every-http-requests-the-crawler-executes)

*2* - default value, means 2 seconds.

*username*: HTTP Auth username
------------------------------

[](#username-http-auth-username)

default: empty. Must be at least 2 chars long.

*password*: HTTP Auth password
------------------------------

[](#password-http-auth-password)

default: empty. Must be at least 2 chars long.

*urlRegexHttpAuth*: URL to send authentication information to, e.g. "#[http://www\\.foo\\.com/protected\_path/#](http://www%5C.foo%5C.com/protected_path/#)"
------------------------------------------------------------------------------------------------------------------------------------------------------------

[](#urlregexhttpauth-url-to-send-authentication-information-to-eg-httpwwwfoocomprotected_path)

default: empty. With the given example, for every URL within path "protectec\_path", the auth data would be added to the request.

Big Thanks to Uwe Hunfeld for th GPL licensed PHPCrawl library
==============================================================

[](#big-thanks-to-uwe-hunfeld-for-th-gpl-licensed-phpcrawl-library)

PHPCrawl is completly free opensource software and is licensed under the GNU GENERAL PUBLIC LICENSE v2

More to know
============

[](#more-to-know)

The PHPCrawl Library offers the possibility to use multi-processes. But there are a few requirements which may be not on every webserver.

For the moment, the extension has the multi-process not implemented yet. It is planned to be able to activate it in the Scheduler Task settings, too.

A temporary file
----------------

[](#a-temporary-file)

While the process runs, it generates a file named `_temporary_sitemap.xml` which will be renamed to `sitemap.xml` (or the given name in the settings), after the Scheduler Task run successfully.

The generated sitemap.xml
-------------------------

[](#the-generated-sitemapxml)

The `sitemap.xml` only contains the URLs that the crawling process has found, which is the minimum requirement for a XML sitemap. This means we do not extend `pages` with fields like `priority` or add dates. I think that´ ok as Google does a good job either.

### Why another sitemap extension?

[](#why-another-sitemap-extension)

We think the approach to crawl from the frontend gives better results than trying to get all URLs form within the backend, where you have to write your own sitemap providers and so on. By using `inm_googlesitemap` all links are found from the view of the frontend, well... that means "just" like a crawler (which it is in deed though), or like a link checker, or maybe a bot like Google.

###  Health Score

28

—

LowBetter than 54% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity15

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity56

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 98% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~121 days

Recently: every ~155 days

Total

14

Last Release

2076d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/0fe57d9af6c59f05fc7c6709f41a07c111c97777ac4460af5596157c8a549898?d=identicon)[merzilla](/maintainers/merzilla)

---

Top Contributors

[![merzilla](https://avatars.githubusercontent.com/u/636309?v=4)](https://github.com/merzilla "merzilla (48 commits)")[![MrSlntGhost](https://avatars.githubusercontent.com/u/162148701?v=4)](https://github.com/MrSlntGhost "MrSlntGhost (1 commits)")

---

Tags

seoTYPO3 CMSGoogle XML sitemap

### Embed Badge

![Health badge](/badges/merzilla-inm-googlesitemap/health.svg)

```
[![Health](https://phpackages.com/badges/merzilla-inm-googlesitemap/health.svg)](https://phpackages.com/packages/merzilla-inm-googlesitemap)
```

###  Alternatives

[fluidtypo3/vhs

This is a collection of ViewHelpers for performing rendering tasks that are not natively provided by TYPO3's Fluid templating engine.

1954.1M49](/packages/fluidtypo3-vhs)[brotkrueml/schema

Embedding schema.org vocabulary - API and view helpers for schema.org markup

33584.6k13](/packages/brotkrueml-schema)[yoast-seo-for-typo3/yoast_seo

Yoast SEO for TYPO3

521.6M9](/packages/yoast-seo-for-typo3-yoast-seo)[derhansen/sf_event_mgt

Configurable event management and registration extension based on ExtBase and Fluid

64313.9k6](/packages/derhansen-sf-event-mgt)[wazum/sluggi

TYPO3 extension for URL slug management with inline editing, auto-sync, locking, access control, and redirects

39488.5k](/packages/wazum-sluggi)[in2code/in2publish_core

Content publishing extension to connect stage and production server

40135.8k](/packages/in2code-in2publish-core)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)