The Guardian’s code repository landscape is diverse, with nearly 200
repositories spanning various business domains and technical functions.
Overall, core customer-facing platforms and reader revenue projects are
highly active and large, reflecting ongoing investment,
while a number of smaller utilities and one-off projects show
little recent activity.
Naming conventions are mostly
consistent and descriptive, though a few outliers use unconventional
formats. Several potential risk areas emerge: a subset of repositories
appear unmaintained (no updates in over a year), some
codebases are very large (hundreds of thousands of
lines) which may pose maintainability challenges, and roughly 36
repositories have no test code, indicating possible
quality risks.
Additionally, many different programming languages and
frameworks are in use, which suggests technology stack
diversity that could lead to fragmentation or specialized skill
needs. Addressing inactive projects, improving test coverage, and
standardizing naming can help reduce technical debt and improve overall
platform health.
Based on repository naming, the codebase can be grouped into key business and technical domains:
editorial-collaboration
, facia-tool
(fronts
management tool), story-packages
(grouping articles), and
rich text editor libraries like prosemirror-...
packages.
These support journalists and editors in producing and packaging
content.
frontend
(legacy website code) and
dotcom-rendering
(the newer web rendering service). Also
included are related monitoring or Lambda services
(e.g. frontend-lambda
) and dotcom-
prefixed
utilities.
support-frontend
(the supporter sign-up site) and
its associated services (support-admin-console
,
support-analytics
, support-reminders
,
support-service-lambdas
) manage subscriptions, payments,
and membership. Repos like members-data-api
and
memsub-promotions
also fall here.
mobile-
prefix relate to the Guardian’s mobile
applications. This includes mobile-apps-api-models
and
mobile-apps-article-templates
(shared models and templates
for apps), as well as supporting utilities like
mobile-notifications-content
,
mobile-save-for-later
, mobile-purchases
(in-app purchases), and platform-specific tools
(bridget-android
, bridget-swift
for bridging
app and web content).
grid
, grid-cerebro
, grid-cli
,
grid-feeds
). Pluto appears to be a project
for video/media workflow, with many repos prefixed pluto-
(e.g. pluto-core
, pluto-mediabrowser
,
pluto-deliverables
) covering storage, browsing, and
deliverable management of media.
ophan-housekeeper
, ophan-geoip-db-refresher
,
ophan-google-search-index-checker
). Other data/analytics
tools include contributions-ticker-calculator
(likely for
contribution metrics), newsletters-nx
(newsletter
platform), newswires
(ingesting newswire content), and
various content data pipelines (content-api-*
clients and
tests, recommendations
).
commercial
, commercial-shared
,
commercial-templates
(ad templates or logic), and
integration with third-parties like google-admanager-api
.
Also, braze-components
(related to Braze marketing
platform) and consent-management-platform
(for user
consents/CMP) fall in this group.
identity-processes
(user identity workflows),
pan-domain-authentication
and pan-domain-node
(single sign-on across “*.gutools” domains), and
login.gutools
(the login app for internal tools) cover
identity. Security-focused projects include security-hq
(security dashboard), secure-contact
and the
SecureDrop client/workstation for whistleblowing
(securedrop-client
,
securedrop-workstation
).
riff-raff
) is the
deployment pipeline, Prism (prism
) is an
AWS inventory service, Amigo (amigo
)
manages AMIs, Amiable monitors AMIs, and
Anghammarad (named repo for alerts). Repos prefixed
with actions-
(e.g. actions-riff-raff
,
actions-npm-dependencies
) are custom GitHub Actions for
CI/CD. cdk
and cdk-playground
relate to AWS
Cloud Development Kit. Other infra utilities include
cloudwatch-logs-management
,
fastly-cache-purger
(and
mobile-fastly-cache-purger
), s3-upload
, and
elasticsearch-node-rotation
. The gateway
and
gatehouse
may handle dev environments or proxies.
content-api-scala-client
(Guardian Open Platform client),
targeting-client
(perhaps for personalization or ad
targeting), play-googleauth
(Play framework module for
Google Auth), simple-configuration
(configuration library),
thrift-serializer
and french-thrift
(Thrift
models for content/analytics, one notably for French content tracking).
The Source design system appears in
source-apps
. There are also design/interactive tooling
repos such as ai2html
and
chloropleth_map_maker
(graphics tools), and
interactive-*
repos which seem to be templates or assets
for interactive articles (e.g.,
interactive-atom-thrasher-template
,
interactive-now-and-then-embed
).
workflow-frontend
(likely an editorial workflow UI),
flexible-*
repos (perhaps related to an older CMS called
“Flexible”, e.g. flexible-octopus-converter
,
flexible-restorer
for content migration or restoration),
and editorial-tools-user-telemetry-service
(captures usage
telemetry of editorial tools). editors-picks-uploader
also
fits here (managing Editors’ Picks content).
national-delivery-fulfilment
(likely print home delivery management) and invoicing-api
or payment-failure-comms
(communication on payment
failures). The zuora-
prefixed repos
(zuora-full-export
, zuora-invoice-write-offs
)
deal with data export and adjustments in the Zuora billing system (used
for subscriptions).
discussion-avatar
(comment system avatar service),
pressreader
(possibly integration with PressReader for
digital newspapers), archivehunter
(archiving tool),
recipes-backend
(perhaps for a recipes section),
hackday-ever-elusive-kudo
(a hackday project), and internal
documentation sites like guardian-engineering-site
.
These conceptual groupings show that the repositories cover everything from content creation and delivery, audience analytics, revenue products, to internal tooling and infrastructure.
Common Naming Patterns: Guardian repositories largely follow a consistent naming scheme:
support-admin-console
and
content-api-scala-client
use hyphens to separate words and
all-lowercase text.
editorial-
for editorial tools, mobile-
for
mobile app related projects, ophan-
for analytics,
pluto-
for media pipeline, support-
for
supporter revenue, commercial-
for ad tools, etc. This
helps quickly identify a repo’s context.
-service
(indicating a
service/microservice,
e.g. editorial-tools-user-telemetry-service
),
-client
(client library, as in
content-api-scala-client
), -frontend
or
-backend
(to distinguish UI vs server components,
e.g. support-frontend
), -lambda
(AWS Lambda
functions, e.g. podcasts-analytics-lambda
), and platform
indicators such as -android
or -swift
for
platform-specific code.
grid-cli
,
fastly-cache-purger
) or use known project codenames within
Guardian (e.g., “riff-raff”, “prism”, “amiable”). In general, the naming
leans toward clarity about the repository’s purpose or the system it
belongs to.
Inconsistencies and Odd Patterns: Despite general consistency, there are a few naming irregularities:
chloropleth_map_maker
is one of the only names using
underscores, and login.gutools
contains a period in its
name. These stand out against the predominantly hyphenated names.
VaultDoor
(capitalized
CamelCase) and CDS-K8s
(contains uppercase “CDS” and
“K8s”). Most other repos are all-lowercase, so these are exceptions
likely due to specific naming needs (e.g., “CDS” might be an acronym for
a system).
login.gutools
) and the underscore
example above. This inconsistency could lead to slight confusion or
extra effort when searching for repos (e.g., one might expect
login-gutools
for consistency).
support-
consistently refers to supporter revenue projects, but
frontend
appears both as a standalone repo name and as a
prefix in others like workflow-frontend
. Similarly, the
term “front” appears in different contexts (frontend
vs. front-press-monitor
). This is minor, but occasionally a
prefix doesn’t fully clarify context (e.g., manage-frontend
vs workflow-frontend
are different domains despite both
ending in “-frontend”).
prout
,
giant
, marley
are short or code-named and
require prior knowledge to understand their purpose. While many of these
(Prism, Riff-Raff, etc.) are known internal tool names, to newcomers
they can be unclear. In some cases, there are redundant or varying
references to the same name (e.g., a contributor list might show both
guardian email and GitHub username variations of the same person, but
that’s more about data than naming).
CDS
in CDS-K8s
), but others
are lowercase or mixed (the gutools
in
login.gutools
is lowercase, which is fine, but
Gutools
isn’t used). The inconsistency is minimal overall
but present in a handful of names.
In summary, the naming conventions are largely systematic
(hyphenated, contextual names), with just a handful of outliers (use of
_
, .
, CamelCase) that break the pattern. These
inconsistencies, though few, could slightly hinder discoverability or
violate the principle of least surprise for developers navigating the
repos.
Analyzing lines of code, contributor counts, and recent commit activity reveals significant differences across the identified groups:
frontend
over its lifetime (reflecting its age and
open-source history), and dotcom-rendering
similarly has a
broad contributor base (the JSON shows dozens of recent contributors).
High activity in these repos indicates they are critical and regularly
updated.
workflow-frontend
and editorial-collaboration
are smaller in LOC but have continuous updates (these tools evolve with
newsroom needs, though typically with fewer contributors than the
public-facing products).
bridget-android
has ~31k LOC). Activity in mobile repos
varies: core shared models (mobile-apps-api-models
) and
templates see periodic updates, but others are relatively quiet. Recent
commit data shows some mobile services (like
mobile-purchases
or mobile-notifications
) have
had commits this year, but the volume is lower compared to web projects.
The mobile group overall has fewer active contributors at any given
time, reflecting a smaller team focus.
pluto-core
(~38k LOC) and pluto-mediabrowser
(~6k LOC) are not extremely large, and their recent activity levels are
relatively low – some Pluto repos show 0 commits in the last quarter.
This could indicate that the Pluto system as a whole is either mature or
possibly being phased out. The number of contributors on each Pluto repo
is modest (often just a handful of people have worked on each), implying
specialized teams.
ophan-housekeeper
,
ophan-geoip-db-refresher
etc., might each have just a few
hundred lines to a few thousand lines of code. They have low commit
frequency recently (some had 0 commits in last 30/90 days) – likely
because these are simple utilities that don’t need frequent changes.
Contributor counts on these are also low (often the same 2–3 data
engineers appear across multiple analytics repos). One exception in size
is french-thrift, which is a large repo (~215k LOC) but
this appears to be an outlier (possibly auto-generated code or a forked
library for Thrift models). Its activity is low (last significant update
in 2024) and despite its huge LOC, it likely doesn’t represent ongoing
development work.
commercial
, commercial-shared
,
braze-components
) tend to be moderate in size (a few
thousand LOC each) and have a moderate number of contributors (the core
commercial engineering team). Activity in the last year on these has not
been as high as the reader revenue or platform teams – for instance,
commercial-shared
had its latest commit in late 2024 and 0
commits in recent months, suggesting a stable library. Some commercial
repos like consent-management-platform
(CMP) might see
occasional bursts of work (e.g., when legal requirements change).
pan-domain-authentication
, janus-app
) and
Open Platform/API (content-api-*
clients)
show moderate activity. The identity and security tools have a steady
trickle of commits (for upkeep with security patches or new
integrations). The content API client libraries are open-source and
widely used, which is reflected in relatively high contributor counts
(the Scala client has 100+ contributors over time) and ongoing
maintenance commits (though not high volume, just consistent over years
to support API changes and Scala version bumps).
In highlighting the extremes: the largest codebases
are in the Web Platform and Support domains (tens to hundreds of
thousands of LOC, with correspondingly large teams and high commit
rates), whereas the smallest are utility scripts or legacy interactives
(often <1k LOC, sometimes single-maintainer, and dormant). The
most contributors tend to be on long-lived, widely used
projects (frontend
, dotcom-rendering
,
facia-tool
, support-frontend
,
grid
, and the content API client), each accumulating dozens
of contributors over time. The highest recent commit
activity is concentrated in support/contributions and the new
website platform – indicating strategic focus areas – while areas like
Pluto, older interactives, or certain infrastructure tools have few or
no recent commits, indicating stability or de-prioritization. This
variance suggests where development effort is currently focused versus
which parts of the codebase might be candidates for cleanup or
archival.
Using criteria such as “no commits in the last year” and/or a last commit date over 18 months ago, we can identify a handful of repositories that appear potentially inactive or in maintenance-only mode:
interactive-boot-scripts
repository is a clear example of
an abandoned project. It hasn’t been updated since May 2016. With only
24 total commits and none in recent years, this repo is essentially
dormant.
oz-bsky-test
repo
(probably a test integration with Bluesky or an experiment) had its last
commit in November 2023. In roughly 18 months since, no further commits
have occurred (0 in last 90 days). With only 12 commits total, this
looks like a short-lived experiment that has since been shelved.
interactive-gaza-damage
(an interactive graphic,
last commit January 2024) and email-mvt
(an email
multivariate test project, last commit March 2024) each have had no
commits in over a year. They border the 18-month mark for inactivity.
They likely served a one-time purpose (news coverage or a specific test)
and then development ceased.
chloropleth_map_maker
(a data visualization tool) shows
only 2 contributors and a recent commit in early 2025 by an automation –
it may not be actively developed feature-wise. Repos like
example-typescript-lambda
or
oz-2022-cpi-explorer
were probably experimental or tutorial
in nature and have seen no meaningful updates lately.
In summary, only a relatively small fraction of Guardian’s repositories appear truly inactive by the “>1 year no commits” definition – on the order of 5–10 repositories stand out as likely unmaintained. These are often either very old (e.g. 2016-era) or very niche. It’s also worth noting many others have low activity but at least one commit within the last year (possibly maintenance like dependency bumps or automated security fixes), which keeps them just out of the “completely inactive” category. The above examples (interactive-boot-scripts, interactive templates, etc.) are those that clearly meet the criteria of having had no meaningful changes for a long time.
Based on the dataset analysis – considering repository sizes, activity levels, testing coverage, contributor counts, naming, and diversity – several potential risk areas emerge:
Technical Debt & Lack of Maintenance: The
presence of repositories with no recent commits for extended
periods suggests pockets of unmaintained code. These could
accumulate technical debt (outdated libraries, unpatched
vulnerabilities) and knowledge loss. For example, a very old repo like
interactive-boot-scripts
(last updated 2016) is likely
running with years-old dependencies. Inactive but still deployed
services could pose reliability and security risks if they are not kept
up to date. It may be unclear if such repos are still in use; if they
are, they need attention, and if not, they might be candidates for
archiving to reduce clutter. Generally, code in maintenance mode can
become brittle – the team should periodically review whether to invest
in updates or decommission those components.
Key Person Dependency: Many repositories have
very low contributor counts, often just two or three
people ever contributing. This implies that knowledge of those codebases
is concentrated in a few individuals. If those individuals leave or move
to other teams, the project could be left without expertise. For
instance, oz-bsky-test
has only 2 contributors listed, and
climate-data-cli
also shows just 2 contributors in its
history (one Guardian dev and one external) – meaning only one person
might really understand each. A single-maintainer scenario is risky;
there’s limited code review, and bus factor is low. Ensuring multiple
developers are familiar with each critical repo, or documenting them
well, would mitigate this risk. It may also be worth examining if any
critical systems are in the hands of only one or two people (though most
mission-critical ones like frontend have many contributors, some medium
importance tools might not).
Code Complexity & Maintainability: The
very large repositories (tens of thousands of LOC)
present maintainability challenges. A codebase like
dotcom-rendering
(~198k LOC) or frontend
(~156k LOC) is inherently complex. They consist of hundreds or thousands
of files and likely implement a wide range of features built up over
years. Such size can slow down onboarding (new developers need to grasp
a huge codebase), increase the chance of bugs (more surface area), and
make big refactors risky. While these large projects are actively worked
on (mitigating the risk somewhat through continuous refactoring and
improvement), their sheer scale means technical debt needs to be
carefully managed. Regular modularization, cleanup of dead code, and
up-to-date documentation are necessary to keep them healthy. There’s
also a risk that some large older codebases (like the
frontend
Play application) contain legacy patterns or
outdated frameworks that are hard to modernize – which can become a drag
on productivity or deployment (e.g., if tied to older versions of
Scala/Play). It’s worth noting positively that some large repos do have
substantial test suites (e.g., frontend
has ~38k lines of
test code, which helps with maintainability by catching
regressions).
Insufficient Testing: A notable number of
repositories have little or no test code, raising
quality and reliability concerns. According to the data, 36 repositories
have 0 test LOC. For example, ophan-thrift-swift
(a Swift
codebase for analytics models) shows 0 test files or
lines, and bridget-android
(over 31k lines of Android
bridging code) also has no tests at all. This pattern
is especially worrying in larger repos: a non-trivial codebase with no
automated tests means any change could inadvertently break functionality
without detection. It also suggests those repos might not have undergone
rigorous TDD or QA, possibly due to being quick prototypes or relying on
manual testing. Repositories related to interactives or one-off projects
often lacked tests (which might be acceptable if they’re throwaway), but
if any of these no-test repos are in production use or expected to be
maintained long-term, that’s a risk. On the flip side, some teams have
excellent testing (support services have more test LOC than main code,
indicating an emphasis on quality), so the risk is uneven – it’s
concentrated in specific repos. The organization should review which
important services lack tests and consider backfilling tests or
refactoring to make them testable. Lack of tests also ties into key
person risk – if only the original author knows how it’s supposed to
work (with no tests as living documentation), maintenance by others
becomes hard.
Technology Stack Fragmentation: The dataset
suggests a broad mix of programming languages and
frameworks across the organization – likely Scala,
JavaScript/TypeScript, Python, Swift, Kotlin/Java Android, etc. For
example, the presence of ophan-thrift-swift
(Swift code)
alongside bridget-android
(Android/Kotlin) and many
Scala-based services (identity-processes
,
pan-domain-authentication
) and Node/React apps
(dotcom-rendering
, support-admin-console
)
shows a wide tech spread. While using the right tool for the job is
sensible, such diversity can pose challenges. It requires hiring and
retaining expertise in multiple tech stacks, and context-switching for
engineers moving between projects. It can also mean duplicated effort or
inconsistent approaches – for instance, one team might solve a problem
in Scala while another solves a similar problem in Node, leading to two
different implementations to maintain. Additionally, some
languages/frameworks might fall out of favor or lose community support
(for example, if any project is still on Play Framework and others are
on Node, balancing effort between them is tough). The variety (including
less common internal languages like Swift for a backend model) could
indicate some siloing of technology per team. This
fragmentation risk is about consistency and
interoperability – if not managed, it can slow down cross-team
development and increase DevOps overhead (different build pipelines,
testing tools for each stack). A mitigative strategy might be to
converge on a smaller set of core technologies for new projects, and
have clear ownership for those that are unique outliers.
Security Risks in Old/Unmaintained Code:
Repositories that haven’t been updated in a long time may not have
received important security patches. For example, an old service last
touched in 2016 likely has outdated dependencies with known
vulnerabilities. Even medium-term inactivity (2–3 years) can be enough
for vulnerabilities to emerge (e.g., an old version of a library with a
newly discovered CVE). If any such repos are still deployed in
production (or their code is reused), they could be an entry point for
security issues. Additionally, some repos (like
login.gutools
) handle authentication – if their naming or
code suggests they are critical for access control, ensuring they are
up-to-date is vital. Another security aspect is that inconsistent naming
or organization can lead to oversights – e.g., a
repository not clearly identified might be forgotten in security scans
or not included in regular maintenance rotations. It’s also worth noting
the presence of a snyk-bot
in contributor lists of some
repos indicates automated security fixes were attempted in some
codebases; however, if those PRs weren’t merged or the bot isn’t run
everywhere, some repos might lag behind. Overall, conducting regular
dependency health checks on the low-activity repos is recommended to
catch security issues.
Naming/Discoverability Issues: While relatively
minor compared to the above, the inconsistent naming
conventions could pose a productivity risk. For instance, a
developer searching for “fastly purger” might not immediately find
mobile-fastly-cache-purger
if they expect all
Fastly-related tools to be prefixed uniformly. The one-off use of
underscores or dots might also break automation scripts that assume repo
names are hyphenated. In an ecosystem as large as Guardian’s, having a
clean, predictable naming scheme helps new developers and cross-team
collaboration. The few deviations (chloropleth_map_maker
,
VaultDoor
, etc.) might just be historical artefacts, but if
they proliferated, it could lead to confusion. Ensuring new repositories
follow a standard (all-lowercase, hyphens, meaningful prefixes) is a
low-effort way to maintain order. Additionally, clear naming signals
ownership – for instance, braze-components
clearly is about
Braze (marketing), which likely involves marketing or CX teams. If
naming were inconsistent, that signal is lost and could hinder quick
identification of who might maintain a given repo.
In conclusion, the Guardian’s repository ecosystem is robust and
covers a wide range of needs, but it is not without areas of concern.
Active management of legacy projects, fostering shared ownership of
smaller projects, enforcing good testing practices, streamlining tech
choices, and consistent conventions will all help reduce these risks.
By addressing the highlighted issues – such as injecting life into or
retiring stale repos, adding tests to critical low-test code, and
auditing security on older code – the organization can lower the chances
of outages, security incidents, or team friction in the future.