Vol. III/Folio 2026Dhaka to Global/Edition 947251c
NewsForge

KaritKarma / News + Media

NewsForge

Dateline / Dhaka / 2026-05-17
Autonomous newsroom

The press release goes in.
The newspaper comes out.

NewsForge is the AI newsroom behind khoboria.com. An eleven-stage pipeline reads 500+ global wires, dedupes by content DNA, rewrites as original journalism in fifty-plus languages, fact-verifies against multiple sources, picks the hero image from Professional Vault, and auto-publishes. No copy-paste, no template farm, no human in the critical path.

Sources
500+
Languages
50+
Stages
11
Live since
khoboria.com
Live wireStage 11 / publishing
  1. ENTech

    Indo-Pacific data centre buildout outpaces grid capacity, operators warn

    Reuters, Nikkei, TechCrunch

    3-source consensus

  2. BNTech

    চিপ রপ্তানি নিয়ন্ত্রণ: ঢাকার সরবরাহ চক্রে নতুন চাপ

    Reuters, BSS, Prothom Alo

    4-source consensus

  3. ENMarkets

    Ringgit and rupiah firm as Fed minutes flag a softer landing

    Bloomberg, FT, AP

    3-source consensus

Output: WordPress + Next.jsp95 latency ~2s

What is NewsForge

NewsForge is KaritKarma's autonomous newsroom infrastructure. A .NET 10 backend, six Python services, an eleven-stage AI pipeline with Groq Llama 70B doing Master Intelligence in a single pass, and a Next.js reader site with a reporter dashboard. It runs in production at khoboria.com and publishes into WordPress, Drupal, or its own native portal. Compared with Arc XP, Brightspot, WordPress VIP, and Drupal, the four CMSs newsrooms usually evaluate, NewsForge is the only one that generates the journalism, dedupes story DNA across languages, and runs a multi-source credibility check before publication. Pricing is per portal, not the six-figure enterprise contracts the incumbents quote.

002 / What it does

Four jobs of an autonomous newsroom.

Each pillar maps to a real folder in the codebase. The product is the integration of these four, not a list of feature ticks.

  1. 500+ wires, watched in real time.

    01 / Source

    The Python crawler keeps a live list of newsroom-grade sources, refreshed every minute. RSS, sitemap, and HTML fallbacks. IPv6 rotation and Cloudflare bypass keep ingestion alive when adblocked sites would refuse a desktop browser.

    Source / crawler/

  2. Eleven stages, deterministic.

    02 / Pipeline

    Each article passes the same eleven stages in order. DNA dedup, Master Intelligence rewrite, credibility, fact verification, copyright filter, story merge, image selection, SEO, publish. No stage can be skipped. The audit trail is the schema.

    Source / ai-pipeline/

  3. Multi-source consensus, never silent.

    03 / Verify

    An 85% consensus threshold across credible sources gates publication. Disagreements queue for human review with the conflicting passages highlighted, not auto-resolved by a hallucinating model.

    Source / ai-pipeline/stage6_credibility.py

  4. Original journalism in 50+ languages.

    04 / Localise

    Master Intelligence rewrites, not translates. A dedicated Bengali prompt corrects transliteration of named entities, the failure mode most multilingual AI ships with. Hindi, Arabic, Spanish, Indonesian, Tagalog covered in the same pass.

    Source / ai-pipeline/stage2_master_intelligence.py

003 / The pipeline

Eleven stages. One press.

Every article moves through the same eleven stages in order. The schedule is deterministic. Stage 2 fires every 40 minutes for Master Intelligence, stages 5 to 11 run every minute. The audit trail is the database, not a log file.

  1. Stage 01

    Source monitor

    Crawler watches 500+ RSS, sitemap, and HTML sources with IPv6 rotation and Cloudflare bypass.

  2. Stage 02

    Universal extract

    Beautiful Soup, Playwright, and a vision-language fallback parse the article body, byline, dateline, and lead image.

  3. Stage 03

    Content DNA

    pgvector embeddings cluster duplicates across languages so the same story never gets rewritten twice.

  4. Stage 04

    Master Intelligence

    Groq Llama 70B runs translation, categorisation, and the first rewrite in one pass. 80% cost reduction vs the original 7-call pipeline.

  5. Stage 05

    Credibility score

    Multi-source consensus check, 85% threshold. Disagreements are flagged for human review, never published silently.

  6. Stage 06

    Fact verification

    Named entities, quotes, and numerics are pinned back to the source URLs before the article can advance.

  7. Stage 07

    Copyright filter

    Quotes, opinion, and expressive prose are stripped or attributed. Facts are not copyrighted; expression is.

  8. Stage 08

    Story merge

    Updates over a 40-minute window are consolidated into a single canonical article, not a stream of duplicates.

  9. Stage 09

    Visual selection

    Semantic image search picks the lead photo from Professional Vault. Auto card generation if no licensed photo exists.

  10. Stage 10

    SEO + Bengali polish

    Bengali transliteration of named entities is corrected by a dedicated prompt. SEO title, slug, meta, and schema are emitted.

  11. Stage 11

    Auto publish

    Pushes to WordPress, the native portal, or your CMS. Reporter celebrations, share cards, and audit trail close the loop.

Stage 2
Every 40 minutes
Master Intelligence rewrite
Stages 5 to 7
Every 1 minute
Credibility + verification
Stage 11
Every 1 minute
Publishing

004 / Architecture

Six services. One newsroom.

The whole platform is a .NET 10 backend, four Python 3.11 services, and one Next.js 15 frontend, behind a Traefik proxy with Let's Encrypt. Postgres 18 with pgvector underneath.

Postgres 18 + pgvectorRedis 8MinIO S3Groq Llama 70BWenme OAuth
ServiceRoleStackPort
Frontend

frontend/

Reader site + reporter dashboard
Next.js 15, React 19, Tailwind 4
3000
Backend API

backend/

REST + auth + content store
.NET 10 Minimal APIs, Postgres 18 + pgvector
8080
Crawler

crawler/

Source ingestion
Python 3.11, FastAPI, Playwright, BeautifulSoup
8001
AI Pipeline

ai-pipeline/

11-stage processor
Python 3.11, Groq Llama 70B, asyncpg
8002
Customer Portal AI

customer-portal-ai-processing/

Per-tenant cards + images
Python 3.11, PIL, MinIO
8003
Orchestrator

orchestrator/

Stage scheduler
Python 3.11 async, httpx
n/a

005 / Humans, when it matters

Autonomous, with a reporter desk bolted on.

Field reporters file straight into NewsForge through a dedicated dashboard. Submissions skip the 40-minute build-up window and run on the priority lane. When the story publishes, a full celebration screen confirms the byline and shows live share links.

The hero image for every published story is selected from Professional Vault, KaritKarma's central DAM. Faces are matched against one shared person directory, so the same public figure resolves to the same entity across every article.

Reporter dashboard

File text, photos, and a completeness score. Real-time status from draft to published.

frontend/src/app/reporter/

Editor review queue

Failures of the 85% credibility check land here with the conflicting passages highlighted.

backend / review

Hero from Professional Vault

Semantic image search against the shared library. Person-aware matching.

ai-pipeline / stage8 + PV

WordPress auto-publisher

Publishes into existing WordPress sites alongside the native Next.js portal.

portal-auto-publisher/

006 / Comparison

NewsForge vs the CMS incumbents.

Compiled from the public product pages of Arc XP, Brightspot, WordPress VIP, and Drupal as of May 2026. Attributable capabilities only, no marketing claims. NewsForge sits above these as a content production engine that can publish into them or replace them entirely with its own native portal.

CapabilityNewsForgeArc XPBrightspotWP VIPDrupal
Autonomous content generation, not just CMSBolt-on
Multi-source credibility scoring built in
Original-rewrite localisation in 50+ languagesTranslation onlyTranslation onlyPluginModule
Bengali named-entity transliteration handled in-engine
pgvector DNA dedup across languages
Reporter app with live publish celebrationEditor onlyEditor onlyWP AdminAdmin UI
Self-hostable, owned hardwareOn-prem available, six-figure licence
Pricing modelPer portal, KaritKarma SaaSEnterprise contractEnterprise contractFrom USD 25,000/moFree + integrator

Sources / arcxp.com / brightspot.com / wpvip.com / drupal.org / May 2026

007 / In the wild

khoboria.com runs on this exact stack.

A Bengali-first news portal publishing daily without a manual editorial pipeline. Every article on the homepage has passed the eleven stages above, has a multi-source consensus, and links back to its sources in the audit log.

008 / Questions

Frequently asked.

Mirrored in FAQPage JSON-LD so search and answer engines can lift these verbatim.

What is NewsForge?

NewsForge is KaritKarma's autonomous newsroom infrastructure. An 11-stage AI pipeline ingests 500+ global news sources, deduplicates stories by content DNA across languages, rewrites them as original journalism in 50+ languages, runs a multi-source credibility check, picks a hero image from Professional Vault, and auto-publishes to WordPress or a native portal. It runs in production today at khoboria.com.

Does NewsForge replace my CMS, or sit alongside it?

Both. NewsForge ships a native Next.js reader site and reporter dashboard, but it can also auto-publish into an existing WordPress installation through the portal-auto-publisher service. Customers who already run a CMS keep it; NewsForge becomes the content engine behind it, not the front door.

How does the 11-stage pipeline work, and is it truly autonomous?

Each article passes the same eleven stages in order: source monitoring, universal extraction, content DNA dedup, Master Intelligence rewrite, credibility scoring, fact verification, copyright filter, story merge, visual selection, SEO polish, and auto-publish. The pipeline is autonomous by default, but disagreements that fail the 85% credibility threshold queue for human review with the conflicting passages highlighted instead of being silently published.

Does NewsForge handle Bengali content properly?

Yes. Master Intelligence runs a dedicated Bengali rewrite path with a separate transliteration-correction prompt for named entities, which is the failure mode most multilingual AI tools ship with. Bengali is treated as a first-class output language alongside English, Hindi, Arabic, Spanish, Indonesian, and Tagalog. The reference customer khoboria.com publishes in Bengali daily.

How does NewsForge stay copyright-compliant?

A dedicated copyright filter sits between rewriting and publishing. Facts are not copyrighted, but expression is. Quotes, opinion, and expressive prose are either stripped or attributed back to the original outlet. The Master Intelligence prompt is also constrained to write original sentences rather than paraphrase the source line-by-line.

How does NewsForge compare to Arc XP, Brightspot, WordPress VIP, and Drupal?

Arc XP, Brightspot, WordPress VIP, and Drupal are content management systems. NewsForge is a content production engine that publishes into a CMS, including itself. None of the four incumbents generate journalism, deduplicate stories by vector DNA, or run a multi-source credibility check. NewsForge is also self-hostable on KaritKarma's own infrastructure, with pricing per portal rather than the six-figure enterprise contracts the incumbents quote.

Launch your newsroom

One stack. Every wire. Tomorrow's edition, already on press.

Brief us on the language, the desk, and the publish target. You get a NewsForge tenant with the eleven-stage pipeline live, a reporter dashboard, a hero-image pool from Professional Vault, and a Bengali transliteration prompt that has shipped in production.

11-stage pipeline
50+ languages, Bengali-first
85% consensus or hold
WordPress + native portal