Norconex Committer is an open-source Java library and command-line application responsible for routing, mapping, and saving crawled data into target data repositories. It serves as a vital component in an Extract, Transform, Load (ETL) pipeline, specifically acting as the final “load” stage for Norconex Crawlers (such as the Norconex Web Crawler or Filesystem Collector). Core Functionality
When a Norconex crawler gathers files, web pages, or metadata, the Committer takes over to handle the following operations:
Target Insertion: It pushes text, binary data, and extracted metadata directly into search engines, databases, or local files.
Document Management: It dynamically handles operations like adding new records, updating existing entries, or deleting documents that no longer exist at the source.
Data Mapping: It maps original document metadata and content fields into the format required by your specific target repository.
Queueing & Batching: It queues up documents locally to send them to target endpoints in optimized, configurable batches, reducing network overhead and improving performance. Available Committers
The framework is broken down into a Core Library (which provides baseline functionalities) and targeted Add-ons built for specific repositories: 1. Core Committers (Out-of-the-Box) Committers – Crawlers – Norconex