Collector Data Model#

The reverge collector uses a structured data model to represent all discovered network assets, vulnerabilities, and metadata. Every object produced by a scanning tool is serialized as a JSON record and sent to the reverge server for ingestion via the /import API endpoint.

Record Structure#

All data model objects share a common JSON envelope:

{
  "id": "<hex-uuid>",
  "type": "<record-type>",
  "parent": {
    "type": "<parent-record-type>",
    "id": "<parent-hex-uuid>"
  },
  "collection_tool_instance_id": "<tool-instance-id>",
  "data": { ... }
}

Field	Description
`id`	Unique hex-encoded UUID for the record
`type`	Lowercase class name identifying the record type (e.g., `host`, `port`, `vuln`)
`parent`	Optional reference to the parent record; `null` for root-level objects
`collection_tool_instance_id`	ID of the scheduled tool run that produced this record
`data`	Record-type-specific payload (see each type below)

Record Tags#

Records are internally tagged to indicate their origin and scope during processing. Tags are not included in the JSON sent to the server but affect how the collector filters and deduplicates records in memory.

Tag	Value	Description
`LOCAL`	`1`	Collected directly by a local scanner on the collector
`REMOTE`	`2`	Obtained from a remote API source (e.g., Shodan, IP THC)
`SCOPE`	`3`	Falls within the defined scan scope provided by the server

Record Types#

`host`#

Represents a network host (IP address). This is the root object for most data hierarchies.

Parent: none

{
  "id": "1a2b3c4d...",
  "type": "host",
  "parent": null,
  "collection_tool_instance_id": "abc123",
  "data": {
    "ipv4_addr": "192.168.1.100"
  }
}

Data Field	Type	Required	Description
`ipv4_addr`	string	yes*	IPv4 address of the host
`ipv6_addr`	string	yes*	IPv6 address (mutually exclusive with `ipv4_addr`)
`credential`	object	no	Credential reference if authenticated access is available (`{"credential_id": "<id>"}`)

`subnet`#

Represents a network subnet, typically used as a scan scope input.

Parent: none

{
  "id": "5e6f7a8b...",
  "type": "subnet",
  "parent": null,
  "collection_tool_instance_id": "abc123",
  "data": {
    "subnet": "10.0.0.0",
    "mask": "24"
  }
}

Data Field	Type	Required	Description
`subnet`	string	yes	Network address (e.g., `10.0.0.0`)
`mask`	string	yes	CIDR prefix length (e.g., `24`)

`port`#

Represents an open network port on a host. Children of host records.

Parent: host

{
  "id": "9c0d1e2f...",
  "type": "port",
  "parent": {
    "type": "host",
    "id": "1a2b3c4d..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "port": "443",
    "proto": 6,
    "secure": true
  }
}

Data Field	Type	Required	Description
`port`	string	yes	Port number as a string
`proto`	int	yes	IP protocol number (`6` = TCP, `17` = UDP)
`secure`	bool	no	Whether the port uses TLS/SSL (`true` = HTTPS, SMTPS, etc.)
`credential_id`	string	no	Credential ID used for authenticated access to this port

`domain`#

Represents a domain name resolved to or associated with a host. Children of host records.

Parent: host

{
  "id": "3a4b5c6d...",
  "type": "domain",
  "parent": {
    "type": "host",
    "id": "1a2b3c4d..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "name": "www.example.com"
  }
}

Data Field	Type	Required	Description
`name`	string	yes	The fully-qualified domain name
`credential_id`	string	no	Credential ID associated with this domain

`listitem`#

Represents a discovered web path or URI. Used as a deduplication key for HTTP endpoints. ListItem records are standalone (no parent) and are referenced by HttpEndpoint objects via their web_path_id field.

Parent: none

{
  "id": "7e8f9a0b...",
  "type": "listitem",
  "parent": null,
  "collection_tool_instance_id": "abc123",
  "data": {
    "path": "/admin/login",
    "path_hash": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
  }
}

Data Field	Type	Required	Description
`path`	string	yes	Web path URI (e.g., `/admin`, `/api/v1/users`). Defaults to `/` if null
`path_hash`	string	yes	SHA-1 hex digest of the path string, used for deduplication

`httpendpoint`#

Represents an HTTP endpoint (a specific path on a port's web service). Children of port records.

Parent: port

{
  "id": "b1c2d3e4...",
  "type": "httpendpoint",
  "parent": {
    "type": "port",
    "id": "9c0d1e2f..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "web_path_id": "7e8f9a0b..."
  }
}

Data Field	Type	Required	Description
`web_path_id`	string	yes	ID of the associated `ListItem` record representing the path

`httpendpointdata`#

Stores HTTP response metadata for an HTTP endpoint. Multiple httpendpointdata records can exist per endpoint (e.g., IP-based vs. domain-based responses). Children of httpendpoint records.

Parent: httpendpoint

{
  "id": "f5a6b7c8...",
  "type": "httpendpointdata",
  "parent": {
    "type": "httpendpoint",
    "id": "b1c2d3e4..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "title": "Admin Login Panel",
    "status": "200",
    "domain_id": "3a4b5c6d...",
    "screenshot_id": "d9e0f1a2...",
    "last_modified": "Thu, 01 Jan 2025 00:00:00 GMT",
    "fav_icon_hash": "-1234567890",
    "content_length": "4096"
  }
}

Data Field	Type	Required	Description
`title`	string	no	HTML `<title>` of the response page
`status`	string	no	HTTP response status code (e.g., `"200"`, `"301"`)
`domain_id`	string	no	ID of the `Domain` record if this response was fetched using a domain name (rather than IP)
`screenshot_id`	string	no	ID of the associated `Screenshot` record
`last_modified`	string	no	Value of the `Last-Modified` HTTP response header
`fav_icon_hash`	string	no	MMH3 hash of the favicon for fingerprinting (Shodan-compatible format)
`content_length`	string	no	Size of the response body in bytes

`screenshot`#

Stores a base64-encoded screenshot of a web page, captured by Pyshot or Webcap. Standalone records (no parent), referenced by httpendpointdata records via screenshot_id.

Parent: none

{
  "id": "d9e0f1a2...",
  "type": "screenshot",
  "parent": null,
  "collection_tool_instance_id": "abc123",
  "data": {
    "screenshot": "<base64-encoded-image-data>",
    "image_hash": "a1b2c3d4e5f6..."
  }
}

Data Field	Type	Required	Description
`screenshot`	string	no	Base64-encoded image data (JPEG or PNG)
`image_hash`	string	no	Hash of the image data used for deduplication

`webcomponent`#

Represents a detected web technology or software component (e.g., web server, framework, CMS). Children of port records.

Parent: port

{
  "id": "c3d4e5f6...",
  "type": "webcomponent",
  "parent": {
    "type": "port",
    "id": "9c0d1e2f..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "name": "nginx",
    "version": "1.18.0"
  }
}

Data Field	Type	Required	Description
`name`	string	yes	Name of the detected technology (e.g., `nginx`, `WordPress`, `Apache`)
`version`	string	no	Detected version string if available

`vuln`#

Represents a security vulnerability or finding detected on a port. Children of port records.

Parent: port

{
  "id": "e7f8a9b0...",
  "type": "vuln",
  "parent": {
    "type": "port",
    "id": "9c0d1e2f..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "name": "CVE-2021-44228",
    "vuln_details": "Log4Shell - JNDI injection vulnerability in Apache Log4j 2.x",
    "endpoint_id": "b1c2d3e4..."
  }
}

Data Field	Type	Required	Description
`name`	string	yes	Vulnerability identifier (CVE ID, template name, or descriptive label)
`vuln_details`	string	no	Extended description or evidence of the vulnerability
`endpoint_id`	string	no	ID of the `HttpEndpoint` where the vulnerability was found

`certificate`#

Represents an SSL/TLS certificate discovered on a port. Children of port records.

Parent: port

{
  "id": "1b2c3d4e...",
  "type": "certificate",
  "parent": {
    "type": "port",
    "id": "9c0d1e2f..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "issuer": "Let's Encrypt Authority X3",
    "issued": 1609459200,
    "expires": 1640995200,
    "fingerprint_hash": "AA:BB:CC:DD:EE:FF:00:11:22:33:44:55:66:77:88:99:AA:BB:CC:DD",
    "domain_id_list": ["3a4b5c6d...", "5e6f7a8b..."]
  }
}

Data Field	Type	Required	Description
`issuer`	string	yes	Certificate issuer common name or organization
`issued`	int	yes	Unix epoch timestamp when the certificate was issued
`expires`	int	yes	Unix epoch timestamp when the certificate expires
`fingerprint_hash`	string	yes	Certificate fingerprint (SHA-1 or SHA-256 hex string)
`domain_id_list`	array	yes	List of `Domain` record IDs for all Subject Alternative Names (SANs) on the certificate

`collectionmodule`#

Represents a named scanning module invocation, grouping the output from a specific tool module run. Children of a Tool record (parent.type will be the tool name). Used by tools like Nmap (NSE scripts), Netexec (protocol modules), and Metasploit (auxiliary modules).

Parent: tool (virtual parent referencing the tool instance)

{
  "id": "5f6a7b8c...",
  "type": "collectionmodule",
  "parent": {
    "type": "tool",
    "id": "<tool-instance-id>"
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "name": "smb-security-mode",
    "description": "Determines the message signing configuration of an SMB server",
    "args": ""
  }
}

Data Field	Type	Required	Description
`name`	string	yes	Module name (e.g., NSE script name, Netexec module name)
`description`	string	no	Human-readable description of the module
`args`	string	yes	Arguments passed to the module

`collectionmoduleoutput`#

Stores the raw text output produced by a CollectionModule for a specific port. Children of collectionmodule records.

Parent: collectionmodule

{
  "id": "9d0e1f2a...",
  "type": "collectionmoduleoutput",
  "parent": {
    "type": "collectionmodule",
    "id": "5f6a7b8c..."
  },
  "collection_tool_instance_id": "abc123",
  "data": {
    "output": "message signing: disabled",
    "port_id": "9c0d1e2f..."
  }
}

Data Field	Type	Required	Description
`output`	string	yes	Raw text output from the module for this target
`port_id`	string	yes	ID of the `Port` record this output is associated with

`credential`#

Represents a set of authentication credentials discovered or used during scanning.

Parent: none

{
  "id": "3b4c5d6e...",
  "type": "credential",
  "parent": null,
  "collection_tool_instance_id": "abc123",
  "data": {
    "username": "admin",
    "password": "Password123!",
    "privileged": false
  }
}

Data Field	Type	Required	Description
`username`	string	yes	Account username
`password`	string	yes	Account password
`privileged`	bool	no	Whether the credential has elevated/admin privileges (defaults to `false`)

Object Hierarchy#

The following diagram shows the parent-child relationships between data model objects:

Subnet (no parent)
Host (no parent)
├── Port
│   ├── WebComponent
│   ├── Vuln
│   ├── Certificate
│   │   └── → (references Domain records)
│   └── HttpEndpoint  ─── references → ListItem
│       └── HttpEndpointData
│           ├── → (references Domain)
│           └── → (references Screenshot)
└── Domain

ListItem (no parent)
Screenshot (no parent)
Credential (no parent)
CollectionModule (parent: Tool/virtual)
└── CollectionModuleOutput
    └── → (references Port)

Data Ingestion Flow#

A scanning tool runs and produces raw output (XML, JSON, stdout).
The tool's import_func parses the output and constructs data model objects.
Objects are serialized via .to_jsonable() and sent to the server's /import endpoint.
The server assigns database IDs and returns an orig_id → db_id mapping.
The collector updates all local records with the server-assigned IDs and writes the final JSON to a tool_import_json file.
The scan scope (ScanData) is updated with the new records so subsequent tools can use them as inputs.

Collector Data Model#

Record Structure#

Record Tags#

Record Types#

host#

subnet#

port#

domain#

listitem#

httpendpoint#

httpendpointdata#

screenshot#

webcomponent#

vuln#

certificate#

collectionmodule#

collectionmoduleoutput#

credential#

Object Hierarchy#

Data Ingestion Flow#

`host`#

`subnet`#

`port`#

`domain`#

`listitem`#

`httpendpoint`#

`httpendpointdata`#

`screenshot`#

`webcomponent`#

`vuln`#

`certificate`#

`collectionmodule`#

`collectionmoduleoutput`#

`credential`#