Skip to main content

Limitations and Recommendations

This page summarizes the scope, limitations, and recommendations for using CxArchive.

Scope and Limitations

This section summarizes the scope and known limitations of CxSAST archiving (using the CxArchive tool):

  • Data is de-normalized, therefore additional space is required when archiving it.

  • Scheduling is not possible, although it is possible to adjust the iteration intervals to control the execution times or use Windows Task scheduler.

  • Access control is not defined at this stage. Access is provided when running the service.

  • CxArchive scope is limited to exporting partial data only, importing data back into CxSAST is unsupported.

  • Data retention is a separate process and is therefore not a part of the CxArchive scope.

  • CxArchive access to Elasticsearch will be available using user and password authentication.

Affected Services

CxSAST: CxArchive consumes data from the CxSAST database, which might cause regular CxSAST operations to take longer while CxArchive is exporting data.

Recommendations

This section lists requirements and recommendations for hardware and configurations

Hardware Requirements and Recommendations

The following hardware configurations are required or recommended.

Minimum Requirements

  • 32 GB RAM

  • CPU 8 cores

Recommendations

The following is recommended.

2xDisk Space on the ELK Host

Data is de-normalized in the ELK database, which requires at least twice the disk space of the database to collect the data.

2xDisk Space on Service machine

Exporting creates large amounts of data on the service file system and the ELK database, which requires at least twice the disk space of the database to collect the data.

Software Requirements and Recommendations

The following software configurations are required or recommended.

ELK Virtual Machine (VM)

It is highly recommended to run the Virtual machine under Linux (CentOS or Ubuntu) as Linux performs faster and consumes less memory. Information and inputs on using the Virtual Machine on Windows is available at the following site:

https://www.reddit.com/r/elasticsearch/comments/9i4tdh/elk_stack_on_windows/

Distributed Installations versus Centralized Installations (AIO)

The CxArchive Service should be installed on the same host as the database and/or SourceFiles to ensure faster connections. If this is not possible, the export of data takes longer.

Configuration Recommendations

Tackle small batches

Ideally, to avoid constraints during intense activity, archiving should be executed for small batches (100 000 scans) and during down time. So if there’s 3 years of data to backup, archive batches of 6 months on the first runs.

Avoid Large Documents for ELK

Elasticsearch is designed as a search engine and benefits from small data sets/documents. it is recommended to use a low number of documents per exported scan (NumberOfResultsPerDocument<5).

For additional information, refer to

https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html

Lower the Number of Concurrent Tasks

If scans/projects share the same source folder, keep NumberOfScansToProcessInParallel to a minimum as multiple concurrent access attempts may cause failure and you may have to repeat the export.

Keeping the Interval Long Enough

The service runs in a loop with intervals in which it remains idle for ExportIntervalInMinutes minutes in order to implement a backoff-retry system. If an export attempt fails due to lack of access to a source folder or too many requests on ELK, the system attempts the export again in the next iteration. This interval grants the system time to recover. It is recommended to set it to at least ExportIntervalInMinutes = 60.