Skip to main content

Source Pulling Performance Improvement - Cloud/NAS

Overview

During source pulling processing, the source repository is cloned to a temporary CxSrc folder where the JobsManager processes it for data such as LOC (lines of code), exclusions, and deletions.

When the CxSrc is configured to use the NAS (Network Attached Storage) for HA (High Availability) environments or DR (Disaster Recovery) purposes, the cloning and JobsManager processes run directly on the NAS. Running these operations over the NAS consumes a large amount of network bandwidth and disk IOPS (input/output operations per second), and reduces performance (sometimes increasing processing time by a factor of four) and can become a bottleneck for systems using NAS storage.

For example, when using the AWS FSX service (the Windows File Servers provided by Amazon Web Services), you only receive 3k IOPS per 1TB (terabyte) of SSD storage. JobsManager can typically consume all of this storage when preparing multiple simultaneous scans. One way to obtain more disk IOPS is by over provisioning the storage, since the more TBs of storage you purchase, the more IOPS you also receive. However, over provisioning storage in FSX is expensive, increasing the cost by approximately a factor of four.

To reduce these bottlenecks and expenses, the new Source Pulling Performance Improvement feature provides an optimized way to execute the source control cloning and processing steps locally, instead of over the NAS.

Implementation

Cloning and processing of the repository is performed locally, in a temporary folder, named according to the ScanId. After the files have been processed, they will be copied to the NAS storage, and the local temporary files will be deleted.

The temporary directory naming convention is as follows:

<SourcePullingTemporaryPath>\<ScanId>.tmp\<Files..>

This feature is used for the following source pulling scenarios:

  • Source Control

    • TFS

    • SVN

    • Git

    • Perforce

  • Shared (Network Path)

  • Local (Zip Upload)

The following services are affected:

  • JobsManager

    • CxSourceAnalyzerManager.SourceControlManager.dll (Source Pulling)

  • SystemManager

    • Checkmarx.CxSystemManager.dll (Cleanup Job)

Configuration

The new behavior is configured with the new SourcePullingTemporaryPath key, located in the dbo.CxComponentConfiguration table. By default, the feature is disabled and the key is set to ‘’ (empty).

The maximum age of unused temporary sources is configured using the SourcePullingTemporaryFoldersMaxAge key , located in the dbo.CxComponentConfiguration table. The configured value is in minutes. All temporary folders older than the configured value will be removed. Minimum value is 12 hours, to allow for abnormally long cloning processes to finish.

Cleanup

Since the solution is based on a temporary path, the cloned files are removed from the temporary location after processing and copying is completed.

The temporary folder is also cleaned up periodically by SystemManager, to handle scenarios of unexpected termination of the process, as well as system crashes. The cleanup job will filter out any folder names that do not have a “.tmp” suffix, or are not within the maximum age range.

Logging and Errors

In case the cloning and processing fails on the mentioned temporary path, one of the following error messages are issued explaining the error:

  • Out of disk space (generic framework message)

  • Insufficient permissions (generic framework message)

Logs for the temporary folder usage (per scan) are also printed, as well as the process for copying files to the CxSrc location and temporary files cleanup.

Limitations

Non-cloud environments

Enabling this solution adds another I/O stage to the “Source Pulling” process, since files are copied to and later deleted from the configured temporary path. Due to this limitation, it is not recommended (and it is not beneficial) to enable this feature for non-cloud environments, in which the sources location (default C:\CxSrc) is already configured as a local path.

High Availability Scenario

Since multiple JobsManager service can run concurrently, its possible that one or more jobs might attempt to process the same scan, and might try to write the same file to the same location at the same time.

The feature handles such scenarios since the copying process is permissive and writes files one-by-one, while handling and skipping files which already exist in the CxSrc folder.

Backwards Compatibility

The new Source Pulling feature supersedes the older EnableUnzipLocalDrive feature, which is being maintained for backwards compatibility only.

In general, the new Source Pulling feature serves the same purpose but handles all supported source pulling providers, including zip files. Enabling the Source Pulling feature will ignore any other state of EnableUnzipLocalDrive feature.