Skip to main content

Generic Symbol table - Type inference plugins

Type inference and SAST

Most modern programming languages simplify development with abbreviations and minimization of boilerplate code. One such example of this simplification is the progression of type inference features on statically typed languages. Take for instance the following piece of Java code:

HashMap<User, List<String>> map = createUserHashMap();

As of Java 10, the var keyword was introduced, allowing the code to be more concise, as follows:

var map = createUserHashMap();

From a static code analysis perspective, this is problematic, because it becomes increasingly more difficult to filter object instances by their types. The general idea behind type inference is that the compiler can deduce the type based on its surrounding context. However, for this deduction compilers require the complete tree of dependencies, which is not always available.

Type inference plugins

To compensate for this gap in context, Checkmarx created a feature called type inference (TI) plugins. The way they work is by configuring the engine to scan source code for the single purpose of producing a persisted symbol table. Then, regular scans can load a collection of these tables and infer types that are defined outside the scope of their files but rather on dependencies that have been previously persisted. This way, scans produce more accurate results in a faster way, as opposed to scanning the source code of the entire application dependency tree.

How to operate

As of 9.4, each of the languages benefiting from this feature has its own configuration flag.

SERIALIZE_[LANGUAGE]_SYMBOL_TABLE_TO_BINARY_FILE Go, Java, Kotlin, Swift and Scala.

For example, for Java, execute the following query to update the CxEngine configurations in CxDB.

IF EXISTS (SELECT * FROM [Config].[CxEngineConfigurationKeysMeta] WHERE [KeyName] = N'SERIALIZE_JAVA_SYMBOL_TABLE_TO_BINARY_FILE' )
      BEGIN
      
          UPDATE [Config].[CxEngineConfigurationKeysMeta]
          SET [GroupId] = (SELECT Id FROM [Config].[CxEngineConfigurationKeyGroups] WHERE Name= N'Parsing')
          ,[KeyType] = N'bool'
          ,[DefaultValue] = N'true'
          ,[ValidationExpression] = N'^(?:tru|fals)e$'
          ,[IsDebug] = 'true'
          ,[Description] = N'Sets the engine into type inference plugin generation mode.'
          WHERE
          [KeyName] = N'SERIALIZE_JAVA_SYMBOL_TABLE_TO_BINARY_FILE'
        
      END
      ELSE
      BEGIN
      INSERT INTO [Config].[CxEngineConfigurationKeysMeta]
      (
      [GroupId]
      ,[KeyName]
      ,[KeyType]
      ,[DefaultValue]
      ,[ValidationExpression]
      ,[IsDebug]
      ,[Description])
      VALUES
      ((SELECT Id FROM [Config].[CxEngineConfigurationKeyGroups] WHERE Name= N'Parsing')
      ,N'SERIALIZE_JAVA_SYMBOL_TABLE_TO_BINARY_FILE'
      ,N'bool'
      ,N'true'
      ,N'^(?:tru|fals)e$'
      ,'true'
      ,N'Sets the engine into type inference plugin generation mode.')
      END
      GO 

Then perform a regular scan using the SAST Web Portal application. Although the Scan Report will indicate a failed status, the outcome can be observed in the following folder:

…\Checkmarx\Checkmarx Engine Service\Engine Server\TypeInference\Java

There will be a new *.bin file named after the previously scanned project. This means that this project’s symbol table will now be loaded every time Java sources are scanned.

Remember to revert the flag default value back to “false“ to enter a proper scan mode.

UPDATE [Config].[CxEngineConfigurationKeysMeta] 
SET [DefaultValue] = N'false'
WHERE [KeyName] = N'SERIALIZE_JAVA_SYMBOL_TABLE_TO_BINARY_FILE';

Example of using TI plugins

As an example, you want to scan module C (which depends on modules A and B) of some typical application, as shown in the following diagram:

6479577357.png

Since most core libraries of the languages mentioned previously are in the TI plugins by default, the blue part of the diagram is covered.

Assume that the application is split into modules and you can audit them individually. To audit module C, the TI plugin feature is particularly useful for adding the context needed for type inference, by first scanning modules A and B.

To do this, set the engine configuration flag, SERIALIZE_[LANGUAGE]_SYMBOL_TABLE_TO_BINARY_FILE to true and scan the sources of A and then B (if it’s an open source, since SAST works with source code only and not with binaries). A plugin file is created for each module.

Set the engine configuration flag,SERIALIZE_[LANGUAGE]_SYMBOL_TABLE_TO_BINARY_FILE to false. Scan module C. Now, since the plugin files for modules A and B provide the needed context, the DOM will contain more information about the types and therefore the query results will be more accurate than if the plugins were not used.