Creating a File data set record for files on repositories

To enable a parallel load from multiple CSV or JSON files located in remote repositories or the local file system, create a File data set that references a repository. This feature enables remote files to function as data sources for Pega Platform data sets.

Before you begin: Create a File data set rule instance. See Creating a File data set rule.
  1. In the New tab, in the Data Source section, click Files on repositories.
  2. In the Connection section, select the source repository:
    • To select one of the predefined repositories, click the Repository configuration field, press the Down Arrow key, and choose a repository.
    • To create a repository, click Open to the right of the Repository Configuration field and perform Creating a repository for continuous development.
    To match multiple files in a folder, use an asterisk (*).
    For example: /folder/part-r-*
  3. In the File configuration section, in the File path field, enter the file location.
    Result: If the specified file exists, additional details appear in the Parser configuration section. Otherwise, configure the settings manually by performing 4.
  4. Optional: In the Parser configuration section, update the settings for the selected file:
    1. From the File type drop-down list, select the defined file type.
    2. Optional: If the file is compressed, select the File is compressed check box and choose the Compression type.
      The supported compression types are .zip and .gz gzip.
    3. For CSV files, specify if the file contains a header row by selecting the File contains header check box.
    4. For CSV files, in the Delimiter character list, select a character separating the fields in the selected file.
    5. For CSV files, in the Supported quotation marks list, select the quotation mark type used for string values in the selected file.
    6. In the Date Time format field, enter the pattern representing date and time stamps in the selected file.
      The default pattern is: yyyy-MM-dd HH:mm:ss
    7. In the Date format field, enter the pattern representing date stamps in the selected file.
      The default pattern is: yyyy-MM-dd
    8. In the Time Of Day format field, enter the pattern representing time stamps in the selected file.
      The default pattern is: HH:mm:ss
      Note: Time properties in the selected file can be in a different time zone than the one used by Pega Platform. To avoid confusion, specify the time zone in the time properties of the file, and use the appropriate pattern in the settings.
  5. Optional: Preview the file by clicking Preview file.
  6. For CSV files, in the Mapping tab, modify the number of mapped columns:
    • To add a CSV file column, click Add mapping.
    • To remove a CSV file column and the associated property mapping, click Delete mapping for the applicable row.

    For CSV files with a header row, the Column entry in a new mapping instance must match the column name in the file.

  7. For CSV files, in the Mapping tab, check and complete the mapping between the columns in the CSV file and the corresponding properties in Pega Platform:
    • To map an existing property to a CSV file column, in the Property column, press the Down Arrow and choose the applicable item from the list.
    • For CSV files with a header row, to automatically create properties that are not in Pega Platform and map them to CSV file columns, click Create missing properties. Confirm the additional mapping by clicking Create.
    • To manually create properties that are not in Pega Platform and map them to CSV file columns, in the Property column, enter a property name that matches the Column entry, click Open, and configure the new property. For more information, see Creating a property.

    For CSV files with a header row, the Column entry in a new mapping instance must match the column name in the file.

    For JSON files, the Mapping tab is empty, because the system automatically maps the fields, and no manual mapping is available.

  8. Confirm the new File data set configuration by clicking Save.
    Result: If CSV or JSON files are not valid, error messages display the reason for the error and a line number that identifies where the error is in the file.