Skip to content

Github Action that analyze Text for PII Entities with Microsoft Presidio framework.

License

Notifications You must be signed in to change notification settings

insightsengineering/presidio-action

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Presidio Action

Github Action that analyzes text for PII entities with Microsoft's Presidio framework.

Author

Insights Engineering

Inputs

  • path:

    Description: Path to verify

    Required: false

    Default: "."

  • configuration-file:

    Description: Path to custom configuration file

    Required: false

    Default: "default"

  • configuration-data:

    Description: Configuration data as an inline YAML configuration

    Required: false

    Default: ""

  • output:

    Description: Format of output

    Required: false

    Default: "auto"

  • publish:

    Description: Publish result as a PR comment

    Required: false

    Default: "true"

  • upload:

    Description: Upload results as an artifact

    Required: false

    Default: "true"

  • presidio-cli-version:

    Description: Presidio CLI version

    Required: false

    Default: "latest"

  • lang-models:

    Description: List of additional language models to install

    Required: false

    Default: ""

  • only-changed-files:

    Description: Only run checks for changed files

    Required: false

    Default: false

Outputs

An output depends on the output parameter:

The default format is auto.

Available formats:

  • standard - standard output format
tests/conftest.py
  34:58     0.85     PERSON
  37:33     0.85     PERSON
  • github - similar to diff function in github
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
  • colored - standard output format but with colors

  • parsable - easy to parse automaticaly

{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
  • auto - default format, switches automatically between those 2 modes:
    • github, if run on github - environment variables GITHUB_ACTIONS and GITHUB_WORKFLOW are set
    • colored, otherwise

How it works

Presidio action uses presidio-cli based on presidio-analyzer from Microsoft Presidio framework to check code against undesirable types of data such as 'EMAIL_ADDRESS' or 'PHONE_NUMBER' inside application's code.

For more information please see a full list of supported entities.

Usage

Example usage:

---
name: Presidio check

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  presidio-action:
    runs-on: ubuntu-latest
    name: Presidio check

    steps:
      - name: Checkout Code
        uses: actions/checkout@v3
        with:
          # 0 fetch-depth is needed if you set `only-changed-files` to true
          # and if you are configuring this check to run on push events
          fetch-depth: 0

      - name: Produce the presidio report
        uses: insightsengineering/presidio-action@v1
        # all parameters below are optional
        with:
          # path to project.
          # if project does not have a specific 'my-project' path,
          # '.' - current folder is a default value
          path: "my-project"
          # configuration-file - path to file with specific configuration
          # or use one of predefined files:
          #   - default - `conf/default.yaml` file from action repository, check default list of entities
          #                and ignore content of `.git` folder
          #   - limited - `conf/limited.yaml` file from action repository, check only PERSON, EMAIL_ADDRESS and CREDIT_CARD
          #                and ignore `.git` folder and *.cfg files
          configuration-file: "my-project/conf/my-presidio-config.yaml"
          # configuration-data - content of configuration in raw yaml format.
          # Give possibility to prepare own configuration without adding file to project
          # any value in this field will block usage of configuration file
          configuration-data: |
            entities:
              - PERSON
            threshold: 0.9
          # output - specify one of output formats
          output: "parsable"
          # only-changed-files - only run the check for files that were changed
          # NOTE: You must set fetch-depth: 0 in the actions/checkout@v3 step
          # for push events while this paramater is set to true
          only-changed-files: true

Example of comment added to the PR:

Screenshot with PR comment example

About

Github Action that analyze Text for PII Entities with Microsoft Presidio framework.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published