All notable changes to this project will be documented in this file.
Recognizer for Spanish Foreigners Identity Code (NIE Numero de Identificacion de Extranjeros).
Recognizer for Finnish Personal Identity Codes (Henkilötunnus).
2.2.353 - March 31st 2024
- Support 'M' prefix in SG_NRIC_FIN Recognizer and expand tests (#1304) (Thanks @miltonsim)
- Add Bech32 and Bech32m Bitcoin Address Validation in Crypto Recognizer and expand tests (#1307) (Thanks @miltonsim)
- Predefined pattern recognizer : IN_VEHICLE_REGISTRATION (#1288) (Thanks @devopam)
- Addition of leniency parameter in predefined PhoneRecognizer (#1311) (Thanks @VMD7)
- Add Singapore UEN Recognizer (#1315) (Thanks @miltonsim)
- Update spacy_stanza.md (#1325) (Thanks @AndreasThinks)
- Adding Span Marker Recognizer Sample (#1321) (Thanks @VMD7)
- Cache compiled regexes in analyzer (#1335) (Thanks @Edward-Upton)
- Added pseudonimyzation sample (#1296)
- Added tesseract to installation (#1312)
- Analysis builder improvements (#1295) (Thanks @ebotiab)
- Implement user-defined entity selection strategies in Presidio Structured (#1319) (Thanks @miltonsim)
- Fix for incorrectly referenced recognizer in analysis_explaination using PhoneRecognizer (#1330) *Thanks @egillv021)
- Fix bug where "bank" and "check" wouldn't work (#1333) (Thanks @usr-ein and @Samuel Prevost)
- Bugfix in tutorial (#1310)
- Changed default aggregation_strategy to max (#1342)
- Fixed wrong condition for dicom metadata (#1347)
2.2.353 - Feb 12th 2024
- Add predefined_recognizer: IN_AADHAAR (#1256)
- Added the option to add custom operators + pseudonymization sample (#1284)
- Fix failing test due to optional package (#1258)
- Update publish-to-pypi.yml (#1259)
- Allow local Spacy Models to be loaded in NLP Engine (#1269)
- Upgrade pip in windows containers (#1272)
- Bugfix in ImageAnalyzerEngine #1274
2.2.352 - Jan 22nd 2024
- Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. (#1192)
- Add PL PESEL recognizer (#1209)
- Azure AI language recognizer (#1228)
- Add_conf_to_package_data (#1243)
- Add keep operator as deanonymizer (#1255)
- Update anonymize_list type hints and document that sometimes items will be ignored. (#1252)
- Add Dockerfile for Windows containers (#1194)
- Drop WA driver license number (#1214)
- Change ner_model_configuration from list to map (#1222)
- Bugfix in SpacyRecognizer (#1221)
- Bugfix in NerModelConfiguration (#1230)
- Add_conf_to_package_data (#1243)
- Improved the logic of conflict handling in AnonymizerEngine (#1196)
- Change default score threshold in image redactor (#1210)
- fixes bug #1227 (#1231)
- Added missing dependencies for opencv-python and azure forms recognizer (#1257)
- Remove inclusive-lint step (#1207)
- Updates to demo website with new NLP Engine (#1181)
2.2.351 - Nov. 6th 2024
- Hotfix for NerModelConfiguration not created correctly (#1208)
2.2.350 - Nov. 2nd 2024
- Hotfix: default.yaml is not parsed correctly (#1202)
2.2.35 - Nov. 2nd 2024
- Put org in ignore as it has many FPs (#1200)
2.2.34 - Oct. 30th 2024
- New Predefined Recognizer: IN_PAN (#1100)
- Anonymizer - Pass bytes key to Encrypt / Decrypt (#1147)
- DICOM redactor improvement: Enabling more photometric interpretations (#1103)
- DICOM redactor improvement: Adding exceptions for when DICOM file does not have pixel data (#1104)
- Small reordering of kwargs as prereq for allow list functionality (#1110)
- DICOM redactor improvement: Preventing distortion when multiple sets of pixels are in one instance (#1109)
- DICOM redactor improvement: Enabling compatibility with compressed images (#1105)
- DICOM redactor improvement: Enable return of redacted bboxes (#1111)
- DICOM redactor improvement: Enable selection of redact approach (#1113)
- Enable toggle of printing output location after redacting from file (#1144)
- Changing test exception type check (#1148)
- Enabling allow list approach with all image redaction (#1145)
- Improve process names method in DICOM image redactor (#1150)
- Adding examples of toggling metadata usage and saving bboxes (#1158)
- Updating verification engines to include latest updates to redactor engines (#1162)
- Improved bbox processor (#1163)
- Updating verification engines and enable plotting of custom bboxes (#1164)
- Added image processing class to preprocess the image before running OCR (#1166)
- Added support for Microsoft's document intelligence OCR
- Refactored the
NlpEngine
and Ner recognizers (SpacyRecognizer
,TransformersRecognizer
,StanzaRecognizer
) to allow simpler integration of huggingface and transformers models (#1159). This includes:- Changes in how NER results flow through Presidio (see docs)
- NER/model definition is now defined using a conf file or a
NerModelConfiguration
object. - Integrated
spacy-huggingface-pipelines
for a more robust integration of huggingface models.
- As a result,
SpacyRecognizer
logic has changed, please see #1159. Some fields within the class are now deprecated. - Updated type checks (#1175)
- Enabled regex flags manipulation (#1193)
- Initial logic check for merging 2 entities (#1092)
- Fix Sphinx warning in OperatorConfig (#1143)
- Fix type mismatch in check_label_groups parameter in spacy_recognizer (#1130)
- anonymize_list return type hint fix (#1178)
- We no longer use Pipenv.lock. Locking happens as part of the CI. (#1152)
- Changed the ACR instance (#1089)
- Updated to Cred Scan V3 (#1154)
2.2.33 - June 1st 2023
- Added
keep
, an no-op anonymizer that allows preserving some types of PII while keeping track of its position in anonymized output. (#1062) - Added
BatchAnonymizerEngine
to complement theBatchAnalyzerEngine
for lists, and dicts (#993)
- Drop support for Python 3.7
- Add support for Python 3.11
- New demo app for Presidio, based on Streamlit (#1054)
- GPT based synthetic data generation (#1051)
2.2.32 - 25.01.2023
- Updated dependencies
- Fixed exception on whitespace in AU recognizers
- Updated API version for Text Analytics in sample
- Fixed merge entity from the same type
- Modified
ImagePiiVerifyEngine
to allow passing of kwargs - Updated template for building image redactor yaml
- Updated all image redactor engines and OCR classes to allow passing of an OCR confidence threshold and other OCR parameters
- Moved general bounding box operations to new class
BboxProcessor
- Updated
presidio-image-redactor
version from 0.0.45 to 0.0.46
- Added revised example for transformer recognizer
- Added evaluation code for the DICOM image redaction capabilities
- REST API to support web applications payload
- Updated documentation to include instructions on using DICOM evaluation code
- Updated documentation to mention OCR thresholding
2.2.31 - 14.12.2022
- Added DICOM image redaction capabilities (
DicomImageRedactorEngine
class and tests) - Updated
setup.py
to include new required packages for DICOM capabilities - Updated Pipfile and Pipfile.lock
- Updated
presidio-image-redactor
version from 0.0.44 to 0.0.45 - Updated the
ImagePiiVerifyEngine
class to allow use of custom analyzer engines
- Updated
NOTICE
to include licenses of added packages - Updated docs with getting started code for new
DicomImageRedactorEngine
2.2.30 - 25.10.2022
- Added Italian fiscal code recognizer
- Added Italian driver license recognizer
- Added Italian identity card recognizer
- Added Italian passport recognizer
- Added
TransformersNlpEngine
to support transformer based NER models within spaCy pipelines - Added pattern for next gen US passport in
presidio-analyzer/presidio_analyzer/predefined_recognizers/us_passport_recognizer.py
- Improved MEDICAL_LICENSE pattern and fixed checksum verification
- Bugfix for context handling by aligning results to recognizers using a unique identifier and not recognizer name
- Updated Pipfile.lock
- Removed constraint on empty texts
- Updated Pipfile.lock
- Updated
pipenv
version - Updated
black
andflake8
in pre-commit scripts - Updated docs for NLP engine
2.2.29 - 12.07.2022
- Added Presidio to OSSF (Open Source Security Foundation)
- Added CodeQL scanning
- Introduced BatchAnalyzerEngine
- Added allow-list functionality to ignore specific strings
- Added notebook on anonymizing known values
- Added sample for using
transformers
models in Presidio
- Bug fix for getting the text before anonymizing (microsoft#890)
- Deps update
2.2.28 - 04.05.2022
- Improved deny-list regex and customizability
- Added documentation for existing spaCy models
- Bugfix in analysis explanation scores
- PIL version updated to 9.0.1
- Recognizers can be loaded from YAML
2.2.27 - 08.03.2022
- Improved context mechanisms to support recognizer level context enhacenement and cross-entity context support
2.2.26 - 23.02.2022
Bug fix in context support
2.2.25 - 21.02.2022
- Added a URL recognizer
- Added a new capability for creating new logic for context detection. See ContextAwareEnhancer and LemmaContextAwareEnhancer. Documentation would be added on a future release.
Furthermore, it is now possible to pass context words thruogh the
analyze
method (or via API) and those would be taken into account for context enhancement.
- Bug fix for entities at the end of a sentence.
- Formatted (black/flake8) the Python examples.
- Removed the DOMAIN_NAME recognizer. This change means that the
DOMAIN_NAME
entity is no longer returned by Presidio.URL
would be returned instead, and would catch full addresses and not just domain names (https://www.microsoft.com/a/b.html
and not justwww.microsoft.com
)
2.2.24 - 23.01.2022
- Fixed issue when IBAN followed by all caps can't be recognized
- Updated dependencies in Pipfile.lock
- Removed official Python 3.6 support and added support for 3.10
- Added docs for creating a streamlit app
- Added docs for using Flair
2.2.23 - 16.11.2021
- Added multi-regional phone number recognizer.
- Fixed duplicated entities removal.
- Added sample for structured / semi-structured data in batch.
- Dependencies version bumps.
- Added sample for getting an identified entity value using a custom Operator.
- Changed packages/imports .
- Added repr to classes.
- Added encryption and decryption samples.
- Remove AnonymizerResult in favor of OperatorResult, for an easier anonymization-deanonymization.
- Anonymizaer and Deanonymizaer to return
operator_name
instead ofoperator
in OperatorResult.
2.2.2 - 09.06.2021
- Databricks based template in Azure Data Factory docs
- Adding ORGANIZATION recognizer docs
- Bumped pydantic from 1.7.3 to 1.7.4
- Updated call to stanza via spacy-stanza
- Added DATE_TIME recognizer
- Added Medical Licence recognizer
- Bumped spacy from 3.0.5 to 3.0.6
2.2.1 - 10.05.2021
- Create CODE_OF_CONDUCT
- ADF templates docs
- Fix spark sample to run presidio in broadcast
- Ad-hoc recognizers
- Text Analytics Integration Sample
- Documentation update and samples validation
- Adding tagger to the spaCy model pipeline
- Sample notebook for remote recognizer (using Text Analytics)
- Add matplotlib to image-redactor
- Added custom lambda anonymizer
- Added add pii_verify_engine to the image-redactor
Upgrade Analyzer spacy version to 3.0.5
- Request entity AnonymizerConfig renamed OperatorConfig
- In OperatorConfig: anonymizer_name -> operator_name
- Response entity AnonymizerResult renamed to EngineResult
- In EngineResult: List[AnonymizedEntity] -> List[OperatorResult]
- In OperatorResult:
- anonymizer -> operator
- anonymized_text -> text
- Response entity anonymizer renamed to operator.
- Response entity anonymizer_text renamed to text.
New endpoint for deanonymizing encrypted entities by the anonymizer.