All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- To speed URL parsing, we no longer parse URLs with
userinfo "@"
in the authority (see URL syntax guide for more details)- Our reasoning is that userinfo is rarely present
- If you have concerns about this change or would like to see it added back in (it could be optionally enabled), please raise an issue
- URL boundary to better respect the conventions of human language regarding quotation marks and parentheses (#130)
- Update required version of ioc-fanger which fixes issues with non-http(s) URL schemes (#255)
- Poorly designed grammars which were SIGNIFICANTLY slowing down this project (#250)
- 🎉 This update improves mean run-times by ≈70%!
- Thanks to @ptmcg for his contribution!
- Removed duplicative function calls
- Possible breaking change: Update required pyparsing version to v3
- Although there are no public API changes associated with this version, this may be a breaking change if you are using ioc-finder and have pyparsing pinned to a version less than v3
- I've chosen to release this as a new minor version b/c I think requirement version updates w/ no API changes and no system requirement changes constitute a minor version change
- Updated parsing of Google Analytics Tracker IDs so that matched must be all lower-cased or all upper-cased (e.g.
ua-...
andUA-...
will be matched, butuA-...
will not) (this makes the parsing consistent with how Google Adsense Publisher IDs are parsed)
included_ioc_types
option to only parse specified IOC types (#218)
- Imphashes are no longer parsed as md5s even when
parse_imphashes
is False (#231) - Authentihashes are no longer parsed as sha256s even when
parse_authentihashes
is False (#231)
- Support for Python 3.10 (#188)
- ASN grammar improved reduce false positives by not matching on lower-case
"as "
(#136)
- Made all boolean arguments keyword-only arguments (#108)
- Converting data from lists to tuples (#110)
- Made
_prepare_text
function public (prepare_text
) (#114) - Renamed
no_urls_without_schemes
toparse_urls_without_scheme
(#109) - Moved from MIT License to GNU Lesser General Public License v3.0 (#113)
- Unquoting URLs appropriately (#104)
- Pinned specific ioc-fanger version (this prevents an error where ioc-fanger was removing a URL in the query parameter of another URL - see #104)
- Unquoting URLs appropriately (#104)
- Pinned specific ioc-fanger version (this prevents an error where ioc-fanger was removing a URL in the query parameter of another URL - see #104)
- Updating library such that CIDR ranges are not detected as URLs when
parse_urls_without_scheme=True
(see #91) - Parse observables from URL path when
parse_domain_from_url=False
andparse_from_url_path=True
(see #90)
- Improved word boundary (specifically of MAC address and IP address grammars)
- Concurrency (through the use of concurrent.futures)
- Added parsing Monero addresses (see #94)
- Simplifying
_remove_url_paths
(a function used behind the scenes by the ioc finder - see #70) - Created a function to update top level domains (see #10)
- Updating top level domains (which are used in grammars to find network observables)
- You can now ingest text using the cli. For example, this now works:
cat foo.text | ioc-finder
. - We now have 100% code coverage!!!
- Adding more keywords so this package is easier to find in pypi
- We are now parsing observables from URL paths by default (see #87). If you would like to disable this functionality, you may do so by setting the
parse_from_url_path
keyword argument toFalse
when calling thefind_iocs
function (e.g.parse_from_url_path=False
).
The change log was added for version 3.1.2