Skip to content

Latest commit

 

History

History
94 lines (83 loc) · 2.63 KB

README.md

File metadata and controls

94 lines (83 loc) · 2.63 KB

pp_cmd_str_match

Postprocessing command str_match

Description

str_match finds strings or regular expression in columns. If several columns are passed then result dataframe include all matches for all columns

Arguments

  • vals - infinite keyword argument, where key is column name and value is string (character sequence or regular expression)

Usage example

Input dataframe:

        a          b           c
0   ab11c       dfag  bbbl33asdd
1     zdf  fffa22fff     znowaty
2  zd111f  fffa22fff   zno33waty

Row at index 2 has c value that starts with zno33:

query: readFile test.csv | str_match  c="zno33"
        a          b          c
2  zd111f  fffa22fff  zno33waty

Rows at index 0 and 2 has values matched by regular expression:

query: readFile test.csv | str_match a=".*11.*"
        a          b           c
0   ab11c       dfag  bbbl33asdd
2  zd111f  fffa22fff   zno33waty

Row at index 0 matched by first regular expression and row at index 2 matched by second:


query: readFile test.csv | str_match a=".*c", c="^zno3"
        a          b           c
0   ab11c       dfag  bbbl33asdd
2  zd111f  fffa22fff   zno33waty

Getting started

Installing

  1. Create virtual environment with post-processing sdk
    make dev

That command

  • downloads Miniconda
  • creates python virtual environment with postprocessing_sdk
  • creates link to current command in postprocessing pp_cmd directory
  1. Configure otl_v1 command. Example:
    vi ./venv/lib/python3.9/site-packages/postprocessing_sdk/pp_cmd/otl_v1/config.ini

Config example:

[spark]
base_address = http://localhost
username = admin
password = 12345678

[caching]
# 24 hours in seconds
login_cache_ttl = 86400
# Command syntax defaults
default_request_cache_ttl = 100
default_job_timeout = 100
  1. Configure storages for readFile and writeFile commands:
   vi ./venv/lib/python3.9/site-packages/postprocessing_sdk/pp_cmd/readFile/config.ini
   

Config example:

[storages]
lookups = /opt/otp/lookups
pp_shared = /opt/otp/shared_storage/persistent

Run str_match

Use pp to run str_match command:

pp
Storage directory is /tmp/pp_cmd_test/storage
Commmands directory is /tmp/pp_cmd_test/pp_cmd
query: | otl_v1 <# makeresults count=100 #> |  str_match 

Deploy

Unpack archive pp_cmd_str_match to postprocessing commands directory

Test

Use make test and all test will run in Docker container. Please turn the vpn on so all the OTL dependencies would download.