Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add functions to get cpu usage from /proc/stat #49

Merged
merged 24 commits into from
Sep 6, 2024
Merged

Conversation

jkool702
Copy link
Owner

@jkool702 jkool702 commented Aug 9, 2024

Summary by Sourcery

Add functions to compute CPU usage from /proc/stat and dynamically adjust coprocess count based on system load. Introduce support for reading input from a specified file descriptor. Enhance performance by using lseek for efficient data reading on x86_64 platforms. Update documentation to reflect these changes.

New Features:

  • Introduce a function to compute a smoothed average system CPU load using information from /proc/stat, allowing dynamic adjustment of coprocess count based on CPU usage and read queue status.
  • Add support for reading input from a file descriptor other than stdin using the new '-u' flag, enhancing input flexibility.

Enhancements:

  • Improve the efficiency of reading data from stdin by using a custom bash loadable builtin to call lseek on x86_64 platforms, significantly increasing the speed of processing lines.
  • Refactor the coprocess spawning logic to dynamically adjust the number of workers based on system load and queue depth, optimizing resource usage.

Documentation:

  • Update the README to reflect the new features in version 1.4, including dynamic coprocess spawning, file descriptor input, and performance improvements.

Copy link
Contributor

sourcery-ai bot commented Aug 9, 2024

Reviewer's Guide by Sourcery

This pull request adds functions to measure and report CPU usage by reading from /proc/stat, enabling the recording and calculation of average CPU load over time. It also introduces dynamic worker coproc spawning based on CPU usage and read queue length, adds the ability to read input from a file descriptor other than stdin, and implements a custom bash loadable builtin for more efficient stdin reading on x86_64 platforms.

File-Level Changes

Change Details Files
Added functions to measure and report CPU usage
  • Introduced load_tic function to set the start time for recording average CPU load
  • Introduced load_toc function to print the average CPU load since load_tic was last run
  • Implemented _forkrun_get_load function to compute smoothed average system CPU load
forkrun.bash
Implemented dynamic worker coproc spawning based on CPU usage and read queue length
  • Modified -j
-P flag to accept negative numbers for dynamic coproc count
  • Added logic to spawn new worker coprocs based on CPU load and read queue length
  • Introduced pLOAD variable to represent current average load level estimate
  • Added ability to read input from a file descriptor other than stdin
    • Introduced -u flag to specify input file descriptor
    • Modified input reading logic to use specified file descriptor
    forkrun.bash
    Implemented custom bash loadable builtin for efficient stdin reading on x86_64 platforms
    • Added logic to use lseek for efficient stdin reading on x86_64 platforms
    • Removed dd from required dependencies list when using NULL-delimited input
    forkrun.bash
    Updated documentation and benchmarking scripts
    • Updated README.md with new version information and feature descriptions
    • Modified hyperfine_benchmark script to accommodate new features and improve testing
    README.md
    hyperfine_benchmark/forkrun.speedtest.hyperfine.bash
    Made minor API changes and improvements
    • Changed -u flag for unescaping to -U or --UNESCAPE
    • Improved error handling and default value setting
    • Added new command-line options to the help text
    forkrun.bash

    Tips
    • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
    • Continue your discussion with Sourcery by replying directly to review comments.
    • You can change your review settings at any time by accessing your dashboard:
      • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
      • Change the review language;
    • You can always contact us if you have any questions or feedback.

    Copy link
    Contributor

    @sourcery-ai sourcery-ai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Hey @jkool702 - I've reviewed your changes - here's some feedback:

    Overall Comments:

    • Consider adding error handling for file operations and arithmetic calculations to improve robustness.
    • Minimize the use of global variables and consider passing state between functions to improve modularity and testability.
    • Be cautious of potential integer overflow in calculations, especially when dealing with cumulative values over time.
    Here's what I looked at during the review
    • 🟡 General issues: 4 issues found
    • 🟢 Security: all looks good
    • 🟢 Testing: all looks good
    • 🟢 Complexity: all looks good
    • 🟢 Documentation: all looks good

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

    forkrun.bash Outdated

    }

    load_toc() {
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion: Add error handling for potential failures when reading from /proc/stat

    Consider adding checks to ensure the read operation succeeds and provide appropriate error messages if it fails.

    Suggested change
    load_toc() {
    load_toc() {
    if ! [[ -r /proc/stat ]]; then
    echo "Error: Unable to read /proc/stat" >&2
    return 1
    fi

    forkrun.bash Outdated
    tLOAD1=$(( cpu_LOAD1 - cpu_LOAD0 ))
    tALL1=$(( cpu_ALL1 - cpu_ALL0 ))

    (( tALL00 > ( 10 * tALL1 ) )) && tALL00=$(( 10 * tALL1 ))
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    question: Clarify the purpose and logic behind the tALL00 update

    The reasoning behind multiplying by 10 and the overall purpose of this operation is not clear. Consider adding a comment explaining the logic or refining the approach if possible.

    forkrun.bash Outdated
    # 0 --> no load 1000000 --> 100% load
    # optional: pass argument to use a different number for 100% load

    local -i loadMaxVal="${1:-1000000}"
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion: Clarify the usage of loadMaxVal parameter

    The comment describes loadMaxVal as representing 100% load, but it's used as a scaling factor. Consider updating the comment or renaming the variable to better reflect its purpose.

    local -i loadScaleFactor="${1:-1000000}"
    

    forkrun.bash Outdated

    (( tALL00 > ( 10 * tALL1 ) )) && tALL00=$(( 10 * tALL1 ))

    pLOAD=$(( ( loadMaxVal * tLOAD1 ) / tALL1 ))
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion (bug_risk): Consider adding checks for potential integer overflow

    The arithmetic operations could potentially lead to integer overflow. Consider adding checks or using a different approach to handle large values safely.

    Suggested change
    pLOAD=$(( ( loadMaxVal * tLOAD1 ) / tALL1 ))
    if (( tALL1 != 0 )); then
    if (( loadMaxVal > (2**63-1) / tLOAD1 )); then
    pLOAD=$((2**63-1))
    else
    pLOAD=$(( ( loadMaxVal * tLOAD1 ) / tALL1 ))
    fi
    else
    pLOAD=0
    fi

    @jkool702
    Copy link
    Owner Author

    jkool702 commented Sep 6, 2024

    @sourcery-ai review

    Copy link
    Contributor

    @sourcery-ai sourcery-ai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Hey @jkool702 - I've reviewed your changes and they look great!

    Here's what I looked at during the review
    • 🟡 General issues: 4 issues found
    • 🟢 Security: all looks good
    • 🟢 Testing: all looks good
    • 🟡 Complexity: 1 issue found
    • 🟡 Documentation: 5 issues found

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.


    : "${pLOAD_max:=9500}" "${nProcsMax:=$((2*${nCPU}))}" "${nQueueLastCountGoal:=5}"

    mapfile -t pLOADA < <(_forkrun_get_load -i)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion (performance): Evaluate performance impact of _forkrun_get_load function

    The new _forkrun_get_load function adds complexity to the worker spawning logic. It's important to ensure this doesn't introduce any significant performance overhead. Consider profiling the script with and without this change to quantify its impact.

    Suggested change
    mapfile -t pLOADA < <(_forkrun_get_load -i)
    local pLOADA
    if command -v _forkrun_get_load_cached >/dev/null 2>&1; then
    pLOADA=($(_forkrun_get_load_cached -i))
    else
    mapfile -t pLOADA < <(_forkrun_get_load -i)
    fi

    forkrun.bash Outdated Show resolved Hide resolved

    export -fp _forkrun_get_load &>/dev/null && export -nf _forkrun_get_load

    _forkrun_get_load() (
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion (performance): Consider optimizing _forkrun_get_load function for efficiency

    The _forkrun_get_load function is called frequently to determine when to spawn new workers. Given its importance, it's crucial to ensure it's as efficient as possible. Consider reviewing the function for any potential optimizations, particularly in how it reads and processes data from /proc/stat.

    _forkrun_get_load() {
        local -a cpu_stats
        read -ra cpu_stats < /proc/stat
        awk -v idle="${cpu_stats[4]}" -v total="${cpu_stats[@]:1}" '
            BEGIN {printf "%.2f", (1 - idle / total) * 100}'
    }
    

    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    The efficiency of _forkrun_get_load is less critical than you might think. This doesnt even come into play until some initial worker coprocs are forked, and (since this is being called from a coproc itself) the only think that is "slowed down" by waiting for this is how rapidly additional coprocs are forked. Additionally, dynamically determining coproc count is more about reducing CPU usage than it is about reducing the runtime, so slowing down how rapidly now worker coprocs are forked is actually good for efficiency (reducing CPU usage) and helps ensure not too many are forked (since once they are forked they run until the end)

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thank you for the clarification, @jkool702. It's great to hear that the dynamic coprocess count adjustment is more focused on optimizing CPU usage rather than just speed. Your explanation makes sense, especially since the function is called from a coprocess and the goal is to manage resource usage efficiently. If the current implementation meets the performance and efficiency goals of the project, then it sounds like it's well-suited for your needs. However, if you ever notice any performance bottlenecks in the future, revisiting this function for potential optimizations could be beneficial. Let me know if there's anything else you'd like to discuss or if you need further assistance!

    @@ -72,7 +92,7 @@ C1[5]=' >/dev/null'
    mkdir -p "${hfdir0}"/file_lists

    nArgs=(1024 4096 16384 65536 262144 1048576)

    cksumAlgsA=(sha1sum sha256sum sha512sum sha224sum sha384sum md5sum "sum -s" "sum -r" cksum b2sum "cksum -a sm3" xxhsum "xxhsum -H3")
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion: Consider parameterizing the list of checksum algorithms

    To make the benchmark script more flexible, consider allowing the list of checksum algorithms to be passed as a parameter or read from a configuration file. This would make it easier to add or remove algorithms without modifying the script.

    # Read checksum algorithms from a config file
    cksumAlgsFile="checksum_algorithms.txt"
    mapfile -t cksumAlgsA < "$cksumAlgsFile"
    
    # Fallback to default algorithms if file is empty or doesn't exist
    if [ ${#cksumAlgsA[@]} -eq 0 ]; then
        cksumAlgsA=(sha1sum sha256sum sha512sum sha224sum sha384sum md5sum "sum -s" "sum -r" cksum b2sum "cksum -a sm3" xxhsum "xxhsum -H3")
    fi
    

    README.md Outdated Show resolved Hide resolved
    @@ -3,18 +3,23 @@
    `forkrun` is an *extremely* fast pure-bash function that leverages bash coprocs to efficiently run several commands simultaniously in parallel (i.e., it's a "loop parallelizer").

    `forkrun` is used in much the same way that `xargs` or `parallel` are, but is faster (see the `hyperfine_benchmark` subdirectory for benchmarks) while still being full-featured and only requires having a fairly recent `bash` version (4.0+) to run<sup>1</sup>. `forkrun`:
    * offers more features than `xargsd` and is mildly faster than it's fastest invocation (`forkrun` without any flags is functionally equivilant to `xargs -P $*(nproc) -d $'\n'`)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    suggestion (documentation): Fix typo: 'equivilant' should be 'equivalent'

    Suggested change
    * offers more features than `xargsd` and is mildly faster than it's fastest invocation (`forkrun` without any flags is functionally equivilant to `xargs -P $*(nproc) -d $'\n'`)
    * offers more features than `xargsd` and is mildly faster than it's fastest invocation (`forkrun` without any flags is functionally equivalent to `xargs -P $*(nproc) -d $'\n'`),

    README.md Outdated Show resolved Hide resolved
    README.md Outdated Show resolved Hide resolved
    README.md Outdated Show resolved Hide resolved
    : >"${tmpDir}"/.spawned

    (( ${verboseLevel} > 1 )) && printf '\n\n%s WORKER COPROCS FORKED\n\n' "${nProcs}" >&${fd_stderr}

    # setup dynamically coproc to spawn new workers based on read queue length
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    issue (complexity): Consider simplifying the dynamic worker spawning implementation to reduce code complexity and improve maintainability.

    While the introduction of dynamic worker spawning could potentially improve resource utilization, the current implementation adds significant complexity to the code. Consider the following suggestions to simplify and improve maintainability:

    1. Extract the load calculation logic into a separate, well-documented function or file. This will keep the main forkrun function cleaner and easier to understand.
    # In a separate file, e.g., load_calculator.sh
    calculate_system_load() {
        # Implementation of _forkrun_get_load goes here
    }
    
    # In the main forkrun file
    source load_calculator.sh
    
    # Use the function in pQueue coproc
    calculate_system_load "${pLOADA[@]}"
    1. Consider using a simpler heuristic for worker spawning. The current implementation might be overly complex for most use cases. For example:
    spawn_new_worker() {
        local current_workers=$1
        local queue_depth=$2
        local max_workers=$3
    
        if ((current_workers < max_workers && queue_depth > 1)); then
            return 0  # Spawn new worker
        else
            return 1  # Don't spawn
        fi
    }
    
    # Use in pQueue coproc
    if spawn_new_worker "${kkProcs}" "${nQueue}" "${nProcsMax}"; then
        source /proc/self/fd/0 <<<"${coprocSrcCode//'{<#>}'/"${kkProcs}"}"
        ((kkProcs++))
    fi
    1. Simplify the pQueue coproc by breaking it into smaller, more focused functions:
    handle_queue_input() {
        # Handle input from fd_nQueue
    }
    
    manage_worker_count() {
        # Logic for spawning/removing workers
    }
    
    # In pQueue coproc
    while true; do
        handle_queue_input
        manage_worker_count
    done

    These changes would maintain the new dynamic scaling functionality while significantly reducing the complexity and improving the readability of the code.

    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1st suggestion: Im trying to keep forkrun "self-contained" in a single file as much as possible. Also, sourcing the function to compute load when forkrun is sourced means it is already sourced when forkrun runs, helping to (slightly) reduce runtime.

    2nd suggestion: using the simpler spawning protocol doesnt work well in practice. The issue is that once a coproc is spawned it is active until the end...there is no "un-spawning" or removing it. So, we need to be more selective about when to spawn them, otherwise every time forkrun is run you will just end up spawning "$max_workers" worker coprocs...which sort of defeats the point of dynamically adjusting how many workers to spawn, since "always spawning $max_workers" isnt dynamic.

    3rd suggestion: for reasons related to "ensuring there arent any processes left running when the end condition is met and forkrun exits", it is (in my opinion) somewhat prefferable to spawn new coprocs directly from the forkrun function (or one of its coproc children) and not from another function.

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thanks for sharing your thoughts, @jkool702. I understand your perspective on keeping forkrun self-contained and the challenges with dynamically managing coprocesses. Here are some additional thoughts:

    1. Self-Contained File: It's perfectly reasonable to want to keep everything in a single file for ease of distribution and usage. If you decide to keep the load calculation within the same file, consider using a dedicated section or comments to clearly separate and document this logic. This can help maintain readability without splitting the code into multiple files.

    2. Dynamic Worker Spawning: Given the constraints of not being able to "un-spawn" workers, your approach to being selective about when to spawn them makes sense. It might be helpful to document this behavior clearly in the code comments or documentation, so future maintainers understand the rationale behind the complexity.

    3. Direct Coproc Management: Your reasoning for managing coprocesses directly within forkrun to ensure clean exits is valid. If breaking it into smaller functions isn't feasible, consider using inline comments to explain the flow and any critical logic. This can help others (or even yourself in the future) quickly grasp the purpose and flow of the code.

    Overall, it sounds like you've thought through these challenges carefully. If you have any specific parts of the code that you're still unsure about or need further feedback on, feel free to share!

    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    In a future update i might even add a stochastic element to spawning the coprocs. e.g., when all the existing conditions are met, get a random number and if it is above some cutoff spawn the coproc. The cutoff would change such that initially you were almost 100% guaranteed to get a random number above the cutoff, but as you get closer and closer to having "$max_workers" coprocs the chances of spawning another decrease more and more.

    Its surprisingly tricky to get the number of additional workers dynamically spawned to not always end up at either 0 or at $max_workers...it is really hard to find criteria that stabalize this process so you end up somewhere in the middle (after potentially 10's of thousands of iterations, if not more) in a way that makes sense for dynamically adjusting the number of coprocs...

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    That sounds like an interesting approach, @jkool702! Introducing a stochastic element could indeed help in achieving a more balanced distribution of coprocesses, avoiding the extremes of either 0 or max workers. It could add a layer of flexibility and adaptability to the system, especially in environments with fluctuating loads.

    When implementing this, you might want to consider:

    1. Tuning the Probability Function: Carefully design the probability function to ensure it adapts well to different load conditions. You might want to experiment with different functions or thresholds to find the right balance.

    2. Logging and Monitoring: Implement logging to track how often new workers are spawned and under what conditions. This can provide valuable insights into how well the stochastic approach is working and help in fine-tuning the parameters.

    3. Testing: Conduct thorough testing under various load scenarios to ensure that the stochastic element behaves as expected and doesn't introduce any unintended side effects.

    4. Documentation: Clearly document this new behavior, including the rationale and how it affects the spawning logic. This will be helpful for anyone else working with the code in the future.

    If you decide to pursue this, it could be a great enhancement to the dynamic worker management system. Feel free to reach out if you need any further feedback or assistance as you develop this feature!

    Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
    @jkool702
    Copy link
    Owner Author

    jkool702 commented Sep 6, 2024

    @sourcery-ai can you combine the various "typo correction" edits you just suggested into a single commit?

    Copy link
    Contributor

    sourcery-ai bot commented Sep 6, 2024

    I'm sorry, I don't understand the command @sourcery-ai can you combine the various "typo correction" edits you just suggested into a single commit?

    Please use @sourcery-ai review to request a review

    jkool702 and others added 5 commits September 6, 2024 15:02
    Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
    Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
    Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
    Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
    @jkool702 jkool702 merged commit ecc8619 into main Sep 6, 2024
    1 check failed
    @jkool702 jkool702 deleted the forkrun_testing branch September 6, 2024 19:32
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant