Skip to content

alibnaqvi/textSplitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text Splitter

This project contains a Python script that splits a large text file into smaller chunks, ensuring that no words are cut off in the middle. This can be useful for processing large documents or preparing data for tasks that require smaller text segments.

Features

  • Splits a large .txt file into smaller chunks based on a maximum character length.
  • Ensures words are not split between chunks.
  • Saves each chunk to a separate file in a specified output directory.

Requirements

  • Python 3 or above.
  • import os with no issues.

Usage

  1. Ensure you have Python installed on your system.
  2. Create a text file named input.txt in the same directory as the script or update the script to point to your text file.
  3. Run the script to split the text file into chunks.

Command Line

You can run the script from the command line as follows:

python text_splitter.py

Example

By default, the script looks for input.txt in the same directory and splits the text into chunks of up to 30,000 characters. The chunks are saved in the output_chunks directory.

You can modify the input.txt file and output_dir variables in the script to use different input and output paths.

if __name__ == "__main__":
    input_file = 'path/to/your/input.txt'  # Replace with your input file path
    output_dir = 'path/to/output_directory'  # Replace with your desired output directory
    split_text_file(input_file, output_dir)

Customization

You can customize the maximum length of each chunk by passing an additional parameter to the split_text_file function:

split_text_file(input_file, output_dir, max_length=50000)

Contributing

If you have any problems, suggestions, or improvements, feel free to submit a pull request or open an issue.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages