What is Minigrep?
MiniGrep is a Rust-based command-line tool, with a (current) size of 588KB that lets users search files for a given query string and shows matching lines with their line numbers. In this post, we'll examine MiniGrep's implementation in detail, going over its architecture, features, and the methods it uses to search strings effectively.
Crate can be found in crates.io minigrep.
Usage
minigrep [query] [file path(s)] [-i]
query: The text string to search for within the file.
file path(s): The path(s) to the file(s) in which the search will be performed separated by a comma. You can also use "." to search within the current directory.
-i (Optional): Performs a case-insensitive search. If provided, the search will ignore the case of the query string.
Examples
Case-Sensitive Search
# Search for the word "hello" in the file "sample.txt"
minigrep hello sample.txt
# Search for the word "hello" in multiple files
minigrep hello sample.txt sample2.txt
# Search for the word "hello" in all files in the current directory
minigrep hello .
Case-Sensitive Search
# Search for the word "world" in the file "sample.txt" (case-insensitive)
minigrep world sample.txt -i
Output Redirection
# Redirect the output to a new file. Error messages are not captured in the output file.
minigrep world sample.txt > output.txt
How Does It Work?
The utility is structured into several components, each serving a specific purpose:
Command-line interface (main): Handles command-line arguments, parses configurations, and initiates the search process.
Search functions (search and search_case_insensitive): Implement string searching algorithms, providing efficient and accurate results.
Configuration (Config): Represents the user's input configuration, such as the query, file paths, and search mode.
For efficient and performant searches, MiniGrep implements the Boyer-Moore string searching method. This approach is especially useful for huge text datasets.
Time Complexity of the search algorithm, following Boyer-Moore:
- ◆ Worst Case: O(n * m), where n is the length of the contents and m is the length of the query.
- ◆ Best Case: O(m / n).
Note: If we did a naive search, BOTH the worst case and best case would be O(n * m)
Space Complexity:
- ◆ O(k), where k is the space used for storing the matches.
Multi-Threading
MiniGrep makes use of multi-threading to improve performance. Rust's std::thread::scope is used to execute each file search in parallel. This enables the search process to be efficiently parallelized.
Benchmark
In order to give you an idea of how MiniGrep performs, let's look at an example where there are 5 files totaling 15k lines. With multithreading and the Boyer-Moore algorithm, the search procedure takes an average of about 0.003 seconds!
Check out the repo here provided with a README.