Inverted Index
A visualization of how full-text search engines organize and retrieve data
Step 1: Documents & Tokenization
Documents are broken down into individual tokens (words). Stop words and punctuation are typically removed.
Document 1
The quick brown fox jumps over the lazy dog. This is a classic pangram.
Tokenization Process
Raw tokens (14):
thequickbrownfoxjumpsoverthelazydogthisisaclassicpangram
After removing 5 stop words:
quickbrownfoxjumpsoverlazydogclassicpangram
Document 2
A journey of a thousand miles begins with a single step.
Tokenization Process
Raw tokens (11):
ajourneyofathousandmilesbeginswithasinglestep
After removing 5 stop words:
journeythousandmilesbeginssinglestep