Inverted Index

A visualization of how full-text search engines organize and retrieve data

Step 1: Documents & Tokenization

Documents are broken down into individual tokens (words). Stop words and punctuation are typically removed.

Document 1

The quick brown fox jumps over the lazy dog. This is a classic pangram.

Tokenization Process
Raw tokens (14):
thequickbrownfoxjumpsoverthelazydogthisisaclassicpangram
After removing 5 stop words:
quickbrownfoxjumpsoverlazydogclassicpangram
Document 2

A journey of a thousand miles begins with a single step.

Tokenization Process
Raw tokens (11):
ajourneyofathousandmilesbeginswithasinglestep
After removing 5 stop words:
journeythousandmilesbeginssinglestep
Built with v0