Inverted Index

A visualization of how full-text search engines organize and retrieve data

Step 1: Documents & Tokenization

Documents are broken down into individual tokens (words). Stop words and punctuation are typically removed.

Document 1

The quick brown fox jumps over the lazy dog. This is a classic pangram.

Tokenization Process

Raw tokens (14):

thequickbrownfoxjumpsoverthelazydogthisisaclassicpangram

After removing 5 stop words:

quickbrownfoxjumpsoverlazydogclassicpangram

Document 2

A journey of a thousand miles begins with a single step.

Tokenization Process

Raw tokens (11):

ajourneyofathousandmilesbeginswithasinglestep

After removing 5 stop words:

journeythousandmilesbeginssinglestep