Sarah Eisenach, John Bost, Jenny Zhong
Faculty Sponsor: Dr. Mendes
An inverted index is a system by which documents containing text are scanned and the frequency and location of the words it contains are stored into some database. This methodology is particularly useful for search engines, which can peruse a large number of webpages, create inverted indices for them, and then assign them a score for their relevance to the query, and return them in order of the score they were given. While this project can be done simply by merely creating a plain inverted index, in order for the search engine to be useful, numerous intermediate steps must be taken. These steps include removing “stop words” (e.g. “the”, “is”, “of”, etc.), reduction of words to their roots (e.g. “cats” -> “cat, “fighting” -> fight), and recognizing synonyms and related words (e.g. “awesome” -> “amazing”, “great”, “cool”, “good”, etc.). For our project, we implement an inverted index in the form of a front-end website in which users can query different words and/or phrases and retrieve a list of documents, arranged by order of relevance (similarity to query).