In our project, we are building a large language model (LLM) that will work within a text editor to find semantically related tokens to any given token and highlight them based on their closeness to the target token.

  • Semantically related keywords are words or phrases conceptually linked to a main keyword or topic, enriching its context and meaning. They go beyond simple synonyms to include terms that are related in concept and help provide a more comprehensive understanding of the subject.
  • Essentially, we are using an ML model to find words that are conceptually connected and provide additional depth and context in relation to the key words in a text.

We plan to start with semantic search techniques like vector encoding and similarity. Then we will likely branch into data processing, natural language processing (NLP), and more advanced deep learning techniques. This could include using transformer models, word embeddings, tokenization, and potentially user interface design. By the end of the semester, we aim to have most of the backend up and running in some form. We would like to be able to take a token and find other semantically similar tokens. We will expand upon this and add a front-end component in the spring.

Project Leads