Large Scale Information Storage and Retrieval: DS 4300 – Spring 2025

Notes and Handouts

Homeworks and Practicals

Homeworks

  • Homework 1 – Lists and Trees
    • Handout
      • EC Due Date: Jan 12 @ 11:59 pm
      • Regular Due Date: Jan 14 @ 11:59 pm
      • Submit to Gradescope
  • Homework 2 – AVL Trees, Hash Tables, and B+ Trees
    • Handout
    • EC Due Date: Mon Jan 27 @ 11:59 PM
    • Regular Due Date: W Jan 29 @ 11:59 PM
    • Submit to GradeScope
  • Homework 3 – MongoDB + PyMongo
    • Jupyter Notebook
    • EC Due Date: Sun Feb 16 @ 11:59 pm
    • Regular Due Date: Tues Feb 18 @ 11:59 pm
  • Homework 4 & Sample Exam Questions

Practicals

  • Practical 01 – Index Builder
    • EC Due Date: Last Commit before Feb 2, 2025 @ 11:59 pm
    • Regular Due Date: Last Commit before Feb 4, 2025 @ 11:59pm
    • Verification Dataset: > here <
      • This is a small collection of 18 JSON documents from the original data set modified to include additional specific words to confirm searching produces correct results.
      • You can also use this dataset to characterize the data structure organization (for each one your team is implementing) after indexing.
      • Search Set (from Final Deliverable Step 3: Northeastern, Beanpot, Husky
    • Template Analysis Report
  • Practical 02 – Vector DBs & LLMs
    • Project needs to be functional by March 24’s exam
    • Final deliverables are due April 2, 2025 @ 11:59pm
    • Slide Deck Template

Midterm

March 24, 2025 using RAG LLM as your cheat sheet.

Course Project

More information coming soon!

Additional References