Large Scale Information Storage and Retrieval: DS 4300 – Spring 2025

Notes and Handouts

Homeworks and Practicals

Homeworks

  • Homework 1 – Lists and Trees
    • Handout
      • EC Due Date: Jan 12 @ 11:59 pm
      • Regular Due Date: Jan 14 @ 11:59 pm
      • Submit to Gradescope
  • Homework 2 – AVL Trees, Hash Tables, and B+ Trees
    • Handout
    • EC Due Date: Mon Jan 27 @ 11:59 PM
    • Regular Due Date: W Jan 29 @ 11:59 PM
    • Submit to GradeScope
  • Homework 3 – MongoDB + PyMongo
    • Jupyter Notebook
    • EC Due Date: Sun Feb 16 @ 11:59 pm
    • Regular Due Date: Tues Feb 18 @ 11:59 pm

Practicals

  • Practical 01 – Index Builder
    • EC Due Date: Last Commit before Feb 2, 2025 @ 11:59 pm
    • Regular Due Date: Last Commit before Feb 4, 2025 @ 11:59pm
    • Verification Dataset: > here <
      • This is a small collection of 18 JSON documents from the original data set modified to include additional specific words to confirm searching produces correct results.
      • You can also use this dataset to characterize the data structure organization (for each one your team is implementing) after indexing.
      • Search Set (from Final Deliverable Step 3: Northeastern, Beanpot, Husky
    • Template Analysis Report
  • Practical 02 – Document DBs and Caching

Midterm

More information coming soon!

Course Project

More information coming soon!

Additional References