Readings, Notes, & Schedule

We will read and discuss ~2 papers each week from major conferences and journals relevant to Internet-Scale Data Management. Our goal will be to develop an understanding of the current lay of the land, identify interesting new research avenues, and encourage and promote each student's research agenda.

This course schedule is subject to change. Background reading materials are marked with an asterisk [*]. You may find these background materials helpful. Through the first half of the semester, the paper presentations and in-class lectures will be tightly synchronized. In the second half of the semester, we will revisit some of the topics from earlier in the semester.

Class 1: (Aug 27) Class Overview and Administration [Notes: pdf]

Class 2: (Aug 29) What is Research? Web Basics [Notes: pdf]

Class 3: (Aug 31) Web Basics and Search [Notes: pdf]

Class 4: (Sept 3) Web Crawling and Link Analysis [Notes: pdf]


Class 5: (Sept 5) Retroactive Answering of Search Queries, WWW 2006, Paper presenter: James Caverlee

Class 6: (Sept 7) Automatic Identification of User Interest For Personalized Search, WWW 2006, Paper presenter: Jaime Perez Chung [Slides: ppt]

Class 7: (Sept 10) Social Networks, Social Media, and Web 2.0 [Notes: pdf]


Class 8: (Sept 12) Seeking Stable Clusters in the Blogosphere, VLDB 2007, Paper presenter: Shaik Moulaali [Slides: ppt]

Class 9: (Sept 14) Exploring Social Annotations for the Semantic Web, WWW 2006, Paper presenter: Sashikanth Damaraju [Slides: ppt]

Class 10: (Sept 17) Privacy and Digital Identity [Notes: pdf]


Class 11: (Sept 19) Privacy-Preserving Indexing of Documents on the Network, VLDB 2003, Paper presenter: Robert Graham [Slides: pdf]

Class 12: (Sept 21) You Are What You Say: Privacy Risks of Public Mentions, SIGIR 2006, Paper presenter: Dustin Talk [Slides: pdf]

Class 13: (Sept 24) Recommendation Systems and Collaborative Filtering [Notes: pdf]


Class 14: (Sept 26) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, TKDE 2005, Paper presenter: Brian Eoff [Slides: pdf]

Class 15: (Sept 28) Google News Personalization: Scalable Online Collaborative Filtering, WWW 2007, Paper presenter: Vijay Sirohi [Slides: ppt]


Class 16: (Oct 1) Spam and Trust [Notes: pdf]


Class 17: (Oct 3) Detecting Spam Web Pages through Content Analysis, WWW 2006, Paper presenter: Videsh Sadafal [Slides: ppt]

Class 18: (Oct 5) Visualizing Tags over Time, WWW 2006, Paper presenter: Jaya Palli [Slides: ppt]

Class 19: (Oct 8) Mining Social and Email Networks [Notes: pdf]


Class 20: (Oct 10) Summarizing Email Conversations with Clue Words, WWW 2007, Paper presenter: Chiao-fang Hsu [Slides: ppt]

Class 21: (Oct 12) 6-Minute Madness: Proposal Presentations

Class 22: (Oct 15) Communities from Seed Sets, WWW 2006, Paper presenter: Paul Davis [Slides: ppt]

Class 23: (Oct 17) Topics in Peer-to-Peer [Notes: pdf]


Class 24: (Oct 19) Routing Indices For Peer-to-Peer Systems, ICDCS 2002, Paper presenter: Ananda Man Shrestha [Slides: ppt]

Class 25: (Oct 22) Exploiting BitTorrent For Fun (But Not Profit), IPTPS 2006, Paper presenter: Keerthi Deconda [Slides: ppt]

Class 26: (Oct 24) Part 2: Social Networks, Social Media, and Web 2.0 [Notes: pdf]


Class 27: (Oct 26) Anti-Aliasing on the Web, WWW 2004, Paper presenter: Gazal Sahai [Slides: ppt]

Class 28: (Oct 29) Geographically Focused Collaborative Crawling, WWW 2006, Paper presenter: Megha Ulavapalle [Slides: pdf]

Class 29: (Oct 31) Part 2: Spam and Trust


Class 30: (Nov 2) A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs, WWW 2006, Paper presenter: James Caverlee [Slides: ppt]

Class 31: (Nov 5) Community Information Management


Class 32: (Nov 7) Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, WWW 2007, Paper presenter: James Caverlee [Notes: pdf] plus an example of EM [pdf]

Class 33: (Nov 9) Shilling recommender systems for fun and profit, WWW 2004, Paper presenter: Omar Alvarez [Slides: ppt]

Class 34: (Nov 12) Mobile and Location-Based Services: Intro and Spatial Alarms

Class 35: (Nov 14) Mobile and Location-Based Services: Privacy


Class 36: (Nov 16) Mobile and Location-Based Services: Positioning


Class 37: (Nov 19) Emerging Research Topics

Workshop Day 1 (Nov 21)

Thanksgiving Break: (Nov 23) NO CLASS

CLASS CANCELED (Nov 26)

Workshop Day 2 (Thursday, Nov 29) <-- Note that we are meeting on Thursday and not Wednesday

Workshop Day 3 (Nov 30)

Demos (Nov 30 - Dec 4)

Final Project Deliverable (Dec 4)