We will read and discuss ~2 papers each week from major conferences and journals relevant to Internet-Scale Data Management. Our goal will be to develop an understanding of the current lay of the land, identify interesting new research avenues, and encourage and promote each student's research agenda.
This course schedule is subject to change. Background reading materials are marked with an asterisk [*]. You may find these background materials helpful. Through the first half of the semester, the paper presentations and in-class lectures will be tightly synchronized. In the second half of the semester, we will revisit some of the topics from earlier in the semester.
Class 1: (Aug 27) Class Overview and Administration [Notes: pdf]
Class 2: (Aug 29) What is Research? Web Basics [Notes: pdf]
Class 3: (Aug 31) Web Basics and Search [Notes: pdf]
Class 4: (Sept 3)
Web Crawling and Link Analysis [Notes: pdf]
- [*] The Anatomy of a Large-Scale Hypertextual Web Search Engine
- [*] The PageRank Citation Ranking: Bringing Order to the Web
- [*] Wikipedia: Web Search Engine
- [*] Accelerated focused crawling through online relevance feedback
- [*] Deeper Inside PageRank
Class 5: (Sept 5) Retroactive Answering of Search Queries, WWW 2006, Paper presenter: James Caverlee
Class 6: (Sept 7) Automatic Identification of User Interest For Personalized Search, WWW 2006, Paper presenter: Jaime Perez Chung [Slides: ppt]
Class 7: (Sept 10) Social Networks, Social Media, and Web 2.0 [Notes: pdf]
- [*] Data Management Issues in Social Sciences
- [*] Paul Graham on Web 2.0
- [*] O'Reilly on Web 2.0
- [*] Wikipedia: Web 2.0
- [*] Wikipedia: Social Media
- [*] Wikipedia: Social Network
Class 8: (Sept 12) Seeking Stable Clusters in the Blogosphere, VLDB 2007, Paper presenter: Shaik Moulaali [Slides: ppt]
Class 9: (Sept 14) Exploring Social Annotations for the Semantic Web, WWW 2006, Paper presenter: Sashikanth Damaraju [Slides: ppt]
Class 10: (Sept 17) Privacy and Digital Identity [Notes: pdf]
- [*] Identity and Deception in the Virtual Community
- [*] Privacy and Databases Research @ Stanford
- [*] Data Privacy and Security
Class 11: (Sept 19) Privacy-Preserving Indexing of Documents on the Network, VLDB 2003, Paper presenter: Robert Graham [Slides: pdf]
Class 12: (Sept 21) You Are What You Say: Privacy Risks of Public Mentions, SIGIR 2006, Paper presenter: Dustin Talk [Slides: pdf]
Class 13: (Sept 24) Recommendation Systems and Collaborative Filtering [Notes: pdf]
- [*] Recommender Systems (Collection of Links)
- [*] GroupLens at UMinnesota
- [*] Wikipedia: Recommender System
- [*] Recommender Systems in E-Commerce
- [*] Evaluating Collaborative Filtering Recommender Systems
- [*] Wikipedia: Collaborative Filtering
Class 14: (Sept 26) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, TKDE 2005, Paper presenter: Brian Eoff [Slides: pdf]
Class 15: (Sept 28) Google News Personalization: Scalable Online Collaborative Filtering, WWW 2007, Paper presenter: Vijay Sirohi [Slides: ppt]
- Project Proposals Due by 11:59pm today
Class 16: (Oct 1) Spam and Trust [Notes: pdf]
- [*] Combating Web Spam with TrustRank
- [*] Wikipedia: Spam (electronic)
- [*] Wikipedia: Spamdexing
- [*] AIRWeb: Adversarial Information Retrieval on the Web
Class 17: (Oct 3) Detecting Spam Web Pages through Content Analysis, WWW 2006, Paper presenter: Videsh Sadafal [Slides: ppt]
Class 18: (Oct 5) Visualizing Tags over Time, WWW 2006, Paper presenter: Jaya Palli [Slides: ppt]
Class 19: (Oct 8) Mining Social and Email Networks [Notes: pdf]
- [*] Wikipedia: Text Mining
- [*] Enron Email Dataset
- [*] POLYPHONET: An Advanced Social Network Extraction System from the Web
Class 20: (Oct 10) Summarizing Email Conversations with Clue Words, WWW 2007, Paper presenter: Chiao-fang Hsu [Slides: ppt]
Class 21: (Oct 12) 6-Minute Madness: Proposal Presentations
Class 22: (Oct 15) Communities from Seed Sets, WWW 2006, Paper presenter: Paul Davis [Slides: ppt]
Class 23: (Oct 17) Topics in Peer-to-Peer [Notes: pdf]
- [*] Wikipedia: Peer-to-peer
- [*] Open Problems in Data-sharing Peer-to-peer Systems
- [*] Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
- [*] Improving Search in Peer-to-Peer Systems
Class 24: (Oct 19) Routing Indices For Peer-to-Peer Systems, ICDCS 2002, Paper presenter: Ananda Man Shrestha [Slides: ppt]
Class 25: (Oct 22) Exploiting BitTorrent For Fun (But Not Profit), IPTPS 2006, Paper presenter: Keerthi Deconda [Slides: ppt]
Class 26: (Oct 24) Part 2: Social Networks, Social Media, and Web 2.0 [Notes: pdf]
- [*] The Dynamics of Viral Marketing
- [*] Group Formation in Large Social Networks: Membership, Growth, and Evolution
Class 27: (Oct 26) Anti-Aliasing on the Web, WWW 2004, Paper presenter: Gazal Sahai [Slides: ppt]
Class 28: (Oct 29) Geographically Focused Collaborative Crawling, WWW 2006, Paper presenter: Megha Ulavapalle [Slides: pdf]
Class 29: (Oct 31) Part 2: Spam and Trust
Class 30: (Nov 2) A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs, WWW 2006, Paper presenter: James Caverlee [Slides: ppt]
Class 31: (Nov 5) Community Information Management
Class 32: (Nov 7) Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, WWW 2007, Paper presenter: James Caverlee [Notes: pdf] plus an example of EM [pdf]
Class 33: (Nov 9) Shilling recommender systems for fun and profit, WWW 2004, Paper presenter: Omar Alvarez [Slides: ppt]
Class 34: (Nov 12) Mobile and Location-Based Services: Intro and Spatial Alarms
Class 35: (Nov 14) Mobile and Location-Based Services: Privacy
- [*] Putting People in their Place: An Anonymous and Privacy-Sensitive Approach to Collecting Sensed Data in Location-Based ApplicationsImproving Wireless Positioning with Look-ahead Map-Matching
- [*] Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms
Class 36: (Nov 16) Mobile and Location-Based Services: Positioning
Class 37: (Nov 19) Emerging Research Topics
Workshop Day 1 (Nov 21)
Thanksgiving Break: (Nov 23) NO CLASS
CLASS CANCELED (Nov 26)
Workshop Day 2 (Thursday, Nov 29) <-- Note that we are meeting on Thursday and not Wednesday
Workshop Day 3 (Nov 30)
Demos (Nov 30 - Dec 4)
Final Project Deliverable (Dec 4)