Course Overview
This course is an introduction to advanced research topics in Internet-scale data management. We will address a number of challenges and proposed solutions from a wide spectrum, including (i) large-scale distributed information management; (ii) data and text mining techniques and algorithms; and (iii) data privacy and security issues in large-scale Internet systems. We will cover the relevant theoretical foundations, methods, and tools. Some of the specific topics that may be covered include:
- Hot topics in web search and web-enabled databases (Deep Web)
- Peer-to-peer content discovery and resource management
- Mobile and location-based data management
- Association rules, classification, and clustering
- Recommendation systems and collaborative filtering
- Text mining applications on the Web, email, IM, and social networks
- Adversarial information retrieval
- Spam, deception, and phishing on the Web and online social networks
- Privacy-preserving data integration
The course is restricted to students with graduate standing. While there are no official pre-requisites, it may be beneficial for students to have had previous exposure to databases, information retrieval, and basic probability theory.
Class Format
We will meet three times a week (MWF) from 9:10am to 10:00 am in HRBB 126. During most weeks, I will present an overview of the week's topics on Monday. Students will then lead the discussion of specific papers on Wednesday and Friday.
Grading
Course grading is divided between class participation (30%) and a research project (70%). Class participation consists of in-class discussion and a paper presentation. The research project consists of a proposal, the final deliverable (code, report, etc.), and an in-class project presentation. The grading is as follows:
- In-class discussion 20%
- Paper presentation 10%
- Project proposal 15%
- Final project deliverable 35%
- Project presentation 20%
Class Participation
Each week, you will be expected to read ~2 papers and to participate in an in-class discussion of these papers. Before the in-class presentation of a paper, you are to email me between 2 and 4 interesting and probing questions about the paper. These questions will serve as the basis of discussion in class. Your emailed questions are due by midnight the day before the scheduled presentation. In your email subject, include the phrase CPSC 689 QUESTIONS
In addition to reading papers, you will be expected to lead the discussion in class of at least one paper. Your presentation should describe the problem setting and the solution presented in the paper. I will expect you to identify several strengths and weaknesses of the paper, and some ideas you may have developed in reading the paper. Although you will be the primary leader of the discussion I expect the rest of the class to actively participate in the discussion.
The Project
The course project is a great opportunity for you to develop your research skills and make an impact. You are free to work on any topic you choose (with my approval) so long as it is interesting, significant, and relevant to Internet-Scale Data Management. In principle, you can propose anything you wish, including implementation, benchmarking, evaluation, novel Internet applications, etc.
You may choose to work alone or in pairs. We can work together to identify a good project idea, but I encourage each of you to brainstorm on your own.
Americans with Disabilities Act (ADA) Policy Statement
The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact the Department of Student Life, Services for Students with Disabilities, in Cain Hall or call 845-1637.
Academic Integrity Statements
AGGIE HONOR CODE
''An Aggie does not lie, cheat, or steal or tolerate those who do.''
Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.
For additional information please visit: http://www.tamu.edu/aggiehonor/