Distributed Systems Course (fall 2010/2011)

Lecturer: Konrad Iwanicki
Assistants: none
Lectures: Wednesday, 2:15 PM - 3:45 PM, Room 2070
Excercise classes: Wednesday, 4:15 PM - 5:45 PM, Room 2100
Midterm exam: Wednesday, December 1, 2:15 PM - 5:45 PM, Room 2070
Final exam: Thursday, February 3, 1:30 PM - 5:00 PM, Room 2070

This pilot course consists of two components: lectures and excercise classes. The lectures cover the principles, advanced concepts, and technologies of distributed systems, including communication, replication, fault tolerance, and security. The excercise part, in turn, consists of a series of student presentations of an educational value and recent papers from top systems conferences and journals. The course is recommended for graduate students attending the distributed systems seminar and other students interested in computer systems. The course is given in English.

Passing Rules

To pass the course, a student has to score at least 60 out of a total of 100+ points. The points can be scored for:

  • a presentation during the excercise classes: up to 50 points
  • questions asked during other people's presentations: up to 1 point per question
  • a written exam at the end of the semester: up to 50 points
The final grade is calculated as follows:
Points 0-51 52-59 60-67 68-75 76-83 84-91 92-...
Grade 2 (fail) 2+ (fail) 3 3+ 4 4.5 5

Presentation Rules

A presentation has to be given in English. The strict time limit of a single talk is 30 minutes; the presenting student will be interrupted after this period. During the talk, other students discouraged from asking questions. After the talk, there is a 15-minute questions-and-answers session, during which the presenter answers question posed by the lecturer and other students. The objective of the questions could be, for instance, to clarify some aspects of the paper or to learn the presenter's opinion on a problem related to the paper.

During his/her presentation of a paper, a student is obliged to display PowerPoint/PDF slides for the paper. They have to be in English. The student has to prepare the slides on his/her own. If some slides for the paper already exist on the Internet, the concents of those slides can be re-used by the student preparing his/her own slides only if re-using the contents does not violate any copyrights, especially when the student's presentation is made available online. Moreover, the student has to acknowledge using somebody else's slides.

Tips:

  • Read your paper well in advance to understand it and to later be able to answer other students' questions.
  • Practice your talk to fit in the 30-minute time limit.
  • Try to briefly go over the related work cited in the paper as this can give you some valuable input on the problem the paper is solving.
  • Try to find any follow-ups on the paper because this can be rewarding as well.
  • Ask the presenter questions that, rather than proving the presenter doesn't know something, lead to interesting discussions. You are not awarded points for mean or stupid questions.
  • If you have read and understood the presented paper, and if you have practiced your talk, relax during your presentation: you will surely be able to answer all questions.

Exam Rules

The exact rules will be given later. In summary, the exam will cover the lecture topics as well as some issues raised during the student presentations.

Lecture Topics and Schedule

Since this is the pilot edition of the course, this year's lectures will be given based on a book by my PhD adviser and the head of my former research group: Maarten van Steen and Andrew S. Tanenbaum, “Distributed Systems: Principles and Paradigms,” Second Edition, Prentice Hall, 2007, 702 pages, ISBN 9780132392273. Purchasing the book is not necessary as the lecture slides will be available here.

Date Topics Slides
October 6, 2010 Introduction:
goals of distributed systems, common types of distributed systems
lecture 01
October 13, 2010 Architectures:
architectural styles, system architectures, self-management
lecture 02
October 20, 2010 Processes:
threads, virtualization, clients & servers, code migration
lecture 03
October 27, 2010 Communication:
fundamentals, remote procedure call, message-oriented communication, stream-oriented communication, multicast communication
lecture 04-05
November 3, 2010
November 10, 2010 Naming:
basic terms and definitions, flat naming, structured naming, attribute-based naming
lecture 06-07
November 17, 2010
November 24, 2010 Synchronization (1):
clock synchronization, logical clocks, totally-ordered multicast, causally-ordered multicast
lecture 08-09
December 1, 2010 Midterm Exam  
December 8, 2010 Synchronization (2):
mutual exclusion, global positioning of nodes, leader election
lecture 08-09
December 15, 2010 Consistency and Replication:
data-centric and client-centric consistency models, replica management, consistency protocols
lecture 10
January 5, 2011 Lecture cancelled  
January 12, 2011 Fault Tolerance:
failure models, failure masking, agreement in faulty systems, failure detection, reliable client-server communication, distributed commit, recovery (NO atomic multicast for the exam)
lecture 11
January 19, 2011 Two presentations instead of the lecture  

Presentation Topics

The papers to be presented by students during the practical work classes are listed below. They have been chosen based on their age and educational value (in the prejudiced opinion of the lecturer). The papers are available for download from the university computers. Should you have problems accessing them, please contact the lecturer.

  1. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan: “FAWN: A Fast Array of Wimpy Nodes,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009.
  2. C. Cadar, D. Dunbar, and D. Engler: “KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs,” in Proceedings USENIX OSDI 2008, San Diego, CA, USA, December 2008.
  3. Y. Chen, O. Gnawali, M. Kazandjieva, P. Levis, J. Regehr: “Surviving Sensor Network Software Faults,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009.
  4. A. Clement, M. Kapritsos, S. Lee, Y. Wang, L. Alvisi, M. Dahlin, and T. Riche: “UpRight Cluster Services,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009.
  5. A. Clement, M. Marchetti, E. Wong, L. Alvisi, and M. Dahlin: “Making Byzantine Fault-Tolerant Systems Tolerate Byzantine Faults,” in Proceedings USENIX NSDI 2009, Boston, MA, USA, October 2009.
  6. J. Dean and S. Ghemawat: “MapReduce: A Flexible Data Processing Tool,” Communications of the ACM, vol. 53, no. 1, January 2010.
  7. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels: “Dynamo: Amazon's highly available key-value store,” in Proceedings SOSP 2007, Stevenson, WA, USA, October 2007.
  8. M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, S. Ratnasamy: “RouteBricks: Exploiting Parallelism to Scale Software Routers,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009.
  9. R. Fonseca, P. Dutta, P. Levis, and I. Stoica: “Quanto: Tracking Energy in Networked Embedded Systems,” in Proceedings USENIX OSDI 2008, San Diego, CA, USA, December 2008.
  10. S. Ghemawat, H. Gobioff, and S.-T. Leung: “The Google file system,” in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003.
  11. S. Gilbert and N. Lynch: “Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services,” ACM SIGACT News, vol. 33, no. 2, June 2002.
  12. R. Gummadi, H. Balakrishnan, P. Maniatis, and S. Ratnasamy: “Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks,” in Proceedings USENIX NSDI 2009, Boston, MA, USA, October 2009.
  13. D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat: “Difference Engine: Harnessing Memory Redundancy in Virtual Machines,” in Proceedings USENIX OSDI 2008, San Diego, CA, USA, December 2008.
  14. B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, and S. Banerjee, and Nick McKeown: “ElasticTree: Saving Energy in Data Center Networks,” in Proceedings USENIX NSDI 2010, San Jose, CA, USA, April 2010.
  15. J. Hill, R. Szewcyk, A. Woo, D. Culler, S. Hollar, K. Pister: “System Architecture Directions for Networked Sensors,” in Proceedings ACM ASPLOS 2000, Cambridge, MA, USA, November 2000.
  16. J. W. Hui and D. Culler: “IP is dead, long live IP for wireless sensor networks,” in Proceedings ACM SenSys 2008, Raleigh, NC, USA, November 2008.
  17. M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. van Steen: “Gossip-based peer sampling,” ACM Transactions on Computer Systems, vol. 25, no. 3, August 2007, article no. 8.
  18. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, S. Winwood: “seL4: Formal Verification of an OS Kernel,” in Proceedings ACM SOSP 2009, Big Sky, MT, USA, October 2009.
  19. K. Klues, V. Handziski, C. Lu, A. Wolisz, D. Culler, D. Gay, and P. Levis: “Integrating Concurrency Control and Energy Management in Device Drivers ,” in Proceedings SOSP 2007, Stevenson, WA, USA, October 2007.
  20. L. Lamport: “The part-time parliament,” ACM Transactions on Computer Systems, vol. 16, no. 2, May 1998.
  21. H. C. Li, A. Clement, E. Wong, J. Napper, L. Alvisi, and M. Dahlin: “BAR Gossip,” in Proceedings USENIX OSDI 2006, Seattle, WA, USA, November 2006.
  22. J. Mudigonda, P. Yalagandula, M. Al-Fares, J. C. Mogul: “SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies,” in Proceedings USENIX NSDI 2010, San Jose, CA, USA, April 2010.
  23. R. van Renesse, K. P. Birman, and W. Vogels: “Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining,” ACM Transactions on Computer Systems, vol. 21, no. 2, May 2003.
  24. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey: “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language,” in Proceedings USENIX OSDI 2008, San Diego, CA, USA, December 2008.
  25. Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, E. Gillum: “BotGraph: Large Scale Spamming Botnet Detection,” in Proceedings USENIX NSDI 2009, Boston, MA, USA, October 2009.
  26. D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel: “Finding a Needle in Haystack: Facebook's Photo Storage,” in Proceedings USENIX OSDI 2010, Vancouver, Canada, October 2010.
  27. P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish: “Depot: Clould Storage with Minimal Trust,” in Proceedings USENIX OSDI 2010, Vancouver, Canada, October 2010.
For his/her own presentation, a student can also propose a paper not present in the above list. Such a paper, however, has to be accepted earlier by the lecturer.

Presentation Schedule

The schedule of the student presentation is as follows:

Date Presenter(s) Theme Paper(s)
October 6, 2010 K. Iwanicki Introduction 23: “Astrolabe: ...”
October 13, 2010 J. Swietlicka
M. Walas
The Big Ones 10: “The Google file system”
7: “Dynamo: Amazon's...”
October 20, 2010 M. Fedoryszak
A. Jurkowski
Gossiping: Theory and Practice 17: “Gossip-based peer sampling”
21: “BAR Gossip”
October 27, 2010 M. Jastrzebski
P. Horban
Sensornets: Past & Present 15: System architecture directions ...”
16: “IP is dead ...”
November 3, 2010 P. Nowojski
P. Swigon
BotNets 25: “BotGraph: ...”
12: “Not-a-bot: ...”
November 10, 2010 A. Panasiuk
K. Kanski
A Little Bit of Theory 11: “Brewer's conjecture ...”
20: “The part-time parliament”
November 17, 2010 M. Mieten
L. Zubkowicz
OS Energy Management 19: “Integrating concurrency”
9: “Quanto: ...”
November 24, 2010 M. Adamczyk
M. Murasik?
Parallel Programming 6: “MapReduce: ...”
24: “DryadLINQ: ...”
December 1, 2010 Midterm Exam
December 8, 2010 H. Jaworski
K. Ruszczyk
Byzantine Failures 5: “Making Byzantine ...”
4: “UpRight ...”
December 15, 2010 M. Sapota
M. Roj
K. Goluchowski
Software Verification and Performance 18: “seL4: ...”
2: “KLEE: ...”
13: “Difference Engine: ...”
January 5, 2011 Presentations moved to January 12th
January 12, 2011 P. Bedynski
J. Migdal
T. Dubrownik
More on Routing and Tiny Devices 8: “RouteBricks: ...”
3: “Surviving Sensor Network ...”
1: “FAWN: ...”
January 19, 2011 M. Derezinski
R. Pudelkiewicz
M. Smolenski
K. Strzelecki
Data Centers
OSDI 2010
14: “ElasticTree: ...”
22: “SPAIN: ...”
27: “Depot: ...”
26: “Finding a Needle ...”

Past Exams

Below, you can find the questions from past exams:

Year Exam Set Participants Points
Course Exam % Available Min Avg Med Max
2010/2011 Part II 26 21 80.8 25 3.75 16.27 13.5 24.25
2010/2011 Late Part I 26 11 42.3 25 13.75 21.6 21.25 24.75
2010/2011 Early Part I 26 17 65.4 25 9.25 14.9 13.5 22