Feb 27, 2018
A reading list for a week at MIT
Something unexpected has happened with my classes this semester. Up until this point, my homework has been pretty predictable: read some lecture notes, do a pset, repeat.
This semester, none of my classes have regular psets, but almost all of my classes have assigned reading!
Recently a big chunk of my time on weekday nights has been dedicated to studying scientific papers, pieces of literature, or system specifications. Even though a lot of it is technical, it’s a really sharp break from the math-and-coding heaving schedules I’m used to. In a way, it’s really nice—I’m being regularly exposed to challenging and cutting edge ideas and being asked to reason about them.
So I thought I’d take the opportunity of doing something I never thought I’d do at MIT—giving you an intro to my classes, literary style.
6.835: Intelligent Multimodal User Interfaces
This is a graduate-level UI class about designing systems that communicate through multiple modalities—speech, key input, gesture, drawing, etc.
The class is mainly about two things: how to make interfaces that are intuitive and easy to use, and how to develop technology that makes that possible.
Before every class meeting we read three papers about multimodal input detection or systems that use it. For instance, the second week, we focused on sketch interpretation. Recently we’ve been talking about body pose and gesture recognition.
The first day of class was more of a survey of UI principals than an investigation of a specific modality. I actually found the reading pretty interesting, because it took a technical look at evaluating user interfaces. We read a couple of papers published in 1990 (!) by researchers at the Technical University of Denmark, both having to do with the question of how to evaluate and troubleshoot user interfaces. The one that stood out to me was called "Heuristic Evaluation of User Interfaces". In the paper, they gave their test subjects a specification for some type of UI, and the subjects were asked to identify as many flaws as possible in the design. Their responses were compared against a master list compiled by the authors. There were two takeaways from the paper that made it memorable:
- People are bad at this. The average test subject found between 20 and 50 percent of known UI problems. This means that if you are the person sitting behind the keyboard troubleshooting your interface, you are peacefully oblivious to at least half of the problems with it.
- Groups of people are pretty good at this. The authors graphed the number of problems found as a function of how many people reviewed the UI:
From "Heuristic Evaluation of User Interfaces" by Jakob Nielson and Rolf Molich
Pooling people’s feedback allowed the subjects to locate between 80 and 100% of UI problems.
Basically, this paper gives a scientific justification for well-planned user studies! It’s practically impossible for one person to optimize a UI, but a sufficiently large group of casual users can do pretty well. This type of evaluation is both meaningful and necessary.
A more technical example of a paper we read is the Microsoft Research paper that explains how they recognize people’s body positions using a Kinect. This is basically the algorithm that makes the Kinect work, and based on the frequency with which this resource comes up in class discussions and final project proposals, it seems like this is one of the best, widely available methods for pose detection.
The algorithm consumes depth information from the Kinect (no color information!) and is able to identify parts of the user’s body. It does this in real time and can handle multiple users. The authors do this by classifying each pixel of the input depth map as belonging to one of 31 body parts. A set of features are generated for each pixel by taking the difference in depth between that pixel and a different pixel specified by a depth-invariant offset. These offsets are chosen by training a “decision tree forest” to find a set of offsets that allows for the best differentiation between different body parts. In order to get enough data to evaluate their feature sets, the authors generated their own synthetic data by rendering models of people people with different body types in a variety of positions. After a person’s body is correctly segmented, it is possible for the authors to estimate the location of the subject’s joints, thus returning a condensed summary of a subject’s pose that can be used in a myriad of applications.
As you can see, the papers we read in the class range from generic UI concerns to highly technical domain-specific multimodal applications. They also run the gamut from theoretical to applied. I actually think this is a good way to lay out a UI course. I think designing easy, user-friendly interfaces is important and nontrivial; nevertheless, people sometimes stay away from UI courses because the stuff they teach you can seem kind of self-evident. 835 interleaves important soft skills with challenging algorithmic and hardware questions, which keeps it interesting.
Plus, the final project involves building a multimodal interface using fun toys (like Kinects). So it’s definitely not all reading!
21G.346: Contemporary Francophone Africa
This class is my favorite French class I’ve taken so far!
This is the first advanced French class I’ve taken, and it’s the first one where learning French is not the main objective. The class discusses the history and culture of Africa, focusing on countries where French is an official or predominant language. That’s to say, the class deals with substantial historical, political, and moral questions, but it’s taught entirely in French. The class is so good because we’re not just using language for the sake of hearing ourselves talk, but rather using it to express significant ideas. It’s been improving my speaking a lot because it’s forcing me to think about how to frame complex concepts. Also, there are only 4 people in the class, so there really is time and space for everyone to participate in a substantial debate.
But the focus on actual content means that mon Dieu, there’s a lot of reading. So far we’ve been focusing on the history of francophone Africa, with an emphasis on the colonial period and the discord and subjugation it brought. We’ve read several chapters of a book on Africa’s history (Petite Histoire de l’Afrique by Catherine Coquery-Vidrovitch). We read the first half of a classic of African literature, L’Aventure Ambiguë by Cheikh Hamidou Kane, about a young African boy who must abandon his religious and cultural traditions in favor of a European education. The novel is really compelling in the stark way it frames the choice faced by many African peoples in the wake of a wave of European colonization: hang on to the customs that have defined their identity for centuries and face eradication, or accept the practical know-how and cultural erosion of a materially superior power. We’ve read first-person French government documents ranging from the 1880s until the 1920s that put on display French imperial attitudes towards the African colonies and their system of colonial education. This week, we’re starting to read speeches by African leaders of decolonization calling for independence.
The reading is dense, textually and emotionally. Given the amount of text, and the fact that it’s in French, I’m pretty easily spending more time on this class than for any other humanities class I’ve taken at MIT. But it’s worth it.
6.033: Designing Software Systems
This is a required class for Computer Science majors. It’s a special kind of class called a CI-M—communication intensive in your major—which means it deals with technical content, but also teaches a lot of communication skills—like critical reading. During recitations, we come prepared having read a paper, a textbook excerpt, or some other kind of article, and then discuss the design choices presented therein.
Honestly, the course so far has been all over the map. Because the subject material is so wide-ranging— “systems”, which is a HUGE field comprising everything from operating systems to databases to the internet, plus the focus on communication—the corpus for this class has felt a little incoherent so far. For instance: the first recitation we read a sensationalist non-technical article about how buggy software is a deadly public safety threat and then discussed how the fatalities cited therein were fewer than the number of people that die annually in the US of lightning strikes.
The part of the class that is pretty cool is the way we discuss tradeoffs in design choices for computer systems. For instance, last week, we read a paper by the creators of the Unix operating system talking about some of the main design choices of the OS, such as how the filesystem is organized and what the command line actually does. We discussed if the decision to prevent circular references in the directory structure (not letting a folder contain a link to itself) is justified, and considered alternatives for how the command prompt could process user commands.
Although the reading can be interesting, the fact that I have literally no idea what next recitation’s reading is about is throwing me off a little. Are we going to talk about locking? Naming schemes? Distributed computing? It’s like playing reading roulette.
6.s081: Dynamic Computer Language Engineering (the only class without assigned reading!)
The semester-long assignment for this class is cut and dry: implement a dynamic programming language. No required reading!
Well, maybe I’m getting ahead of myself. Even though there are no assigned texts for the class, implementing a computer language is a complicated task…which means that there’s a lot of documentation in my future! Most notably:
A few weekends ago, I spent a solid day perusing the pages of LearnCpp.com, trying to learn C++, which I have never used before, in time to implement the first assignment, which was writing a parser for our language. This happened to be a project that involved some of the most complicated features of C++, like memory management and complicated inheritance patterns. After a somewhat stressful couple of days of working on it nonstop and liberally consulting StackOverflow, I finished the parser…AND it passed all the test cases!!! So I’m off to a good start.
The next assignment is basically a reading assignment in itself. This is the spec for our interpreter, which defines what commands in the language actually do. It’s 9 pages long.
So that’s my schedule this semester, reading-list style! I’m actually really excited about this schedule; it’s the first semester where I’ve been able to choose a lot of higher-level classes, and I’ve got good lecturers and interesting subject material to look forward to. Pen out, reading glasses on!