Friday, June 20, 2014

See You in Prague

For those of you going to SPAA this coming week, I'll see you there.  I'll be giving the last two talks at the conference, to what I expect (based on the timing) will be a nearly empty room.  That just means there will be no pressure.

If you want to hear more about the papers, you can go to Abstract Talk, where I talk about the papers.  Here is the link for the paper Balanced Allocations and Double Hashing, and the link for the paper Parallel Peeling Algorithms.  I haven't done podcasts for Abstract Talk before, so be forgiving if you go to listen.  It seems like a cool idea;  what do people think of it in practice? 

For those who expect to be sightseeing in Prague during the final session (or who just aren't going to SPAA), here's the brief overview. 

For Balanced Allocations and Double Hashing:
In the well-known balanced allocations paradigm, balls are hashed sequentially into bins, where each ball gets d random choices from the hash functions, and is then placed in the least loaded.  With double hashing, we replace the d random choices with d choices of the form a, a+b, a+2b, a+3b,... a+(d-1)b, where a and b are random values (determined by hashing).  That is, we build the d choices from 2 random numbers instead of using d random numbers.  (The numbers are taken mod the size of the hash table, and b should be relatively prime to the hash table size... let's stop worrying about details.)  We find empirically that this makes no difference, in a very strong sense;  the fraction of bins with load j appears the same for every value of j for both systems, so you can't really tell them apart.  We provide a theoretical explanation, based on fluid limit models, for why this happens.

For Parallel Peeling Algorithms:
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core.  We analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty
with high probability. We show that, with high probability, below this threshold, only O(log log n) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, above this threshold, Ω(log n) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty k-core  this asymmetry appears fortunate;  nature is on our side. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.

The Parallel Peeling Algorithms paper was, to my surprise, awarded Best Paper.  Maybe I'll write up more about the surprise at some point, but I'd certainly like to thank the committee for the honor, and pass the credit to where it is due, with my co-authors Jiayang Jiang and Justin Thaler. 


NSF Thanks for the Year

It's that time of year where, as a background process, I have to do my annual reviews for the NSF.  It's generally not a very exciting task, and their online forms remain, I think, unpleasantly designed.  (I think they fixed a problem I had last year, so at least now they point out to you what you haven't filled out more clearly.)

That being said, this year, even more than others, I find myself gratefully filling them out.  The NSF provides the money for my (and my students') research.  And research is fun.  In my few years playing administrator, I was trying to keep up with my research, but inevitably my available time and corresponding output started to decline.  This year, I've been able to "get back into it", and it made me realize how much I enjoy it.  Sure it's often frustrating.  And writing it down (and dealing with conferences, reviews, etc.) can also be frustrating.  But overall the creation process of research, including all the frustrating parts, is the most enjoyable part of the job*, and I'm glad that the NSF is there to support it.  

Thanks NSF.  I'll try to have those reports in well before the nominal deadline....

[* Yes, I like teaching and interacting with students too -- otherwise I'd look for work in one of the many great research labs -- that's the other most enjoyable part of the job.]  

Monday, June 16, 2014

Sad News: Berthold Vöcking

I have just seen the news that Berthold Vöcking passed away.  For those who didn't know him, Berthold was an oustanding researcher in algorithms.  We had several shared interests and were of a similar age;  I always enjoyed his work, and very much enjoyed the few times that I worked with him.  His paper How Asymmetry Helps Load Balancing still amazes me, and is one of the most wonderful papers that I wish I had written.  When I first saw it I simply didn't believe the main result*;  I stopped my other work, wrote up a simulation, and found it matched his analysis.  I then had to spend the next few days understanding it.  Once I understood it, I wondered how he had seen it, and how I had missed it.  It is a truly inspired paper.  

Of course he has other great papers, including the well known Tight Bounds for Worst-Case Equilibria, and maybe the less well known but very interesting (particularly to me) work on Balanced Allocations:  The Heavily Loaded Case.  He had just won a best paper award for ICALP for work on Online Independent Sets.  He was one of the editors of Algorithms Unplugged, a wonderful book project. 

Berthold was regularly on my list of people to invite to workshops, because he always had very interesting work to present and was interesting to talk to.  I can't believe I won't have another chance to talk with him.  His passing is a loss to our community.  

* For those who care, his result was that when hashing using the "power of two choices", suppose that instead of having each new item make two random choices and then placing the item in the least loaded, you split the table into two equal halves (call them left and right), have each item make a random choice from each half, and then place the item in the least loaded, except that you always break ties to the "left half".  There wouldn't seem to be any difference between the two schemes, but there is;  the split-and-break-ties approach works significantly better.    


Saturday, May 31, 2014

Child Geniuses and Other Articles

Stuck on a flight home, I passed time reading a hotel copy of the Wall Street Journal, to find more interesting things than I would have thought.

What first caught my eye is an interesting piece by Jordan Ellenberg, a math professor who also writes a lot about mathematics for more popular consumption, about The Wrong Way to Treat Child Geniuses.  Especially for any readers who did any of those science/math programs as a child (notably John Hopkins/SMPY) -- but not exclusively -- it's definitely worth a read.

Then there were also interesting articles about Ray Kurzweil and an amusing article about entitled Ticket to Dine: the Restaurant Reservation Revoultion.  The last article described a business model that is simultaneously disturbing and why-didn't-I-think-of-it -- using bots to snap up reservations on OpenTable to popular restaurants, and then selling/scalping the reservations.  Evil, or genius?  You decide....



Wednesday, May 28, 2014

ITCS 2015

I was asked to announce the call for ITCS.  Here is a link to the call for papers, with the most important info:
ITCS (previously known as ICS) seeks to promote research that carries a strong conceptual message (e.g., introducing a new concept or model, opening a new line of inquiry within traditional or cross-interdisciplinary areas, or introducing new techniques or new applications of known techniques). ITCS welcomes all submissions, whether aligned with current theory of computation research directions or deviating from them.
 
Important Dates
Paper Submission Deadline: Friday,
August 8, 2014, 5PM PDT
Apparently in some fit of weakness I agreed to be on the PC.  
 
I think an ongoing question about ITCS is how well it lives up to its "mission statement".  Part of the question is whether ITCS is necessary -- do FOCS/STOC/SODA not do a sufficiently good job of accepting "conceptual" papers -- and sufficient(ly doing a good job) -- is it really focusing on accepting conceptual papers, or is it just the same-old.  So I guess I'll be able to look under the hood to see how it's going.
 
My current/upcoming PCs have been SIGCOMM and coNEXT (2014), and ITCS and ICALP part C (2015).  I'm feeling happily diverse in what I get to read.  
 
  
 
 

Wednesday, May 21, 2014

What's Important in Algoirthms

I saw this interesting article up on 20 questions with Don Knuth, worth reading just for the fun of it.  But the following question on work in algorithm design and analysis particularly interested me, and I nodded especially hard resonating with the statement:

Thus I think the present state of research in algorithm design misunderstands the true nature of efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to be published.
 
But here's the whole question/answer for context.

15. Robert Tarjan, Princeton: What do you see as the most promising directions for future work in algorithm design and analysis? What interesting and important open problems do you see?
Don Knuth: My current draft about satisfiability already mentions 25 research problems, most of which are not yet well known to the theory community. Hence many of them might well be answered before Volume 4B is ready. Open problems pop up everywhere and often. But your question is, of course, really intended to be much more general.
In general I'm looking for more focus on algorithms that work fast with respect to problems whose size, n, is feasible. Most of today's literature is devoted to algorithms that are asymptotically great, but they are helpful only when n exceeds the size of the universe.
In one sense such literature makes my life easier, because I don't have to discuss those methods in TAOCP. I'm emphatically not against pure research, which significantly sharpens our abilities to deal with practical problems and which is interesting in its own right. So I sometimes play asymptotic games. But I sure wouldn't mind seeing a lot more algorithms that I could also use.
For instance, I've been reading about algorithms that decide whether or not a given graph G belongs to a certain class. Is G, say, chordal? You and others discovered some great algorithms for the chordality and minimum fillin problems, early on, and an enormous number of extremely ingenious procedures have subsequently been developed for characterizing the graphs of other classes. But I've been surprised to discover that very few of these newer algorithms have actually been implemented. They exist only on paper, and often with details only sketched.
Two years ago I needed an algorithm to decide whether G is a so-called comparability graph, and was disappointed by what had been published. I believe that all of the supposedly "most efficient" algorithms for that problem are too complicated to be trustworthy, even if I had a year to implement one of them.
Thus I think the present state of research in algorithm design misunderstands the true nature of efficiency. The literature exhibits a dangerous trend in contemporary views of what deserves to be published.
Another issue, when we come down to earth, is the efficiency of algorithms on real computers. As part of the Stanford GraphBase project I implemented four algorithms to compute minimum spanning trees of graphs, one of which was the very pretty method that you developed with Cheriton and Karp. Although I was expecting your method to be the winner, because it examines much of the data only half as often as the others, it actually came out two to three times worse than Kruskal's venerable method. Part of the reason was poor cache interaction, but the main cause was a large constant factor hidden by O notation.

Friday, May 02, 2014

Reviewing Question: What's Important

Nick Feamster on Google+ recently shared (and gave me permission to blog about) the following review comment:
This review comment is super enlightening:

"I think there is a general problem with the overall goal of this study and area of research. This seems to be making data usage a critical thing people should be paying attention to. People have  real issues and I am not at all sure that this one deserves attention. They have real concerns like, are they going to lose their job, where should their children go to college, should they encourage their elderly parents to move into a retirement center."
It's so cringeworthy, it's funny.  You could substitute "data usage" with pretty much any research topic you feel like, and you have a review that's almost certainly accurate and justifies rejection.  Which is what makes it such a wonderful, terrible review.  

Without wanting in any way to excuse this reviewer, I do want to say that this review highlights an ever-increasing problem:  I believe review decisions are more and more becoming dominated by subjective decisions about what topics are "important".  I realize some may say it has ever been thus, and I acknowledge that the importance of the underlying problem has always been a factor in judging a paper.  I think the subjective judgment has become more significant in both in systems and in theory over the years for multiple reasons.  As the field has expanded there's less of an underlying agreement and common understanding of what's important.  Many times the reviewer may not know the area well enough to judge the importance, and there is every-growing potential for area bias:  the problems in my area are (more) important.  Further, there are far too many papers for the available slots so reasons to reject have to be found.  As the above comment suggests, one can always call into question the importance of the problem the paper aims to solve.   

But finally, I think, it fits in with an issue that keeps coming up for me:  reviewers are too arrogant.  If they don't see why the problem is important, then the issue must be with the research or the writing;  it couldn't be with their reading or understanding of the problem space.  Reviewers will have opinions regarding the importance of the works they read -- no getting around that -- and they should where possible give advice to authors on how to best present their results.  But they could often be a bit more judicious in recognizing that they are expressing their opinion and offering advice;  they are not, in the end, the final arbiter of a paper's eventual, actual importance.

I don't see subjective decisions in "importance" going away.  But I think they could be given a bit more care, both in how they are used in the final decision-making, and in how reviewers express their opinions on "importance" to authors.

If you'll excuse me now, I don't have time to blog further, I have to go plan where my children should go to college and where I'll eventually ship off my aging parents.  (Thank goodness I have a tenured position.)           

Thursday, April 03, 2014

Postdocs in Copenhagen

I apologize for the short notice, but my occasional co-author Rasmus Pagh is looking for postdocs for a big data project he recently had funded, with an application deadline on April 14th. 

For more information, you can see this page, which starts with:

The Scalable Similarity Search (SSS) project led by Professor Rasmus Pagh is seeking 3 post-docs with a strong background in algorithms theory, combinatorics, or statistics. The project is funded by the European Research Council (ERC), runs in the years 2014-19, and will include a total of 3 PhD and 3 post-doc positions. The aim of the project is to improve theory and practice of algorithms for high-dimensional similarity search on big data, and to extend similarity search algorithms to work in settings where data is distributed (using a communication complexity perspective) or uncertain (using a statistical perspective). A post-doc position may include a long-term visit to a project partner (at Berkeley, Harvard, MIT, Stanford, or Tsinghua) if all parties find the visit beneficial.

Or you can see this nice video Rasmus recently put together.   

And yes, I'm self-interested in this matter, in that as someone who works with Rasmus, the potential "long-term visit" to Harvard described above would involve me if it worked out.  Also, Copenhagen is a wonderful place. 

Wednesday, March 19, 2014

Hashing Summer School in Copenhagen

Of the many, many interesting things happening in Copenhagen this summer (SEA, SWAT, ICALP) we'd like to add one more:  a Hashing Summer School at the University of Copenhagen.  Here's the web site.  This was the brainchild of Mikkel Thorup, who knows a thing or two or three about hashing, and is co-organized by me and Rasmus Pagh.  We've got a great set of speakers, and we expect a mix of lectures, problem-solving exercises, a poster session, and other such fun and learning.  The registration deadline is May 15th;  check the web site for details.  If you're coming out to Copenhagen for other activities, or just want a good reason to visit the beautiful city, take a look.  I hope to see you there.


Tuesday, March 18, 2014

Leslie Lamport wins Turing Awards

In another sign that these Turing Award committees really know what they're doing, Leslie Lamport has won the Turing Award.  There's a very nice writeup including the history of his work up on an official Microsoft blog post.

While I know there are arguably many people deserving of a Turing Award, Leslie Lamport is an amazingly obvious and absolutely fantastic choice.  His body of work is truly inspiring, and as the above links show, his work has had a huge effect on us all.