On Tuesday 08 December 2009 04:25:07 pm Seebs wrote: > On 2009-12-08, David Masover <ninja / slaphack.com> wrote: > > Compare any of these to C. You probably could write a web app in C. You > > probably could be about as efficient with it. You could be disciplined > > enough to never do pointer arithmetic, > > This is hardly necessary. Pointer arithmetic can certainly be done safely. Can be. However, the fact that it exists opens the door to a whole class of weird and hard-to-pin-down crashes (and possible vulnerabilities) that simply don't happen if you don't (or can't) do it. But that wasn't my point. My point was that if you consider lack of pointer arithmetic, garbage collection, and other features to be a selling point of higher-level languages, you can do all that in C, and you can make it _almost_ automatic in C++. > > Think about that for a moment. In languages like Ruby and PHP, a buffer > > overflow is actually not possible. You might get it in a third-party > > library written in another language (like C), but you can't do it > > yourself. But in C, it's not only possible, it's a very easy mistake to > > make, and a hard one to avoid. > > I'm not sold on this. I don't think I've had any buffer overflows in my > code in years. It's pretty easy -- if I'm about to use a buffer, I make > sure I know what I'm using it for and that I cap any copies and/or report > failure if there's not enough space. My favorite example is here: http://joelonsoftware.com/articles/fog0000000319.html I like this both for the ludicrous example, when he finally decides to figure out how much to allocate: char* bigString; int i = 0; i = strlen("John, ") + strlen("Paul, ") + strlen("George, ") + strlen("Joel "); bigString = (char*) malloc (i + 1); ...and for the ludicrous inefficiency. He's going to scan through each string at least twice, and that's with a customized strcat -- it gets much worse with the real strcat. And remember, his next step is: char *p = bigString; bigString[0] = '\0'; p = mystrcat(p,"John, "); p = mystrcat(p,"Paul, "); p = mystrcat(p,"George, "); p = mystrcat(p,"Joel "); It's still a bit sloppy -- that initial null assignment makes me cringe -- but think about this. Even if you ignore the fact that we've got each string duplicated here -- let's say they're variables: int i = 0; i = strlen(a) + strlen(b) + strlen(c) + strlen(d); bigString = (char*) malloc (i+1); char *p = bigString; bigString[0] = '\0'; p = mystrcat(p,a); p = mystrcat(p,b); p = mystrcat(p,c); p = mystrcat(p,d); Now suppose you add a string to that, or remove it. If you add it to one place and not the other, or remove it from one place and not the other, you're either wasting RAM or hitting a buffer overrun every time. Then again, this kind of malloc is probably inefficient, as the article points out. Instead, you probably want to allocate some power of 2 -- at which point, you want to make sure you've always allocated a power of two that's more than you need, not less than you need. Are you sure you never make a mistake here? Because this is the kind of thing that I don't have to think about. Yes, it's less efficient, but if I have a bunch of strings in Ruby, I can just do this: big_string = a + b + c + d There are other, more efficient ways, like: big_string = a.dup << b << c << d or big_string = "#{a}#{b}#{c}#{d}" The point is, though, while these have varying degrees of efficiency, none of them have the possibility that I'll forget something and open myself up to a vulnerability or a crash. Worst case, I waste a bit of RAM, and 100% of the RAM I waste here can be garbage-collected later, whereas in C, if I waste it, it's wasted, possibly even leaked. So not only is it ridiculously easier, it's also safer. It's also possibly faster, because since it's a higher-level abstraction, the runtime might (in theory; I bet Ruby doesn't) notice that these are all strings and that you're just concatenating them, so it could use some sort of StringBuilder automatically. Even if it doesn't, it still has the option of storing the length of a string separately, rather than using null-terminated strings -- thus saving you at least half your time in an operation like ("a" + "b"). Am I being unrealistic? Is this the kind of thing you'd never do? > I agree that it requires actual effort, as opposed to being implicit. The point here is that the implicit version also implicitly handles all the safety for you. Another example might be SQL manipulation. To keep myself sane, let's do this with Ruby: execute "select hashed_password from users where username = '#{name}'" The problem with that code should be blindingly obvious. Of course, I should probably be doing something like this: execute "select hashed_password from users where username = '#{escape name}'" The problem is, this requires me to always, always remember to do it. This is how a lot of PHP stuff is written, though I'm told it's changing, and those in the know use libraries that allow you to do it the Right Way. How would the Right Way look? execute 'select hashed_password from users where username = ?', name Can you see why that's safer? I can develop a much easier to maintain habit of using only single-quoted strings as my queries. Since the actual values are always passed separately, they are always escaped -- I don't have to remember anything special to make that work. So I can develop a very, very simple habit (use single-quoted strings) that I can almost unconsciously apply everywhere, and I will never be subject to a SQL injection attack. Or I can try to develop a habit of manually escaping -- the problem is that sooner or later, mistakes WILL happen. Best case, I develop such muscle memory of doing it this way that I end up accidentally doing this: puts "Hello, #{escape name}!" That way, worst case, it goes unnoticed for months until someone named O'Harris signs up and wonders why the system thinks their name is O''Harris or O\'Harris. The point is that higher levels of abstraction do allow us to abstract away opportunities to screw things up. This is true in the language itself, and in the libraries. And if I've convinced you of that, don't worry, low-level skill is still needed. Another of my favorite articles: http://joelonsoftware.com/articles/LeakyAbstractions.html It helps to understand what's going on at the C level, even if I never want to actually touch it, because that might give me some insight as to why "Hello, #{name}!" is more efficient than 'Hello, '+name+'!' Try it yourself: require 'benchmark' name = 'steve' Benchmark.bm do |x| x.report { 10000000.times { "Hello, #{name}!" }} x.report { 10000000.times { 'Hello, '+name+'!' }} end My results: user system total real 6.010000 0.020000 6.030000 ( 6.104799) 7.500000 0.010000 7.510000 ( 7.505193) It only gets better, the more interpolated values you have. a+b is more efficient than "#{a}#{b}", but a+b+c+d is less efficient than "#{a}#{b}#{c}#{d}". This was very surprising to me. Then I went back and read that article, and thought a bit about the concept of a string builder. Now it makes sense, even though it's still a bit counterintuitive. So I'm glad I sort of know C, and I'm just as glad I don't have to use it much. > The killer for me was > discovering that there was a thing like a function pointer which could be > used only for user-defined functions, not built-in functions. I could live with that, but I'm guessing it might've been the last straw... For me, I'm spoiled by blocks now. I can fake them in Javascript, and even (though less effectively) in Java, but not in PHP, that I know of.