Why software crashes

j_e_f_f_g · Post by **j_e_f_f_g** » Mon Jun 10, 2013 12:48 am

Here's a test to see which programmers here have been taught proper error handling.

In prep of researching for an article I may write about creating an LV2 host, I decided to look at LV2 host libraries. First up is Lilv. As is typical of OSS documention, the docs consist of a simple list of APIs with almost no explanation of usage, and a couple of "example" apps which of course are almost completely uncommented. In other words, if a programmer wants to use this lib, he must examine the (uncommented -- surprise, huh?) source code of the lib itself. (And linux endusers wonder why commercial devs won't put any effort into supporting linux?)

Ok so I download the lilv sources, figure out that lilv_world_new() is probably the first function an app will call. I load "world.c" into gedit, and literally within 5 seconds I see:

Code: Select all

LilvWorld* world = (LilvWorld*)malloc(sizeof(LilvWorld));
world->world = sord_world_new();

Pressing the Page Down key, I see:

Code: Select all

LilvSpec* spec = (LilvSpec*)malloc(sizeof(LilvSpec));
spec->spec  = sord_node_copy(specification_node);

And then I see:

Code: Select all

LilvDynManifest* desc = malloc(sizeof(LilvDynManifest));
desc->bundle = lilv_node_new_from_node(world, bundle_node);

Incidentally, this is the same sort of thing you find all over Pulse Audio's sources too.

So the question is: What have these programmers either not been taught, or failed to learn?

In another thread, an enduser wondered why oss seemed so unstable and prone to crash. The answer is because of things like the above.

j_e_f_f_g · Post by **j_e_f_f_g** » Mon Jun 10, 2013 10:42 am

falkTX wrote:lilv is... on par with other libs I've used myself too.

I hope not. My point is that, due to very basic, missing error-checking, it's unsafe code that can crash any app that uses it.

Checking the return of malloc() should be one of the first things a Computer Sci student learns. I would never hire a programmer who doesn't know to do this.

j_e_f_f_g · Post by **j_e_f_f_g** » Mon Jun 10, 2013 10:57 am

P.S. I see you're using the C++ new operator. I hope you're handling a bad_alloc exception.

Post by **raboof** » Mon Jun 10, 2013 12:24 pm

j_e_f_f_g wrote:In another thread, an enduser wondered why oss seemed so unstable and prone to crash. The answer is because of things like the above.

Only if you mean 'things like this' rather broadly.

A typical Linux installation will overcommit on memory, so a malloc() of such small structures is highly unlikely to return NULL even in an OOM situation. You've got bigger problems (processes getting killed randomly) at that point.

You could argue it would still be useful to do error-checking here, but mostly because that is simply "how it should be done" (which might sound dogmatic but actually has some advantages).

It'd be interesting to see what most actual instabilities stem from. A tool like "apport" might be nice for that, though I haven't looked at it in detail myself so I can't really recommend it yet. In any case the availability of such a diagnostic tool can not be an excuse for not getting it right the first time, but might give some insight in where things typically go wrong.

male · Post by **male** » Mon Jun 10, 2013 4:08 pm

Wrong. This is the least likely cause you could imagine for a segfault. I suppose next you're going to tell us that C programs crash because of 'incorrect' indentation.

I seem to recall just about everyone who tried your midiview and edrummer programs reporting an immediate crash. Why don't you begin your instruction by showing examples of your own bugs?

j_e_f_f_g · Post by **j_e_f_f_g** » Mon Jun 10, 2013 6:18 pm

male wrote:Wrong. This is the least likely cause you could imagine for a segfault.

The issue with the OOM (ie, Out Of Memory) Manager is as usual another example of you arguing with your own straw man. It's a total fallacy programmers have that malloc() won't return 0 due to the OOM.

male wrote:I seem to recall just about everyone who tried your midiview and edrummer programs reporting an immediate crash.

As usual, you "recall" incorrectly. I'm sure you're thinking about your own software instead.

http://linuxmusicians.com/viewtopic.php ... 891#p39928
http://linuxmusicians.com/viewtopic.php?f=1&t=11008
http://linuxmusicians.com/viewtopic.php?f=1&t=10970

male · Post by **male** » Mon Jun 10, 2013 6:35 pm

j_e_f_f_g wrote:
male wrote:Wrong. This is the least likely cause you could imagine for a segfault.
The issue with the OOM (ie, Out Of Memory) Manager is as usual another example of you arguing with your own straw man. It's a total fallacy programmers have that malloc() won't return 0 due to the OOM.

male wrote:I seem to recall just about everyone who tried your midiview and edrummer programs reporting an immediate crash.
As usual, you "recall" incorrectly. I'm sure you're thinking about your own software instead.

http://linuxmusicians.com/viewtopic.php ... 891#p39928
http://linuxmusicians.com/viewtopic.php?f=1&t=11008
http://linuxmusicians.com/viewtopic.php?f=1&t=10970

When did I even mention the OOM killer? All you're doing here, Jeff, is proving that you're out of touch and don't know anything about that which you criticise. Do you offer some mechanism to prevent crashes? No. Do you really think that the people who wrote that code don't know that malloc() could possibly return NULL? If you do, then you're just once more proving how out of touch you are with reality. Again, why don't you analyse why your own software crashes and post that? Oh, wait, I know, because software crashes are a completely general problem and have nothing to do with Linux Audio or even Linux. Maybe C, but that's about as specific as the problem gets. You're doing nothing here but displaying your own foolishness for everyone to see and laugh at. And while I enjoy a bit of entertainment as much as the next guy, I grow tired of your lame old gag.

Post by **raboof** » Mon Jun 10, 2013 9:27 pm

Guys, I'm going to leave the above posts alone, but be careful. If this is going to turn into a 'your code is shittier than mine'-contest I'll moderate.

Of course if you want to take specific bugs (your own or others') and explore what exactly caused the problems and how such issues could be prevented that's all good.

nils · Post by **nils** » Mon Jun 10, 2013 9:48 pm

Yes, quote code excerpts and comment them, please.

male · Post by **male** » Mon Jun 10, 2013 9:51 pm

raboof wrote:Guys, I'm going to leave the above posts alone, but be careful. If this is going to turn into a 'your code is shittier than mine'-contest I'll moderate.

Of course if you want to take specific bugs (your own or others') and explore what exactly caused the problems and how such issues could be prevented that's all good.

Fair enough. IMHO, it's quicker and more productive to fix bugs than complain about them in a public forum that the author doesn't even read, so you won't find me line-by-line auditing anyone's code here.

Post by **raboof** » Mon Jun 10, 2013 10:59 pm

j_e_f_f_g wrote:
male wrote:Wrong. This is the least likely cause you could imagine for a segfault.
The issue with the OOM (ie, Out Of Memory) Manager is as usual another example of you arguing with your own straw man.

You seem to be confusing me and male - I was the one who brought up memory overcommit and OOM.

I'm not sure which 'straw man' you claim I'm arguing. You claimed the code was buggy because it didn't check for malloc() returning NULL, and I put it into perspective by claiming that 1) the only situation where that would do any good would be in an OOM situation, and 2) it wouldn't do much good in an OOM situation on a typical system.

j_e_f_f_g wrote:It's a total fallacy programmers have that malloc() won't return 0 due to the OOM.

Uh, no. (I'll ignore the 0-versus-NULL debate to try and keep this on-topic)

On Linux, due to memory overcommit, malloc() might not return NULL even if there's insufficient memory to back your malloc(). A simple example program can demonstrate this: this program will try to allocate 12 gigs of memory. On my configuration, all these malloc() calls return a non-NULL value. Obviously, since my machine doesn't have 12 gigs of memory (and I don't use swap), this can't work - and indeed if i try to actually use the memory (in this case: writing some 'y' characters into it), it'll grow too big and get killed by the OOM killer.

Code: Select all

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <assert.h>

int main() {
  unsigned long gig_in_bytes = 1024 * 1024 * 1024;

  // This example assumes an architecture where the smallest addressable unit
  // is a byte, and the maximum size of size_t is at least a gig (i.e. any
  // modern system with a 32-bit architecture)
  assert(sizeof(unsigned char) == 1);
  assert(gig_in_bytes < SIZE_MAX);

  int gigs_to_alloc = 12;
  unsigned char * allocated_chunks[gigs_to_alloc];
  int i;
  unsigned long j;

  // first allocate a generous amount of memory. Depending on overcommit
  // settings, this might not return NULL even if this allocates more than
  // the physically available amount of memory.
  for (i = 0; i < gigs_to_alloc; i++) {
    allocated_chunks[i] = (unsigned char*) malloc(gig_in_bytes);
    assert(allocated_chunks[i] != NULL);
    printf("Malloc'ed %d gig in total now\n", i);
  }

  // Now actually use the memory (by writing into it)
  for (i = 0; i < gigs_to_alloc; i++) {
    for (j = 0; j < gig_in_bytes; j += 10000)
      allocated_chunks[i][j] = 'y';
  }

  printf("Done\n");

  return 0;
}

So, this example shows even allocating huge 1-gig chunks of memory on a machine that doesn't have them available doesn't always make malloc() return NULL - which j_e_f_f_g above claimed was a 'total fallacy'. Therefore I stand by my earlier claim that checking the return value of malloc() for a small number of small allocations is unlikely to improve the stability of your application when running on a typical Linux system.

Of course this doesn't necessarily mean checking the return value of malloc() is useless. It's not that hard to think of scenario's where it could be a good idea. Nonetheless, I hope this does put j_e_f_f_g's bold claims above into some perspective.

Post by **raboof** » Mon Jun 10, 2013 11:00 pm

male wrote:IMHO, it's quicker and more productive to fix bugs than complain about them in a public forum that the author doesn't even read.

Well, if we can learn something from it that makes it 'productive' in my book. I hope that'll happen

.

j_e_f_f_g · Post by **j_e_f_f_g** » Mon Jun 10, 2013 11:12 pm

male wrote:When did I even mention the OOM killer?

In order for my statement of "My point is that, due to very basic, missing error-checking (ie not checking malloc returning 0), it's unsafe code that can crash any app that uses it." to be "wrong" (as your reply erroneously contends), then the following assumptions must be made:

1) malloc will never return 0 due to over-committing.
2) A reference to over-committed mem will result in the OOM Manager "safely" recovering enough memory to satisfy the app's reference, such that the OOM Manager won't abruptly terminate the app. (ie, The app essentially "crashes").

Both of the above are incorrect assumptions. I merely pointed out the first incorrect assumption as evidence that your contention about me being "wrong" is, as usual, wrong.

I see now that your contention wasn't even based upon one of the above mis-assumptions, but rather yet another example of you engaging in mere "truth by proclamation". Do you ever back up any of your statements with facts, or do you always resort to useless ad hominem accusations that the other person allegedly "doesn't know what (he's) talking about" (just because you say so), which is the entire content of the remainder of your reply. Typical.

male · Post by **male** » Mon Jun 10, 2013 11:30 pm

j_e_f_f_g wrote:
male wrote:When did I even mention the OOM killer?
In order for my statement of "My point is that, due to very basic, missing error-checking (ie not checking malloc returning 0), it's unsafe code that can crash any app that uses it." to be "wrong" (as your reply erroneously contends), then the following assumptions must be made:

1) malloc will never return 0 due to over-committing.
2) A reference to over-committed mem will result in the OOM Manager "safely" recovering enough memory to satisfy the app's reference, such that the OOM Manager won't abruptly terminate the app. (ie, The app essentially "crashes").

Both of the above are incorrect assumptions. I merely pointed out the first incorrect assumption as evidence that your contention about me being "wrong" is, as usual, wrong.

I see now that your contention wasn't even based upon one of the above mis-assumptions, but rather yet another example of you engaging in mere "truth by proclamation". Do you ever back up any of your statements with facts, or do you always resort to useless ad hominem accusations that the other person allegedly "doesn't know what (he's) talking about" (just because you say so), which is the entire content of the remainder of your reply. Typical.

Let's try this, genius: Why don't you sift through the bug database of any of the many large projects that have a policy of never checking the return value of malloc() for NULL and tell us how many bugs you find that are attributable to the fact? You have asserted that not checking the return value of malloc() for NULL is why software crashes, and you are quite and thoroughly wrong. That is not the reason. Do whatever it takes to convince yourself of this, but don't waste this forum's time with your foolishness and misinformation.

j_e_f_f_g · Post by **j_e_f_f_g** » Tue Jun 11, 2013 12:25 am

j_e_f_f_g wrote:It's a total fallacy programmers have that malloc() won't return 0 due to the OOM.

raboof wrote:Uh, no.

Yes. It's a fallacy that malloc won't return 0. If malloc deduces that there's no way it can fullfill a request, for example due to memory fragmentation, exceeding a ulimit setting (or other mem management settings, such as overcommit_memory), etc, then malloc will return 0.

http://stackoverflow.com/questions/2248 ... -uses-over
http://voices.canonical.com/jussi.pakka ... -and-linux
http://compgroups.net/comp.unix.program ... ull/471850

raboof wrote:On Linux, due to memory overcommit, malloc() might not return NULL even if there's insufficient memory to back your malloc().

The key word being "might". You do realize that you're tacitly admitting that my above statement is true?

raboof wrote:allocating chunks of memory on a machine that doesn't have them available doesn't always make malloc() return NULL - which j_e_f_f_g above claimed was a 'total fallacy'.

That is not what I wrote. Reread my text which you quoted.

raboof wrote:Therefore I stand by my earlier claim that checking the return value of malloc() for a small number of small allocations is unlikely to improve the stability of your application when running on a typical Linux system.

And I stand by my claim that it should always be done, and that assumptions it's safe/pointless not to do it are incorrect.

LinuxMusicians

Why software crashes

Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes

Re: Why software crashes