Determining if a number is prime in 115 characters of C code
main(c,v)char**v;{return((c=strtol(v[1],0,10))>255)?2:("Y~U17qO=U=7qMmUA/-O}-A5qM}-A7-/mU17m-"[c/7]-45>>(c%7))&1;}
Recently this was posted on a work chat:
Implement an algorithm that:
1. Receives an unsigned int 'n'
2. Returns 'true' if 'n' is prime
3. Runs in O(1)
Taking in consideration that 'n' is an unsigned int we can assume that it is finite. Usually it should be 32 bits but anything above 16 bit is possible1.
This way a possible solution to the stated problem is to use a lookup table of integers where each position indicates if a number is prime or not.
I decided to implement a code golf2 version in C with a few modifications to the requirements:
- unsigned int is going to be 8 bit.
- The program will return 1 for a prime number and 0 for non primes. 2 indicates errors.
Generating the lookup table
The following listing specifies a C program that implements the Sieve of Eratosthenes3:
#include <stdio.h>
int main() {
unsigned char is_prime[256] = { 0 };
for (int i=2;i<256;i++) {
is_prime[i] = 1;
}
for (int i=2; i<256; i++) {
for (int j=2; j<16; j++) {
unsigned product = i*j;
if (product > 255) {
break;
}
is_prime[product] = 0;
}
}
for (int i=0; i<255;i++) {
printf("%d,", is_prime[i]);
}
printf("%d\n", is_prime[255]);
return 0;
}
Compiling and running it yields the table:
0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
We can then write a very simple program that returns the value found in the lookup table.
#include <stdlib.h>
int main(char argc, char** argv)
{
unsigned char is_prime[256] = {
0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0
};
int r = strtol(argv[1], 0, 10);
if (r > 255) {
return 2;
} else {
return is_prime[r];
}
return 0;
}
Golfing it
The original goal was to write this program as short as possible. The previous listing has much to improve:
- Remove any unnecessary whitespace.
- Remove the stdlib include.
- Convert the main declaration using K&R style function declarations:
int main(char argc, char** argv)
->main(c, v)char**v
. - Use
argc
as a variable. - Omit type declarations. Variables get defaulted to int.
- Reduce all variables to a single letter.
Applying these technicques results in:
t[]={0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0};main(c, v)char**v;{return((c=strtol(v[1],0,10))>255)?2:t[c];}
A 50% reduction from the original 1092 characters to 580.
Still too big
Now, looking at the previous listing it is pretty obvious that the encoding of the lookup table is very inefficient. It currently uses 514 characters.
A possible solution would be to encode it as an array of 32 bit integers. Each integer would encode 32 elements of the lookup table. We would need 8 integer that encoded using hexadecimal require 10 characters, plus 7 commas and 2 brackets, totaling of 89 characters.
However we have a better option. We can encode the lookup table as a string, at 8 bit per character we require 32 characters plus 2 quotation marks (thanks João Lourenço).
Unfortunatelly ASCII only defines 128 code points so the full 8 bits of a char are not usable. We will need to use 7 bits of each char increasing the size of the string to 37 characters.
Furthermore chars 0 to 31 cannot be encoded in an efficient way. The whole table needs to be shifted upwards so that this range is not used.
This means adding an offset to each character in the string. If the used offset is 32 the string would be :
LqH$*dB0H0*d@`H4" Bp 4(d@p 4* "`H$*`
This string contains quotes that need escaping. Changing the offset to 45 results in:
Y~U17qO=U=7qMmUA/-O}-A5qM}-A7-/mU17m-
Fortunatelly the largest char that needs to be encoded is 'R' 4.
This all can be summed up in the script that generates the lookup table as a string:
int main(char c, char **v) {
unsigned char is_prime[256] = { 0 };
for (int i=2;i<256;i++) {
is_prime[i] = 1;
}
for (int i=2; i<256; i++) {
for (int j=2; j<16; j++) {
unsigned product = i*j;
if (product > 255) {
break;
}
is_prime[product] = 0;
}
}
char string[37] = {0};
for (int i=0; i<256; i++) {
string[i/7] |= is_prime[i]<<i%7;
}
for (int i=0; i<37; i++) {
printf("%c", string[i] + 45);
}
printf("\n");
return 0;
}
And finally:
main(c,v)char**v;{return((c=strtol(v[1],0,10))>255)?2:("Y~U17qO=U=7qMmUA/-O}-A5qM}-A7-/mU17m-"[c/7]-45>>(c%7))&1;}
[1]: https://en.wikipedia.org/wiki/C_data_types
[2]: https://en.wikipedia.org/wiki/Code_golf
[3]: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
[4]: Proof is left as an exercise to the reader