Linux pids/tgids from target namespace

2021-04-06
3 min read

Problem

There is a use case where a new ebpf helper is needed. When the program that will use ebpf to instrument a process that is in a different namespace which is privileged to instrument another one, as is described in this use case:

I’ll try and describe the use case I have in mind. I expect folks would like to use bpftrace in container X to trace events in container Y, where X may or not be Y, provided the bpftrace process has the required access to Y’s namespace. For symbolization to work, bpftrace needs to get the pid of processes in container Y but in the namespace of X. For example X could be a parent cgroup to multiple workloads including X, and the owner of these workloads has access to X but not the host itself. I don’t see it as trying to get pid/tgid from another namespace, it’s actually to get the pid in the namespace where it can be acted upon (i.e. X). Looks like your suggestion is for bpftrace to use bpf_get_pid_tgid() when it’s running in the root namespace (i.e. the host), and bpf_get_ns_current_pid_tgid() when it’s running in another namespace (i.e. a container). I think this would be fine in the short term, even though it doesn’t cover all the cases (see above). That said, I’d say the intelligence belongs more in a bpf helper than in user space.

Looks like your suggestion is for bpftrace to use bpf_get_pid_tgid() when it’s running in the root namespace (i.e. the host), and bpf_get_ns_current_pid_tgid() when it’s running in another namespace (i.e. a container). I think this would be fine in the short term, even though it doesn’t cover all the cases (see above). That said, I’d say the intelligence belongs more in a bpf helper than in user space. Using bpf_get_pid_tgid() should work if you have host access. But if you only have a privileged namespace, this won’t work and indeed a new helper is needed.

In a nutshell:

From container X get pid/tgid from container Y (X maybe or may not be Y).

Proposed helper

This helper should return pid/tgid from target namespace, the pid/tgid of the process may or maybe not be current task.

Solution

This helper is not searching for the current task, it just needs pid/tgid from task on the target namespace, so we need to return the tasks which match dev/ino from the requested ns space. So we return the first pid/tgid that matches ns dev,ino.

Problems

there is no dev_t on ns_common just inum, so we just match on ino then?. in the future a pid namespaces could live in different devices so that’s dev_t, ino matched for.

Algorithm

def find_pid_nr_ns ( tasklist, ns) :
        for each t in tasklist:
           if t.ns.inum == ns.inum && t.ns.dev == ns.dev : 
              return t.pid, t.ns

Seems this one could help kernel/pid.c

struct task_struct *find_task_by_pid_ns(pid_t nr, struct
 pid_namespace *ns)

References: https://gs0510.github.io/PIDAllocationLinuxKernel/