Skip to content
Matthieu Baerts edited this page Jan 24, 2025 · 89 revisions

Linux MPTCP Upstream Project

MPTCP logo

Overview

The goal of this community is to develop, maintain, and improve the Multipath TCP (MPTCP) protocol (v1 / RFC 8684) in the upstream Linux kernel.

Programs that were built to use TCP will still use the TCP connections when running on an MPTCP-enabled kernel unless the programmer/user/admin opts-in to using Multipath TCP.

Please visit our website for more details about MPTCP, how to use and configure it, and what app developers can do to support it natively.

How to use MPTCP?

Please visit our Setup page.

Talks and Articles

Mostly linked to the kernel development

In English:

In Chinese (中文):

ChangeLog

  • v5.6: Create MPTCPv1 sockets with a single subflow
    • Prerequisites: modifications in TCP and Socket API
    • Single subflow & RFC8684 support
    • Selftests for single subflow
  • v5.7: Create multiple subflows but use them one at a time and get stats
    • Multiple subflows: one subflow is used at a time
    • Path management: global, controlled via Netlink
    • MIB counters
    • Subflow context exposed via inet_diag
    • Selftests for multiple-subflow operation
    • Selftests for the Netlink path manager interface
    • Bug-fix
  • v5.8: Stabilisation and support more MPTCPv1 spec
    • Shared receive window across multiple subflows
    • A few optimisations
    • Bug-fix
  • v5.9: Stabilisation and support more MPTCPv1 spec
    • Token refactoring
    • KUnit tests
    • Receive buffer auto-tuning
    • diag features (list MPTCP connections using ss)
    • Full DATA FIN support
    • MPTCP SYN Cookie support
    • A few optimisations
    • Bug-fix
  • v5.10 (LTS): Send over multiple subflows at the same time and support more MPTCPv1 spec
    • Multiple xmit: possibility to send data over multiple subflows
    • ADD_ADDR support with echo-bit
    • REMOVE_ADDR support
    • A few optimisations
    • Bug-fix
  • v5.11: Performances and support more MPTCPv1 spec
    • Refines receive buffer autotuning
    • Improves GRO and RX coalescing with MPTCP skbs
    • Improve multiple xmit streams support
    • MP_ADD_ADDR v6 support
    • Sending MP_ADD_ADDR port support
    • Incoming MP_FAST_CLOSE support
    • A few optimisations
    • Bug-fix
  • v5.12: Good performances, PM events and support more MPTCPv1 spec
    • Accepting MP_JOIN to another port (after having sent an ADD_ADDR with this port) support
    • MP_PRIO support
    • Per connection netlink PM events
    • "Delegated actions" framework to improve communications between MPTCP socket and subflows
    • Support IPv4-mapped in IPv6 for additional subflows
    • Performances improvement
    • A few optimisations
    • Bug-fix
  • v5.13: Supporting more options and items from the protocol
    • Outgoing MP_FAST_CLOSE support
    • MP_TCPRST support
    • RM_ADDR: addresses' list support
    • Switch to next available address when a subflow creation fails
    • Support removing subflows with ID 0
    • New MIB counters: active MPC, token creation fallback
    • socket options:
      • only admit explicitly supported ones
      • support new ones: SO_KEEPALIVE, SO_PRIORITY, SO_RCV/SNDBUFF, SO_BINDTODEVICE/IFINDEX, SO_LINGER, SO_MARK, SO_INCOMING_CPU, SO_DEBUG, TCP_CONGESTION and TCP_INFO (TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT have been added in backports)
    • debug: new tracepoints support
    • Retransmit DATA_FIN support
    • MSG_TRUNC and MSG_PEEK support
    • A few optimisations/cleanup
    • Bug-fix
  • v5.14: Supporting more options and items from the protocol
    • Checksum support
    • MP_CAPABLE C flag support
    • Receive path cmsg support (e.g. timestamp)
    • MIB counters for invalid mapping
    • A few optimisations/cleanup (that might affect perfs)
    • Bug-fix
  • v5.15 (LTS): Supporting more options and usability improvements
    • MP_FAIL support (without TCP fallback / infinite mapping)
    • Packet scheduler improvements (especially with backup subflows)
    • Full mesh path management support
    • Refactoring of ADD_ADDR and ECHO handling
    • Memory and execution optimisation of option header transmit and receive
    • Bug-fix and small optimisations
  • v5.16: Supporting more socket options
    • Support for MPTCP_INFO socket option (similar to TCP_INFO) on the SOL_MPTCP (284) level + MPTCP_TCPINFO and MPTCP_SUBFLOW_ADDRS to get info per subflow
    • Default max additional subflows for the in-kernel PM is now set to 2
    • Batch SNMP operations
    • Bug-fix and optimisations
  • v5.17: Even more socket options
    • Support for new ioctls: SIOCINQ, OUTQ, and OUTQNSD
    • Support for new socket options: IP_TOS, IP_FREEBIND, IPV6_FREEBIND, IP_TRANSPARENT, IPV6_TRANSPARENT, TCP_CORK and TCP_NODELAY
    • Support for cmsgs: TCP_INQ
    • PM: Support changing the "backup" bit via Netlink (ip mptcp)
    • PM: Do not block subflows creation on errors
    • Packet scheduler improvement with better HoL-blocking estimation improving the stability
    • Support sending MP_FASTCLOSE option (quick shutdown of the full MPTCP connection, similar to TCP RST in regular TCP)
    • Bug-fix and optimisations
  • v5.18: Stabilisation
    • Support dynamic change of the Fullmesh PM flag
    • Support for new socket options: SNDTIMEO
    • Code cleanup:
      • Clarify when MPTCP options can be used together
      • Constify a bunch of helpers
      • Make some OPS structure Read-Only
    • Add MIBs for MP_FASTCLOSE and MP_RST
    • Add tracepoint in mptcp_sendmsg_frag()
    • Restricts RM_ADDR generation to previously explicitly announced ones
    • Send ADD_ADDR echo before creating subflows
  • v5.19: Userspace control and fallbacks
    • Support for MPTCP path manager in user space
    • Add MPTCP support for fallback to regular TCP for connections that have never connected additional subflows or transmitted out-of-sequence data (partial support for RFC8684 fallback)
    • Fallback or reset MPTCP connections in case of checksum issues (MP_FAIL and infinite mapping support)
    • Avoid races in MPTCP-level window tracking, stabilize and improve throughput
    • Make 'ss -Ml' show MPTCP listen sockets
    • BPF: Add BPF access to mptcp_sock structures and their metadata
  • v6.0: Initial subflow as Backup and memory optimisations
    • Support changes to initial subflow priority (set the initial subflow as backup)
    • Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one
  • v6.1 (LTS): User namespace and TFO sender support, send MP_FASTCLOSE like TCP RST
    • Allow privileged Netlink operations from user namespaces
    • TCP_FASTOPEN_CONNECT support for a client to initiate MPTCP + TFO connections (data in the SYN). Note that the server support is still being developed
    • MP_FASTCLOSE are being sent in case of errors (equivalent to TCP RESET) and in more edge scenarios to mimic TCP behaviour
  • v6.2: TFO receiver support
    • TFO receiver support
    • MSG_FASTOPEN's sendmsg() flag support
    • Support of more socket options: TCP_FASTOPEN, TCP_FASTOPEN_KEY, TCP_FASTOPEN_NO_COOKIE
    • Cleaner messages in case of error when creating endpoint
    • Add Path Manager "listener" Netlink events for the userspace path manager
  • v6.3: ProcFS info and mix v4/v6 subflows
    • Add statistics for MPTCP sockets in use in /proc/net/protocols
    • Path-Manager: in-kernel: allow using mixed IPv4 and IPv6 addresses
    • Some clean-up and small improvements (MPTCP and selftests)
  • v6.4: Improvement around the reception of connection requests
    • Refactoring around the reception of MPC/MPJ connection requests
    • getsockopt(SOL_MPTCP, MPTCP_INFO) and Netlink (ss -M): do not fill info not used by the PM in used
    • Move first subflow allocation at MPC access time
  • v6.5: More exposed info
    • LSM/SELinux: correctly inherit labels on MPTCP subflows
    • New ADD_ADDR (+ echo) transmission MIB counters
    • New aggregated data counters exposed via Netlink and getsockopt(SOL_MPTCP, MPTCP_INFO)
    • New getsockopt(SOL_MPTCP, MPTCP_FULL_INFO) aggregating MPTCP and subflows info (with ID)
    • Some clean-up and small improvements (MPTCP and selftests + support of old kernels)
  • v6.6: Forcing using MPTCP with BPF
    • Allow forcing using MPTCP with BPF (example)
    • Refactoring to get rid of msk->subflow
    • Preparation for future extension of the packet scheduler
    • Some improvements in the selftests: TAP for subtests, uniformity, colours
  • v6.7: MPTCP YNL and packet scheduler improvements
    • Convert Netlink code to use YAML spec for better API validation and documentation, see YNL
    • New sysctl for make after break timeout: net.mptcp.close_timeout
    • Support SO_RCVLOWAT socket option (instead of ignoring it)
    • Ignore net.ipv4.tcp_notsent_lowat at subflow level not to foul the packet scheduler
    • Reduce overhead on transmit part
    • Refactor sndbuf auto-tuning to improve the situation when being limited by the sent buffer
    • Some clean-up in MPTCP code and selftests
  • v6.8: New counters
    • New MPTCP_INFO and Netlink (ss -M) counter: subflows_total, taking into account the initial subflow (compared to subflows which only looks at additional subflows)
    • New Current Established (CurrEstab) MPTCP MIB counter, showing the current number of established MPTCP connections, visible with nstat for example.
    • Support IP_LOCAL_PORT_RANGE and IP_BIND_ADDRESS_NO_PORT socket options.
    • Some code refactoring and sharing resources in selftests scripts.
  • v6.9: TCP_NOTSENT_LOWAT, new Netlink commands, selftests
    • Support TCP_NOTSENT_LOWAT socket option.
    • Userspace Path-Manager: support new dump addrs and get addr Netlink commands
    • Some improvements in the selftests: colours, less duplicated code, shellcheck compliant, uniformity
    • Debug: check the protocol in (mp)tcp_sk() with DEBUG_NET, annotate lockless accesses, clean-ups
    • Improved CI support: BPF, KVM support, ShellCheck, notifications.
  • v6.10: New info, sockopt, tracing
    • "Last time" info in getsockopt(SOL_MPTCP, MPTCP_INFO) and Netlink Diag (ss -M)
    • getsockopt(SOL_TCP, TCP_IS_MPTCP) to check for MPTCP fallback to TCP
    • Optimisation if getsockopt(SOL_MPTCP, MPTCP_INFO) is used with no buffer (optlen == 0)
    • net.mptcp.available_schedulers new sysctl knob
    • Possibility to trace reset reasons
    • Some improvements in the selftests: less duplicated code, options to use IPRoute2 for all tests
    • Improved CI support: Virtme NG, Tracking regressions
    • Dev tools: CLang and VSCode support in MPTCP Upstream Virtme Docker.
  • v6.11: Fixes
    • Mainly some fixes that have been / are being backported to stable versions.
    • Some small improvements to reduce duplicated code
  • v6.12: Cache Fallback
    • Fallback to TCP in case of MPTCP blackhole
    • Cache the info about the blackholes, to directly fallback to TCP for 1h+, see blackhole_timeout sysctl
    • New MIB counters for sent MP_JOIN
    • CI: Code coverage
  • v6.13: Notif on non-stale subflows, more lockless operations
    • Send ACK notifications (addresses and priority) on non-stale subflows
    • In case of missing DSS, tell the other peer there was a "middlebox interference" in the RST
    • Switch to a lockless way to list packet schedulers and to dump MPTCP endpoints
    • BPF selftests (1 2) to show how to set socket options per subflow
  • v6.14: Control SYN + MPC retransmissions
    • New syn_retrans_before_tcp_fallback sysctl to control the number of SYN + MP_CAPABLE retransmissions before falling back to TCP.
    • Selftests: more stats info in case of errors.
    • PM: preparation work to make it more modular
  • v6.13: TODO

Resources

Please visit our website for more details about MPTCP, a FAQ, and more.

Clone this wiki locally