C++ threads multithreading std::thread jthread
|

C++ Threads: Multithreading with std::thread & jthread Guide 2026

Why Multithreading Matters

Modern CPUs have multiple cores, but a single-threaded program only uses one of them. If your program needs to download files while processing data while updating a UI, doing these sequentially wastes time and cores. Multithreading lets you run multiple tasks simultaneously, using all available CPU resources.

Before C++11, there was no standard threading — you had to use platform-specific APIs (POSIX threads on Linux, Win32 threads on Windows). C++11 introduced std::thread in the <thread> header, giving C++ portable, standard multithreading. C++20 improved it further with std::jthread, which handles cleanup automatically.

Threading is powerful but dangerous. Shared mutable state between threads causes data races — one of the most difficult bugs to debug. This lesson teaches you how to create and manage threads. The next lesson on mutexes teaches you how to share data safely between them.

Creating Your First Thread

#include <iostream>
#include <thread>

void say_hello() {
    std::cout << "Hello from thread!
";
}

void count_to(int n) {
    for (int i = 1; i <= n; ++i) {
        std::cout << "Count: " << i << "
";
    }
}

int main() {
    // Create a thread that runs say_hello()
    std::thread t1(say_hello);

    // Create a thread with arguments
    std::thread t2(count_to, 5);

    std::cout << "Hello from main!
";

    // Wait for threads to finish
    t1.join();
    t2.join();

    // Output order is non-deterministic!
    // Main, t1, and t2 run concurrently
}

The key concept: once you create a std::thread, it starts running immediately. The constructor takes a callable (function, lambda, functor) and its arguments. The thread runs concurrently with the creating thread — you cannot predict which thread’s output appears first.

Passing Arguments to Threads

#include <iostream>
#include <thread>
#include <string>

void greet(const std::string& name, int times) {
    for (int i = 0; i < times; ++i) {
        std::cout << "Hello, " << name << "!
";
    }
}

void modify(int& value) {
    value = 42;
}

int main() {
    // Arguments are COPIED by default
    std::string name = "Alice";
    std::thread t1(greet, name, 3);

    // To pass by reference, use std::ref
    int x = 0;
    std::thread t2(modify, std::ref(x));

    t1.join();
    t2.join();

    std::cout << "x = " << x << "
";  // 42

    // Move-only types: use std::move
    auto data = std::make_unique<int>(100);
    std::thread t3([](std::unique_ptr<int> p) {
        std::cout << "Got: " << *p << "
";
    }, std::move(data));
    t3.join();
}

Important: thread arguments are copied by default, even if the function signature takes a reference. You must explicitly use std::ref() to pass by reference. This is a safety feature — it prevents dangling references if the calling thread’s variable goes out of scope before the new thread uses it.

join vs detach

#include <iostream>
#include <thread>
#include <chrono>

void long_task() {
    std::this_thread::sleep_for(std::chrono::seconds(2));
    std::cout << "Task complete!
";
}

void background_task() {
    for (int i = 0; i < 5; ++i) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    // This runs independently — may not finish before main exits
}

int main() {
    // join() — wait for thread to finish
    std::thread t1(long_task);
    std::cout << "Waiting for t1...
";
    t1.join();  // Blocks until long_task completes
    std::cout << "t1 done
";

    // detach() — let thread run independently
    std::thread t2(background_task);
    t2.detach();  // t2 runs on its own, main doesn't wait
    // WARNING: if main exits before t2 finishes, t2 is killed

    // CRITICAL RULE: You MUST call join() or detach() before
    // a std::thread object is destroyed. Otherwise, std::terminate
    // is called and your program crashes.

    // Check if joinable
    std::thread t3(long_task);
    if (t3.joinable()) {
        t3.join();
    }
}

The rule is simple but strict: every std::thread must be either joined or detached before it’s destroyed. If you forget, the destructor calls std::terminate() — an intentional crash. This is C++’s way of forcing you to think about thread lifecycle.

Thread with Lambda Functions

#include <iostream>
#include <thread>
#include <vector>
#include <numeric>

int main() {
    // Simple lambda thread
    std::thread t1([]() {
        std::cout << "Lambda thread running
";
    });
    t1.join();

    // Lambda with captures
    int result = 0;
    std::thread t2([&result]() {
        result = 42;  // Careful: data race if main reads concurrently
    });
    t2.join();  // Safe to read result after join
    std::cout << "Result: " << result << "
";

    // Parallel sum with multiple threads
    std::vector<int> data(1000, 1);
    int num_threads = 4;
    int chunk = data.size() / num_threads;
    std::vector<int> partial_sums(num_threads, 0);
    std::vector<std::thread> threads;

    for (int i = 0; i < num_threads; ++i) {
        int start = i * chunk;
        int end = (i == num_threads - 1) ? data.size() : start + chunk;
        threads.emplace_back([&data, &partial_sums, i, start, end]() {
            for (int j = start; j < end; ++j) {
                partial_sums[i] += data[j];
            }
        });
    }

    for (auto& t : threads) t.join();

    int total = std::accumulate(partial_sums.begin(), partial_sums.end(), 0);
    std::cout << "Total: " << total << "
";  // 1000
}

Thread with Class Methods

#include <iostream>
#include <thread>

class Worker {
public:
    void process(int id) {
        std::cout << "Worker processing task " << id << "
";
    }

    void run() {
        // Launch thread on member function
        std::thread t(&Worker::process, this, 42);
        t.join();
    }
};

class BackgroundService {
    bool running_ = true;
    std::thread worker_;

public:
    void start() {
        worker_ = std::thread(&BackgroundService::loop, this);
    }

    void stop() {
        running_ = false;
        if (worker_.joinable()) worker_.join();
    }

    ~BackgroundService() { stop(); }

private:
    void loop() {
        int count = 0;
        while (running_ && count < 5) {
            std::cout << "Service tick " << ++count << "
";
            std::this_thread::sleep_for(std::chrono::milliseconds(100));
        }
    }
};

int main() {
    Worker w;
    w.run();

    BackgroundService svc;
    svc.start();
    std::this_thread::sleep_for(std::chrono::milliseconds(300));
    svc.stop();
}

Hardware Concurrency

#include <iostream>
#include <thread>

int main() {
    // How many threads can run truly in parallel?
    unsigned int cores = std::thread::hardware_concurrency();
    std::cout << "CPU cores: " << cores << "
";

    // Returns 0 if it can't determine (rare)
    // Typical values: 4, 8, 12, 16, 32

    // Rule of thumb:
    // CPU-bound work: use hardware_concurrency() threads
    // I/O-bound work: can use more threads (they spend time waiting)
    // Don't create thousands of threads — use a thread pool instead

    // Get current thread's ID
    std::cout << "Main thread ID: " << std::this_thread::get_id() << "
";

    std::thread t([]() {
        std::cout << "Worker thread ID: "
                  << std::this_thread::get_id() << "
";
    });
    t.join();

    // Yield: hint to scheduler to let other threads run
    std::this_thread::yield();

    // Sleep
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    std::this_thread::sleep_until(
        std::chrono::steady_clock::now() + std::chrono::seconds(1)
    );
}

Data Races — The Fundamental Problem

#include <iostream>
#include <thread>
#include <vector>

int main() {
    // DATA RACE: undefined behavior!
    int counter = 0;

    auto increment = [&counter]() {
        for (int i = 0; i < 100000; ++i) {
            ++counter;  // Not atomic — read, modify, write
        }
    };

    std::thread t1(increment);
    std::thread t2(increment);
    t1.join();
    t2.join();

    // Expected: 200000
    // Actual: unpredictable (might be 150000, 180000, etc.)
    std::cout << "Counter: " << counter << "
";

    // WHY: ++counter is three steps:
    // 1. Read counter (say, 5)
    // 2. Add 1 (6)
    // 3. Write back (6)
    // If two threads read 5 simultaneously, both write 6 — one increment lost

    // Quick fix: std::atomic (for simple types)
    #include <atomic>
    std::atomic<int> safe_counter{0};
    auto safe_inc = [&safe_counter]() {
        for (int i = 0; i < 100000; ++i) {
            ++safe_counter;  // Atomic — guaranteed correct
        }
    };

    std::thread t3(safe_inc);
    std::thread t4(safe_inc);
    t3.join();
    t4.join();
    std::cout << "Safe counter: " << safe_counter << "
";  // Always 200000
}

Data races are undefined behavior in C++. Not “wrong results” — undefined behavior means the compiler can do anything, including optimizing away your synchronization or producing impossible results. The next lesson on mutexes covers the full solution for protecting complex shared state.

C++20 jthread — The Better Thread

#include <iostream>
#include <thread>
#include <chrono>

int main() {
    // jthread automatically joins on destruction — no need to call join()
    {
        std::jthread t([]() {
            std::cout << "jthread runs
";
        });
        // t.join() NOT needed — destructor handles it
    }  // t destroyed here, automatically joined

    // jthread supports cooperative cancellation via stop_token
    std::jthread worker([](std::stop_token stoken) {
        int i = 0;
        while (!stoken.stop_requested()) {
            std::cout << "Working... " << ++i << "
";
            std::this_thread::sleep_for(std::chrono::milliseconds(100));
        }
        std::cout << "Stop requested, cleaning up
";
    });

    std::this_thread::sleep_for(std::chrono::milliseconds(350));
    worker.request_stop();  // Politely ask thread to stop
    // worker joins automatically when destroyed

    // stop_callback: register action when stop is requested
    std::jthread t2([](std::stop_token st) {
        std::stop_callback cb(st, []() {
            std::cout << "Callback: stop was requested!
";
        });
        std::this_thread::sleep_for(std::chrono::seconds(5));
    });

    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    t2.request_stop();
    // Callback fires immediately since stop was requested
}

std::jthread fixes the two biggest pain points of std::thread: forgetting to join (which crashes) and having no standard way to ask a thread to stop. In new code, prefer jthread over thread unless you need C++17 compatibility.

Thread-Local Storage

#include <iostream>
#include <thread>

// Each thread gets its own copy of this variable
thread_local int tl_counter = 0;

void count_up(const std::string& name) {
    for (int i = 0; i < 3; ++i) {
        ++tl_counter;
        std::cout << name << ": tl_counter = " << tl_counter << "
";
    }
}

int main() {
    std::thread t1(count_up, "Thread1");
    std::thread t2(count_up, "Thread2");
    t1.join();
    t2.join();

    // Each thread counts 1, 2, 3 independently
    // No interference, no data race
    std::cout << "Main: tl_counter = " << tl_counter << "
";  // 0
}

Returning Values from Threads

#include <iostream>
#include <thread>
#include <future>
#include <numeric>
#include <vector>

int main() {
    // Method 1: Output parameter with std::ref
    int result1 = 0;
    std::thread t1([](int& out) { out = 42; }, std::ref(result1));
    t1.join();
    std::cout << "Result1: " << result1 << "
";

    // Method 2: std::promise/future (covered in async lesson)
    std::promise<int> promise;
    std::future<int> future = promise.get_future();

    std::thread t2([](std::promise<int> p) {
        p.set_value(100);
    }, std::move(promise));

    std::cout << "Result2: " << future.get() << "
";  // 100
    t2.join();

    // Method 3: std::async (simplest — see async lesson)
    auto future3 = std::async(std::launch::async, []() {
        return 200;
    });
    std::cout << "Result3: " << future3.get() << "
";
}

Common Mistakes

#include <thread>

// MISTAKE 1: Forgetting to join or detach
void leak_thread() {
    std::thread t([]() { /* work */ });
    // t destroyed without join or detach → std::terminate()!
}

// MISTAKE 2: Accessing local variables after detach
void dangling_reference() {
    int local = 42;
    std::thread t([&local]() {
        // local might be destroyed before thread reads it!
        // Use value capture [local] or std::ref carefully
    });
    t.detach();
}  // local destroyed here, thread may still be running

// MISTAKE 3: Cout interleaving
// std::cout << is NOT atomic — two threads writing simultaneously
// produces garbled output. Use a mutex to serialize output.

// MISTAKE 4: Too many threads
void too_many() {
    // Creating 10,000 threads is a bad idea
    // Each thread costs ~1-8MB of stack memory
    // Use a thread pool for many short tasks
}

Practice Exercises

Exercise 1: Write a program that creates 4 threads, each computing the sum of one quarter of a large array. Combine the results in the main thread.

Exercise 2: Create a std::jthread that prints a message every second until stop is requested. Have main request stop after 5 seconds.

Exercise 3: Write a parallel file downloader simulation: create N threads that each “download” a file (sleep for a random duration) and print when done. Track how many are complete using std::atomic<int>.

Exercise 4: Demonstrate the data race problem: create a shared counter incremented by two threads without synchronization, then fix it with std::atomic. Show the different results.

Threads give you parallelism, but sharing data between them requires careful synchronization. The next lesson covers mutexes and synchronization primitives — the tools that make shared state safe.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *