𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝗡𝗶𝗺𝗯𝗹𝗲𝗘𝗱𝗴𝗲 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝘁𝗲𝗮𝗺 𝗺𝗲𝗺𝗯𝗲𝗿, Arpit Saxena'𝘀 𝗹𝗮𝘁𝗲𝘀𝘁 𝗯𝗹𝗼𝗴 𝗼𝗻 𝗵𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗺𝗼𝗱𝗲𝗹𝘀 🗒️ In the blog, Arpit breaks down hardware memory models and the complexities of relaxed concurrency, focusing mostly on ARM and IBM POWER architectures, while also motivating the C++ memory model 💡 Ideal for developers looking to deepen their understanding of low-level memory synchronization, this blog offers valuable insights into ensuring correctness while squeezing out performance! https://lnkd.in/gS9_TqX7
NimbleEdge’s Post
More Relevant Posts
-
What do you think about this: Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood (https://lnkd.in/gTJJmjhE)
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood
blog.codingconfessions.com
To view or add a comment, sign in
-
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood https://lnkd.in/eQTs42Mu
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood
blog.codingconfessions.com
To view or add a comment, sign in
-
Caught up with this today: Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood (https://lnkd.in/gVPrEtEy)
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood
blog.codingconfessions.com
To view or add a comment, sign in
-
C++ Memory Models and Atomic Operations. --------------------------------------------------- Modern applications frequently involve concurrency to leverage multicore processors for faster performance. However, concurrency introduces challenges in memory management, data access synchronization, and consistency. This article explores the C++ memory model and atomic operations, which are essential for writing efficient, safe concurrent programs. We’ll cover concepts such as memory ordering, the C++ memory model’s rules, atomic operations, and practical examples to illustrate the correct usage of these concepts. https://lnkd.in/gfYUzmns Read more ... https://lnkd.in/gAPu8y69
Understanding Atomics and Memory Ordering
dev.to
To view or add a comment, sign in
-
The `volatile` keyword serves several critical purposes in embedded systems: 1. Prevents Optimization ```c volatile uint8_t status_register; while(status_register == 0) { // Wait for status change } ``` Without `volatile`, the compiler might optimize this to an infinite loop, assuming the variable can't change. With `volatile`, it rereads the register each time. 2. Key Use Cases: - Hardware Registers: Memory-mapped registers that can change independently of program flow ```c volatile uint32_t* UART_STATUS = (volatile uint32_t*)0x40001000; ``` - ISR (Interrupt Service Routine) Shared Variables ```c volatile bool flag_from_interrupt = false; void ISR_Handler(void) { flag_from_interrupt = true; } ``` - Memory shared between multiple threads/processes ```c volatile uint32_t shared_counter = 0; ``` 3. Common Issues It Prevents: - Race conditions between interrupts and main code - Missed hardware status changes - Incorrect optimization in timing-critical code 4. Important Considerations: - `volatile` doesn't guarantee atomic operations - It can impact performance due to forced memory reads/writes - Should only be used when necessary, as it prevents certain optimizations
To view or add a comment, sign in
-
JEP 491 (https://lnkd.in/gNsqsNGY) is here. More than anything, this JEP lets you know that JDK community is taking the pinning issue seriously. In the section "The reason for pinning", Alan has explained the reason behind pinning in layman's terms. Even for a person, who hasn't gone through the jdk runtime source code, it isn't that difficult to guess why synchronization has pinning problem. synchronized blocks must be in someway be aware of control flow jumping out of its block. Sequential execution doesn't pose any challenge, whereas the non-sequential ones like return or exception do need the synchronization mechanism be aware of the native stack frames in some way (think about setjmp and longjmp). This is the likely reason for pinning. Let's do a thought experiment where there is no pinning for synchronization blocks. Say a task is running in virtual thread VT1, which is mounted in platform thread PT1. The task enters the synchronized scope. Synchronization mechanism needs to remember the native stack frame and stores something from PT1. The task blocks in some I/O. Runtime unmounts the VT1 out of PT1. After sometime, I/O gets unblocked and the task is ready to run. Runtime happens to mount VT1 in a different platform thread, PT2 since PT1 is busy. However, the native frame of PT1 is not relevant in PT2. We end up in memory access violation. This is why runtime avoids unmounting a virtual thread when it is inside the synchronized block and VT ends up getting pinned. As you can see, not just synchronization, any mechanism that uses native stack frame will have pinning problem. There are other cases that use native frames, but those are no way near in importance or prevalence as synchronization blocks. The fix? Making sure that the synchronization mechanism, in other words, object monitor management is independent of the platform thread. Easier said than done. Watch out for changes in src/hotspot/share/runtime files. Since native frames are involved, changes are likely in cpu specific files too (src/hotspot/cpu). Till the JEP is delivered, let us use this opportunity to get rid of synchronized keyword. ReentrantLock and Condition APIs make the code much more expressive and robust. Of course, older codes are much difficult to migrate. Hopefully with JEP 491, we wouldn't be worrying about pinning anymore. Disclaimer: Opinion is strictly mine.
JEP 491: Synchronize Virtual Threads without Pinning
openjdk.org
To view or add a comment, sign in
-
How to Retrieve System Information Using The CPUID Instruction When developing a bootloader/kernel, understanding the underlying architecture is crucial for optimizing performance and compatibility between software and hardware. One important yet sometimes overlooked tool available to engineers for querying and ... Read mode on following blog post!
How to Retrieve System Information Using The CPUID Instruction
freecodecamp.org
To view or add a comment, sign in
-
How Much Stack Memory Do I Need for My ARM Cortex-M Applications As embedded developers, determining the right amount of stack memory is crucial for the performance and reliability of your application. Too little can lead to stack overflows and system crashes, while too much wastes precious RAM resources. 💡 Here's a quick breakdown of key considerations: Stack Memory Usage: Function calls, local variables, and interrupts all consume stack space. Challenges in Sizing: Interrupt nesting, recursion, and dynamic function calls make it hard to predict usage. Approaches: Static analysis to estimate usage. Use runtime monitoring tools like Arm's Keil MDK to track actual stack usage. Test under peak conditions for worst-case estimation. Tools: Profiling and analysis tools help optimize and ensure you are using just the right amount of stack space without wasting RAM. Accurate stack sizing leads to better efficiency and performance in your ARM Cortex-M applications. Start small, monitor, and adjust as needed! ⚙️ Want to dive deeper? Check out the full post on Arm Community Blog: https://lnkd.in/dpHChS5U #EmbeddedSystems #ARM #CortexM #StackMemory #RTOS #TechLearning
How much Stack Memory do Cortex-M applications need
community.arm.com
To view or add a comment, sign in
-
How to Retrieve System Information Using The CPUID Instruction When developing a bootloader/kernel, understanding the underlying architecture is crucial for optimizing performance and compatibility between software and hardware. One important yet sometimes overlooked tool available to engineers for querying and ... Read mode on following blog post!
How to Retrieve System Information Using The CPUID Instruction
freecodecamp.org
To view or add a comment, sign in
-
As I delved into perf, I noticed that much of its documentation is scattered across the internet, making it challenging to grasp the full scope of its capabilities. What I feel lacking was a connection between the high-level software perspective and the low-level architecture that perf provides insights into. For learning and sharing purposes, I will start a series about using perf. It will go through the low-level overview we need and all the way to step-by-step commands to profile and analyze with perf. For this first post, I will introduce the overview of modern CPU architecture, which I believe should be learned by every developer caring about performance. https://lnkd.in/gFMm2agn
Introduction to Modern Processor Architecture (Part 1)
http://phamlh.wordpress.com
To view or add a comment, sign in