The `volatile` keyword serves several critical purposes in embedded systems: 1. Prevents Optimization ```c volatile uint8_t status_register; while(status_register == 0) { // Wait for status change } ``` Without `volatile`, the compiler might optimize this to an infinite loop, assuming the variable can't change. With `volatile`, it rereads the register each time. 2. Key Use Cases: - Hardware Registers: Memory-mapped registers that can change independently of program flow ```c volatile uint32_t* UART_STATUS = (volatile uint32_t*)0x40001000; ``` - ISR (Interrupt Service Routine) Shared Variables ```c volatile bool flag_from_interrupt = false; void ISR_Handler(void) { flag_from_interrupt = true; } ``` - Memory shared between multiple threads/processes ```c volatile uint32_t shared_counter = 0; ``` 3. Common Issues It Prevents: - Race conditions between interrupts and main code - Missed hardware status changes - Incorrect optimization in timing-critical code 4. Important Considerations: - `volatile` doesn't guarantee atomic operations - It can impact performance due to forced memory reads/writes - Should only be used when necessary, as it prevents certain optimizations
Nagesh B’s Post
More Relevant Posts
-
#day16 #embeddedsystems DSA: Reference Variable: int &a=b; 👉🏻 same memory location as the variable it is referencing to but different variable name 👉🏻Need: To pass arguments in a function by reference 👉🏻Why returning reference/pointer in a function is a bad practice: The scope of that variable would be to that function only so when that function returns, that memory block might have a different value. 👉🏻Why giving size of array at runtime(user input) is a bad practice: when array size is specified at compile time, the compiler increases stack size if array is too big. If size is given at runtime, stack might get overflowed. 👉🏻Solution: to give array size at runtime, use dynamic memory allocation which uses Heap and returns an address to first memory address of array/variable. 👉🏻Pitfall: while(1){ int a=1; // stored in stack int *p=new int[1000000]; // p stored in stack, 10^6 int array stored in heap }Here, every iteration a and p's scope ends so they are cleared automatically but heap must be manually cleared from memory by 'delete' keyword otherwise program will crash. ARM Cortex M4: Stack memory: 👉🏻It is Full(For storing-increment then store, For loading-load then decrement) and Descending(Stack starts at higher memory address and grows to lower addresses) format. 👉🏻Thread Mode(All your application execute under Thread mode. This is also called as user mode) 👉🏻Handler Mode(All your exceptions handlers execute under Handler mode) 👉🏻Privileged v/s non privileged mode. Handler mode is always privileged. 👉🏻MSP(Main Stack Pointer): This is the default stack pointer used after reset, and is used for all exception /interrupt handlers and for codes that run on Thread Mode. 👉🏻PSP(Process Stack Pointer):This is an alternative stack pointer that can only be used in thread mode. It is usually used for application task in embedded systems and Embedded OS. 👉🏻Understood AAPCS standard with caller and callee function responsibilities. Understood what happens in stack when interrupt occurs 👉🏻Wrote inline assembly code and generated an exception to verify if SP changes from PSP to MSP or not as shown in attached image:
To view or add a comment, sign in
-
JEP 491 (https://lnkd.in/gNsqsNGY) is here. More than anything, this JEP lets you know that JDK community is taking the pinning issue seriously. In the section "The reason for pinning", Alan has explained the reason behind pinning in layman's terms. Even for a person, who hasn't gone through the jdk runtime source code, it isn't that difficult to guess why synchronization has pinning problem. synchronized blocks must be in someway be aware of control flow jumping out of its block. Sequential execution doesn't pose any challenge, whereas the non-sequential ones like return or exception do need the synchronization mechanism be aware of the native stack frames in some way (think about setjmp and longjmp). This is the likely reason for pinning. Let's do a thought experiment where there is no pinning for synchronization blocks. Say a task is running in virtual thread VT1, which is mounted in platform thread PT1. The task enters the synchronized scope. Synchronization mechanism needs to remember the native stack frame and stores something from PT1. The task blocks in some I/O. Runtime unmounts the VT1 out of PT1. After sometime, I/O gets unblocked and the task is ready to run. Runtime happens to mount VT1 in a different platform thread, PT2 since PT1 is busy. However, the native frame of PT1 is not relevant in PT2. We end up in memory access violation. This is why runtime avoids unmounting a virtual thread when it is inside the synchronized block and VT ends up getting pinned. As you can see, not just synchronization, any mechanism that uses native stack frame will have pinning problem. There are other cases that use native frames, but those are no way near in importance or prevalence as synchronization blocks. The fix? Making sure that the synchronization mechanism, in other words, object monitor management is independent of the platform thread. Easier said than done. Watch out for changes in src/hotspot/share/runtime files. Since native frames are involved, changes are likely in cpu specific files too (src/hotspot/cpu). Till the JEP is delivered, let us use this opportunity to get rid of synchronized keyword. ReentrantLock and Condition APIs make the code much more expressive and robust. Of course, older codes are much difficult to migrate. Hopefully with JEP 491, we wouldn't be worrying about pinning anymore. Disclaimer: Opinion is strictly mine.
To view or add a comment, sign in
-
𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝗡𝗶𝗺𝗯𝗹𝗲𝗘𝗱𝗴𝗲 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝘁𝗲𝗮𝗺 𝗺𝗲𝗺𝗯𝗲𝗿, Arpit Saxena'𝘀 𝗹𝗮𝘁𝗲𝘀𝘁 𝗯𝗹𝗼𝗴 𝗼𝗻 𝗵𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗺𝗼𝗱𝗲𝗹𝘀 🗒️ In the blog, Arpit breaks down hardware memory models and the complexities of relaxed concurrency, focusing mostly on ARM and IBM POWER architectures, while also motivating the C++ memory model 💡 Ideal for developers looking to deepen their understanding of low-level memory synchronization, this blog offers valuable insights into ensuring correctness while squeezing out performance! https://lnkd.in/gS9_TqX7
Hardware Memory Models
arpit-saxena.com
To view or add a comment, sign in
-
RISC (Reduced Instruction Set Computer): -Simplified Instruction Set: Fewer instructions, which are typically uniform in length. -Execution Speed: Instructions execute in a single clock cycle due to their simplicity. -Hardware Complexity: Simple, allowing for more efficient pipelining. -Focus on Software: More emphasis on the compiler for optimizing the instruction set. CISC (Complex Instruction Set Computer): Complex Instruction Set: A larger and more versatile set of instructions, some of which are multi-step operations. Direct Memory Access: CISC allows operations to be performed directly on memory, reducing the need for intermediate load/store operations. Hardware Complexity: More complex, making pipelining difficult but providing a wide variety of instructions. Focus on Hardware: The complexity is managed at the hardware level, potentially reducing the need for extensive compiler optimization.
To view or add a comment, sign in
-
How much it is hard to develop new architecture and fit into that. ARM and X86 Just dominating people have challenges to overcome thus hurdles. So, usual Os, software won't fit into our architecture is the hard part. To run existing software on a new architecture without extensive manual adaptation, you can consider these approaches: 1. Emulation and Virtualization: Emulators can mimic other architectures, allowing existing software to run, though often with some performance trade-offs (e.g., Apple’s Rosetta 2 for ARM-based Macs). 2. Cross-compilation Tools: Compilers like LLVM can help convert code to run on different architectures, though they may require tweaks for optimization. 3. Binary Translation: This dynamic technique converts executable code from one architecture to another at runtime, improving compatibility but may impact speed. So we have some small ways, but that is not efficient, so is there any other solution? [email protected]
To view or add a comment, sign in
-
What do you think about this: Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood (https://lnkd.in/gTJJmjhE)
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood
blog.codingconfessions.com
To view or add a comment, sign in
-
Get up and running in system performance optimization at a high level. 1. How performance problems looks like? 2. How we measure performance? 3. What principle performance of a system depends on? 4. How do we design or improve performance of a system? 5. How do you minimize a latency of a system? 6. How do we maximize CPU utilization? 7. How do we minimize memory related latency? 8. How do we minimize network related latency while transferring data? 9. How do we improve disk utilization and minimize disk latency? 10. How do we improve through put of a system? 11. What is the place of locking and coherence place in terms of improving or degrading the performance of a system? 12. Two types of locking? 13. What is pessimistic locking? 14. What is optimistic locking? 15. Where do we use them to improve performance of a system? 16. How do we cache static data? 17. How do we cache dynamic data? 18. What are the challenges involved in caching of data? 19. How caching can help to improve performance of a system? #coding #systemdesign #programming
To view or add a comment, sign in
-
Cache this, cache that.... kept hearing this term is system design ang beyond. Dived deeper into a rabbithole to understand on why is it required at all? Well, to start with, caches are used almost everywhere, whether we explicitly use it, or not: - CPU has it's own L1, L2, L3 caches - Same for storage devices like hard disk - File Data is generally written to file cache ( in memory ) - Browser and a lot of applications have caches to store static data like multimedia( Have you ever noticed websites working smoothly after the first load? It's because static assets are mostly cached by the browser ) -Those practicing or having some experience with #dsa might've heard of Dynamic Programming They are mainly used for the following reasons: - Store "frequently" accessed data - Store data "closer" to the "request: for reducing the latency in fetching the data. Releasing a video later, diving deeper on usage of cache across various levels of a system and beyond. Meanwhile, I leave you with this image as a trailer: #software #systemdesign
To view or add a comment, sign in
-
Technical detail on what caused #CrowdStrike's woes is hard to find as yet, but the available info suggests an invalid pointer was used deep inside the kernel - paging faults inside the kernel have absolutely nowhere to go, other than the blue screen. This shows, yet again, that memory safety is too important to leave to software to fix. It also suggests that some of the more complex parts of modern processors' memory subsystems might be just too complex to use inside the OS kernel. Hardware-based memory safety approaches (eg #CHERI, VyperCore) can solve challenges that Rust software can't reach. But new memory abstractions might be needed to avoid the possibility of something inside the kernel going AWOL in the first place. Initial thinking suggests this is something that #VyperCore's new memory abstraction can help with. We'll write more once we've delved deeper.
To view or add a comment, sign in
-
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood https://lnkd.in/eQTs42Mu
Two Threads, One Core: How Simultaneous Multithreading Works Under the Hood
blog.codingconfessions.com
To view or add a comment, sign in
Senior Embedded Design Engineer - Software Design
1moLove this