Ravi Mogha’s Post

5mo

#What is ForEach Activity in Azure Data Factory? The ForEach Activity defines a repeating control flow in an Azure Data Factory or Synapse pipeline. This activity is used to iterate over a collection and executes specified activities in a loop. The loop implementation of this activity is similar to Foreach looping structure in programming languages. #dataanalyst #dataengineering #Azure #ADF

To view or add a comment, sign in

More Relevant Posts

MORUF NOFIU

Innovative Data Scientist | First Class Mathematician (4.88/5.00)| Python Developer Wizard | Passionate Educator| R Programmer | Innovator | 🌐 Empowering Tech Talent Worldwide
7mo
Report this post
Apache Spark offers data engineers a powerful platform for transforming and processing data. The ability to include Spark notebooks in a pipeline enables you to automate Spark processing and integrate it into a data integration workflow. Data Engineering milestones completion https://lnkd.in/dTc2STV6

Use Spark Notebooks in an Azure Synapse Pipeline

learn.microsoft.com
Like Comment
To view or add a comment, sign in
Avideh Sadeghi

Data Scientist & Azure Certified Engineer | Expert in Business Intelligence & Scalable Cloud Solutions | Proficient in SQL, ETL, OLAP, and Data Visualization with Power BI
4mo
Report this post
🌟 Exciting Learning Journey! 🌟 I have expanded my skill set with Apache Spark and Azure Synapse Analytics. 📣 Here's what I've achieved: 📝 Describe Notebook and Pipeline Integration: Understanding how to seamlessly connect notebooks with data pipelines for streamlined data processing workflows. 📝 Use a Synapse Notebook Activity in a Pipeline: Implementing notebooks as activities within Azure Synapse Analytics pipelines to execute complex data processing tasks efficiently. 📝 Use Parameters with a Notebook Activity: Leveraging parameters to make notebook activities more dynamic and reusable, enhancing the flexibility and scalability of data workflows. These new skills empower me to leverage scalable, distributed data processing platforms and seamlessly integrate them into Azure Synapse Analytics pipelines. #DataEngineering #ApacheSpark #AzureSynapse #BigData #DataScience #DataAnalytics #MachineLearning #DataProcessing #CloudComputing #Azure #ETL #DataIntegration #DistributedComputing #PipelineIntegration #NotebookActivity #SynapseAnalytics #AzureDataFactory #SQL #Python #MicrosoftAzure #TechSkills #CareerGrowth #ProfessionalDevelopment #DataDriven

Use Spark Notebooks in an Azure Synapse Pipeline

learn.microsoft.com

1 Comment
Like Comment
To view or add a comment, sign in
👨🏻💻 Jaco van Gelder

Staff Data Engineer a.i. @ IKEA 🪑🟡🔵 Instructor & MVP @ Databricks 🧱
5mo Edited
Report this post
More and more people are migrating their Databricks workloads to Unity Catalog. This is great news! 🧱 However, if your team has been using Databricks for a long time, this can be a VERY painful migration. I know from experience and witness it every time. Why you ask? Because you need to use a different cluster with a different security model. Unity Catalog needs a cluster with Shared Access mode and most people are still using a Non-Isolation Shared (job) cluster. This security model will enforce some refactoring of "old" logic. Here's a couple of things/examples to keep in mind: 🧱 Tables no longer need to be created in the hive_metastore catalog but in the right (UC) catalog. Setting a default catalog in your cluster settings greatly helps here. 🧱 You can't directly interact with delta's files anymore, I mean technically you can, but this will not trigger data lineage or query insights. Make sure you mention the catalog, schema and table in your logic and not the storage location. 🧱 If you want to work with external data, you'll need to create a volume. Reading from DBFS mounts is no longer supported, as it interferes with the new UC security model, unless you use a single user cluster. 🧱 Directly querying JDBC or creating JDBC federated tables no longer works in Unity Catalog. You need to use lakehouse federation and create connections for external databases. 🧱 Some legacy Spark classes and/or Scala UDFs no longer work. For example: spark.catalog._jcatalog doesn't work anymore, you'll need to use the information_schema databases/schema's. Also make sure to check out Databricks UCX: Databricks' companion for upgrading to Unity Catalog. Link here: https://lnkd.in/eWrx-4s9. It will definitely help if you've got a lot of migration work to do. Databricks also has a more clickity way of upgrading data assets in the workspace manually. Any specific questions, feel free to hit me up 🤙 #databricks #dataengineering #unitycatalog

9 Comments
Like Comment
To view or add a comment, sign in
Mahmud Oyinloye

Business Intelligence Engineer
6mo
Report this post
We’re thrilled to announce a significant breakthrough in data transmission within the financial service sector, powered by our deep knowledge of distributed computing (#ApacheSpark) and Azure Data Engineering. Our project, now live after a rigorous testing phase, involves a robust process flow that begins with reading JSON data from Azure Data Lake Storage. This data is then processed using Azure Synapse Analytics Notebook and finally written to a SQL warehouse (responses). The core transmission script, written in Python, uses PySpark to process large amounts of data in parallel. It efficiently handles token expiration and sends a large amount of data using asynchronous requests and multi-threading. With this setup, we’ve successfully transmitted over 10 million rows of data in under 3 hours via our API endpoint. Proud of this achievement and excited for what’s next! #DataTransmission #FinancialServices #DistributedComputing #AzureSynpase #ADF #AzureDataEngineering #Innovation
2 Comments
Like Comment
To view or add a comment, sign in
MARTIN WAMBUI

Data Engineer
1mo Edited
Report this post
Preparation for DP-600 Certification 😋 Getting Data Into Fabric Here's a breakdown of how data gets into #MicrosoftFabric: #DataIngestion (ETL/ELT): Fabric supports various data ingestion methods such as: Dataflow: Prepares data using Power Query and transforms it for use in reports and datasets. Data Pipeline: Orchestrates complex data integration processes. Notebook: Leverages code (like Python or Spark) to handle large-scale data processing. Eventstream: Real-time ingestion of event data. #Shortcuts: Speed up access to your external and internal data by creating shortcuts to: External Sources (Amazon S3, ADLS, Google Storage, Dataverse). Internal Sources (Lakehouse, Warehouse, and KQL tables). #DatabaseMirroring: Easily mirror your data from external databases such as: Snowflake CosmosDB Azure SQL Thanks to https://lnkd.in/dZYTftVZ #DataEngineering #MicrosoftFabric #DataIntegration #CloudComputing
4 Comments
Like Comment
To view or add a comment, sign in
Navinkumar M

👨🏻💻 Data Engineer | 🏭 ETL | 📊 Data warehouse | Snowflake | Temenos data hub (TDH/TDE)
6mo Edited
Report this post
😃 Excited to share another very cool project I have completed: "𝗔𝘇𝘂𝗿𝗲-𝗔𝗱𝘃𝗲𝗻𝘁𝘂𝗿𝗲-𝗪𝗼𝗿𝗸𝘀-𝗘𝗻𝗱_𝘁𝗼_𝗘𝗻𝗱-𝗗𝗮𝘁𝗮-𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲-𝗣𝗿𝗼𝗷𝗲𝗰𝘁". 🙇🏽♂️ Firstly, a big thanks to Mr. K Talks Tech for the fantastic demonstration video on YouTube. 🔹 𝗣𝗿𝗼𝗷𝗲𝗰𝘁 𝗢𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲: The objective was to create an end-to-end pipeline to ingest data from an on-prem server to the cloud at specific intervals. 🔹 𝗧𝗼𝗼𝗹𝘀 𝘂𝘀𝗲𝗱: Ingestion: Azure Integration Runtime & ADF (Azure Data Factory) Transformation: Azure Databricks Loading: Azure Synapse Analytics Reporting: Power BI 🔹 𝗔𝗯𝗼𝘂𝘁 𝗖𝗜/𝗖𝗗: Whenever code is pushed to the main branch, it is automatically deployed to the production environment. Dev Environment: Continuous Integration Prod Environment: Continuous Deployment. 🔗 GitHub Link: https://lnkd.in/gVvBrdsM Links: project: https://lnkd.in/g96Vp-yx CI/CD : https://lnkd.in/gB3xy9hD 💬 𝘿𝙧𝙤𝙥 𝙖 𝙘𝙤𝙢𝙢𝙚𝙣𝙩 𝙗𝙚𝙡𝙤𝙬 𝙖𝙣𝙙 𝙧𝙚𝙥𝙤𝙨𝙩 𝙞𝙛 𝙮𝙤𝙪 𝙛𝙞𝙣𝙙 𝙩𝙝𝙞𝙨 𝙪𝙨𝙚𝙛𝙪𝙡!😊 #AzureDataEngineering #DataAnalytics #CloudComputing #MicrosoftAzure #dataengineering #powerbi #databricks #ADF #DATA #CICD #PYTHON #PYSPARK
6 Comments
Like Comment
To view or add a comment, sign in
Kenneth Imade

Experienced Data Analyst | MSc in Business Analytics & Decision Sciences (University of Leeds) | Building Production-Grade Data Pipelines at Core Data Engineers BootCamp
6mo
Report this post
Data orchestration tools are crucial for managing data from ingestion to machine learning and analysis preparation. Recently, I explored Azure Data Factory (ADF) and Apache Airflow and wanted to note some of the key differences in a post: 🛠️ Azure Data Factory (ADF): - Seamless Azure integration - User-friendly drag-and-drop interface - Fully managed service by Microsoft - Extensive library of built-in connectors ⚙️ Apache Airflow: - Highly customizable and flexible - Open-source with a strong community - Code-based workflows in Python - Self-hosted, providing full control Choose ADF for an easy-to-use, managed solution within Azure, or Airflow for a flexible, open-source option for complex workflows. #DataOrchestration #AzureDataFactory #ApacheAirflow #DataEngineering
Like Comment
To view or add a comment, sign in
Emrullah Çelik

Junior Data Engineer
6mo
Report this post
Through completing this lab, I've learned to Use Delta Lake with Apache Spark in Azure Synapse Analytics. This enabled me to perform data operations with relational semantics on top of a data lake. Exploring Delta Lake's functionality, I discovered its capability to process both batch and streaming data, paving the way for the creation of a Lakehouse architecture with Spark. https://lnkd.in/eGJ9xpNM #Azure #DataEngineering #DeltaLake

Use Delta Lake with Spark in Azure Synapse Analytics

microsoftlearning.github.io
Like Comment
To view or add a comment, sign in
👨🏻💻 Jaco van Gelder

Staff Data Engineer a.i. @ IKEA 🪑🟡🔵 Instructor & MVP @ Databricks 🧱
4mo
Report this post
Here are the top 3 ways of sharing Databricks assets with data consumers. All have their pros and cons. 𝐔𝐧𝐢𝐭𝐲 𝐂𝐚𝐭𝐚𝐥𝐨𝐠 Pros: + Sharing is the click of a button, or a Terraform grant. + Comes with things like column level data lineage and query insights. + UC really helps with getting to know your downstream consumers. Cons: - To get all goodies, consumers must use Databricks. The future is looking really bright for UC, lots of development happening. 𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 / 𝐒𝐩𝐚𝐫𝐤 𝐂𝐨𝐧𝐧𝐞𝐜𝐭 Pros: + Users can query using Go, Rust, Python, Scala and Java (in Spark 4.0). Cons: - Setting it up has some prerequisites such as setting the right U2M or M2M authentication. - Data publisher pays the bill. - Lose track of consumers easily. I see this as a data integration solution rather than a data sharing solution. 𝐃𝐞𝐥𝐭𝐚 𝐒𝐡𝐚𝐫𝐢𝐧𝐠 Pros: + Easy to set up, just share a link with a consumer. + Very easy integration with tools such as Power BI, Tableau, pandas. + You can also share AI models and notebooks. Cons: - Hard to manage many consumers at scale. - Secured cloud storage might oppose a challenge. - Tabular data needs to be in delta format. Fun fact: they are all open source. #databricks #dataengineering #datasharing
22 Comments
Like Comment
To view or add a comment, sign in
Manish V.

SDE @IBM | Practicing Prompt Engineering
8mo
Report this post
Today I earned my "Introduction to Azure Data Lake Storage Gen2" badge! I’m so proud to be celebrating this achievement and hope this inspires you to start your own @MicrosoftLearn journey! #python #pythondays #shuffling #datascience #dataengineering #datalake #databricks #dataanalytics

Introduction to Azure Data Lake Storage Gen2

learn.microsoft.com
Like Comment
To view or add a comment, sign in

2,736 followers

View Profile Follow

Ravi Mogha’s Post

More from this author

SQL QUERY

Explore topics