Shashank Mishra 🇮🇳

Bengaluru, Karnataka, India

176K followers 500+ connections

View mutual connections with Shashank

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to follow

Prophecy

Motilal Nehru National Institute Of Technology

About

Experienced Data Engineer with a demonstrated history of solving complex data problems…

Articles by Shashank

Data Engineer

Apr 10, 2020

Data Engineer

Many times people have asked me, how can we move into Data Engineering? What are the roles and responsibilities of a…

2 Comments

Contributions

What skills will future data engineers need?

The field of Data Engineering is expanding at an unprecedented pace. Relying solely on foundational tools like Python, SQL, and Spark will no longer suffice.Below is a comprehensive list of skills essential for anyone aspiring to be an elite data engineer in today's world 🌟 1.) Foundational Skills: SQL, Python, BigData Fundamentals 2.) Cloud: AWS, GCP, Azure 3.) Databases: PostgreSQL, MongoDB 4.) Processing: Apache Spark, Flink, Beam, Databricks 5.) Orchestration: Apache Airflow, Mage, Kestra 6.) CI/CD: Git, Jenkins, Docker 7.) Cluster Management: Kubernetes 8.) ETL Tools: Informatica, DBT, AirByte 9.) Monitoring: Grafana, Splunk 10.) Streaming: Apache Kafka 11.) Lakes: AWS Lake Formation, Delta Lake 12.) Warehousing: Snowflake, BigQuery

Shashank Mishra 🇮🇳 contributed 1 year ago 17 Upvotes

Activity

Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Liked by Shashank Mishra 🇮🇳
Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Posted by Shashank Mishra 🇮🇳
10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

Liked by Shashank Mishra 🇮🇳

Join now to see all activity

Experience

Prophecy

Bengaluru, Karnataka, India
-

Gurugram, Haryana, India
-

Bengaluru, Karnataka, India
-

Gurgaon, Haryana, India
-

Noida Area, India
-

Noida Area, India
-

Noida Area, India
-

noida
-

Noida Area, India

Education

Motilal Nehru National Institute Of Technology

2014 - 2017

Activities and Societies: • Coordinator, Computer Club Classes, NIT Allahabad, 2016 • Dance Coordinator, SWAGAT ( Cultural festival ) of NIT Allahabad, 2015 • Volunteer, Pahal (NGO); personally conducted 100+ classes for 150+ slum kids
2011 - 2014

Projects

Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS

Feb 2021 - Nov 2021

-> Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS
- Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion
- It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee
- This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake
Incremental Ingestion pipeline – Employee Benefits Data

May 2020 - Feb 2021

-> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
- Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
- Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
- Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching…

-> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
- Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
- Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
- Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards
Automated Alerting System for Job Monitoring

Mar 2020 - May 2020

-> Tech Stack – Python, AWS CLI, QuickSight
- Created automated alerting system for Redshift load metrics and Job monitoring
- It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status
Feature Development For Telecom Data

Dec 2019 - Mar 2020

-> Tech Stack – PySpark, Kedro, Azure Cloud, Databricks
- Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework
- Worked closely with client in order to get business requirements
- Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis
GG VMN migration

Sep 2019 - Nov 2019

Tech Stack – PySpark, Hive, Azkaban, Jenkins

- Migrated all Facts/Olaps written in Hive into PySpark
- Created job flows in Azkaban
Data Ingestion & Sync Process

Jun 2019 - Aug 2019

Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins

- Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
- Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
- Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
- Added exception handling scenarios in data sync…

Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins

- Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
- Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
- Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
- Added exception handling scenarios in data sync logic to fix multiple bugs
- Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure
Near Real Time Data Pipeline - POC

Apr 2019 - Jun 2019

Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven

- Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
- Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster…

Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven

- Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
- Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data
- Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication
Dehleez - Report Scheduling Tool

Feb 2019 - Apr 2019

Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog

- Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats
- Diff Checker - Admins can check the difference between queries before approving reports
- Time slot picker to schedule a report - User can see scheduled…

Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog

- Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats
- Diff Checker - Admins can check the difference between queries before approving reports
- Time slot picker to schedule a report - User can see scheduled reports for next 4 hours from intended schedule time and can pick the slot accordingly
- Dump report output into S3 bucket - User can take dump of report output into AWS S3 bucket
- Cassandra Connector - User can schedule reports having Cassandra query panels in addition with HIVE/MySQL
Hive Query Parser

Feb 2019 - Mar 2019

Tech Stack – Django, Django RestFramework, Python, NGINX

- Query Validator and Optimization Engine - Created a Django web application to parse and validate user's hive queries. In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query
- PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.
Procurement Spend Optimizer

Jul 2018 - Jan 2019

o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using
smart categorisation, benchmarking and anomaly detection
o Crafted a Big Data based solution; organised structured & unstructured data
o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month…

o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using
smart categorisation, benchmarking and anomaly detection
o Crafted a Big Data based solution; organised structured & unstructured data
o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month

Technologies Used : Hadoop Framework, Spark
Languages Used : Java, Python 2.7
Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM )
Version Control : SVN
Trip Narrative

Jan 2017 - Jun 2018

o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's
engagement throughout the life-cycle of the trip
o Developed data pipelines from scratch; optimised data aggregation from 10+ independent
sources and automated the ETL process to roll out the solution
o The solution powers a web application; used by 1000+ CSRs and decision makers o Built
application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN)…

o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's
engagement throughout the life-cycle of the trip
o Developed data pipelines from scratch; optimised data aggregation from 10+ independent
sources and automated the ETL process to roll out the solution
o The solution powers a web application; used by 1000+ CSRs and decision makers o Built
application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications
(Distributed Processing Engine), SQL and Python

Technologies Used : Hadoop Framework, REST, Ingres DB, NGINX
Languages Used : Java, Python 2.7, YAML, SQL
Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM )
Version Control : SVN
BlueChat

Feb 2016 - Mar 2016
BlueChat is an android chat application which leverages the Bluetooth stack to send text, images and contacts. Text messages are also includes on-the-fly encryption and decryption for text.

Technologies Used: Android
Language Used: JAVA, XML
Tools Used: Android Studio 2.0

Other creators

Honors & Awards

Opera Cool Ovation Award - 2017

Opera Solution

Oct 2017

Received opera cool ovation award - 2017 for excellent contribution in project Trip Narrative.
Geek Of The Month , September 2016

GeeksForGeeks

Sep 2016

Got this honour for extraordinary contribution in article writting for GeeksForGeeks.
http://www.geeksforgeeks.org/geek-of-the-month/

Languages

English

Full professional proficiency

Recommendations received

2 people have recommended Shashank

Join now to view

More activity by Shashank

10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

Shared by Shashank Mishra 🇮🇳

View Shashank’s full profile

See who you know in common
Get introduced
Contact Shashank directly

Join to view full profile

Other similar profiles

Explore more posts

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Shashank Mishra 🇮🇳 in India

2029 others named Shashank Mishra 🇮🇳 in India are on LinkedIn

See others named Shashank Mishra 🇮🇳

Add new skills with these courses

See all courses

Shashank Mishra 🇮🇳

Bengaluru, Karnataka, India 176K followers 500+ connections

About

Articles by Shashank

Data Engineer

Contributions

What skills will future data engineers need?

Activity

Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Liked by Shashank Mishra 🇮🇳

Databricks has raised $10 billion in a funding round and now valued at $62 billion 🤯🤯🤯 Itni funding bhi milti hai kya (Someone can really get this…

Posted by Shashank Mishra 🇮🇳

10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

Liked by Shashank Mishra 🇮🇳

Experience

Prophecy

-

-

-

-

-

-

-

-

Education

Motilal Nehru National Institute Of Technology

Projects

Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS

Feb 2021 - Nov 2021

Incremental Ingestion pipeline – Employee Benefits Data

May 2020 - Feb 2021

Automated Alerting System for Job Monitoring

Mar 2020 - May 2020

Feature Development For Telecom Data

Dec 2019 - Mar 2020

GG VMN migration

Sep 2019 - Nov 2019

Data Ingestion & Sync Process

Jun 2019 - Aug 2019

Near Real Time Data Pipeline - POC

Apr 2019 - Jun 2019

Dehleez - Report Scheduling Tool

Feb 2019 - Apr 2019

Hive Query Parser

Feb 2019 - Mar 2019

Procurement Spend Optimizer

Jul 2018 - Jan 2019

Trip Narrative

Jan 2017 - Jun 2018

BlueChat

Feb 2016 - Mar 2016

Honors & Awards

Opera Cool Ovation Award - 2017

Opera Solution

Geek Of The Month , September 2016

GeeksForGeeks

Languages

English

Full professional proficiency

Recommendations received

Yusuf Hassan

Tejas Nagdulikar

More activity by Shashank

10 Red Flags to Watch Out for in a Data Engineer 🚩 The role of a Data Engineer is pivotal, but not all engineers are created equal. Here are 10 red…

Shared by Shashank Mishra 🇮🇳

View Shashank’s full profile

Other similar profiles

Burhanuddin Pithawala

Manali Verma

Apeksha Rustagi

Ravi Kumar

Sameer Donadkar

Vikash Kumar Jangid

Keerthana Sivakumar

Anil Sunkara

Prachi Gawai

Himanshu Singh

Rounak Vyas

Himanshu Mishra

Akanksha Gupta

Bengaluru, Karnataka, India

176K followers 500+ connections