Shashank Mishra 🇮🇳

Shashank Mishra 🇮🇳

Bengaluru, Karnataka, India
176K followers 500+ connections

About

Experienced Data Engineer with a demonstrated history of solving complex data problems…

Articles by Shashank

  • Data Engineer

    Data Engineer

    Many times people have asked me, how can we move into Data Engineering? What are the roles and responsibilities of a…

    2 Comments

Contributions

Activity

Join now to see all activity

Experience

  • Prophecy Graphic

    Prophecy

    Bengaluru, Karnataka, India

  • -

    Gurugram, Haryana, India

  • -

    Bengaluru, Karnataka, India

  • -

    Gurgaon, Haryana, India

  • -

    Noida Area, India

  • -

    Noida Area, India

  • -

    Noida Area, India

  • -

    noida

  • -

    Noida Area, India

Education

  • Motilal Nehru National Institute Of Technology Graphic

    Motilal Nehru National Institute Of Technology

    -

    Activities and Societies: • Coordinator, Computer Club Classes, NIT Allahabad, 2016 • Dance Coordinator, SWAGAT ( Cultural festival ) of NIT Allahabad, 2015 • Volunteer, Pahal (NGO); personally conducted 100+ classes for 150+ slum kids

  • -

Projects

  • Salesforce to Redshift Ingestion - Migration from Informatica to Native AWS

    -

    -> Tech Stack – Salesforce, Informatica, S3, Lambda, Glue, AppFlow, Redshift, SNS
    - Crafted generic scalable Native AWS solution for Salesforce to Redshift ingestion
    - It helped to move ingestion pipelines from third party tool Informatica and saved cost for heavy license fee
    - This generic framework helped other business units for smooth ingestion of newly onboarded Salesforce object into Redshift Datalake

  • Incremental Ingestion pipeline – Employee Benefits Data

    -

    -> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
    - Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
    - Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
    - Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching…

    -> Tech Stack – Shell Scripting, AWS CLI, S3, EMR, Glue, Redshift, SNS, QuickSight,PySpark
    - Build generic & optimized ingestion pipeline for highly critical & confidential Employee Benefits Data
    - Pipeline is designed in a way to handle GB’s of daily & weekly data together for different use cases like Audit, Payroll, Reimbursement, Education Reimbursement etc
    - Took complete ownership and worked closely with business teams to understand the requirements & deliver enriching dashboards

  • Automated Alerting System for Job Monitoring

    -

    -> Tech Stack – Python, AWS CLI, QuickSight
    - Created automated alerting system for Redshift load metrics and Job monitoring
    - It saved 1.5 Hours/day of manual efforts by each team member to monitor & prepare Daily Job Status

  • Feature Development For Telecom Data

    -

    -> Tech Stack – PySpark, Kedro, Azure Cloud, Databricks
    - Created large scale & optimized pipelines for Telcom data using PySpark & Kedro framework
    - Worked closely with client in order to get business requirements
    - Implemented business logics to prepare clean & aggregated data for Customer Churn Analysis

  • GG VMN migration

    -

    Tech Stack – PySpark, Hive, Azkaban, Jenkins

    - Migrated all Facts/Olaps written in Hive into PySpark
    - Created job flows in Azkaban

  • Data Ingestion & Sync Process

    -

    Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins

    - Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
    - Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
    - Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
    - Added exception handling scenarios in data sync…

    Tech Stack – Python, Hive, ElasticSearch, Scala Play Framework, SBT, EMR, Lambda, DynamoDB, Azkaban, Jenkins

    - Crafted data-sync logic by prioritizing datasets (High/Medium/Low tag) based upon criticality to meet SLO
    - Built premption logic to prioritize highly critical datasets when multiple low priority sync processes are running
    - Designed Rest API in data ingestion for retention of GA data in order to optimize cluster space
    - Added exception handling scenarios in data sync logic to fix multiple bugs
    - Fix for missing PG data from Kafka for UMP panel - Created a new pipeline to ingest missing data from HDFS to ElasticSearch in case of cluster failure

  • Near Real Time Data Pipeline - POC

    -

    Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven

    - Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
    - Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster…

    Tech Stack – Java, Spark, Kafka, Datastax Cassandra, Datastax studio, Zookeeper, Maven

    - Crafted a Cassandra based real time ingestion pipeline for marketplace data in order to help DWH team to reduce request load from production MySQL. The Objective was to shift business users from production, to overcome data leaks & security issues
    - Interacted with different business users to know about their use cases, ingestion tables, PII data and built data models accordingly for faster insertion/updation of data
    - Setup web interface Datastax Studio for users to query real time data from Cassandra using LDAP authentication

  • Dehleez - Report Scheduling Tool

    -

    Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog

    - Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats
    - Diff Checker - Admins can check the difference between queries before approving reports
    - Time slot picker to schedule a report - User can see scheduled…

    Tech Stack – Python, JavaScript, Django, Azkaban, Docker, Hive, Ajax, Bootstrap, REST API, DataDog

    - Enhanced Paytm's proprietary report scheduling tool which is used by business users working on data analysis where they can schedule their reports by writing HIVE/MySQL/Cassandra queries and report output in various formats
    - Diff Checker - Admins can check the difference between queries before approving reports
    - Time slot picker to schedule a report - User can see scheduled reports for next 4 hours from intended schedule time and can pick the slot accordingly
    - Dump report output into S3 bucket - User can take dump of report output into AWS S3 bucket
    - Cassandra Connector - User can schedule reports having Cassandra query panels in addition with HIVE/MySQL

  • Hive Query Parser

    -

    Tech Stack – Django, Django RestFramework, Python, NGINX

    - Query Validator and Optimization Engine - Created a Django web application to parse and validate user's hive queries. In case of a bad query (missing partition columns/unbalanced joins), it also provides suggestions to improve the query
    - PII detector – Built a Django web application to detect all running hive queries which are fetching PII data.

  • Procurement Spend Optimizer

    -

    o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using
    smart categorisation, benchmarking and anomaly detection
    o Crafted a Big Data based solution; organised structured & unstructured data
    o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
    o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month…

    o Developed CXO-level insights engine to manage USD 60Bn; engine enabled cost optimization using
    smart categorisation, benchmarking and anomaly detection
    o Crafted a Big Data based solution; organised structured & unstructured data
    o Built solution using Hadoop Ecosystem (HDFS, YARN), Spark and Python
    o Built a google translator API based solution to automate legacy translation engine; improved record aggregation accuracy by 50% and saved team 120 hours/month


    Technologies Used : Hadoop Framework, Spark
    Languages Used : Java, Python 2.7
    Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM )
    Version Control : SVN

  • Trip Narrative

    -

    o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's
    engagement throughout the life-cycle of the trip
    o Developed data pipelines from scratch; optimised data aggregation from 10+ independent
    sources and automated the ETL process to roll out the solution
    o The solution powers a web application; used by 1000+ CSRs and decision makers o Built
    application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN)…

    o Deployed an end to end solution for a leading US airlines; Aggregated a 360 view of customer's
    engagement throughout the life-cycle of the trip
    o Developed data pipelines from scratch; optimised data aggregation from 10+ independent
    sources and automated the ETL process to roll out the solution
    o The solution powers a web application; used by 1000+ CSRs and decision makers o Built
    application on RESTFUL API`s using Hadoop Ecosystem (HDFS, YARN), DataRush Applications
    (Distributed Processing Engine), SQL and Python


    Technologies Used : Hadoop Framework, REST, Ingres DB, NGINX
    Languages Used : Java, Python 2.7, YAML, SQL
    Tools Used : Signal Hub ( Opera’s proprietary development framework ), Signal Hub Manager ( SHM )
    Version Control : SVN

  • BlueChat

    -

    BlueChat is an android chat application which leverages the Bluetooth stack to send text, images and contacts. Text messages are also includes on-the-fly encryption and decryption for text.

    Technologies Used: Android
    Language Used: JAVA, XML
    Tools Used: Android Studio 2.0

    Other creators

Honors & Awards

  • Opera Cool Ovation Award - 2017

    Opera Solution

    Received opera cool ovation award - 2017 for excellent contribution in project Trip Narrative.

  • Geek Of The Month , September 2016

    GeeksForGeeks

    Got this honour for extraordinary contribution in article writting for GeeksForGeeks.
    http://www.geeksforgeeks.org/geek-of-the-month/

Languages

  • English

    Full professional proficiency

Recommendations received

More activity by Shashank

View Shashank’s full profile

  • See who you know in common
  • Get introduced
  • Contact Shashank directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Shashank Mishra 🇮🇳 in India

Add new skills with these courses