Resume Ankit Anand

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Ankit Anand | Data Engineer

Mobile: 9810481176 | Email: [email protected] | Location: Delhi, IN

SUMMARY

Dedicated and highly skilled Data Engineer with 7 years of experience in designing, implementing, and optimizing
data pipelines and systems. Proficient in data integration, ETL processes, and data warehousing. Adept at leveraging a
wide range of technologies like Spark, HDFC, SQL, Python, Hadoop, AWS stack.

Key Skills TECHNICAL SKILLS


• Data Security and Governance • Python, Scala, Java
• Performance Tuning and optimizing • Apache Hive, Presto
• Data integration and migration • SQL, NoSQL
• Data Warehousing and architecture • CICD: git, bitbucket, bamboo, Macquarie Arturo
• ETL • HDFS, Amazon S3
• Project Management • Oozie and Airflow
• Data Quality Assurance • Apache Sqoop
• Data Modelling • Apache NiFi
• Spark Core, SQL, Structured Steaming
• Elasticsearch, Logstash, Kibana
• Apache Kafka

PROFESSIONAL EXPERIENCE

Data Engineer | Genpact, Gurugram, India [July 2021] – Present]

• Build spark-based ingestion pipelines.


• Made a Query based Data quality check framework.
• Data migration of impala-based queries to presto based.
• Migration of oozie based applications to Apache airflow and spark EKS.

Data Engineer | Absolutdata Analytics, Gurugram, India [Jan 2021 - July 2021 ]

• Data modelling for a healthcare organisation where 20+ tables were involved.
• Migrated the database from SQL Server to Azure SQL database.
• Used Alteryx to automate the data ingestion coming in different file formats (xlsx, csv, ad-hoc master data)
• Data ingestion framework in hive for daily, weekly, quarterly run, handling over 70m records/ day for
multiple tables.
• Made data curation scripts in shell & hive to extract data and make data readily available for analytics.

Data Engineer | Valiance Analytics, Noida, India [June 2020 - Jan 2021 ]

• Spark SFTP based pipeline to encrypt data on HDFS partitioned by Y, M, D where encrypt module was
residing on a secure Unix box.
• Above pipeline was deprecated in view of a new NiFi based pipeline that encrypts on the fly with the integration of
encryption module as a custom processor.
• Enabling compression in terms of compression algorithm and choosing file formats saved space in TBs.
• Worked on schema registry under NiFi to take control over CSVs schema.
• Implemented ELK stack dashboards for logs monitoring for the automated runs of data ingestion per day.
• Made a spark-based solution for data correctness and consistency after ingestion.
Data Engineer | Tata Consultancy Services, Gurgaon, India [Dec 2016 - June 2020 ]

• Implemented a lambda architecture (Realtime & batch) pipeline framework using spark, Kafka, HBase for the
ingestion of finance data.
• Developed Sqoop Generic jobs to handle almost 200 GB data leading to major cost saving for the customer.
• Used special ORC Format tables with UTF-8 encoding so that all the special character are handled perfectly
even in Hive tables.
• Independently designed Spark Based Solution to parse raw data, populate staging tables & store processed data in Data
Lake scheduled by Apache oozie.
• Worked on a prod application to ingest multi line json files and xml files.
• Wrote Complex SQL queries using joins, sub queries to retrieve the data from database Involved.

Education

B.Tech – CSE May 2012 –May 2016

Jamia Hamdard

You might also like