About
Data Scientist with 6 years of experience in product, retail, manufacturing and finance…
Activity
-
अंजनि पुत्र महाबलदायी। सन्तन के प्रभु सदा सहाई ।। दे बीरा रघुनाथ पठाए। लंका जारि सिया सुध लाए अर्थ- माता अंजनी के पुत्र श्री हनुमान जी महाबली होने…
अंजनि पुत्र महाबलदायी। सन्तन के प्रभु सदा सहाई ।। दे बीरा रघुनाथ पठाए। लंका जारि सिया सुध लाए अर्थ- माता अंजनी के पुत्र श्री हनुमान जी महाबली होने…
Liked by Uddipto Dutta
-
🌟 Dream Big 🌟 Today, as I take the next step in my career and join Google as a Data Scientist, it still feels surreal. My journey from Mu Sigma…
🌟 Dream Big 🌟 Today, as I take the next step in my career and join Google as a Data Scientist, it still feels surreal. My journey from Mu Sigma…
Liked by Uddipto Dutta
-
Meditation is essentially about connecting with a timeless divine reality that resides and presides beyond the mundane reality, which is constantly…
Meditation is essentially about connecting with a timeless divine reality that resides and presides beyond the mundane reality, which is constantly…
Liked by Uddipto Dutta
Experience
Education
-
Indian Statistical Institute, Kolkata
-
Activities and Societies: Actively participated in cricket, football and table tennis tournaments
Master's degree in the application of probability and statistics in real life problems. Course included relevant subjects on machine learning, statistical modeling, multivariate data analysis, probability and distributions, stochastic processes, reliability and statistical quality control.
-
-
Activities and Societies: Part of the department football team
Course included relevant subjects on Electronics and Communication engineering like analysis of signals using variety of techniques like Laplace Transformations, Fourier Transformations etc. Also contained subjects on JAVA and Data Structures.
-
-
Activities and Societies: Playing cricket and football
Majored in Maths, Physics, Chemistry and Computer Science with particular interest in Calculus, Inequalities and Coordinate Geometry.
-
-
Activities and Societies: Playing cricket, football and participating in plays
Consistent performer in both science and literature based subjects, especially in Mathematics.
Licenses & Certifications
Publications
-
From Pixels to Words: A Scalable Journey of Text Information from Product Images to Retail Catalog
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
Extracting texts of various shapes, sizes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to E-commerce. In the context of the scale at which Walmart operates, the text from an image can be a richer and more accurate source of data than human inputs and can be used in several applications such as Attribute Extraction, Offensive Text Classification, Product Matching among others. The motivation of this particular work…
Extracting texts of various shapes, sizes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to E-commerce. In the context of the scale at which Walmart operates, the text from an image can be a richer and more accurate source of data than human inputs and can be used in several applications such as Attribute Extraction, Offensive Text Classification, Product Matching among others. The motivation of this particular work has come from different business requirements such as flagging products whose images contain words that are non-compliant with organizational policies and building an efficient automated system to identify similar products by comparing the information contained in their respective product images and many others. Existing methods fail to address domain specific challenges like high entropy, different orientations, and small texts in product images adequately. In this work, we provide a solution that not only addresses these challenges but is also proven to work at a million image scale for various retail business units within Walmart. Extensive experimentation revealed that our proposed solution has been able to save around 30% computational cost in both the training and the inference stages.
Other authorsSee publication -
A mean shift adjusted relative error metric to forecast accuracy measurements
24th International Conference on Computational Statistics (COMPSTAT 2021)
Measuring forecast accuracy in an appropriate and robust manner is one of the key components of time series forecasting. Time series data often contain a series of uncharacteristically low or uncharacteristically high values, also known as mean shifts, arising due to some external factors. Measures like MAE, MAPE, MASE etc. are not able to consider the presence of such mean shifts in the validation period and whether the impact of such mean shifts is likely to persist beyond the validation…
Measuring forecast accuracy in an appropriate and robust manner is one of the key components of time series forecasting. Time series data often contain a series of uncharacteristically low or uncharacteristically high values, also known as mean shifts, arising due to some external factors. Measures like MAE, MAPE, MASE etc. are not able to consider the presence of such mean shifts in the validation period and whether the impact of such mean shifts is likely to persist beyond the validation period. We attempted to address these shortcomings of the existing measures by proposing a new measure which is able to account for such shifts in the overall pattern of the data by giving appropriate weight to the relative absolute difference between the predicted and actual value for each observation in the validation period. The proposed measure is also - (1) scale independent, and hence can be compared across series, (2) immune to producing infinite values in the presence of zero values in the validation period. Experimental results obtained by calculating the measure on data with such mean shifts, show that the proposed measure is a much closer reflection of model performance compared to the existing measures.
Other authorsSee publication -
Contextual Transformation of Short Text for Improved Classifiability
https://vixra.org/
Text classification is the task of automatically sorting a set of documents into predefined
set of categories. This task has several applications including separating positive and negative
product reviews by customers, automated indexing of scientific articles, spam filtering and
many more. What lies at the core of this problem is to extract features from text data
which can be used for classification. One of the common techniques to address this problem
is to represent text data…Text classification is the task of automatically sorting a set of documents into predefined
set of categories. This task has several applications including separating positive and negative
product reviews by customers, automated indexing of scientific articles, spam filtering and
many more. What lies at the core of this problem is to extract features from text data
which can be used for classification. One of the common techniques to address this problem
is to represent text data as low dimensional continuous vectors such that the semantically
unrelated data are well separated from each other. However, sometimes the variability along
various dimensions of these vectors is irrelevant as they are dominated by various global
factors which are not specific to the classes we are interested in. This irrelevant variability
often causes diculty in classification. In this paper, we propose a technique which takes the
initial vectorized representation of the text data through a process of transformation which
amplifies relevant variability and suppresses irrelevant variability and then employs a classifier
on the transformed data for the classification task. The results show that the same classifier
exhibits better accuracy on the transformed data than the initial vectorized representation of
text data.Other authorsSee publication -
Does noise hinder or boost time-series forecasting: A theoretical and empirical analysis
24th International Conference on Computational Statistics (COMPSTAT 2021)
Accurate time-series forecasting is at the core of many important practical applications. Although extensive research has been carried out in this domain, different aspects of this problem are still left to be analysed from many different perspectives. In this paper, we focused on analysing the impact of noise in time-series forecasting. It is a common notion that noise in training data deteriorates the accuracy of forecasting models. However, our theoretical analysis shows that injection of…
Accurate time-series forecasting is at the core of many important practical applications. Although extensive research has been carried out in this domain, different aspects of this problem are still left to be analysed from many different perspectives. In this paper, we focused on analysing the impact of noise in time-series forecasting. It is a common notion that noise in training data deteriorates the accuracy of forecasting models. However, our theoretical analysis shows that injection of noise into the original time-series data, with certain constraints on the mean and variance of the noise-distribution, helps to improve the accuracy of the forecasting models. The accuracy of the model starts worsening when values of the noise-parameters lie outside these constraints. We also carried out analyses to estimate an approximate theoretical bound on various parameters of the noise distribution in order to identify the most optimal operating region of the forecasting model. We explored different types of neural-network based forecasting models on time-series data obtained from different practical applications in our empirical study. The observed behaviour of all these forecasting models on different kinds of time-series data are in agreement with our theoretical findings.
Other authorsSee publication -
Drift-adjusted and arbitrated ensemble framework for time series forecasting
International Symposium on Forecasting (ISF)
Time-series Forecasting is at the core of many practical applications such as sales forecasting for business and many others. Though this problem has been extensively studied for years, it is still considered a challenging problem due to complex and evolving nature of time-series data. Typical methods proposed for time-series forecasting modelled linear or non-linear dependencies between data observations. However it is a generally accepted notion that no one method is universally effective for…
Time-series Forecasting is at the core of many practical applications such as sales forecasting for business and many others. Though this problem has been extensively studied for years, it is still considered a challenging problem due to complex and evolving nature of time-series data. Typical methods proposed for time-series forecasting modelled linear or non-linear dependencies between data observations. However it is a generally accepted notion that no one method is universally effective for all kinds of time series data. Attempts have been made to use dynamic and weighted combination of heterogeneous and independent forecasting models and it has been found to be a promising direction to tackle this problem. This method is based on the assumption that different forecasters have different specialization and varying performance for different distribution of data and weights are dynamically assigned to multiple forecasters accordingly. However in many practical time-series dataset, the distribution of data slowly evolves with time. We propose a re-sampling based method to adjust the assigned weights to various forecasters to account for such distribution-drift. An exhaustive testing was performed against time-series data from several real-world applications. Experimental results show the competitiveness of this method against state-of-the-art approaches for combining forecasters.
Other authorsSee publication -
Surge-Adjusted Forecasting in Temporal Data Containing Extreme Observations
7th International Conference on Big Data Analysis and Data Mining
Forecasting in time-series data is at the core of various business decision making activities. One
key characteristic of many practical time series data of different business metrics such as orders,
revenue, is the presence of irregular yet moderately frequent spikes of very high intensity, called
extreme observation. Forecasting such spikes accurately is crucial for various business activities
such as workforce planning, financial planning, inventory planning. Traditional time…Forecasting in time-series data is at the core of various business decision making activities. One
key characteristic of many practical time series data of different business metrics such as orders,
revenue, is the presence of irregular yet moderately frequent spikes of very high intensity, called
extreme observation. Forecasting such spikes accurately is crucial for various business activities
such as workforce planning, financial planning, inventory planning. Traditional time series
forecasting methods such as ARIMA, BSTS, are not very accurate in forecasting extreme spikes.
Deep Learning techniques such as variants of LSTM tend to perform only marginally better than
these traditional techniques. The underlying assumption of thin tail of data distribution is one of
the primary reasons for such models to falter on forecasting extreme spikes as moderately
frequent extreme spikes result in heavy tail of the distribution. On the other hand, literatures,
proposing methods to forecast extreme events in time series, focused mostly on extreme events
but ignored overall forecasting accuracy. We attempted to address both these problems by
proposing a technique where we considered a time series signal with extreme spikes as the
superposition of two independent signals - (1) a stationary time series signal without extreme
spike (2) a shock signal consisting of near-zero values most of the time along with few spikes of
high intensity. We modelled the above two signals independently to forecast values for the
original time series signal. Experimental results show that the proposed technique outperforms
existing techniques in forecasting both normal and extreme events.Other authorsSee publication
Projects
-
Data Imputation in Time Series containing Extreme Observations
- Present
Imputed data in time series containing a lot of extreme observations using a bi-directional LSTM with an overall accuracy of 80%, offering an improvement of nearly 40% on the existing techniques like TS Impute, KNN, MICE etc.
-
China Walmart Orders Forecasting for Workforce Management
- Present
Forecasted orders for China Walmart to aid in efficient workforce management using a number of classical and deep learning based models like Regression ARIMA, Bayesian Structural Time Series, Neutral Net etc. with an innovative additional adjustment made to account for abrupt upward fluctuations in the data. An overall accuracy of 95% was achieved across 600 different stores with roughly 100k orders being placed on a daily basis.
Other creators -
Text Extraction from Images for Product Segmentation
-
Worked on a text recognition model to identify inverted texts in images with an overall accuracy of 90% to segment products without labels. The features were extracted from the images using a CNN model based mainly on VGG-16 and then the extracted features were processed using an LSTM network to find out the text.
-
Cost Estimation based on Resource Usage by Users
-
Created a statistical model to estimate cost on the basis of resource usage by customers for running a number of queries with varying degrees of complexity. A constrained polynomial regression model was built for the purpose and achieved an overall accuracy of 85%.
-
Creation of a Robust Measure of Forecast Accuracy Measurement
-
Created a robust measure of forecast accuracy to quantify performance of time series models much more appropriately.
-
Regional Water Level Forecasting
-
Forecasted regional water level using time series modelling techniques like ARIMA, TBATS, neural network, theta forecasting etc. with an overall 82% accuracy as a part of a Corporate Social Responsibility project to help the concerned authorities plan accordingly.
-
Drivers' Health Survey
-
Worked on extracting and visually representing critical information from a drivers' health survey to help create improved safety guidelines and suggestions for a better lifestyle for them.
-
Prediction of NPS Scores
-
Predicted NPS scores using relevant ML models like Random Forests, XGBoost, MARS and SVR to gauge customer feedback better.
-
Cycle Index for Economic Situation Forecasting
-
Created a cycle index based on the forecasted values from the VAR model, using HP Filter and Principal Component Analysis, to predict potential economic stress scenarios.
-
Vector Autoregressive Model for Macroeconomic Factors Forecasting
-
Built a Vector Autoregressive Model to forecast interconnected economic factors like GDP, HPI and Unemployment Rate.
Languages
-
English
Full professional proficiency
-
Hindi
Full professional proficiency
-
Bengali
Native or bilingual proficiency
More activity by Uddipto
-
"One should learn the essence of the scriptures from the Guru and then practice Sadhana. If one rightly follows spiritual discipline, then one…
"One should learn the essence of the scriptures from the Guru and then practice Sadhana. If one rightly follows spiritual discipline, then one…
Liked by Uddipto Dutta
-
Just like we have our home and love our home then we may need to go to various places for various purposes, but as soon as the work is done, we go…
Just like we have our home and love our home then we may need to go to various places for various purposes, but as soon as the work is done, we go…
Liked by Uddipto Dutta
-
Words! Drona managed to release an arrow that broke Dhrishtadyumna’s bow. Without wavering, Dhrishta picked up another bow & deeply pierced…
Words! Drona managed to release an arrow that broke Dhrishtadyumna’s bow. Without wavering, Dhrishta picked up another bow & deeply pierced…
Liked by Uddipto Dutta
-
सिद्धपीठ श्री सालासर बालाजी धाम से श्रीबालाजी महाराज के आज के अलौकिक दर्शन 04-10-2024 अश्विनी शुक्ल पक्ष शुक्रवार श्रीबालाजी महाराज की कृपा आप व आपके…
सिद्धपीठ श्री सालासर बालाजी धाम से श्रीबालाजी महाराज के आज के अलौकिक दर्शन 04-10-2024 अश्विनी शुक्ल पक्ष शुक्रवार श्रीबालाजी महाराज की कृपा आप व आपके…
Liked by Uddipto Dutta
-
To hear about Kṛṣṇa from Vedic literatures, or to hear from Him directly through the Bhagavad-gītā, is itself righteous activity. And for one who…
To hear about Kṛṣṇa from Vedic literatures, or to hear from Him directly through the Bhagavad-gītā, is itself righteous activity. And for one who…
Liked by Uddipto Dutta
-
Fight in divine consciousness! As Drona & Arjun waged an incomparable battle against each other, demigods, celestial sages, Gandharvas, Siddhas…
Fight in divine consciousness! As Drona & Arjun waged an incomparable battle against each other, demigods, celestial sages, Gandharvas, Siddhas…
Liked by Uddipto Dutta
-
जय हनुमान ज्ञान-गुन-सागर । जय कपीस तिहुँ लोक उजागर ॥🙏⛳🌺 #JaiHanuMan #JaiShreeRam 🙏⛳🌺
जय हनुमान ज्ञान-गुन-सागर । जय कपीस तिहुँ लोक उजागर ॥🙏⛳🌺 #JaiHanuMan #JaiShreeRam 🙏⛳🌺
Liked by Uddipto Dutta
-
Divine protection! As Ghatotkaca attacked & Kaurav soldiers fled, Karna did not waver & continued sending steady strem of arrows into the sky while…
Divine protection! As Ghatotkaca attacked & Kaurav soldiers fled, Karna did not waver & continued sending steady strem of arrows into the sky while…
Liked by Uddipto Dutta
-
Downward Spiral! After Krpā’s stern reply to Karna, he smilingly said: O Krpā, I agree with you that Sri Krsna & Arjun are ordinarily incapable of…
Downward Spiral! After Krpā’s stern reply to Karna, he smilingly said: O Krpā, I agree with you that Sri Krsna & Arjun are ordinarily incapable of…
Liked by Uddipto Dutta
-
Faults lies within if we see the same outside! As Dhrishtadyumna fought with Drona, Karna, Asvattāma, Salya & Duhsasana came and surrounded him…
Faults lies within if we see the same outside! As Dhrishtadyumna fought with Drona, Karna, Asvattāma, Salya & Duhsasana came and surrounded him…
Liked by Uddipto Dutta
-
Everyone has their own #story .🌟 Everyone knows their own #pain.❤️🩹 Never allow anyone to judge your path because only you know how much…
Everyone has their own #story .🌟 Everyone knows their own #pain.❤️🩹 Never allow anyone to judge your path because only you know how much…
Liked by Uddipto Dutta
-
Free from the Material world! Dhrishtadyumna confronted Asvattāma, challenging: O son of Drona, I will not kill you yet. Tomorrow I will kill your…
Free from the Material world! Dhrishtadyumna confronted Asvattāma, challenging: O son of Drona, I will not kill you yet. Tomorrow I will kill your…
Liked by Uddipto Dutta
-
The problem with distraction is that it leads to a frustrating non-presence in the present. The present moment is an opportunity for us to both…
The problem with distraction is that it leads to a frustrating non-presence in the present. The present moment is an opportunity for us to both…
Liked by Uddipto Dutta
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Uddipto Dutta
1 other named Uddipto Dutta is on LinkedIn
See others named Uddipto Dutta