Uddipto Dutta

Uddipto Dutta

Bangalore Urban district, India
4K followers 500+ connections

About

Data Scientist with 6 years of experience in product, retail, manufacturing and finance…

Activity

Join now to see all activity

Experience

  • Mashreq Graphic

    Mashreq

    Bengaluru, Karnataka, India

  • -

    Bengaluru, Karnataka, India

  • -

    Bengaluru, Karnataka, India

  • -

    Bengaluru, Karnataka, India

  • -

    Bengaluru, Karnataka, India

  • -

    Pune Area, India

  • -

    Bengaluru, Karnataka, India

  • -

    Kolkata Area, India

Education

  • Indian Statistical Institute, Kolkata Graphic

    Indian Statistical Institute, Kolkata

    -

    Activities and Societies: Actively participated in cricket, football and table tennis tournaments

    Master's degree in the application of probability and statistics in real life problems. Course included relevant subjects on machine learning, statistical modeling, multivariate data analysis, probability and distributions, stochastic processes, reliability and statistical quality control.

  • -

    Activities and Societies: Part of the department football team

    Course included relevant subjects on Electronics and Communication engineering like analysis of signals using variety of techniques like Laplace Transformations, Fourier Transformations etc. Also contained subjects on JAVA and Data Structures.

  • -

    Activities and Societies: Playing cricket and football

    Majored in Maths, Physics, Chemistry and Computer Science with particular interest in Calculus, Inequalities and Coordinate Geometry.

  • -

    Activities and Societies: Playing cricket, football and participating in plays

    Consistent performer in both science and literature based subjects, especially in Mathematics.

Licenses & Certifications

Publications

  • From Pixels to Words: A Scalable Journey of Text Information from Product Images to Retail Catalog

    CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

    Extracting texts of various shapes, sizes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to E-commerce. In the context of the scale at which Walmart operates, the text from an image can be a richer and more accurate source of data than human inputs and can be used in several applications such as Attribute Extraction, Offensive Text Classification, Product Matching among others. The motivation of this particular work…

    Extracting texts of various shapes, sizes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to E-commerce. In the context of the scale at which Walmart operates, the text from an image can be a richer and more accurate source of data than human inputs and can be used in several applications such as Attribute Extraction, Offensive Text Classification, Product Matching among others. The motivation of this particular work has come from different business requirements such as flagging products whose images contain words that are non-compliant with organizational policies and building an efficient automated system to identify similar products by comparing the information contained in their respective product images and many others. Existing methods fail to address domain specific challenges like high entropy, different orientations, and small texts in product images adequately. In this work, we provide a solution that not only addresses these challenges but is also proven to work at a million image scale for various retail business units within Walmart. Extensive experimentation revealed that our proposed solution has been able to save around 30% computational cost in both the training and the inference stages.

    Other authors
    See publication
  • A mean shift adjusted relative error metric to forecast accuracy measurements

    24th International Conference on Computational Statistics (COMPSTAT 2021)

    Measuring forecast accuracy in an appropriate and robust manner is one of the key components of time series forecasting. Time series data often contain a series of uncharacteristically low or uncharacteristically high values, also known as mean shifts, arising due to some external factors. Measures like MAE, MAPE, MASE etc. are not able to consider the presence of such mean shifts in the validation period and whether the impact of such mean shifts is likely to persist beyond the validation…

    Measuring forecast accuracy in an appropriate and robust manner is one of the key components of time series forecasting. Time series data often contain a series of uncharacteristically low or uncharacteristically high values, also known as mean shifts, arising due to some external factors. Measures like MAE, MAPE, MASE etc. are not able to consider the presence of such mean shifts in the validation period and whether the impact of such mean shifts is likely to persist beyond the validation period. We attempted to address these shortcomings of the existing measures by proposing a new measure which is able to account for such shifts in the overall pattern of the data by giving appropriate weight to the relative absolute difference between the predicted and actual value for each observation in the validation period. The proposed measure is also - (1) scale independent, and hence can be compared across series, (2) immune to producing infinite values in the presence of zero values in the validation period. Experimental results obtained by calculating the measure on data with such mean shifts, show that the proposed measure is a much closer reflection of model performance compared to the existing measures.

    Other authors
    See publication
  • Contextual Transformation of Short Text for Improved Classifiability

    https://vixra.org/

    Text classification is the task of automatically sorting a set of documents into predefined
    set of categories. This task has several applications including separating positive and negative
    product reviews by customers, automated indexing of scientific articles, spam filtering and
    many more. What lies at the core of this problem is to extract features from text data
    which can be used for classification. One of the common techniques to address this problem
    is to represent text data…

    Text classification is the task of automatically sorting a set of documents into predefined
    set of categories. This task has several applications including separating positive and negative
    product reviews by customers, automated indexing of scientific articles, spam filtering and
    many more. What lies at the core of this problem is to extract features from text data
    which can be used for classification. One of the common techniques to address this problem
    is to represent text data as low dimensional continuous vectors such that the semantically
    unrelated data are well separated from each other. However, sometimes the variability along
    various dimensions of these vectors is irrelevant as they are dominated by various global
    factors which are not specific to the classes we are interested in. This irrelevant variability
    often causes diculty in classification. In this paper, we propose a technique which takes the
    initial vectorized representation of the text data through a process of transformation which
    amplifies relevant variability and suppresses irrelevant variability and then employs a classifier
    on the transformed data for the classification task. The results show that the same classifier
    exhibits better accuracy on the transformed data than the initial vectorized representation of
    text data.

    Other authors
    See publication
  • Does noise hinder or boost time-series forecasting: A theoretical and empirical analysis

    24th International Conference on Computational Statistics (COMPSTAT 2021)

    Accurate time-series forecasting is at the core of many important practical applications. Although extensive research has been carried out in this domain, different aspects of this problem are still left to be analysed from many different perspectives. In this paper, we focused on analysing the impact of noise in time-series forecasting. It is a common notion that noise in training data deteriorates the accuracy of forecasting models. However, our theoretical analysis shows that injection of…

    Accurate time-series forecasting is at the core of many important practical applications. Although extensive research has been carried out in this domain, different aspects of this problem are still left to be analysed from many different perspectives. In this paper, we focused on analysing the impact of noise in time-series forecasting. It is a common notion that noise in training data deteriorates the accuracy of forecasting models. However, our theoretical analysis shows that injection of noise into the original time-series data, with certain constraints on the mean and variance of the noise-distribution, helps to improve the accuracy of the forecasting models. The accuracy of the model starts worsening when values of the noise-parameters lie outside these constraints. We also carried out analyses to estimate an approximate theoretical bound on various parameters of the noise distribution in order to identify the most optimal operating region of the forecasting model. We explored different types of neural-network based forecasting models on time-series data obtained from different practical applications in our empirical study. The observed behaviour of all these forecasting models on different kinds of time-series data are in agreement with our theoretical findings.

    Other authors
    See publication
  • Drift-adjusted and arbitrated ensemble framework for time series forecasting

    International Symposium on Forecasting (ISF)

    Time-series Forecasting is at the core of many practical applications such as sales forecasting for business and many others. Though this problem has been extensively studied for years, it is still considered a challenging problem due to complex and evolving nature of time-series data. Typical methods proposed for time-series forecasting modelled linear or non-linear dependencies between data observations. However it is a generally accepted notion that no one method is universally effective for…

    Time-series Forecasting is at the core of many practical applications such as sales forecasting for business and many others. Though this problem has been extensively studied for years, it is still considered a challenging problem due to complex and evolving nature of time-series data. Typical methods proposed for time-series forecasting modelled linear or non-linear dependencies between data observations. However it is a generally accepted notion that no one method is universally effective for all kinds of time series data. Attempts have been made to use dynamic and weighted combination of heterogeneous and independent forecasting models and it has been found to be a promising direction to tackle this problem. This method is based on the assumption that different forecasters have different specialization and varying performance for different distribution of data and weights are dynamically assigned to multiple forecasters accordingly. However in many practical time-series dataset, the distribution of data slowly evolves with time. We propose a re-sampling based method to adjust the assigned weights to various forecasters to account for such distribution-drift. An exhaustive testing was performed against time-series data from several real-world applications. Experimental results show the competitiveness of this method against state-of-the-art approaches for combining forecasters.

    Other authors
    See publication
  • Surge-Adjusted Forecasting in Temporal Data Containing Extreme Observations

    7th International Conference on Big Data Analysis and Data Mining

    Forecasting in time-series data is at the core of various business decision making activities. One
    key characteristic of many practical time series data of different business metrics such as orders,
    revenue, is the presence of irregular yet moderately frequent spikes of very high intensity, called
    extreme observation. Forecasting such spikes accurately is crucial for various business activities
    such as workforce planning, financial planning, inventory planning. Traditional time…

    Forecasting in time-series data is at the core of various business decision making activities. One
    key characteristic of many practical time series data of different business metrics such as orders,
    revenue, is the presence of irregular yet moderately frequent spikes of very high intensity, called
    extreme observation. Forecasting such spikes accurately is crucial for various business activities
    such as workforce planning, financial planning, inventory planning. Traditional time series
    forecasting methods such as ARIMA, BSTS, are not very accurate in forecasting extreme spikes.
    Deep Learning techniques such as variants of LSTM tend to perform only marginally better than
    these traditional techniques. The underlying assumption of thin tail of data distribution is one of
    the primary reasons for such models to falter on forecasting extreme spikes as moderately
    frequent extreme spikes result in heavy tail of the distribution. On the other hand, literatures,
    proposing methods to forecast extreme events in time series, focused mostly on extreme events
    but ignored overall forecasting accuracy. We attempted to address both these problems by
    proposing a technique where we considered a time series signal with extreme spikes as the
    superposition of two independent signals - (1) a stationary time series signal without extreme
    spike (2) a shock signal consisting of near-zero values most of the time along with few spikes of
    high intensity. We modelled the above two signals independently to forecast values for the
    original time series signal. Experimental results show that the proposed technique outperforms
    existing techniques in forecasting both normal and extreme events.

    Other authors
    See publication

Projects

  • Data Imputation in Time Series containing Extreme Observations

    - Present

    Imputed data in time series containing a lot of extreme observations using a bi-directional LSTM with an overall accuracy of 80%, offering an improvement of nearly 40% on the existing techniques like TS Impute, KNN, MICE etc.

  • China Walmart Orders Forecasting for Workforce Management

    - Present

    Forecasted orders for China Walmart to aid in efficient workforce management using a number of classical and deep learning based models like Regression ARIMA, Bayesian Structural Time Series, Neutral Net etc. with an innovative additional adjustment made to account for abrupt upward fluctuations in the data. An overall accuracy of 95% was achieved across 600 different stores with roughly 100k orders being placed on a daily basis.

    Other creators
  • Text Extraction from Images for Product Segmentation

    -

    Worked on a text recognition model to identify inverted texts in images with an overall accuracy of 90% to segment products without labels. The features were extracted from the images using a CNN model based mainly on VGG-16 and then the extracted features were processed using an LSTM network to find out the text.

  • Cost Estimation based on Resource Usage by Users

    -

    Created a statistical model to estimate cost on the basis of resource usage by customers for running a number of queries with varying degrees of complexity. A constrained polynomial regression model was built for the purpose and achieved an overall accuracy of 85%.

  • Analysis of Effect of Noise on Accuracy of Neural Networks

    -

    Analysed the effect of Gaussian Noise on the accuracy of deep learning based forecasting models and quantified how addition of noise actually helps in building a better neural network model.

    Other creators
  • Creation of a Robust Measure of Forecast Accuracy Measurement

    -

    Created a robust measure of forecast accuracy to quantify performance of time series models much more appropriately.

  • Contextual Vectorisation of Text for Improved Classifiability

    -

    Calculated embeddings for text data on the basis of contexts in order to facilitate clustering of said data and help gauge customer sentiment better.

    Other creators
  • Automated Data Pre-Processing

    -

    Used PySpark to create modules to automatically pre-process data and perform a number of EDA operations including statistical tests and help users gain a detailed insight of the data.

    Other creators
  • Categorical Feature Extraction

    -

    Extracted latent variables from categorical data to facilitate clustering of such data in order to obtain better models.

    Other creators
  • Regional Water Level Forecasting

    -

    Forecasted regional water level using time series modelling techniques like ARIMA, TBATS, neural network, theta forecasting etc. with an overall 82% accuracy as a part of a Corporate Social Responsibility project to help the concerned authorities plan accordingly.

  • Drivers' Health Survey

    -

    Worked on extracting and visually representing critical information from a drivers' health survey to help create improved safety guidelines and suggestions for a better lifestyle for them.

  • Prediction of NPS Scores

    -

    Predicted NPS scores using relevant ML models like Random Forests, XGBoost, MARS and SVR to gauge customer feedback better.

  • Cycle Index for Economic Situation Forecasting

    -

    Created a cycle index based on the forecasted values from the VAR model, using HP Filter and Principal Component Analysis, to predict potential economic stress scenarios.

  • Vector Autoregressive Model for Macroeconomic Factors Forecasting

    -

    Built a Vector Autoregressive Model to forecast interconnected economic factors like GDP, HPI and Unemployment Rate.

Languages

  • English

    Full professional proficiency

  • Hindi

    Full professional proficiency

  • Bengali

    Native or bilingual proficiency

More activity by Uddipto

View Uddipto’s full profile

  • See who you know in common
  • Get introduced
  • Contact Uddipto directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Uddipto Dutta

Add new skills with these courses