Casmir Anyaegbu’s Post

View profile for Casmir Anyaegbu, graphic

Data Scientist | Data Analyst |Sales Analyst | Python | Pandas |Seaborn | Machine Learning | R | SQL | Power BI | Tableau | Looker Studio| Excel | STATA | Eviews |Dashboard| Researcher

#Boom #My_Models have been #trained #Project_title: Sentiment Analysis for Customer Feedback: Product and Service Improvements with Precision The processing times and speeds provide insight into the efficiency of the data preprocessing stage for this datasets. Here's a breakdown of what these figures imply: #Training_Dataset (3.6 Million Reviews) - Total Time_Taken: 41 minutes and 31 seconds - Processing Speed: 1444.68 iterations per second (it/s) #Test_Dataset (400,000 Reviews) - Total Time Taken: 4 minutes and 28 seconds - Processing Speed: 1488.19 iterations per second (it/s) #Observations_and_Implications: 1. Processing Speed: The speed of processing, measured in iterations per second, is relatively consistent between the training and test datasets. The slight variation could be due to differences in system resources, data structure, or concurrent processes running during the preprocessing. 2. Scalability: The fact that the system can handle a large dataset of 3.6 million reviews within a reasonable timeframe (41 minutes) demonstrates good scalability and efficiency of the preprocessing step. This efficiency is crucial for projects involving large-scale data, where preprocessing can become a bottleneck. 3. Progress Feedback: The use of tqdm to provide progress feedback is particularly beneficial in such scenarios. It not only gives an estimate of the time required but also helps in monitoring the process for any potential issues. 4. Optimization Opportunities: While the current processing speed is efficient, there may still be room for optimization. This could involve parallel processing, more efficient algorithms for stopword removal, or hardware improvements. 5. Practical Considerations: The total processing time and speed are important metrics for planning and resource allocation, especially if preprocessing is a recurring task or if different datasets need to be processed frequently. Overall, these metrics are a positive indication of the system's capability to handle extensive preprocessing tasks, an essential step in preparing data for subsequent analysis or machine learning modeling. The code is in the comment session AMDARI, #NLP #DataScience, #PredictiveModelling #SentimentAnalysis #CustomerFeedback #ServiceImprovements Omowumi Victoria Samson

Casmir Anyaegbu

Data Scientist | Data Analyst |Sales Analyst | Python | Pandas |Seaborn | Machine Learning | R | SQL | Power BI | Tableau | Looker Studio| Excel | STATA | Eviews |Dashboard| Researcher

4mo

total_rows = len(test_dataset) tqdm.pandas(total = total_rows) test_dataset['stop words'] = test_dataset['text'].progress_apply(remove_stopwords)

Like
Reply
Casmir Anyaegbu

Data Scientist | Data Analyst |Sales Analyst | Python | Pandas |Seaborn | Machine Learning | R | SQL | Power BI | Tableau | Looker Studio| Excel | STATA | Eviews |Dashboard| Researcher

4mo

##i would love to see a progress bar when we process for all the 3.6 million reviews total_rows = len(train_dataset) tqdm.pandas(total = total_rows) train_dataset['stop words'] = train_dataset['text'].progress_apply(remove_stopwords)

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics