Module 1 Cheatsheet - Data Science and Generative AI

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

23/10/2024, 13:55 about:blank

Module 1 Cheatsheet: Data Science and Generative AI


Popular GenAI tools
Name of model Usage Link
Data Robot A simple tool useful for data analysis and model building operations https://www.datarobot.com/
Mostly.AI Synthetic data generation https://mostly.ai/
ChatGPT GPT based model used for text and code generation based on natural language queries https://openai.com/chatgpt
DB Sensei Generate SQL queries for databases using natural language queries https://dbsensei.com/

Important prompts for data preparation


Task Prompt
Write a Python code that can perform the following tasks:
Read a CSV data file and load it to a data frame. Read the CSV file, located on a given file path, into a Pandas data frame, assuming that the first rows
of the file are the headers for the data.
Data cleaning: Identify and replace missing values per the Write a Python to perform the following tasks:
following guidelines. 1. Identify the attributes with missing values.
1. You replace the missing entries in columns containing 2. Segregate these attributes into categorical and continuous valued attributes.
categorical values with the most frequent entries 3. Drop the entire row if the value is missing in the target variable.
2. You replace the missing entries in columns with continuous 4. If the value is missing in a categorical attribute, replace the missing values with the most frequent
data with the mean value of the column. value in the column.
3. If a value is missing in the target column, you may need to 5. If the value is missing in a continuous value attribute, replace the missing values with the mean
drop that row value of the entries in the column.
Data Normalization: Normalize an attribute to its maximum Write a Python code to normalize the content under a given attribute in a data frame df to its
value. maximum value. Make changes to the original data, and do not create a new attribute.
Write a Python code to perform the following tasks.
1. Convert a data frame df attribute into indicator variables, saved as df1, with the naming
Converting categorical variable into indicator variables convention "Name_<unique value of the attribute>".
2. Append df1 into the original data frame df.
3. Drop the original attribute from the data frame df.

Author(s)
Abhishek Gagneja

about:blank 1/1

You might also like