Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Bachelor of Technology in Computer Science
and Engineering
Report
On
Data Analysis in Big Data Using Python
Name Admission No
Tushar Verma 21SCSE1310012
Under the Guidance of Dr. K Suresh Sir
1
Introduction:
Big data analytics is the process of collecting, examining, and analyzing large
amounts of data to discover market trends, insights, and patterns that can help
companies make better business decisions. This information is available
quickly and efficiently so that companies can be agile in crafting plans to
maintain their competitive advantage.
Objective:
Big data analytics describes the process of uncovering trends,

patterns, and correlations in large amounts of raw data to help make
data-informed decisions. These processes use familiar statistical
analysis techniques—like clustering and regression—and apply them
to more extensive datasets with the help of newer tools.
Technologies Used:
Python programming language

PyData library for Data development
Integrated Development Environment (IDE) such as PyCharmor

Jupyter Notebook
Implementation Details:
a. Setting up the Environment:

Install Python and PyData library.
Create a new Python script or project in your preferred IDE.
b. Importing Required Libraries:
Import the necessary libraries, including PyData.

c. Initializing the Data Window:
Set up the Data window dimensions, title, and otherconfigurations.

2
d. Setting up the Data Loop:
Create a Data loop that continuously updates the data stateand redraws the
Data window.
e. Handling Keyboard Inputs:
Capture keyboard inputs to control the Big Data's movement.
Map the arrow keys or WASD keys to specific movementssuch as up,

down, left, and right.
f. Creating the Stack:
Implement the Big Data's initial position, size, and movementlogic.

Define functions to handle the Big Data's movement andgrowth.
g. Generating Food:
Randomly generate food within the Data window. Ensure the food does
not overlap with the Big Data's body.
h. Collision Detection:
Implement collision detection logic to check if the Big Datacollides with

the boundaries or its own body.
End the Data if a collision occurs.
i. Scoring and Data Over:
Keep track of the player's score based on the number of fooditems eaten.
Display the score on the Data window.
End the Data and display a Data over message when theBig Data collides.
j. Adding Data Over Options:
Provide options to restart the Data or exit the applicationafter the Data ends.
Challenges Faced:
3
During the development of the Snack Data, the followingchallenges were
encountered:
Implementing smooth and responsive Big Data movement.
Preventing the Big Data from moving in the opposite directioninstantly,
causing self-collision.
Managing the complexity of collision detection and preventing bugs related
to the Big Data's body and foodplacement.
Conclusion:
The Data analysis project successfully demonstrates the
development of a simple yet engaging Data using Python. Byfollowing the
implementation details outlined in this report, users can create their own
version of the Snack Data and further enhance it with additional features and
functionalities. The project provides a solid foundation forunderstanding Data
development concepts and Python programming techniques.
Source Code:
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder \
.appName("BigDataAnalysis") \
.getOrCreate()
# Load the big data file into a DataFrame

data = spark.read.csv("path/to/bigdata.csv", header=True, inferSchema=True)
# Perform data analysis operations

# Example 1: Count the number of rows in the DataFrame
row_count = data.count()
print("Number of rows:", row_count)
# Example 2: Perform aggregations

agg_result = data.groupBy("column_name").agg({"numeric_column": "sum"})
agg_result.show()
# Example 3: Apply filters

filtered_data = data.filter(data["column_name"] > 100)
filtered_data.show()
5
# Example 4: Perform joins

joined_data = data.join(another_data, data["common_column"] == another_data["common_column"],
"inner")
joined_data.show()
# Example 5: Perform machine learning tasks (e.g., clustering, classification)

from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
# Prepare features for clustering

assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
features_data = assembler.transform(data)
# Apply KMeans clustering

kmeans = KMeans(k=2, seed=0)
model = kmeans.fit(features_data)
# Get cluster predictions

predictions = model.transform(features_data)
predictions.show()
# Stop the SparkSession

spark.stop()
Note that this code assumes you have a running Spark cluster and that you have PySpark installed.
Additionally, you may need to modify the code based on your specific data and analysis requirements.

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

Copyright:

Available Formats

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

Copyright:

Available Formats

Bachelor of Technology in Computer Science

Under the Guidance of Dr. K Suresh Sir

Big data analytics describes the process of uncovering trends,

Python programming language

Integrated Development Environment (IDE) such as PyCharmor

a. Setting up the Environment:

Import the necessary libraries, including PyData.

Set up the Data window dimensions, title, and otherconfigurations.

Map the arrow keys or WASD keys to specific movementssuch as up,

Implement the Big Data's initial position, size, and movementlogic.

Implement collision detection logic to check if the Big Datacollides with

from pyspark.sql import SparkSession

# Load the big data file into a DataFrame

# Perform data analysis operations

# Example 2: Perform aggregations

# Example 3: Apply filters

# Example 4: Perform joins

# Example 5: Perform machine learning tasks (e.g., clustering, classification)

# Prepare features for clustering

# Apply KMeans clustering

# Get cluster predictions

# Stop the SparkSession

You might also like