Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Bachelor of Technology in Computer Science

and Engineering
Report
On
Data Analysis in Big Data Using Python

Name Admission No
Tushar Verma 21SCSE1310012

Under the Guidance of Dr. K Suresh Sir

1
Introduction:

Big data analytics is the process of collecting, examining, and analyzing large
amounts of data to discover market trends, insights, and patterns that can help
companies make better business decisions. This information is available
quickly and efficiently so that companies can be agile in crafting plans to
maintain their competitive advantage.

Objective:

Big data analytics describes the process of uncovering trends,


patterns, and correlations in large amounts of raw data to help make
data-informed decisions. These processes use familiar statistical
analysis techniques—like clustering and regression—and apply them
to more extensive datasets with the help of newer tools.

Technologies Used:

Python programming language


PyData library for Data development

Integrated Development Environment (IDE) such as PyCharmor


Jupyter Notebook

Implementation Details:

a. Setting up the Environment:


Install Python and PyData library.
Create a new Python script or project in your preferred IDE.
b. Importing Required Libraries:

Import the necessary libraries, including PyData.


c. Initializing the Data Window:

Set up the Data window dimensions, title, and otherconfigurations.


2
d. Setting up the Data Loop:

Create a Data loop that continuously updates the data stateand redraws the
Data window.
e. Handling Keyboard Inputs:
Capture keyboard inputs to control the Big Data's movement.

Map the arrow keys or WASD keys to specific movementssuch as up,


down, left, and right.
f. Creating the Stack:

Implement the Big Data's initial position, size, and movementlogic.


Define functions to handle the Big Data's movement andgrowth.

g. Generating Food:
Randomly generate food within the Data window. Ensure the food does
not overlap with the Big Data's body.
h. Collision Detection:

Implement collision detection logic to check if the Big Datacollides with


the boundaries or its own body.
End the Data if a collision occurs.
i. Scoring and Data Over:

Keep track of the player's score based on the number of fooditems eaten.
Display the score on the Data window.
End the Data and display a Data over message when theBig Data collides.
j. Adding Data Over Options:

Provide options to restart the Data or exit the applicationafter the Data ends.
Challenges Faced:

3
During the development of the Snack Data, the followingchallenges were
encountered:
Implementing smooth and responsive Big Data movement.
Preventing the Big Data from moving in the opposite directioninstantly,
causing self-collision.
Managing the complexity of collision detection and preventing bugs related
to the Big Data's body and foodplacement.
Conclusion:
The Data analysis project successfully demonstrates the
development of a simple yet engaging Data using Python. Byfollowing the
implementation details outlined in this report, users can create their own
version of the Snack Data and further enhance it with additional features and
functionalities. The project provides a solid foundation forunderstanding Data
development concepts and Python programming techniques.

Source Code:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
.appName("BigDataAnalysis") \
.getOrCreate()

# Load the big data file into a DataFrame


data = spark.read.csv("path/to/bigdata.csv", header=True, inferSchema=True)

# Perform data analysis operations


# Example 1: Count the number of rows in the DataFrame
row_count = data.count()
print("Number of rows:", row_count)

# Example 2: Perform aggregations


agg_result = data.groupBy("column_name").agg({"numeric_column": "sum"})
agg_result.show()

# Example 3: Apply filters


filtered_data = data.filter(data["column_name"] > 100)
filtered_data.show()
5

# Example 4: Perform joins


joined_data = data.join(another_data, data["common_column"] == another_data["common_column"],
"inner")
joined_data.show()

# Example 5: Perform machine learning tasks (e.g., clustering, classification)


from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler

# Prepare features for clustering


assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
features_data = assembler.transform(data)

# Apply KMeans clustering


kmeans = KMeans(k=2, seed=0)
model = kmeans.fit(features_data)

# Get cluster predictions


predictions = model.transform(features_data)
predictions.show()

# Stop the SparkSession


spark.stop()
Note that this code assumes you have a running Spark cluster and that you have PySpark installed.
Additionally, you may need to modify the code based on your specific data and analysis requirements.

You might also like