Project Description
Project Description
Project Description
Overview
In this project, you will apply the business forecasting models and techniques you learned in class to solve
real-world problems. You will be provided a dataset related to S&P500 stock data and its associated news
information. You are required to submit a written report for grade assessment, together with your code
for reference. You must submit the project (zipped file containing the written report and code) no later
than the deadline. Please email me ([email protected]) if you have any questions about the project.
Data: Stock data (“SP500Panel2019.csv”) are collected from the Thomson Reuters platform covering the
period from January 2019 to December 2019. Text-related variables are constructed using NLP techniques
and deep learning models beforehand.
15. Litigious: Daily average of litigious score of financial news, measured based on the LM lexicon
16. StrongModal: Daily average of strong modal score of financial news, measured based on the LM
lexicon
17. WeakModal: Daily average of weak modal score of financial news, measured based on the LM
lexicon
18. Constraining: Daily average of constraining score of financial news, measured based on the LM
lexicon
Evaluation Criteria
In general, you should record your step-by-step progress. The possible efforts include data cleaning, missing
data handling, outlier detection, feature engineering/selection, learning algorithm selection, parameter
tuning, etc. Clarity and organization of your written report are important when evaluating your project.
Please justify why you believe the problem addressed in your project is important and describe the
techniques you used to tackle the problem and the rationale behind your approaches.
Please refer to the last section, “Report Structure,” for more details on the score breakdown.
Deduct 2 points for each extra page going beyond the 10-page limit.
Deduct 10 points for each day going beyond the submission deadline.
Final submission
In the final submission, you must submit a final written report and your Python code for plagiarism check.
The report should follow the specific structure, as indicated in the next section. The report must be in 12-
point Times New Roman font, single- or double-spaced, and left-justified. Figures, tables, and exhibits can
be smaller than 12-point Times New Roman font sizes. Text in figures, tables, and exhibits may be single-
spaced. The report MUST be no more than ten pages (in Word or PDF format), including references and
appendices. Please note that the report length will NOT be used as a criterion for grading. The written
3
News-Induced Market Signaling
report MUST be self-contained, in which you should include all necessary details and information you find
important in the report only rather than in the code or other files.
Report Structure
1. Introduction (20 points)
Describe the business problem you are going to tackle. You may want to put your business problem in a
larger context and motivate the importance of the issue addressed in your project.
Describe the dataset. You may consider the following aspects: the number of data records; the number of
features and a brief description of their meanings, attribute type, range, mean, skewness; outliers; class
imbalance.
It would help if you also considered data preprocessing, such as missing values and feature normalization.
You must implement ONE deep learning model and ONE conventional time series model as the benchmark.
For each model built, indicate the parameter values and describe the conclusions you can draw from it. You
may dedicate a specific subsection to each model used.
Some additional efforts you can consider to improve model performance: e.g., feature normalization,
feature discretization, feature selection, and parameter tuning. Provide a logical explanation of why you
made such an effort.
• Describe the benchmark (time series model) briefly and justify your choice.
• Describe the model architectures and parameters.
• Describe the deep learning model briefly and justify your choice.
• Describe the model architectures and parameters.
4
News-Induced Market Signaling
• Hypothesize TWO improvements that can be applied to the deep learning model with reasonings
and propose the corresponding changes.
• Implement the proposed changes.
Indicate the performance measures (e.g., accuracy, ROC, 𝑅 2) you have chosen to evaluate the performance
of the models built. You may want to summarize the performance of the built models using the chosen
performance measures in a table. In this way, it is easy to compare the performance of different models.
You also need to provide some reasoning on why and how models differ from each other in terms of
performance.
Summarize the problem to be addressed and how the conclusions drawn from the built models help you to
tackle the problem. List any potential problems as future work.