
Scrapping Data using Python
A Python application designed to generate a histogram depicting the frequency of articles published on Google News in 2022 concerning '@celebjets'.

Background
Created by then teenager Jack Sweeny in 2020, @celebjets (now suspended) was a twitter account that tracked the location of celebrities' jets. The account gained worldwide notoriety through 2021 and 2022 mainly due to Jack Sweeny's altercation with Elon Musk regarding privacy and safety concerns with tracking Elon's Jet. But more importantly the posts from the account brought to light conversations on the 'vanity-filled' lifestyle of celebrities and the significant CO2 footprints they leave with their obnoxious use of private jets.
Problem Formulation, Decomposition and Abstraction
With the given prompt at hand, we need to exhaustively understand the problem space in order to efficiently and effectively move from the undesired to the desired state of affairs. The problem at hand is not monolithic, thus, we need to break it down before conceptualizing any solutions.
Breaking down the problem requires the employment of the Computational Thinking concept known as Decomposition. By separating the problem at hand into sub-problems, the task becomes more approachable as one is quickly able to see how possible conceptual frameworks (in the form of existing python commands) can be employed and knitted to solve the problem.
However, before diving into decomposition, we need to understand that the prompt does not encapsulate the entire problem space. Key components of the problem space deal with the nature of the file news-celebjets.txt. How is the data organized? Where are the dates stated? What is the format of the dates? How can we work with this format? Thus, I ran the html code and manually viewed a sample (the first ten) of the articles to attain a brief idea of the nature of the data. My findings were as follows:
The data is not primary but secondary data: some analysis has already been made.
Data is very well structured: Article Cover Picture; Logo and name of Publisher; Title of Article (Hyperlinked); The Date
List of Articles appears to consistently follow the structure stated above.
The dates appear to have the same format through out: Month and day; example: Dec 14
Intuitively, from my findings, I have already applied the Computational Thinking concept of Patterns and Generalizations. By identifying the repeated structure of the list of articles, I wondered, can loops or some other iterative command assist me with extracting the dates?
With these insights at hand, I attained a greater understanding of the problem space. Consequently, I employed the General to Specific decomposition technique The General to Specific technique involves breaking down a problem from a general perspective and then adding specific and more detailed components. For the given problem, as it is not open-ended and specific requirements were given, I found this technique to be the most appropriate.
The results of my decomposition are as follows:
General Problem: Analyze news articles and create a histogram representing the number of articles published per week
Listed below with the alphabets a, b, c,d and e are the "definitions of the desired characteristics of the solution" = Subproblems. To address these charactersitics/subproblems, we need to get specific. Hence, below each Subproblem, listed in Roman numerals, are the specifications written in pseudocode. Note, that with the exception of subproblem "a" and c", I relied heavily on Chatgpt to write out the specifics for the other subproblems as I had zero experience with the commands required.
a. Read the scrapped data from the text file (news-celebjets.txt).
b. Find the publishing dates of the news articles.
c. Sort the publishing dates.
e. Plot a histogram to represent the number of article count per week
Figure 1: Abstraction

Algorithmic Solution and the Agile Process
An algorithm is a well-defined sequence of instructions that takes one or more input values and produces output values. Per the abstraction above, we have an idea of the desired solution's input, output and sequence of instructions.
In the main.py file attached, I have generated a solution - pictured below in Figure 2. Following the decomposition phase's specifics, I have extensively commented throughout the lines of code on my reasoning and methods, which I shall not repeat here. Instead, in this section, I would comment on the role of Chatgpt in my Agile solution creation process.
Figure 2: My Final Histogram
