This is the homestretch! All my focus is on writing the final paper, which is due about a week from now! I’m fighting on and wishing for success!
Week 10
This week, more of my report has been written. More thought has been done about how I want to organize the paper, how I’m going to integrate figures into the paper, and how I want to state my findings in a concise manner. This resulted in the Analytic Methods and Results sections to be written and/or edited. The Results section is not finished yet, but more will be added over time.
Thank you for reading!
Week 9
This week, it was discovered that the latitude and longitude for each terrorist attack wasn’t as accurate as initially thought. Although the latitude and longitude are extremely accurate, as they both have 4 or more decimal places, it was thought that the location recorded was the exact place where the terrorist attacks occurred. It turns out that the latitude and longitude for each attack in a given town are the same, meaning that the data is showing that all terrorist attacks are only occurring in one specific location, no matter the timespan. This observation will be written as a source of error or limitation with the data given.
Thank you for reading!
Week 8
For this week, I have been making the figures to be both integrated into the report and into the appendix. Importance has been placed into trying to make all the graphics the same size so that the text size is consistent with each figure. The figures that will be in the report include bar graphs, line graphs, geographic maps, and circle graphs. There are over 20 figures so far, but this may change in the future.
Thank you for reading!
Week 7
For the month of March, I’m focusing on gathering everything I had done and formalizing it into a report. Lots have been done, such as generating geographic histograms for all the subsets of data I have, which are: 1970-2017, 1970-2009, and 2010-2017 for the United States, and 1970-2017, 1970-2002, and 2003-2017 for Iraq. Most of the graphs produced by Tableau have been put into a formal presentation, data has been spliced and modified for analytic purposes, and code has been written to cross-reference the two datasets to find the population information for each terrorist attack. Mostly everything I have done so far will be in the midterm report. The report is already about 9 pages, not including the title page and references, with only the Methods section so everything will be included in due time. The report, as well as the appropriate files, will be posted whenever it becomes due.
Thanks for reading!
Week 6
Here are graphs of the terrorists attacks in Iraq and USA from 1981 to 2017. One interesting observation is that some of the terrorist attacks aren’t directly in the country the dataset states it took place in. For example, there are a couple of terrorist attacks physically in China, but state the attack was directed towards the US. It is still being determined if these data points will be included in the data.
Again, it seems that terrorist attacks were less common in the 1900’s compared to the 2000’s so background information needs to be found about exactly why. Another observation I quickly saw was that a significant amount of terrorists attacks in Italy were from 1900’s. The opposite is true for Yemen and neighboring countries. These are also possible options for analysis.
Week 5
For this week, I wanted to identity my focus for my analysis on the GTD dataset. This most likely will be changed in the future, but this will serve as a good start.
USA
- Determine if a relationship exists between population and number of attacks. Unsure how to statistically find this relationship yet (linear / logistic regression?).
- Gather population of city for year of attack through Population data first
- Map the cities and use marker(s) to denote population number and/or number of attacks
- Determine if a relationship exists between population and severity of attacks. Unsure how to statistically find this relationship yet (linear / logistic regression?).
- Gather population of city for year of attack through Population data first
- Map the cities and use marker(s) to denote population number and/or severity of attack
- Define what severity of attack is (deaths, injured, destruction, etc)
- Form clusters to see areas where attacks happen
- Through mapping the cities, it seems there are patches of the USA where attacks haven’t occurred. Possibly research background info into why.
Iraq
- Study the difference in number of attacks for the years 1970-2017
- Research background info on why number of attacks increased substantially in the 2000’s
- Study severity of attacks from 1900’s and 2000’s
- Is there a difference? If so, research possibilities on why
- Find where terrorists groups in the country live
- Correlation between terrorists location and number / severity of attacks is possible
- Form clusters to see where attacks happen
- Through mapping the attacks, there seems to be patches where attacks haven’t occurred. Possibly research background info into why.
- If find population info, then do same population analysis as for USA
Week 4
The data I’m exploring is the Global Terrorism Database (GTD), which can be found here: https://www.kaggle.com/START-UMD/gtd. So far, I have used Tableau to map all terrorist attacks with the latitude and longitude information and color coded them by year. I also made a bar graph of the number of terrorist attacks by country and by year.
One interesting observation I’ve made so far is that although Iraq by far has the most terrorist attacks, a significant amount (I’m approximating 95% or more of them) occurred on or after 2003, even though the data encompasses terrorist attacks from 1970 to 2017. I thinking of exploring why with background information in the future.
I began trying to see if population influences the number or severity of terrorist acts, but difficulties have arise. Since GTD doesn’t include population information, I used a different dataset to get the population of each US city from 2010 to 2019 (https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-cities-and-towns.html#tables, Incorporated Places: 2010 to 2019 United States Dataset). I made revised versions of GTD dataset to only include information from the US from 2010 to 2017. I have also already made revised datasets so that the two can form a “relationship” with the city information, but I’m unable to make a relationship with the year information. This has caused me to be unable to easily integrate the data together so that I can get the population information for the city and year when a terrorist attack occurred.
I haven’t been able to find a simple solution to this problem besides through coding. I haven’t coded this yet, since the code could be complicated and I’m unsure what language to use, but I do have an idea of what to do to get this information without manually inputing it. Below is pseudocode for what I think a solution could be:
create new column in Terrorist US Data called Population
for all rows in Terrorist US Data (i = 1:n)
for all rows in Population US Data
if city names from both datasets match
capture row value in Population US Data where cities match (r)
stop for loop
else
go to next iteration / row in Population US Data
for certain columns (ones for year) in Population US Data
if years from both datasets match
capture column value in Population US Data where years match (c)
stop for loop
else
go to next iteration / column in Population US Data
Terrorist US Data [i, Population] = Population US Data [r,c]
If you have any advice or suggestions, I would appreciate it if you let me know.
Thank you!
Week 3
This week, I will be receiving the dataset I’ll be working on for the rest of the semester. I’m not sure what topics will pertain to the data, but I’m looking forward to finding background information about the data and experimenting upon it!
Week 2
This week, I looked into possible datasets to use throughout the semester. Some of the documentation for the data is either quite complex or lacking crucial details, so I have to look at the data manually to see if I can use them for the project. I will be looking into these datasets manually starting next week and picking a dataset before the deadline. All the datasets I’m considering are below.
~ Miya Spinella
UFO Sightings
https://datarepository.wolframcloud.com/resources/UFO-Sightings-2015
Suicide Rate by County
https://datarepository.wolframcloud.com/resources/USSuicideRatesbyCounty
Suicide Rate By Year
https://datarepository.wolframcloud.com/resources/US-County-Suicide-Data-1999-2013
UPS Facilities
https://datarepository.wolframcloud.com/resources/UPS-Facilities
1918 Spanish Flu
https://datarepository.wolframcloud.com/resources/1918-Spanish-Flu-Pandemic-In-Chicago
Uber Trips in NYC
https://data.world/data-society/uber-pickups-in-nyc
SIDS in NC
https://geodacenter.github.io/data-and-lab//sids2/
Heat Deaths in CA
https://catalog.data.gov/dataset/heat-related-deaths-among-california-residents-may-september-2000-2009
Chicago Crime
https://catalog.data.gov/dataset/crimes-2001-to-present-398a4
CT Crime
https://crime-data-explorer.fr.cloud.gov/explorer/state/connecticut/crime