The data I’m exploring is the Global Terrorism Database (GTD), which can be found here: https://www.kaggle.com/START-UMD/gtd. So far, I have used Tableau to map all terrorist attacks with the latitude and longitude information and color coded them by year. I also made a bar graph of the number of terrorist attacks by country and by year.
One interesting observation I’ve made so far is that although Iraq by far has the most terrorist attacks, a significant amount (I’m approximating 95% or more of them) occurred on or after 2003, even though the data encompasses terrorist attacks from 1970 to 2017. I thinking of exploring why with background information in the future.
I began trying to see if population influences the number or severity of terrorist acts, but difficulties have arise. Since GTD doesn’t include population information, I used a different dataset to get the population of each US city from 2010 to 2019 (https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-cities-and-towns.html#tables, Incorporated Places: 2010 to 2019 United States Dataset). I made revised versions of GTD dataset to only include information from the US from 2010 to 2017. I have also already made revised datasets so that the two can form a “relationship” with the city information, but I’m unable to make a relationship with the year information. This has caused me to be unable to easily integrate the data together so that I can get the population information for the city and year when a terrorist attack occurred.
I haven’t been able to find a simple solution to this problem besides through coding. I haven’t coded this yet, since the code could be complicated and I’m unsure what language to use, but I do have an idea of what to do to get this information without manually inputing it. Below is pseudocode for what I think a solution could be:
create new column in Terrorist US Data called Population
for all rows in Terrorist US Data (i = 1:n)
for all rows in Population US Data
if city names from both datasets match
capture row value in Population US Data where cities match (r)
stop for loop
else
go to next iteration / row in Population US Data
for certain columns (ones for year) in Population US Data
if years from both datasets match
capture column value in Population US Data where years match (c)
stop for loop
else
go to next iteration / column in Population US Data
Terrorist US Data [i, Population] = Population US Data [r,c]
If you have any advice or suggestions, I would appreciate it if you let me know.
Thank you!