Data Analytics Using Python
Thanks to “https://suvenconsultants.com/” for providing us with such an awesome task and giving us an opportunity to earn a certificate…
Well, the task in this project is to perform analysis on the given dataset and either prove or reject the given hypothesis.
The given Null Hypothesis is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”.
We will use python for the analysis of the data.
Exploration of Dataset
Lets first start with importing the required modules and library.
Now, I have imported the data file located in the same working directory . You can download the file from “https://www.kaggle.com/muthuj7/weather-dataset”.
The read_csv() function is used to take an csv file as an input and create a pandas dataframe. Here the dataframe is named df.
Lets have an overlook of the data. For that, head() method is used.
head() method returns top 5 values (if no argument is passed) from the dataframe.
Here, we have several columns that show different features of weather on the given time. Readings are recorded hourly.
For our analysis, we only want three columns, i.e., “Formatted Date”, “Apparent Temperature (C)” and “Humidity”. So, we will strip those columns.
Now, our data looks something like this.
Before moving forward, let us look out for any missing value in the data.
isnull() returns true for all the missing values.
sum() method is used to sum up all the missing values and have a brief summary.
We can see that there are no missing values present in the dataset. So we can move forward.
Resampling
The entries in the given dataset were created hourly. That means that the dataset shows weather status for every hour. But, we want to compare the features of one month from the same month of next year and samefor the next 10 years. So, we have to resample our data and convert the entries from hourly to monthly.
For resampling, we can use resample() method of pandas library. But it requires the formatted date to be in UTC format. Also, it will resample the data only if the formatted data is set as the index of the dataframe. To accomplish these tasks, we can use to_datetime() and set_index() methods.
to_datetime() method is used to convert the given date of any format in standard formats.
set_index() method sets given column as the index of the dataframe.
Here is how our data look so far.
Now, we can easily resample our data as per our requirements.
resample() method resamples the data from one duration to another. “M” here represents “Month” and tells the resample() method to resample the data and convert it into monthly format.
mean() method here describes that the average values of other columns are to be selected in the mothly representation.
This is our dataframe now.
Separating And Classifying Data
Now, I will separate the entries of different months in different frames. That means. I will be separating and storing all the entries belonging to the month of January in a single dataframe.
This to be done for every month.
Now, I will store these dataframes in a list so that I can loop over them whenever I need. Also, I will create a list mentioning all months. This will help me when I will create visualizations.
Creating Visualization for Analysis
Atlast, we have arrived at the last and most important part of the project. Now, we have to create appropriate graphs and plots to visualize and hence analyse the data.
This is the code to plot line graphs of Apparent temperature of same month over 10 years.
From above plots, we can see that there are sudden changes in Apparent Temperature of same month over different year.
Now, we have to do the same thing for Humidity.
Again we can see sudden changes in the Humidity of same month over 10 years. But the change in Humidity is not as large as that of Apparent Temperature. To visualize it, we can plot percent change of Apparent Temperature and Humidity. Also as we are taking percentages in consideration, we can plot Apparent Temperature and Humidity on the same graph. This would also help us compare the changes in both the features.
From above Graphs, we can see that Humidity is almost constant for most of the months and for few months the values of Humidity level changes for maximum 20percent. But for Apparent Temperature, the changes are large, sudden and bi directional. That means that for some year the Apparent Temperature has fallen drastically and for others, it rose suprisingly.
Conclusion
From our data analysis, we have concluded that there is not much relation between Global warming and Apparent Temperature/Humidity. Humidity has remained almost constant while Apparent Temperature has shown both increases and decreases.