Data Analysis using Python 3

Data Analyst is now a very lucrative job handle to have in the tech world, with the skills and patience of a data analyst one can analyze large chunks of Data in very short span of time which would have been otherwise impossible or would take up a long time.

Early January this year I did PH526x: Using Python for Research – (Harvard University) via edX, which taught me how to analyze .csv data files using Python 3.I scored well enough and passed the course.So, here I am presenting you, what I learned.

harvard_edx

It was fun and exciting to work throughout the course and learn new stuff.My repository  – Github_repo holds the codes for study cases I had in the course.Professor Jukka-Pekka “JP” Onnela is an excellent professor who would make you learn stuff with his excellent teaching style.It was an awesome experience.

So, Let’s analyze one of the case called: Bird Migration Analysis repo

Aim: Track the movement of three gulls namely – Eric, Nico & Sanne

Dataset: https://inbo.carto.com/u/lifewatch/datasets ; used dataset – csv

Summary: One fascinating area of research uses GPS to track movements of animals. It is now possible to manufacture a small GPS device that is solar charged, so you don’t need to change batteries and use it to track flight patterns of birds. The data for this case study comes from the LifeWatch INBO project. Several data sets have been released as part of this project. We will use a small data set that consists of migration data for three
gulls named Eric, Nico, and Sanne. The csv file contains eight columns and includes variables like latitude, longitude, altitude, and time stamps. In this case study, we will first load the data, visualize some simple flight trajectories, track flight speed, learn about daytime and much, much more.

Dependencies :

  • Matplotlib
  • Pandas
  • Numpy
  • Cartopy
  • Shapely

We will divide our case study into five parts:

  1. Latitude and Longitude
  2. 2D speed vs. Frequency
  3. Time and Date
  4. Daily Mean Speed
  5. Cartographic View

PART (1/5): Latitude and Longitude


In this part, we are going to visualize the location of the birds.We are going to plot latitude and longitude along y and x-axis respectively and visualize the data present in the csv file.

1

The code:

bird_migration_speed

In the code, Firstly we import the modules – pandas,matplotlib, and numpy. Then we import the csv file from the default directory (check default directory, >>>pwd or else change the path to the directory holding the csv file using >>>cd directory_address) into the variable birddata.

♦bird_names = pd.unique(birddata.bird_name) is used to find all the unique bird names from the csv file and save it to the variable bird_names using pandas dataframe and unique() function.

Next, we are going to select the latitude and longitude data only for the gull named “Eric”.We code,

♦ ix = birddata.bird_name == “Eric” 
   x,y = birddata.longitude[ix], birddata.latitude[ix]
   plt.figure(figsize = (7,7))
   plt.plot(x,y,”b.”)

Here, we are specifying the variable ix to contain the data of the column named “bird_name” from the csv file having the name of the bird as “Eric”.Next, we are specifying x to hold longitude data and y to hold latitude data of “Eric”.We use the matplotlib function, figure() to initialize it’s size as 7 x 7 and plot it using the plot() function(learn matplotlib).The parameters inside the function plot() i.e x, y and “b.” are specifying to use longitude data along x axis, latitude along y and b=blue, . = circles in the vizualization.

Output: enlarged_view

figure_1_Eric's_trajectory

 But Now, to look at all the bird’s trajectories, we plot each bird in the same figure.We code,

♦  plt.figure(figsize = (7,7))
    for bird_name in bird_names:
      ix = birddata.bird_name == bird_name 
      x,y = birddata.longitude[ix], birddata.latitude[ix]
    plt.plot(x,y,”.”, label=bird_name)
    plt.xlabel(“Longitude”)
    plt.ylabel(“Latitude”)
    plt.legend(loc=”lower right”)
    plt.show()

Here, we are plotting the location for all the three gulls namely Eric, Nico and Sanne.We create a 7 x 7 figure using plt.plot(figsize = (7,7)).We store unique bird’s name in the variable ix, longitude and latitude data in the variales x and y repectively and we over every data using a for loop.Next we, plot the data x and y using “.” = circular marks and we add a label named “bird_data”.We also use labels Longitude and Latitude along x and y axis respectively using xlabel() and ylabel() functions.legend() is used to locate the info bar in the plot, which is initialized to lower right.Finally, we use the show() function to get the visualized data for all the three gulls.

Output : enlarged_view

figure_2_bird_trajectories

PART (2/5): 2D Speed Vs Frequency


In this second part of the case study, we are going to visualize 2D speed Vs Frequency for the gull named “Eric”.

The Code:

bird_migration_trajectories_lat.long

♦ ix = birddata.bird_name == “Eric” 
  speed = birddata.speed_2d[ix]
  Here, we load bird data for the gull “Eric” into the variable ix and speed data of the     same gull “Eric” into the variable speed.

♦ plt.figure(figsize = (8,4))
   ind = np.isnan(speed)
   plt.hist(speed[~ind], bins=np.linspace(0,30,20), normed=True)
   plt.xlabel(” 2D speed (m/s) “)
   plt.ylabel(” Frequency “)
   plt.show()

We plot a 8 x 4 figure and allot isnan speed data into ind.We find out the count of non numeric entries, False=0 & True =1 using the isnan() function.Next, we plot a histogram using the hist() function.The parameters speed[~ind] indicates that we will include only those entries for which ind != True, bins=np.linspace(0,30,20) indicates the bins along x axis will vary from 0 to 30 with 20 bins within them linearly spaced.Lastly, we plot 2D speed in m/s along x-axis and Frequency along y-axis using the xlabel() and ylabel() functions respectively and plot the data using plt.show().

Output: enlarged_view

figure_3_speed

PART (3/5): Time and Date


The third part is associated with date and time.We are going to visualize the time(in days ) required by Eric to reach constant distances.If he requires same time to cover almost same distances, then the curve will be linear.

bird_migration_date.time

We import the libraries matplotlib, pandas, and datetime.

∇  timestamps = []
    for k in range(len(birddata)):

timestamps.append(datetime.datetime.strptime(birddata.date_time.iloc[k][:-3],”%Y-%m-%d %H:%M:%S”))

we create an empty list called timestamps and append date-time data of the birds to it.

”’ >>>datetime.datetime.today() #returns the current Date (yy-mm-dd) & time (h:m:s).

   >>>date_str[:-3] #slices/removes the UTC +00 coordinated time stamps.

    >>>datetime.datetime.strptime(date_str[:-3], “%Y-%m-%d %H:%M:%S”) ,the time-stamp strings from date_str are converted to datetime object to be worked upon. “%Y-%m-%d %H:%M:%S” is the Year-Month-Date and Hour-Minute-Second format”’

The next step for us is to construct a panda series object and insert the timestamp from our Python list into that object. We can then append the panda series as a new column in my bird data, data frame.

∇ birddata[“timestamp”] = pd.Series(timestamps, index = birddata.index)

What we’d like to do next is to create a list that captures the amount of time
that has elapsed since the beginning of data collection.

∇ times = birddata.timestamp[birddata.bird_name == “Eric”]
   elapsed_time = [time-times[0] for time in times]

we calculated the elapsed time for the gull Eric

∇ plt.plot(np.array(elapsed_time)/datetime.timedelta(days=1))
   plt.xlabel(” Observation “)
   plt.ylabel(” Elapsed time (days) “)
   plt.show()

We plot the observation(reference points at constatnt distances) along x axis vs elapsed time( in days ) along y axis.We label our plot using xlabel() and ylabel() as Observation and Elapsed time (days) respectively along x ang y axis.We Observe the curve.

Output : enlarged_view

figure_4_time_stamp

PART (4/5): Daily Mean Speed


We are going to visualize Daily mean speed of the gull named “Eric” for the total number of days of recorded flight.

bird_migration_daily_mean_speed.JPG

Up to line 16, we borrowed the code from part (3/5).

Next, we enumerate the elapsed_days and hold its returned tuple of an index and elapsed days in i and t respectively.Until and unless the elapsed day has not reached the next day we append the index to the empty list inds.

Or else If we reach the next day, we append speed data to daily mean speed and increment next_day by 1.

Lastly, we plot the figure of size 8 x 6

∇  plt.plot(daily_mean_speed, “rs-“)
    plt.xlabel(” Day “)
    plt.ylabel(” Mean Speed (m/s) “);
    plt.show()

In the plot code, r represent red, s represents square (for the turning points) and – represents the visualization style of the curve.We next, label the x and y axis using the xlabel() and ylabel() as “Day” and ” Mean Speed (m/s) ” respectively.Lastly, we show() the final plot.

Output: enlarged_view

figure_5_mean.avg.speed_perday

PART (5/5): Cartographic View


In this last part, i.e part 5, we are going to track the Birds over Political Map.

bird_migration_cartographic

We import the cartopy and matplotlib module along with its salient libraries.

∇ proj = ccrs.Mercator()

To move forward, we need to specify a specific projection that we’re interested in using.So we are using the cartopy Mercator() function and initializing it to proj.

∇  plt.figure(figsize=(10,10))
    ax = plt.axes(projections=proj)
    ax.set_extent((-25.0, 20.0, 52.0, 10.0))
    ax.add_feature(cfeature.LAND)
    ax.add_feature(cfeature.OCEAN)
    ax.add_feature(cfeature.COASTLINE)
    ax.add_feature(cfeature.BORDERS, linestyle=’:’)

We plot a 10 x 10 figure and draw an axis with projection along the variable proj.Next, we add the political features like Land, Ocean, Coastline and borders into our plot.We automatically get the Political shapes and features mentioned above according to the gps locations (i.e, latitude and longitude ) present in our data.

∇  for name in bird_names:
      ix = birddata[‘bird_name’] == name
      x,y = birddata.longitude[ix], birddata.latitude[ix]
      ax.plot(x,y,’.’, transform=ccrs.Geodetic(), label=name)

    plt.legend(loc=”upper left”)
    plt.show()

For every gull we plot its latitude and longitude data and use the Geodic() function to conform with the Geographical features.

Lastly, we visualize the mapped data of the gulls.

Output : enlarged_view

figure_6_bird_cartographic

Resources :

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s