{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Group numeric values into categories (bins)\n", "\n", "We often have a column with many numeric values and we want into group them to bins or buckets, such as age groups or value tiers. In Excel we can do it using and of the following options: \n", "\n", "- Data Menu -> Data Analysis -> Histogram or Rank and Percentile\n", "- _VLOOKUP_ (`=LOOKUP(A1,{0,7,14,31,90,180,360},{\"0-6\",\"7-13\",\"14-30\",\"31-89\",\"90-179\",\"180-359\",\">360\"})`, for example), or \n", "- _IF_ (`=if(b2>30,\"large\",if(A1>20,\"medium\",if(A1>=10,\"small\",if(A1<10,\"tiny\",\"\"))))`, for example)\n", "- _INDEX_ (`=INDEX({\"Small\",\"Medium\",\"Large\"},LARGE(IF(A1>{0,11,21},{1,2,3}),1))`, for example)\n", "\n", "\n", "[](https://studiolab.sagemaker.aws/import/github/aiola-lab/from-excel-to-pandas/blob/master/notebooks/02.08_group_values.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cut Function\n", "\n", "Since this is a common need, Pandas has built-in functions _cut_ and _qcut_ that make it more flexible and accurate. We start with importing pandas." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Real-life example\n", "\n", "Let's analyze data regarding the impact of weatehr information on numbers of bike rentals. We know that when it is too cold or too hot, people rent less bike. But what is the cutoff temperature value? When we listen to the next 10 days weather forecast, how should we prepare our bike rental fleet to meet the expected demand?\n", "\n", "### Loading Data\n", "\n", "As usual, let's load a dataset to work on. We will use the dataset that is used before, \"Bike Share\". This is a data set about the demand of bike share service in Seoul. Please note that we need to modify the default encoding of _read_csv_ from 'UTF-8' to 'latin1'." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Date | \n", "Rented Bike Count | \n", "Hour | \n", "Temperature(°C) | \n", "Humidity(%) | \n", "Wind speed (m/s) | \n", "Visibility (10m) | \n", "Dew point temperature(°C) | \n", "Solar Radiation (MJ/m2) | \n", "Rainfall(mm) | \n", "Snowfall (cm) | \n", "Seasons | \n", "Holiday | \n", "Functioning Day | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "01/12/2017 | \n", "254 | \n", "0 | \n", "-5.2 | \n", "37 | \n", "2.2 | \n", "2000 | \n", "-17.6 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Winter | \n", "No Holiday | \n", "Yes | \n", "
1 | \n", "01/12/2017 | \n", "204 | \n", "1 | \n", "-5.5 | \n", "38 | \n", "0.8 | \n", "2000 | \n", "-17.6 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Winter | \n", "No Holiday | \n", "Yes | \n", "
2 | \n", "01/12/2017 | \n", "173 | \n", "2 | \n", "-6.0 | \n", "39 | \n", "1.0 | \n", "2000 | \n", "-17.7 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Winter | \n", "No Holiday | \n", "Yes | \n", "
3 | \n", "01/12/2017 | \n", "107 | \n", "3 | \n", "-6.2 | \n", "40 | \n", "0.9 | \n", "2000 | \n", "-17.6 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Winter | \n", "No Holiday | \n", "Yes | \n", "
4 | \n", "01/12/2017 | \n", "78 | \n", "4 | \n", "-6.0 | \n", "36 | \n", "2.3 | \n", "2000 | \n", "-18.6 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Winter | \n", "No Holiday | \n", "Yes | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
8755 | \n", "30/11/2018 | \n", "1003 | \n", "19 | \n", "4.2 | \n", "34 | \n", "2.6 | \n", "1894 | \n", "-10.3 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Autumn | \n", "No Holiday | \n", "Yes | \n", "
8756 | \n", "30/11/2018 | \n", "764 | \n", "20 | \n", "3.4 | \n", "37 | \n", "2.3 | \n", "2000 | \n", "-9.9 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Autumn | \n", "No Holiday | \n", "Yes | \n", "
8757 | \n", "30/11/2018 | \n", "694 | \n", "21 | \n", "2.6 | \n", "39 | \n", "0.3 | \n", "1968 | \n", "-9.9 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Autumn | \n", "No Holiday | \n", "Yes | \n", "
8758 | \n", "30/11/2018 | \n", "712 | \n", "22 | \n", "2.1 | \n", "41 | \n", "1.0 | \n", "1859 | \n", "-9.8 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Autumn | \n", "No Holiday | \n", "Yes | \n", "
8759 | \n", "30/11/2018 | \n", "584 | \n", "23 | \n", "1.9 | \n", "43 | \n", "1.3 | \n", "1909 | \n", "-9.3 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "Autumn | \n", "No Holiday | \n", "Yes | \n", "
8760 rows × 14 columns
\n", "