Skip to content

Instantly share code, notes, and snippets.

@ikouchiha47
Last active February 4, 2026 04:49
Show Gist options
  • Select an option

  • Save ikouchiha47/a99ee81a18cbb732dc2a0031fb81703d to your computer and use it in GitHub Desktop.

Select an option

Save ikouchiha47/a99ee81a18cbb732dc2a0031fb81703d to your computer and use it in GitHub Desktop.
Curated Library API's for LLM

Pandas API Reference Documentation with Examples

Complete API reference with examples for pandas library

Extracted from: https://pandas.pydata.org/docs/reference/


Table Of Contents

  • Input/output
  • General functions
  • Series
  • DataFrame
  • pandas arrays, scalars, and data types
  • Index objects
  • Date offsets
  • Window
  • GroupBy
  • Resampling
  • Style
  • Plotting

Input/output

Pickling

Function Description
read_pickle(filepath_or_buffer[, ...]) Load pickled pandas object (or any object) from file and return unpickled object.
DataFrame.to_pickle(path, *[, compression, ...]) Pickle (serialize) object to file.

Flat file

Function Description
read_table(filepath_or_buffer, *[, sep, ...]) Read general delimited file into DataFrame.
read_csv(filepath_or_buffer, *[, sep, ...]) Read a comma-separated values (csv) file into DataFrame.
DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) Write object to a comma-separated values (csv) file.
read_fwf(filepath_or_buffer, *[, colspecs, ...]) Read a table of fixed-width formatted lines into DataFrame.

Clipboard

Function Description
read_clipboard([sep, dtype_backend]) Read text from clipboard and pass to read_csv().
DataFrame.to_clipboard(*[, excel, sep]) Copy object to the system clipboard.

Excel

Function Description
read_excel(io[, sheet_name, header, names, ...]) Read an Excel file into a DataFrame.
DataFrame.to_excel(excel_writer, *[, ...]) Write object to an Excel sheet.
ExcelFile(path_or_buffer[, engine, ...]) Class for parsing tabular Excel sheets into DataFrame objects.
ExcelFile.book Gets the Excel workbook.
ExcelFile.sheet_names Names of the sheets in the document.
ExcelFile.parse([sheet_name, header, names, ...]) Parse specified sheet(s) into a DataFrame.
Function Description
Styler.to_excel(excel_writer[, sheet_name, ...]) Write Styler to an Excel sheet.
Function Description
ExcelWriter(path[, engine, date_format, ...]) Class for writing DataFrame objects into excel sheets.

JSON

Function Description
read_json(path_or_buf, *[, orient, typ, ...]) Convert a JSON string to pandas object.
json_normalize(data[, record_path, meta, ...]) Normalize semi-structured JSON data into a flat table.
DataFrame.to_json([path_or_buf, orient, ...]) Convert the object to a JSON string.
Function Description
build_table_schema(data[, index, ...]) Create a Table schema from data.

HTML

Function Description
read_html(io, *[, match, flavor, header, ...]) Read HTML tables into a list of DataFrame objects.
DataFrame.to_html([buf, columns, col_space, ...]) Render a DataFrame as an HTML table.
Function Description
Styler.to_html([buf, table_uuid, ...]) Write Styler to a file, buffer or string in HTML-CSS format.

XML

Function Description
read_xml(path_or_buffer, *[, xpath, ...]) Read XML document into a DataFrame object.
DataFrame.to_xml([path_or_buffer, index, ...]) Render a DataFrame to an XML document.

Latex

Function Description
DataFrame.to_latex([buf, columns, header, ...]) Render object to a LaTeX tabular, longtable, or nested table.
Function Description
Styler.to_latex([buf, column_format, ...]) Write Styler to a file, buffer or string in LaTeX format.

HDFStore: PyTables (HDF5)

Function Description
read_hdf(path_or_buf[, key, mode, errors, ...]) Read from the store, close it if we opened it.
HDFStore.put(key, value[, format, index, ...]) Store object in HDFStore.
HDFStore.append(key, value[, format, axes, ...]) Append to Table in file.
HDFStore.get(key) Retrieve pandas object stored in file.
HDFStore.select(key[, where, start, stop, ...]) Retrieve pandas object stored in file, optionally based on where criteria.
HDFStore.info() Print detailed information on the store.
HDFStore.keys([include]) Return a list of keys corresponding to objects stored in HDFStore.
HDFStore.groups() Return a list of all the top-level nodes.
HDFStore.walk([where]) Walk the pytables group hierarchy for pandas objects.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

Feather

Function Description
read_feather(path[, columns, use_threads, ...]) Load a feather-format object from the file path.
DataFrame.to_feather(path, **kwargs) Write a DataFrame to the binary Feather format.

Parquet

Function Description
read_parquet(path[, engine, columns, ...]) Load a parquet object from the file path, returning a DataFrame.
DataFrame.to_parquet([path, engine, ...]) Write a DataFrame to the binary parquet format.

Iceberg

Function Description
read_iceberg(table_identifier[, ...]) Read an Apache Iceberg table into a pandas DataFrame.
DataFrame.to_iceberg(table_identifier[, ...]) Write a DataFrame to an Apache Iceberg table.

Warning

read_iceberg is experimental and may change without warning.

ORC

Function Description
read_orc(path[, columns, dtype_backend, ...]) Load an ORC object from the file path, returning a DataFrame.
DataFrame.to_orc([path, engine, index, ...]) Write a DataFrame to the Optimized Row Columnar (ORC) format.

SAS

Function Description
read_sas(filepath_or_buffer, *[, format, ...]) Read SAS files stored as either XPORT or SAS7BDAT format files.

SPSS

Function Description
read_spss(path[, usecols, ...]) Load an SPSS file from the file path, returning a DataFrame.

SQL

Function Description
read_sql_table(table_name, con[, schema, ...]) Read SQL database table into a DataFrame.
read_sql_query(sql, con[, index_col, ...]) Read SQL query into a DataFrame.
read_sql(sql, con[, index_col, ...]) Read SQL query or database table into a DataFrame.
DataFrame.to_sql(name, con, *[, schema, ...]) Write records stored in a DataFrame to a SQL database.

STATA

Function Description
read_stata(filepath_or_buffer, *[, ...]) Read Stata file into DataFrame.
DataFrame.to_stata(path, *[, convert_dates, ...]) Export DataFrame object to Stata dta format.
Function Description
StataReader.data_label Return data label of Stata file.
StataReader.value_labels() Return a nested dict associating each variable name to its value and label.
StataReader.variable_labels() Return a dict associating each variable name with corresponding label.
StataWriter.write_file() Export DataFrame object to Stata dta format.

Examples for pandas.read_pickle

>>> original_df = pd.DataFrame(
...     {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> pd.to_pickle(original_df, "./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

Examples for pandas.DataFrame.to_pickle

>>> original_df = pd.DataFrame(
...     {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

Examples for pandas.read_table

>>> pd.read_table("data.csv")
   Name  Value
0   foo      1
1   bar      2
2  #baz      3

Index and header can be specified via the index_col and header arguments.

>>> pd.read_table("data.csv", header=None)
      0      1
0  Name  Value
1   foo      1
2   bar      2
3  #baz      3
>>> pd.read_table("data.csv", index_col="Value")
       Name
Value
1       foo
2       bar
3      #baz

Column types are inferred but can be explicitly specified using the dtype argument.

>>> pd.read_table("data.csv", dtype={{"Value": float}})
   Name  Value
0   foo    1.0
1   bar    2.0
2  #baz    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_table("data.csv", na_values=["foo", "bar"])
   Name  Value
0   NaN      1
1   NaN      2
2  #baz      3

Comment lines in the input file can be skipped using the comment argument.

>>> pd.read_table("data.csv", comment="#")
  Name  Value
0  foo      1
1  bar      2

By default, columns with dates will be read as object rather than datetime.

>>> df = pd.read_table("tmp.csv")
>>> df
   col 1       col 2            col 3
0     10  10/04/2018  Sun 15 Jan 2023
1     20  15/04/2018  Fri 12 May 2023
>>> df.dtypes
col 1     int64
col 2    object
col 3    object
dtype: object

Specific columns can be parsed as dates by using the parse_dates and date_format arguments.

>>> df = pd.read_table(
...     "tmp.csv",
...     parse_dates=[1, 2],
...     date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )
>>> df.dtypes
col 1             int64
col 2    datetime64[ns]
col 3    datetime64[ns]
dtype: object

Examples for pandas.read_csv

>>> pd.read_csv("data.csv")
   Name  Value
0   foo      1
1   bar      2
2  #baz      3

Index and header can be specified via the index_col and header arguments.

>>> pd.read_csv("data.csv", header=None)
      0      1
0  Name  Value
1   foo      1
2   bar      2
3  #baz      3
>>> pd.read_csv("data.csv", index_col="Value")
       Name
Value
1       foo
2       bar
3      #baz

Column types are inferred but can be explicitly specified using the dtype argument.

>>> pd.read_csv("data.csv", dtype={{"Value": float}})
   Name  Value
0   foo    1.0
1   bar    2.0
2  #baz    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_csv("data.csv", na_values=["foo", "bar"])
   Name  Value
0   NaN      1
1   NaN      2
2  #baz      3

Comment lines in the input file can be skipped using the comment argument.

>>> pd.read_csv("data.csv", comment="#")
  Name  Value
0  foo      1
1  bar      2

By default, columns with dates will be read as object rather than datetime.

>>> df = pd.read_csv("tmp.csv")
>>> df
   col 1       col 2            col 3
0     10  10/04/2018  Sun 15 Jan 2023
1     20  15/04/2018  Fri 12 May 2023
>>> df.dtypes
col 1     int64
col 2    object
col 3    object
dtype: object

Specific columns can be parsed as dates by using the parse_dates and date_format arguments.

>>> df = pd.read_csv(
...     "tmp.csv",
...     parse_dates=[1, 2],
...     date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )
>>> df.dtypes
col 1             int64
col 2    datetime64[ns]
col 3    datetime64[ns]
dtype: object

Examples for pandas.DataFrame.to_csv

Create ‘out.csv’ containing ‘df’ without indices

>>> df = pd.DataFrame(
...     [["Raphael", "red", "sai"], ["Donatello", "purple", "bo staff"]],
...     columns=["name", "mask", "weapon"],
... )
>>> df.to_csv("out.csv", index=False)

Create ‘out.zip’ containing ‘out.csv’

>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
>>> compression_opts = dict(
...     method="zip", archive_name="out.csv"
... )
>>> df.to_csv(
...     "out.zip", index=False, compression=compression_opts
... )

To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:

>>> from pathlib import Path
>>> filepath = Path("folder/subfolder/out.csv")
>>> filepath.parent.mkdir(parents=True, exist_ok=True)
>>> df.to_csv(filepath)
>>> import os
>>> os.makedirs("folder/subfolder", exist_ok=True)
>>> df.to_csv("folder/subfolder/out.csv")

Format floats to two decimal places:

>>> df.to_csv("out1.csv", float_format="%.2f")

Format floats using scientific notation:

>>> df.to_csv("out2.csv", float_format="{{:.2e}}".format)

Examples for pandas.read_fwf

>>> pd.read_fwf("data.csv")

Examples for pandas.read_clipboard

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_clipboard()
>>> pd.read_clipboard()
     A  B  C
0    1  2  3
1    4  5  6

Examples for pandas.DataFrame.to_clipboard

Copy the contents of a DataFrame to the clipboard.

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_clipboard(sep=",")
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

>>> df.to_clipboard(sep=",", index=False)
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6

Using the original pyperclip package for any string output format.

import pyperclip

html = df.style.to_html()
pyperclip.copy(html)

Examples for pandas.read_excel

The file can be read using the file name as string or an open file object:

>>> pd.read_excel("tmp.xlsx", index_col=0)
       Name  Value
0   string1      1
1   string2      2
2  #Comment      3
>>> pd.read_excel(open("tmp.xlsx", "rb"), sheet_name="Sheet3")
   Unnamed: 0      Name  Value
0           0   string1      1
1           1   string2      2
2           2  #Comment      3

Index and header can be specified via the index_col and header arguments

>>> pd.read_excel("tmp.xlsx", index_col=None, header=None)
     0         1      2
0  NaN      Name  Value
1  0.0   string1      1
2  1.0   string2      2
3  2.0  #Comment      3

Column types are inferred but can be explicitly specified

>>> pd.read_excel(
...     "tmp.xlsx", index_col=0, dtype={"Name": str, "Value": float}
... )
       Name  Value
0   string1    1.0
1   string2    2.0
2  #Comment    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_excel(
...     "tmp.xlsx", index_col=0, na_values=["string1", "string2"]
... )
       Name  Value
0       NaN      1
1       NaN      2
2  #Comment      3

Comment lines in the excel input file can be skipped using the comment kwarg.

>>> pd.read_excel("tmp.xlsx", index_col=0, comment="#")
      Name  Value
0  string1    1.0
1  string2    2.0
2     None    NaN

Examples for pandas.DataFrame.to_excel

Create, write to and save a workbook:

>>> df1 = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_1")
...     df2.to_excel(writer, sheet_name="Sheet_name_2")

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_3")

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")

Examples for pandas.ExcelFile

>>> file = pd.ExcelFile("myfile.xlsx")
>>> with pd.ExcelFile("myfile.xls") as xls:
...     df1 = pd.read_excel(xls, "Sheet1")

Examples for pandas.ExcelFile.book

>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.book
<openpyxl.workbook.workbook.Workbook object at 0x11eb5ad70>
>>> file.book.path
'/xl/workbook.xml'
>>> file.book.active
<openpyxl.worksheet._read_only.ReadOnlyWorksheet object at 0x11eb5b370>
>>> file.book.sheetnames
['Sheet1', 'Sheet2']

Examples for pandas.ExcelFile.sheet_names

>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.sheet_names
["Sheet1", "Sheet2"]

Examples for pandas.ExcelFile.parse

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_excel("myfile.xlsx")
>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.parse()

Examples for pandas.io.formats.style.Styler.to_excel

Create, write to and save a workbook:

>>> df1 = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_1")
...     df2.to_excel(writer, sheet_name="Sheet_name_2")

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_3")

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")

Examples for pandas.ExcelWriter

Default usage:

>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
...     df.to_excel(writer)

To write to separate sheets in a single file:

>>> df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"])
>>> df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet1")
...     df2.to_excel(writer, sheet_name="Sheet2")

You can set the date format or datetime format:

>>> from datetime import date, datetime
>>> df = pd.DataFrame(
...     [
...         [date(2014, 1, 31), date(1999, 9, 24)],
...         [datetime(1998, 5, 26, 23, 33, 4), datetime(2014, 2, 28, 13, 5, 13)],
...     ],
...     index=["Date", "Datetime"],
...     columns=["X", "Y"],
... )
>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     date_format="YYYY-MM-DD",
...     datetime_format="YYYY-MM-DD HH:MM:SS",
... ) as writer:
...     df.to_excel(writer)

You can also append to an existing Excel file:

>>> with pd.ExcelWriter("path_to_file.xlsx", mode="a", engine="openpyxl") as writer:
...     df.to_excel(writer, sheet_name="Sheet3")

Here, the if_sheet_exists parameter can be set to replace a sheet if it already exists:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     mode="a",
...     engine="openpyxl",
...     if_sheet_exists="replace",
... ) as writer:
...     df.to_excel(writer, sheet_name="Sheet1")

You can also write multiple DataFrames to a single sheet. Note that the if_sheet_exists parameter needs to be set to overlay:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     mode="a",
...     engine="openpyxl",
...     if_sheet_exists="overlay",
... ) as writer:
...     df1.to_excel(writer, sheet_name="Sheet1")
...     df2.to_excel(writer, sheet_name="Sheet1", startcol=3)

You can store Excel file in RAM:

>>> import io
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> buffer = io.BytesIO()
>>> with pd.ExcelWriter(buffer) as writer:
...     df.to_excel(writer)

You can pack Excel file into zip archive:

>>> import zipfile
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with zipfile.ZipFile("path_to_file.zip", "w") as zf:
...     with zf.open("filename.xlsx", "w") as buffer:
...         with pd.ExcelWriter(buffer) as writer:
...             df.to_excel(writer)

You can specify additional arguments to the underlying engine:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     engine="xlsxwriter",
...     engine_kwargs={{"options": {{"nan_inf_to_errors": True}}}},
... ) as writer:
...     df.to_excel(writer)

In append mode, engine_kwargs are passed through to openpyxl’s load_workbook:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     engine="openpyxl",
...     mode="a",
...     engine_kwargs={{"keep_vba": True}},
... ) as writer:
...     df.to_excel(writer, sheet_name="Sheet2")

Examples for pandas.read_json

>>> from io import StringIO
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )

Encoding/decoding a Dataframe using 'split' formatted JSON:

>>> df.to_json(orient="split")
'{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}'
>>> pd.read_json(StringIO(_), orient="split")  # noqa: F821
      col 1 col 2
row 1     a     b
row 2     c     d

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> df.to_json(orient="index")
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
>>> pd.read_json(StringIO(_), orient="index")  # noqa: F821
      col 1 col 2
row 1     a     b
row 2     c     d

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> df.to_json(orient="records")
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
>>> pd.read_json(StringIO(_), orient="records")  # noqa: F821
  col 1 col 2
0     a     b
1     c     d

Encoding with Table Schema

>>> df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string","extDtype":"str"},{"name":"col 1","type":"string","extDtype":"str"},{"name":"col 2","type":"string","extDtype":"str"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":"row 1","col 1":"a","col 2":"b"},{"index":"row 2","col 1":"c","col 2":"d"}]}'

The following example uses dtype_backend="numpy_nullable"

>>> data = '''{"index": {"0": 0, "1": 1},
...        "a": {"0": 1, "1": null},
...        "b": {"0": 2.5, "1": 4.5},
...        "c": {"0": true, "1": false},
...        "d": {"0": "a", "1": "b"},
...        "e": {"0": 1577.2, "1": 1577.1}}'''
>>> pd.read_json(StringIO(data), dtype_backend="numpy_nullable")
   index     a    b      c  d       e
0      0     1  2.5   True  a  1577.2
1      1  <NA>  4.5  False  b  1577.1

Examples for pandas.json_normalize

>>> data = [
...     {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
...     {"name": {"given": "Mark", "family": "Regner"}},
...     {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
    id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker
>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=0)
    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

Normalizes nested data up to level 1.

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60
>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height': 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> series = pd.Series(data, index=pd.Index(["a", "b", "c"]))
>>> pd.json_normalize(series)
    id        name  fitness.height  fitness.weight
a  1.0   Cole Volk             130              60
b  NaN    Mark Reg             130              60
c  2.0  Faye Raker             130              60
>>> data = [
...     {
...         "state": "Florida",
...         "shortname": "FL",
...         "info": {"governor": "Rick Scott"},
...         "counties": [
...             {"name": "Dade", "population": 12345},
...             {"name": "Broward", "population": 40000},
...             {"name": "Palm Beach", "population": 60000},
...         ],
...     },
...     {
...         "state": "Ohio",
...         "shortname": "OH",
...         "info": {"governor": "John Kasich"},
...         "counties": [
...             {"name": "Summit", "population": 1234},
...             {"name": "Cuyahoga", "population": 1337},
...         ],
...     },
... ]
>>> result = pd.json_normalize(
...     data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich
>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
    Prefix.0
0          1
1          2

Returns normalized data with columns prefixed with the given string.

Examples for pandas.DataFrame.to_json

>>> from json import loads, dumps
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> result = df.to_json(orient="records")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
    {{
        "col 1": "a",
        "col 2": "b"
    }},
    {{
        "col 1": "c",
        "col 2": "d"
    }}
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> result = df.to_json(orient="index")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "row 1": {{
        "col 1": "a",
        "col 2": "b"
    }},
    "row 2": {{
        "col 1": "c",
        "col 2": "d"
    }}
}}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

>>> result = df.to_json(orient="columns")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "col 1": {{
        "row 1": "a",
        "row 2": "c"
    }},
    "col 2": {{
        "row 1": "b",
        "row 2": "d"
    }}
}}

Encoding/decoding a Dataframe using 'values' formatted JSON:

>>> result = df.to_json(orient="values")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

Encoding with Table Schema:

>>> result = df.to_json(orient="table")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "schema": {{
        "fields": [
            {{
                "name": "index",
                "type": "string"
            }},
            {{
                "name": "col 1",
                "type": "string"
            }},
            {{
                "name": "col 2",
                "type": "string"
            }}
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "1.4.0"
    }},
    "data": [
        {{
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        }},
        {{
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }}
    ]
}}

Examples for pandas.io.json.build_table_schema

>>> from pandas.io.json._table_schema import build_table_schema
>>> df = pd.DataFrame(
...     {'A': [1, 2, 3],
...      'B': ['a', 'b', 'c'],
...      'C': pd.date_range('2016-01-01', freq='D', periods=3),
...      }, index=pd.Index(range(3), name='idx'))
>>> build_table_schema(df)
{'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'A', 'type': 'integer'}, {'name': 'B', 'type': 'string', 'extDtype': 'str'}, {'name': 'C', 'type': 'datetime'}], 'primaryKey': ['idx'], 'pandas_version': '1.4.0'}

General functions

Data manipulations

Function Description
melt(frame[, id_vars, value_vars, var_name, ...]) Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
pivot(data, *, columns[, index, values]) Return reshaped DataFrame organized by given index / column values.
pivot_table(data[, values, index, columns, ...]) Create a spreadsheet-style pivot table as a DataFrame.
crosstab(index, columns[, values, rownames, ...]) Compute a simple cross tabulation of two (or more) factors.
cut(x, bins[, right, labels, retbins, ...]) Bin values into discrete intervals.
qcut(x, q[, labels, retbins, precision, ...]) Quantile-based discretization function.
merge(left, right[, how, on, left_on, ...]) Merge DataFrame or named Series objects with a database-style join.
merge_ordered(left, right[, on, left_on, ...]) Perform a merge for ordered data with optional filling/interpolation.
merge_asof(left, right[, on, left_on, ...]) Perform a merge by key distance.
concat(objs, *[, axis, join, ignore_index, ...]) Concatenate pandas objects along a particular axis.
get_dummies(data[, prefix, prefix_sep, ...]) Convert categorical variable into dummy/indicator variables.
from_dummies(data[, sep, default_category]) Create a categorical DataFrame from a DataFrame of dummy variables.
factorize(values[, sort, use_na_sentinel, ...]) Encode the object as an enumerated type or categorical variable.
unique(values) Return unique values based on a hash table.
lreshape(data, groups[, dropna]) Reshape wide-format data to long.
wide_to_long(df, stubnames, i, j[, sep, suffix]) Unpivot a DataFrame from wide to long format.

Top-level missing data

Function Description
isna(obj) Detect missing values for an array-like object.
isnull(obj) Detect missing values for an array-like object.
notna(obj) Detect non-missing values for an array-like object.
notnull(obj) Detect non-missing values for an array-like object.

Top-level dealing with numeric data

Function Description
to_numeric(arg[, errors, downcast, ...]) Convert argument to a numeric type.

Top-level dealing with datetimelike data

Function Description
to_datetime(arg[, errors, dayfirst, ...]) Convert argument to datetime.
to_timedelta(arg[, unit, errors]) Convert argument to timedelta.
date_range([start, end, periods, freq, tz, ...]) Return a fixed frequency DatetimeIndex.
bdate_range([start, end, periods, freq, tz, ...]) Return a fixed frequency DatetimeIndex with business day as the default.
period_range([start, end, periods, freq, name]) Return a fixed frequency PeriodIndex.
timedelta_range([start, end, periods, freq, ...]) Return a fixed frequency TimedeltaIndex with day as the default.
infer_freq(index) Infer the most likely frequency given the input index.

Top-level dealing with Interval data

Function Description
interval_range([start, end, periods, freq, ...]) Return a fixed frequency IntervalIndex.

Top-level evaluation

Function Description
col(col_name) Generate deferred object representing a column of a DataFrame.
eval(expr[, parser, engine, local_dict, ...]) Evaluate a Python expression as a string using various backends.

Datetime formats

Function Description
tseries.api.guess_datetime_format(dt_str[, ...]) Guess the datetime format of a given datetime string.

Hashing

Function Description
util.hash_array(vals[, encoding, hash_key, ...]) Given a 1d array, return an array of deterministic integers.
util.hash_pandas_object(obj[, index, ...]) Return a data hash of the Index/Series/DataFrame.

Importing from other DataFrame libraries

Function Description
api.interchange.from_dataframe(df[, allow_copy]) Build a pd.DataFrame from any DataFrame supporting the interchange protocol.

Examples for pandas.melt

>>> df = pd.DataFrame(
...     {
...         "A": {0: "a", 1: "b", 2: "c"},
...         "B": {0: 1, 1: 3, 2: 5},
...         "C": {0: 2, 1: 4, 2: 6},
...     }
... )
>>> df
A  B  C
0  a  1  2
1  b  3  4
2  c  5  6
>>> pd.melt(df, id_vars=["A"], value_vars=["B"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> pd.melt(
...     df,
...     id_vars=["A"],
...     value_vars=["B"],
...     var_name="myVarname",
...     value_name="myValname",
... )
A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"], ignore_index=False)
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
0  a        C      2
1  b        C      4
2  c        C      6

If you have multi-index columns:

>>> df.columns = [list("ABC"), list("DEF")]
>>> df
A  B  C
D  E  F
0  a  1  2
1  b  3  4
2  c  5  6
>>> pd.melt(df, col_level=0, id_vars=["A"], value_vars=["B"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
>>> pd.melt(df, id_vars=[("A", "D")], value_vars=[("B", "E")])
(A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

Examples for pandas.pivot

>>> df = pd.DataFrame(
...     {
...         "foo": ["one", "one", "one", "two", "two", "two"],
...         "bar": ["A", "B", "C", "A", "B", "C"],
...         "baz": [1, 2, 3, 4, 5, 6],
...         "zoo": ["x", "y", "z", "q", "w", "t"],
...     }
... )
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t
>>> df.pivot(index="foo", columns="bar", values="baz")
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index="foo", columns="bar")["baz"]
bar  A   B   C
foo
one  1   2   3
two  4   5   6
>>> df.pivot(index="foo", columns="bar", values=["baz", "zoo"])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame(
...     {
...         "lev1": [1, 1, 1, 2, 2, 2],
...         "lev2": [1, 1, 2, 1, 1, 2],
...         "lev3": [1, 2, 1, 2, 1, 2],
...         "lev4": [1, 2, 3, 4, 5, 6],
...         "values": [0, 1, 2, 3, 4, 5],
...     }
... )
>>> df
    lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values")
      lev3    1    2
lev1  lev2
   1     1  0.0  1.0
         2  2.0  NaN
   2     1  4.0  3.0
         2  NaN  5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame(
...     {
...         "foo": ["one", "one", "two", "two"],
...         "bar": ["A", "A", "B", "C"],
...         "baz": [1, 2, 3, 4],
...     }
... )
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index="foo", columns="bar", values="baz")
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape

Examples for pandas.pivot_table

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
...         "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
...         "C": [
...             "small",
...             "large",
...             "large",
...             "small",
...             "small",
...             "large",
...             "small",
...             "small",
...             "large",
...         ],
...         "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...         "E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
...     }
... )
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(
...     df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum"
... )
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(
...     df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum", fill_value=0
... )
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(
...     df, values=["D", "E"], index=["A", "C"], aggfunc={"D": "mean", "E": "mean"}
... )
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(
...     df,
...     values=["D", "E"],
...     index=["A", "C"],
...     aggfunc={"D": "mean", "E": ["min", "max", "mean"]},
... )
>>> table
                  D   E
               mean max      mean  min
A   C
bar large  5.500000   9  7.500000    6
    small  5.500000   9  8.500000    8
foo large  2.000000   5  4.500000    4
    small  2.333333   6  4.333333    2

Examples for pandas.crosstab

>>> a = np.array(
...     [
...         "foo",
...         "foo",
...         "foo",
...         "foo",
...         "bar",
...         "bar",
...         "bar",
...         "bar",
...         "foo",
...         "foo",
...         "foo",
...     ],
...     dtype=object,
... )
>>> b = np.array(
...     [
...         "one",
...         "one",
...         "one",
...         "two",
...         "one",
...         "one",
...         "one",
...         "two",
...         "two",
...         "two",
...         "one",
...     ],
...     dtype=object,
... )
>>> c = np.array(
...     [
...         "dull",
...         "dull",
...         "shiny",
...         "dull",
...         "dull",
...         "shiny",
...         "shiny",
...         "dull",
...         "shiny",
...         "shiny",
...         "shiny",
...     ],
...     dtype=object,
... )
>>> pd.crosstab(a, [b, c], rownames=["a"], colnames=["b", "c"])
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2

Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.

>>> foo = pd.Categorical(["a", "b"], categories=["a", "b", "c"])
>>> bar = pd.Categorical(["d", "e"], categories=["d", "e", "f"])
>>> pd.crosstab(foo, bar)
col_0  d  e
row_0
a      1  0
b      0  1
>>> pd.crosstab(foo, bar, dropna=False)
col_0  d  e  f
row_0
a      1  0  0
b      0  1  0
c      0  0  0

Examples for pandas.cut

Discretize into three equal-sized bins.

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
...
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)
...
([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
array([0.994, 3.   , 5.   , 7.   ]))

Discovers the same bins, but assign them specific labels. Notice that the returned Categorical’s categories are labels and is ordered.

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["bad", "medium", "good"])
['bad', 'good', 'medium', 'medium', 'good', 'bad']
Categories (3, str): ['bad' < 'medium' < 'good']

ordered=False will result in unordered categories when labels are passed. This parameter can be used to allow non-unique labels:

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["B", "A", "B"], ordered=False)
['B', 'B', 'A', 'A', 'B', 'B']
Categories (2, str): ['A', 'B']

labels=False implies you just want the bins back.

>>> pd.cut([0, 1, 1, 2], bins=4, labels=False)
array([0, 1, 1, 3])

Passing a Series as an input returns a Series with categorical dtype:

>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, 3)
...
a    (1.992, 4.667]
b    (1.992, 4.667]
c    (4.667, 7.333]
d     (7.333, 10.0]
e     (7.333, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(1.992, 4.667] < (4.667, ...

Passing a Series as an input returns a Series with mapping value. It is used to map numerically to intervals based on bins.

>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, [0, 2, 4, 6, 8, 10], labels=False, retbins=True, right=False)
...
(a    1.0
 b    2.0
 c    3.0
 d    4.0
 e    NaN
 dtype: float64,
 array([ 0,  2,  4,  6,  8, 10]))

Use drop optional when bins is not unique

>>> pd.cut(
...     s,
...     [0, 2, 4, 6, 10, 10],
...     labels=False,
...     retbins=True,
...     right=False,
...     duplicates="drop",
... )
...
(a    1.0
 b    2.0
 c    3.0
 d    3.0
 e    NaN
 dtype: float64,
 array([ 0,  2,  4,  6, 10]))

Passing an IntervalIndex for bins results in those categories exactly. Notice that values not covered by the IntervalIndex are set to NaN. 0 is to the left of the first bin (which is closed on the right), and 1.5 falls between two bins.

>>> bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
>>> pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
[NaN, (0.0, 1.0], NaN, (2.0, 3.0], (4.0, 5.0]]
Categories (3, interval[int64, right]): [(0, 1] < (2, 3] < (4, 5]]

Using np.histogram_bin_edges with cut

>>> pd.cut(
...     np.array([1, 7, 5, 4]),
...     bins=np.histogram_bin_edges(np.array([1, 7, 5, 4]), bins="auto"),
... )
...
[NaN, (5.0, 7.0], (3.0, 5.0], (3.0, 5.0]]
Categories (3, interval[float64, right]): [(1.0, 3.0] < (3.0, 5.0] < (5.0, 7.0]]

Examples for pandas.qcut

>>> pd.qcut(range(5), 4)
...
[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
Categories (4, interval[float64, right]): [(-0.001, 1.0] < (1.0, 2.0] ...
>>> pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
...
[good, good, medium, bad, bad]
Categories (3, str): [good < medium < bad]
>>> pd.qcut(range(5), 4, labels=False)
array([0, 0, 1, 2, 3])

Examples for pandas.merge

>>> df1 = pd.DataFrame(
...     {"lkey": ["foo", "bar", "baz", "foo"], "value": [1, 2, 3, 5]}
... )
>>> df2 = pd.DataFrame(
...     {"rkey": ["foo", "bar", "baz", "foo"], "value": [5, 6, 7, 8]}
... )
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on="lkey", right_on="rkey")
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=("_left", "_right"))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  bar           2  bar            6
3  baz           3  baz            7
4  foo           5  foo            5
5  foo           5  foo            8

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='str')
>>> df1 = pd.DataFrame({"a": ["foo", "bar"], "b": [1, 2]})
>>> df2 = pd.DataFrame({"a": ["foo", "baz"], "c": [3, 4]})
>>> df1
      a  b
0   foo  1
1   bar  2
>>> df2
      a  c
0   foo  3
1   baz  4
>>> df1.merge(df2, how="inner", on="a")
      a  b  c
0   foo  1  3
>>> df1.merge(df2, how="left", on="a")
      a  b  c
0   foo  1  3.0
1   bar  2  NaN
>>> df1 = pd.DataFrame({"left": ["foo", "bar"]})
>>> df2 = pd.DataFrame({"right": [7, 8]})
>>> df1
    left
0   foo
1   bar
>>> df2
    right
0   7
1   8
>>> df1.merge(df2, how="cross")
   left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8

Examples for pandas.merge_ordered

>>> from pandas import merge_ordered
>>> df1 = pd.DataFrame(
...     {
...         "key": ["a", "c", "e", "a", "c", "e"],
...         "lvalue": [1, 2, 3, 1, 2, 3],
...         "group": ["a", "a", "a", "b", "b", "b"],
...     }
... )
>>> df1
  key  lvalue group
0   a       1     a
1   c       2     a
2   e       3     a
3   a       1     b
4   c       2     b
5   e       3     b
>>> df2 = pd.DataFrame({"key": ["b", "c", "d"], "rvalue": [1, 2, 3]})
>>> df2
  key  rvalue
0   b       1
1   c       2
2   d       3
>>> merge_ordered(df1, df2, fill_method="ffill", left_by="group")
  key  lvalue group  rvalue
0   a       1     a     NaN
1   b       1     a     1.0
2   c       2     a     2.0
3   d       2     a     3.0
4   e       3     a     3.0
5   a       1     b     NaN
6   b       1     b     1.0
7   c       2     b     2.0
8   d       2     b     3.0
9   e       3     b     3.0

Examples for pandas.merge_asof

>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
    a left_val
0   1        a
1   5        b
2  10        c
>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7
>>> pd.merge_asof(left, right, on="a")
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7
>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
    a left_val  right_val
0   1        a        NaN
1   5        b        3.0
2  10        c        7.0
>>> pd.merge_asof(left, right, on="a", direction="forward")
    a left_val  right_val
0   1        a        1.0
1   5        b        6.0
2  10        c        NaN
>>> pd.merge_asof(left, right, on="a", direction="nearest")
    a left_val  right_val
0   1        a          1
1   5        b          6
2  10        c          7

We can use indexed DataFrames as well.

>>> left = pd.DataFrame({"left_val": ["a", "b", "c"]}, index=[1, 5, 10])
>>> left
   left_val
1         a
5         b
10        c
>>> right = pd.DataFrame({"right_val": [1, 2, 3, 6, 7]}, index=[1, 2, 3, 6, 7])
>>> right
   right_val
1          1
2          2
3          3
6          6
7          7
>>> pd.merge_asof(left, right, left_index=True, right_index=True)
   left_val  right_val
1         a          1
5         b          3
10        c          7

Here is a real-world times-series example

>>> quotes = pd.DataFrame(
...     {
...         "time": [
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.030"),
...             pd.Timestamp("2016-05-25 13:30:00.041"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.049"),
...             pd.Timestamp("2016-05-25 13:30:00.072"),
...             pd.Timestamp("2016-05-25 13:30:00.075"),
...         ],
...         "ticker": [
...             "GOOG",
...             "MSFT",
...             "MSFT",
...             "MSFT",
...             "GOOG",
...             "AAPL",
...             "GOOG",
...             "MSFT",
...         ],
...         "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...         "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
...     }
... )
>>> quotes
                     time ticker     bid     ask
0 2016-05-25 13:30:00.023   GOOG  720.50  720.93
1 2016-05-25 13:30:00.023   MSFT   51.95   51.96
2 2016-05-25 13:30:00.030   MSFT   51.97   51.98
3 2016-05-25 13:30:00.041   MSFT   51.99   52.00
4 2016-05-25 13:30:00.048   GOOG  720.50  720.93
5 2016-05-25 13:30:00.049   AAPL   97.99   98.01
6 2016-05-25 13:30:00.072   GOOG  720.50  720.88
7 2016-05-25 13:30:00.075   MSFT   52.01   52.03
>>> trades = pd.DataFrame(
...     {
...         "time": [
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.038"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...         ],
...         "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...         "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...         "quantity": [75, 155, 100, 100, 100],
...     }
... )
>>> trades
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   GOOG  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100

By default we are taking the asof of the quotes

>>> pd.merge_asof(trades, quotes, on="time", by="ticker")
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 2ms between the quote time and the trade time

>>> pd.merge_asof(
...     trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms")
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155     NaN     NaN
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 10ms between the quote time and the trade time and we exclude exact matches on time. However prior data will propagate forward

>>> pd.merge_asof(
...     trades,
...     quotes,
...     on="time",
...     by="ticker",
...     tolerance=pd.Timedelta("10ms"),
...     allow_exact_matches=False,
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100     NaN     NaN
3 2016-05-25 13:30:00.048   GOOG  720.92       100     NaN     NaN
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

Examples for pandas.concat

Combine two Series.

>>> s1 = pd.Series(["a", "b"])
>>> s2 = pd.Series(["c", "d"])
>>> pd.concat([s1, s2])
0    a
1    b
0    c
1    d
dtype: str

Clear the existing index and reset it in the result by setting the ignore_index option to True.

>>> pd.concat([s1, s2], ignore_index=True)
0    a
1    b
2    c
3    d
dtype: str

Add a hierarchical index at the outermost level of the data with the keys option.

>>> pd.concat([s1, s2], keys=["s1", "s2"])
s1  0    a
    1    b
s2  0    c
    1    d
dtype: str

Label the index keys you create with the names option.

>>> pd.concat([s1, s2], keys=["s1", "s2"], names=["Series name", "Row ID"])
Series name  Row ID
s1           0         a
             1         b
s2           0         c
             1         d
dtype: str

Combine two DataFrame objects with identical columns.

>>> df1 = pd.DataFrame([["a", 1], ["b", 2]], columns=["letter", "number"])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df2 = pd.DataFrame([["c", 3], ["d", 4]], columns=["letter", "number"])
>>> df2
  letter  number
0      c       3
1      d       4
>>> pd.concat([df1, df2])
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with NaN values.

>>> df3 = pd.DataFrame(
...     [["c", 3, "cat"], ["d", 4, "dog"]], columns=["letter", "number", "animal"]
... )
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> pd.concat([df1, df3], sort=False)
  letter  number animal
0      a       1    NaN
1      b       2    NaN
0      c       3    cat
1      d       4    dog

Combine DataFrame objects with overlapping columns and return only those that are shared by passing inner to the join keyword argument.

>>> pd.concat([df1, df3], join="inner")
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects horizontally along the x axis by passing in axis=1.

>>> df4 = pd.DataFrame(
...     [["bird", "polly"], ["monkey", "george"]], columns=["animal", "name"]
... )
>>> pd.concat([df1, df4], axis=1)
  letter  number  animal    name
0      a       1    bird   polly
1      b       2  monkey  george

Prevent the result from including duplicate index values with the verify_integrity option.

>>> df5 = pd.DataFrame([1], index=["a"])
>>> df5
   0
a  1
>>> df6 = pd.DataFrame([2], index=["a"])
>>> df6
   0
a  2
>>> pd.concat([df5, df6], verify_integrity=True)
Traceback (most recent call last):
    ...
ValueError: Indexes have overlapping values: ['a']

Append a single row to the end of a DataFrame object.

>>> df7 = pd.DataFrame({"a": 1, "b": 2}, index=[0])
>>> df7
    a   b
0   1   2
>>> new_row = pd.Series({"a": 3, "b": 4})
>>> new_row
a    3
b    4
dtype: int64
>>> pd.concat([df7, new_row.to_frame().T], ignore_index=True)
    a   b
0   1   2
1   3   4

Examples for pandas.get_dummies

>>> s = pd.Series(list("abca"))
>>> pd.get_dummies(s)
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False
>>> s1 = ["a", "b", np.nan]
>>> pd.get_dummies(s1)
       a      b
0   True  False
1  False   True
2  False  False
>>> pd.get_dummies(s1, dummy_na=True)
       a      b    NaN
0   True  False  False
1  False   True  False
2  False  False   True
>>> df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["b", "a", "c"], "C": [1, 2, 3]})
>>> pd.get_dummies(df, prefix=["col1", "col2"])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1    True   False   False    True   False
1  2   False    True    True   False   False
2  3    True   False   False   False    True
>>> pd.get_dummies(pd.Series(list("abcaa")))
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False
4   True  False  False
>>> pd.get_dummies(pd.Series(list("abcaa")), drop_first=True)
       b      c
0  False  False
1   True  False
2  False   True
3  False  False
4  False  False
>>> pd.get_dummies(pd.Series(list("abc")), dtype=float)
     a    b    c
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0

Examples for pandas.from_dummies

>>> df = pd.DataFrame({"a": [1, 0, 0, 1], "b": [0, 1, 0, 0], "c": [0, 0, 1, 0]})
>>> df
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0
>>> pd.from_dummies(df)
0     a
1     b
2     c
3     a
>>> df = pd.DataFrame(
...     {
...         "col1_a": [1, 0, 1],
...         "col1_b": [0, 1, 0],
...         "col2_a": [0, 1, 0],
...         "col2_b": [1, 0, 0],
...         "col2_c": [0, 0, 1],
...     }
... )
>>> df
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       1       0       0       0       1
>>> pd.from_dummies(df, sep="_")
    col1    col2
0    a       b
1    b       a
2    a       c
>>> df = pd.DataFrame(
...     {
...         "col1_a": [1, 0, 0],
...         "col1_b": [0, 1, 0],
...         "col2_a": [0, 1, 0],
...         "col2_b": [1, 0, 0],
...         "col2_c": [0, 0, 0],
...     }
... )
>>> df
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       0       0       0       0       0
>>> pd.from_dummies(df, sep="_", default_category={"col1": "d", "col2": "e"})
    col1    col2
0    a       b
1    b       a
2    d       e

Examples for pandas.factorize

These examples all show factorize as a top-level method like pd.factorize(values). The results are identical for methods like Series.factorize().

>>> codes, uniques = pd.factorize(np.array(["b", "b", "a", "c", "b"], dtype="O"))
>>> codes
array([0, 0, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

With sort=True, the uniques will be sorted, and codes will be shuffled so that the relationship is the maintained.

>>> codes, uniques = pd.factorize(
...     np.array(["b", "b", "a", "c", "b"], dtype="O"), sort=True
... )
>>> codes
array([1, 1, 0, 2, 1])
>>> uniques
array(['a', 'b', 'c'], dtype=object)

When use_na_sentinel=True (the default), missing values are indicated in the codes with the sentinel value -1 and missing values are not included in uniques.

>>> codes, uniques = pd.factorize(np.array(["b", None, "a", "c", "b"], dtype="O"))
>>> codes
array([ 0, -1,  1,  2,  0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.

>>> cat = pd.Categorical(["a", "a", "c"], categories=["a", "b", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
['a', 'c']
Categories (3, str): ['a', 'b', 'c']

Notice that 'b' is in uniques.categories, despite not being present in cat.values.

For all other pandas objects, an Index of the appropriate type is returned.

>>> cat = pd.Series(["a", "a", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
Index(['a', 'c'], dtype='str')

If NaN is in the values, and we want to include NaN in the uniques of the values, it can be achieved by setting use_na_sentinel=False.

>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values)  # default: use_na_sentinel=True
>>> codes
array([ 0,  1,  0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, use_na_sentinel=False)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1.,  2., nan])

Examples for pandas.unique

>>> pd.unique(pd.Series([2, 1, 3, 3]))
array([2, 1, 3])
>>> pd.unique(pd.Series([2] + [1] * 5))
array([2, 1])
>>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
array(['2016-01-01T00:00:00.000000'], dtype='datetime64[us]')
>>> pd.unique(
...     pd.Series(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ],
...         dtype="M8[ns, US/Eastern]",
...     )
... )
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]
>>> pd.unique(
...     pd.Index(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ],
...         dtype="M8[ns, US/Eastern]",
...     )
... )
DatetimeIndex(['2016-01-01 00:00:00-05:00'],
        dtype='datetime64[ns, US/Eastern]',
        freq=None)
>>> pd.unique(np.array(list("baabc"), dtype="O"))
array(['b', 'a', 'c'], dtype=object)

An unordered Categorical will return categories in the order of appearance.

>>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']
>>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']

An ordered Categorical preserves the category ordering.

>>> pd.unique(
...     pd.Series(
...         pd.Categorical(list("baabc"), categories=list("abc"), ordered=True)
...     )
... )
['b', 'a', 'c']
Categories (3, str): ['a' < 'b' < 'c']

An array of tuples

>>> pd.unique(pd.Series([("a", "b"), ("b", "a"), ("a", "c"), ("b", "a")]).values)
array([('a', 'b'), ('b', 'a'), ('a', 'c')], dtype=object)

A NumpyExtensionArray of complex

>>> pd.unique(pd.array([1 + 1j, 2, 3]))
<NumpyExtensionArray>
[(1+1j), (2+0j), (3+0j)]
Length: 3, dtype: complex128

Examples for pandas.lreshape

>>> data = pd.DataFrame(
...     {
...         "hr1": [514, 573],
...         "hr2": [545, 526],
...         "team": ["Red Sox", "Yankees"],
...         "year1": [2007, 2007],
...         "year2": [2008, 2008],
...     }
... )
>>> data
   hr1  hr2     team  year1  year2
0  514  545  Red Sox   2007   2008
1  573  526  Yankees   2007   2008
>>> pd.lreshape(data, {"year": ["year1", "year2"], "hr": ["hr1", "hr2"]})
      team  year   hr
0  Red Sox  2007  514
1  Yankees  2007  573
2  Red Sox  2008  545
3  Yankees  2008  526

Examples for pandas.wide_to_long

>>> np.random.seed(123)
>>> df = pd.DataFrame(
...     {
...         "A1970": {0: "a", 1: "b", 2: "c"},
...         "A1980": {0: "d", 1: "e", 2: "f"},
...         "B1970": {0: 2.5, 1: 1.2, 2: 0.7},
...         "B1980": {0: 3.2, 1: 1.3, 2: 0.1},
...         "X": dict(zip(range(3), np.random.randn(3), strict=True)),
...     }
... )
>>> df["id"] = df.index
>>> df
  A1970 A1980  B1970  B1980         X  id
0     a     d    2.5    3.2 -1.085631   0
1     b     e    1.2    1.3  0.997345   1
2     c     f    0.7    0.1  0.282978   2
>>> pd.wide_to_long(df, ["A", "B"], i="id", j="year")
...
                X  A    B
id year
0  1970 -1.085631  a  2.5
1  1970  0.997345  b  1.2
2  1970  0.282978  c  0.7
0  1980 -1.085631  d  3.2
1  1980  0.997345  e  1.3
2  1980  0.282978  f  0.1

With multiple id columns

>>> df = pd.DataFrame(
...     {
...         "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
...         "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
...         "ht1": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
...         "ht2": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
...     }
... )
>>> df
   famid  birth  ht1  ht2
0      1      1  2.8  3.4
1      1      2  2.9  3.8
2      1      3  2.2  2.9
3      2      1  2.0  3.2
4      2      2  1.8  2.8
5      2      3  1.9  2.4
6      3      1  2.2  3.3
7      3      2  2.3  3.4
8      3      3  2.1  2.9
>>> long_format = pd.wide_to_long(df, stubnames="ht", i=["famid", "birth"], j="age")
>>> long_format
...
                  ht
famid birth age
1     1     1    2.8
            2    3.4
      2     1    2.9
            2    3.8
      3     1    2.2
            2    2.9
2     1     1    2.0
            2    3.2
      2     1    1.8
            2    2.8
      3     1    1.9
            2    2.4
3     1     1    2.2
            2    3.3
      2     1    2.3
            2    3.4
      3     1    2.1
            2    2.9

Going from long back to wide just takes some creative use of unstack

>>> wide_format = long_format.unstack()
>>> wide_format.columns = wide_format.columns.map("{0[0]}{0[1]}".format)
>>> wide_format.reset_index()
   famid  birth  ht1  ht2
0      1      1  2.8  3.4
1      1      2  2.9  3.8
2      1      3  2.2  2.9
3      2      1  2.0  3.2
4      2      2  1.8  2.8
5      2      3  1.9  2.4
6      3      1  2.2  3.3
7      3      2  2.3  3.4
8      3      3  2.1  2.9

Less wieldy column names are also handled

>>> np.random.seed(0)
>>> df = pd.DataFrame(
...     {
...         "A(weekly)-2010": np.random.rand(3),
...         "A(weekly)-2011": np.random.rand(3),
...         "B(weekly)-2010": np.random.rand(3),
...         "B(weekly)-2011": np.random.rand(3),
...         "X": np.random.randint(3, size=3),
...     }
... )
>>> df["id"] = df.index
>>> df
   A(weekly)-2010  A(weekly)-2011  B(weekly)-2010  B(weekly)-2011  X  id
0        0.548814        0.544883        0.437587        0.383442  0   0
1        0.715189        0.423655        0.891773        0.791725  1   1
2        0.602763        0.645894        0.963663        0.528895  1   2
>>> pd.wide_to_long(df, ["A(weekly)", "B(weekly)"], i="id", j="year", sep="-")
...
         X  A(weekly)  B(weekly)
id year
0  2010  0   0.548814   0.437587
1  2010  1   0.715189   0.891773
2  2010  1   0.602763   0.963663
0  2011  0   0.544883   0.383442
1  2011  1   0.423655   0.791725
2  2011  1   0.645894   0.528895

If we have many columns, we could also use a regex to find our stubnames and pass that list on to wide_to_long

>>> stubnames = sorted(
...     set(
...         [
...             match[0]
...             for match in df.columns.str.findall(r"[A-B]\(.*\)").values
...             if match != []
...         ]
...     )
... )
>>> list(stubnames)
['A(weekly)', 'B(weekly)']

All of the above examples have integers as suffixes. It is possible to have non-integers as suffixes.

>>> df = pd.DataFrame(
...     {
...         "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
...         "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
...         "ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
...         "ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
...     }
... )
>>> df
   famid  birth  ht_one  ht_two
0      1      1     2.8     3.4
1      1      2     2.9     3.8
2      1      3     2.2     2.9
3      2      1     2.0     3.2
4      2      2     1.8     2.8
5      2      3     1.9     2.4
6      3      1     2.2     3.3
7      3      2     2.3     3.4
8      3      3     2.1     2.9
>>> long_format = pd.wide_to_long(
...     df, stubnames="ht", i=["famid", "birth"], j="age", sep="_", suffix=r"\w+"
... )
>>> long_format
...
                  ht
famid birth age
1     1     one  2.8
            two  3.4
      2     one  2.9
            two  3.8
      3     one  2.2
            two  2.9
2     1     one  2.0
            two  3.2
      2     one  1.8
            two  2.8
      3     one  1.9
            two  2.4
3     1     one  2.2
            two  3.3
      2     one  2.3
            two  3.4
      3     one  2.1
            two  2.9

Examples for pandas.isna

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.isna("dog")
False
>>> pd.isna(pd.NA)
True
>>> pd.isna(np.nan)
True

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.isna(array)
array([[False,  True, False],
       [False, False,  True]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False,  True, False])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.isna(df)
       0      1      2
0  False  False  False
1  False   True  False
>>> pd.isna(df[1])
0    False
1     True
Name: 1, dtype: bool

Examples for pandas.isnull

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.isna("dog")
False
>>> pd.isna(pd.NA)
True
>>> pd.isna(np.nan)
True

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.isna(array)
array([[False,  True, False],
       [False, False,  True]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False,  True, False])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.isna(df)
       0      1      2
0  False  False  False
1  False   True  False
>>> pd.isna(df[1])
0    False
1     True
Name: 1, dtype: bool

Examples for pandas.notna

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.notna("dog")
True
>>> pd.notna(pd.NA)
False
>>> pd.notna(np.nan)
False

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.notna(array)
array([[ True, False,  True],
       [ True,  True, False]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True,  True, False,  True])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.notna(df)
      0      1     2
0  True   True  True
1  True  False  True
>>> pd.notna(df[1])
0     True
1    False
Name: 1, dtype: bool

Examples for pandas.notnull

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.notna("dog")
True
>>> pd.notna(pd.NA)
False
>>> pd.notna(np.nan)
False

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.notna(array)
array([[ True, False,  True],
       [ True,  True, False]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True,  True, False,  True])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.notna(df)
      0      1     2
0  True   True  True
1  True  False  True
>>> pd.notna(df[1])
0     True
1    False
Name: 1, dtype: bool

Series

Constructor

Function Description
Series([data, index, dtype, name, copy]) One-dimensional ndarray with axis labels (including time series).

Attributes

Axes

Function Description
Series.index The index (axis labels) of the Series.
Series.array The ExtensionArray of the data backing this Series or Index.
Series.values Return Series as ndarray or ndarray-like depending on the dtype.
Series.dtype Return the dtype object of the underlying data.
Series.info([verbose, buf, max_cols, ...]) Print a concise summary of a Series.
Series.shape Return a tuple of the shape of the underlying data.
Series.nbytes Return the number of bytes in the underlying data.
Series.ndim Number of dimensions of the underlying data, by definition 1.
Series.size Return the number of elements in the underlying data.
Series.T Return the transpose, which is by definition self.
Series.memory_usage([index, deep]) Return the memory usage of the Series.
Series.hasnans Return True if there are any NaNs.
Series.empty Indicator whether Index is empty.
Series.dtypes Return the dtype object of the underlying data.
Series.name Return the name of the Series.
Series.flags Get the properties associated with this pandas object.
Series.set_flags(*[, copy, ...]) Return a new object with updated flags.

Conversion

Function Description
Series.astype(dtype[, copy, errors]) Cast a pandas object to a specified dtype dtype.
Series.convert_dtypes([infer_objects, ...]) Convert columns from numpy dtypes to the best dtypes that support pd.NA.
Series.infer_objects([copy]) Attempt to infer better dtypes for object columns.
Series.copy([deep]) Make a copy of this object's indices and data.
Series.to_numpy([dtype, copy, na_value]) A NumPy ndarray representing the values in this Series or Index.
Series.to_period([freq, copy]) Convert Series from DatetimeIndex to PeriodIndex.
Series.to_timestamp([freq, how, copy]) Cast to DatetimeIndex of Timestamps, at beginning of period.
Series.to_list() Return a list of the values.
Series.array([dtype, copy]) Return the values as a NumPy array.

Indexing, iteration

Function Description
Series.get(key[, default]) Get item from object for given key (ex: DataFrame column).
Series.at Access a single value for a row/column label pair.
Series.iat Access a single value for a row/column pair by integer position.
Series.loc Access a group of rows and columns by label(s) or a boolean array.
Series.iloc Purely integer-location based indexing for selection by position.
Series.iter() Return an iterator of the values.
Series.items() Lazily iterate over (index, value) tuples.
Series.keys() Return alias for index.
Series.pop(item) Return item and drops from series.
Series.item() Return the first element of the underlying data as a Python scalar.
Series.xs(key[, axis, level, drop_level]) Return cross-section from the Series/DataFrame.

For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.

Binary operator functions

Function Description
Series.add(other[, level, fill_value, axis]) Return Addition of series and other, element-wise (binary operator add).
Series.sub(other[, level, fill_value, axis]) Return Subtraction of series and other, element-wise (binary operator sub).
Series.mul(other[, level, fill_value, axis]) Return Multiplication of series and other, element-wise (binary operator mul).
Series.div(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator truediv).
Series.truediv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator truediv).
Series.floordiv(other[, level, fill_value, axis]) Return Integer division of series and other, element-wise (binary operator floordiv).
Series.mod(other[, level, fill_value, axis]) Return Modulo of series and other, element-wise (binary operator mod).
Series.pow(other[, level, fill_value, axis]) Return Exponential power of series and other, element-wise (binary operator pow).
Series.radd(other[, level, fill_value, axis]) Return Addition of series and other, element-wise (binary operator radd).
Series.rsub(other[, level, fill_value, axis]) Return Subtraction of series and other, element-wise (binary operator rsub).
Series.rmul(other[, level, fill_value, axis]) Return Multiplication of series and other, element-wise (binary operator rmul).
Series.rdiv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator rtruediv).
Series.rtruediv(other[, level, fill_value, axis]) Return Floating division of series and other, element-wise (binary operator rtruediv).
Series.rfloordiv(other[, level, fill_value, ...]) Return Integer division of series and other, element-wise (binary operator rfloordiv).
Series.rmod(other[, level, fill_value, axis]) Return Modulo of series and other, element-wise (binary operator rmod).
Series.rpow(other[, level, fill_value, axis]) Return Exponential power of series and other, element-wise (binary operator rpow).
Series.combine(other, func[, fill_value]) Combine the Series with a Series or scalar according to func.
Series.combine_first(other) Update null elements with value in the same location in 'other'.
Series.round([decimals]) Round each value in a Series to the given number of decimals.
Series.lt(other[, level, fill_value, axis]) Return Greater than of series and other, element-wise (binary operator lt).
Series.gt(other[, level, fill_value, axis]) Return Greater than of series and other, element-wise (binary operator gt).
Series.le(other[, level, fill_value, axis]) Return Less than or equal to of series and other, element-wise (binary operator le).
Series.ge(other[, level, fill_value, axis]) Return Greater than or equal to of series and other, element-wise (binary operator ge).
Series.ne(other[, level, fill_value, axis]) Return Not equal to of series and other, element-wise (binary operator ne).
Series.eq(other[, level, fill_value, axis]) Return Equal to of series and other, element-wise (binary operator eq).
Series.product(*[, axis, skipna, ...]) Return the product of the values over the requested axis.
Series.dot(other) Compute the dot product between the Series and the columns of other.

Function application, GroupBy & window

Function Description
Series.apply(func[, args, by_row]) Invoke function on values of Series.
Series.agg([func, axis]) Aggregate using one or more operations over the specified axis.
Series.aggregate([func, axis]) Aggregate using one or more operations over the specified axis.
Series.transform(func[, axis]) Call func on self producing a Series with the same axis shape as self.
Series.map([func, na_action, engine]) Map values of Series according to an input mapping or function.
Series.groupby([by, level, as_index, sort, ...]) Group Series using a mapper or by a Series of columns.
Series.rolling(window[, min_periods, ...]) Provide rolling window calculations.
Series.expanding([min_periods, method]) Provide expanding window calculations.
Series.ewm([com, span, halflife, alpha, ...]) Provide exponentially weighted (EW) calculations.
Series.pipe(func, *args, **kwargs) Apply chainable functions that expect Series or DataFrames.

Computations / descriptive stats

Function Description
Series.abs() Return a Series/DataFrame with absolute numeric value of each element.
Series.all(*[, axis, bool_only, skipna]) Return whether all elements are True, potentially over an axis.
Series.any(*[, axis, bool_only, skipna]) Return whether any element is True, potentially over an axis.
Series.autocorr([lag]) Compute the lag-N autocorrelation.
Series.between(left, right[, inclusive]) Return boolean Series equivalent to left <= series <= right.
Series.clip([lower, upper, axis, inplace]) Trim values at input threshold(s).
Series.corr(other[, method, min_periods]) Compute correlation with other Series, excluding missing values.
Series.count() Return number of non-NA/null observations in the Series.
Series.cov(other[, min_periods, ddof]) Compute covariance with Series, excluding missing values.
Series.cummax([axis, skipna]) Return cumulative maximum over a Series.
Series.cummin([axis, skipna]) Return cumulative minimum over a Series.
Series.cumprod([axis, skipna]) Return cumulative product over a Series.
Series.cumsum([axis, skipna]) Return cumulative sum over a Series.
Series.describe([percentiles, include, exclude]) Generate descriptive statistics.
Series.diff([periods]) First discrete difference of Series elements.
Series.factorize([sort, use_na_sentinel]) Encode the object as an enumerated type or categorical variable.
Series.kurt(*[, axis, skipna, numeric_only]) Return unbiased kurtosis over requested axis.
Series.max(*[, axis, skipna, numeric_only]) Return the maximum of the values over the requested axis.
Series.mean(*[, axis, skipna, numeric_only]) Return the mean of the values over the requested axis.
Series.median(*[, axis, skipna, numeric_only]) Return the median of the values over the requested axis.
Series.min(*[, axis, skipna, numeric_only]) Return the minimum of the values over the requested axis.
Series.mode([dropna]) Return the mode(s) of the Series.
Series.nlargest([n, keep]) Return the largest n elements.
Series.nsmallest([n, keep]) Return the smallest n elements.
Series.pct_change([periods, fill_method, freq]) Fractional change between the current and a prior element.
Series.prod(*[, axis, skipna, numeric_only, ...]) Return the product of the values over the requested axis.
Series.quantile([q, interpolation]) Return value at the given quantile.
Series.rank([axis, method, numeric_only, ...]) Compute numerical data ranks (1 through n) along axis.
Series.sem(*[, axis, skipna, ddof, numeric_only]) Return unbiased standard error of the mean over requested axis.
Series.skew(*[, axis, skipna, numeric_only]) Return unbiased skew over requested axis.
Series.std(*[, axis, skipna, ddof, numeric_only]) Return sample standard deviation.
Series.sum(*[, axis, skipna, numeric_only, ...]) Return the sum of the values over the requested axis.
Series.var(*[, axis, skipna, ddof, numeric_only]) Return unbiased variance over requested axis.
Series.kurtosis(*[, axis, skipna, numeric_only]) Return unbiased kurtosis over requested axis.
Series.unique() Return unique values of Series object.
Series.nunique([dropna]) Return number of unique elements in the object.
Series.is_unique Return True if values in the object are unique.
Series.is_monotonic_increasing Return True if values in the object are monotonically increasing.
Series.is_monotonic_decreasing Return True if values in the object are monotonically decreasing.
Series.value_counts([normalize, sort, ...]) Return a Series containing counts of unique values.

Reindexing / selection / label manipulation

Function Description
Series.align(other[, join, axis, level, ...]) Align two objects on their axes with the specified join method.
Series.case_when(caselist) Replace values where the conditions are True.
Series.drop([labels, axis, index, columns, ...]) Return Series with specified index labels removed.
Series.droplevel(level[, axis]) Return Series/DataFrame with requested index / column level(s) removed.
Series.drop_duplicates(*[, keep, inplace, ...]) Return Series with duplicate values removed.
Series.duplicated([keep]) Indicate duplicate Series values.
Series.equals(other) Test whether two objects contain the same elements.
Series.head([n]) Return the first n rows.
Series.idxmax([axis, skipna]) Return the row label of the maximum value.
Series.idxmin([axis, skipna]) Return the row label of the minimum value.
Series.isin(values) Whether elements in Series are contained in values.
Series.reindex([index, axis, method, copy, ...]) Conform Series to new index with optional filling logic.
Series.reindex_like(other[, method, copy, ...]) Return an object with matching indices as other object.
Series.rename([index, axis, copy, inplace, ...]) Alter Series index labels or name.
Series.rename_axis([mapper, index, axis, ...]) Set the name of the axis for the index.
Series.reset_index([level, drop, name, ...]) Generate a new DataFrame or Series with the index reset.
Series.sample([n, frac, replace, weights, ...]) Return a random sample of items from an axis of object.
Series.set_axis(labels, *[, axis, copy]) (DEPRECATED) Assign desired index to given axis.
Series.take(indices[, axis]) Return the elements in the given positional indices along an axis.
Series.tail([n]) Return the last n rows.
Series.truncate([before, after, axis, copy]) Truncate a Series or DataFrame before and after some index value.
Series.where(cond[, other, inplace, axis, level]) Replace values where the condition is False.
Series.mask(cond[, other, inplace, axis, level]) Replace values where the condition is True.
Series.add_prefix(prefix[, axis]) Prefix labels with string prefix.
Series.add_suffix(suffix[, axis]) Suffix labels with string suffix.
Series.filter([items, like, regex, axis]) Subset the DataFrame or Series according to the specified index labels.

Missing data handling

Function Description
Series.bfill(*[, axis, inplace, limit, ...]) Fill NA/NaN values by using the next valid observation to fill the gap.
Series.dropna(*[, axis, inplace, how, ...]) Return a new Series with missing values removed.
Series.ffill(*[, axis, inplace, limit, ...]) Fill NA/NaN values by propagating the last valid observation to next valid.
Series.fillna(value, *[, axis, inplace, limit]) Fill NA/NaN values with value.
Series.interpolate([method, axis, limit, ...]) Fill NaN values using an interpolation method.
Series.isna() Detect missing values.
Series.isnull() Series.isnull is an alias for Series.isna.
Series.notna() Detect existing (non-missing) values.
Series.notnull() Series.notnull is an alias for Series.notna.
Series.replace([to_replace, value, inplace, ...]) Replace values given in to_replace with value.

Reshaping, sorting

Function Description
Series.argsort([axis, kind, order, stable]) Return the integer indices that would sort the Series values.
Series.argmin([axis, skipna]) Return int position of the smallest value in the Series.
Series.argmax([axis, skipna]) Return int position of the largest value in the Series.
Series.reorder_levels(order) Rearrange index levels using input order.
Series.sort_values(*[, axis, ascending, ...]) Sort by the values.
Series.sort_index(*[, axis, level, ...]) Sort Series by index labels.
Series.swaplevel([i, j, copy]) Swap levels i and j in a MultiIndex.
Series.unstack([level, fill_value, sort]) Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
Series.explode([ignore_index]) Transform each element of a list-like to a row.
Series.searchsorted(value[, side, sorter]) Find indices where elements should be inserted to maintain order.
Series.repeat(repeats[, axis]) Repeat elements of a Series.
Series.squeeze([axis]) Squeeze 1 dimensional axis objects into scalars.

Combining / comparing / joining / merging

Function Description
Series.compare(other[, align_axis, ...]) Compare to another Series and show the differences.
Series.update(other) Modify Series in place using values from passed Series.

Time Series-related

Function Description
Series.asfreq(freq[, method, how, ...]) Convert time series to specified frequency.
Series.asof(where[, subset]) Return the last row(s) without any NaNs before where.
Series.shift([periods, freq, axis, ...]) Shift index by desired number of periods with an optional time freq.
Series.first_valid_index() Return index for first non-missing value or None, if no value is found.
Series.last_valid_index() Return index for last non-missing value or None, if no value is found.
Series.resample(rule[, closed, label, ...]) Resample time-series data.
Series.tz_convert(tz[, axis, level, copy]) Convert tz-aware axis to target time zone.
Series.tz_localize(tz[, axis, level, copy, ...]) Localize time zone naive index of a Series or DataFrame to target time zone.
Series.at_time(time[, asof, axis]) Select values at particular time of day (e.g., 9:30AM).
Series.between_time(start_time, end_time[, ...]) Select values between particular times of the day (e.g., 9:00-9:30 AM).

Accessors

pandas provides dtype-specific methods under various accessors. These are separate namespaces within Series that only apply to specific data types.

Function Description
Series.str alias of StringMethods
Series.cat alias of CategoricalAccessor
Series.dt alias of CombinedDatetimelikeProperties
Series.sparse alias of SparseAccessor
DataFrame.sparse alias of SparseFrameAccessor
Index.str alias of StringMethods
Data Type Accessor
Datetime, Timedelta, Period dt
String str
Categorical cat
Sparse sparse

Datetimelike properties

Series.dt can be used to access the values of the series as datetimelike and return several properties. These can be accessed like Series.dt.<property>.

Datetime properties

Function Description
Series.dt.date Returns numpy array of python datetime.date objects.
Series.dt.time Returns numpy array of datetime.time objects.
Series.dt.timetz Returns numpy array of datetime.time objects with timezones.
Series.dt.year The year of the datetime.
Series.dt.month The month as January=1, December=12.
Series.dt.day The day of the datetime.
Series.dt.hour The hours of the datetime.
Series.dt.minute The minutes of the datetime.
Series.dt.second The seconds of the datetime.
Series.dt.microsecond The microseconds of the datetime.
Series.dt.nanosecond The nanoseconds of the datetime.
Series.dt.dayofweek The day of the week with Monday=0, Sunday=6.
Series.dt.day_of_week The day of the week with Monday=0, Sunday=6.
Series.dt.weekday The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear The ordinal day of the year.
Series.dt.day_of_year The ordinal day of the year.
Series.dt.days_in_month The number of days in the month.
Series.dt.quarter The quarter of the date.
Series.dt.is_month_start Indicates whether the date is the first day of the month.
Series.dt.is_month_end Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start Indicate whether the date is the first day of a year.
Series.dt.is_year_end Indicate whether the date is the last day of the year.
Series.dt.is_leap_year Boolean indicator if the date belongs to a leap year.
Series.dt.daysinmonth The number of days in the month.
Series.dt.days_in_month The number of days in the month.
Series.dt.tz Return the timezone.
Series.dt.freq Tries to return a string representing a frequency generated by infer_freq.
Series.dt.unit The precision unit of the datetime data.

Datetime methods

Function Description
Series.dt.isocalendar() Calculate year, week, and day according to the ISO 8601 standard.
Series.dt.to_period([freq]) Cast to PeriodArray/PeriodIndex at a particular frequency.
Series.dt.to_pydatetime() Return the data as a Series of datetime.datetime objects.
Series.dt.tz_localize(tz[, ambiguous, ...]) Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
Series.dt.tz_convert(tz) Convert tz-aware Datetime Array/Index from one time zone to another.
Series.dt.normalize() Convert times to midnight.
Series.dt.strftime(date_format) Convert to Index using specified date_format.
Series.dt.round(freq[, ambiguous, nonexistent]) Perform round operation on the data to the specified freq.
Series.dt.floor(freq[, ambiguous, nonexistent]) Perform floor operation on the data to the specified freq.
Series.dt.ceil(freq[, ambiguous, nonexistent]) Perform ceil operation on the data to the specified freq.
Series.dt.month_name([locale]) Return the month names with specified locale.
Series.dt.day_name([locale]) Return the day names with specified locale.
Series.dt.as_unit(unit[, round_ok]) Convert to a dtype with the given unit resolution.

Period properties

Function Description
Series.dt.qyear Fiscal year the Period lies in according to its starting-quarter.
Series.dt.start_time Get the Timestamp for the start of the period.
Series.dt.end_time Get the Timestamp for the end of the period.

Timedelta properties

Function Description
Series.dt.days Number of days for each element.
Series.dt.seconds Number of seconds (>= 0 and less than 1 day) for each element.
Series.dt.microseconds Number of microseconds (>= 0 and less than 1 second) for each element.
Series.dt.nanoseconds Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.
Series.dt.components Return a Dataframe of the components of the Timedeltas.
Series.dt.unit The precision unit of the datetime data.

Timedelta methods

Function Description
Series.dt.to_pytimedelta() Return an array of native datetime.timedelta objects.
Series.dt.total_seconds() Return total duration of each element expressed in seconds.
Series.dt.as_unit(unit[, round_ok]) Convert to a dtype with the given unit resolution.

String handling

Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property>.

Function Description
Series.str.capitalize() Convert strings in the Series/Index to be capitalized.
Series.str.casefold() Convert strings in the Series/Index to be casefolded.
Series.str.cat([others, sep, na_rep, join]) Concatenate strings in the Series/Index with given separator.
Series.str.center(width[, fillchar]) Pad left and right side of strings in the Series/Index.
Series.str.contains(pat[, case, flags, na, ...]) Test if pattern or regex is contained within a string of a Series or Index.
Series.str.count(pat[, flags]) Count occurrences of pattern in each string of the Series/Index.
Series.str.decode(encoding[, errors, dtype]) Decode character string in the Series/Index using indicated encoding.
Series.str.encode(encoding[, errors]) Encode character string in the Series/Index using indicated encoding.
Series.str.endswith(pat[, na]) Test if the end of each string element matches a pattern.
Series.str.extract(pat[, flags, expand]) Extract capture groups in the regex pat as columns in a DataFrame.
Series.str.extractall(pat[, flags]) Extract capture groups in the regex pat as columns in DataFrame.
Series.str.find(sub[, start, end]) Return lowest indexes in each strings in the Series/Index.
Series.str.findall(pat[, flags]) Find all occurrences of pattern or regular expression in the Series/Index.
Series.str.fullmatch(pat[, case, flags, na]) Determine if each string entirely matches a regular expression.
Series.str.get(i) Extract element from each component at specified position or with specified key.
Series.str.index(sub[, start, end]) Return lowest indexes in each string in Series/Index.
Series.str.isascii() Check whether all characters in each string are ascii.
Series.str.join(sep) Join lists contained as elements in the Series/Index with passed delimiter.
Series.str.len() Compute the length of each element in the Series/Index.
Series.str.ljust(width[, fillchar]) Pad right side of strings in the Series/Index.
Series.str.lower() Convert strings in the Series/Index to lowercase.
Series.str.lstrip([to_strip]) Remove leading characters.
Series.str.match(pat[, case, flags, na]) Determine if each string starts with a match of a regular expression.
Series.str.normalize(form) Return the Unicode normal form for the strings in the Series/Index.
Series.str.pad(width[, side, fillchar]) Pad strings in the Series/Index up to width.
Series.str.partition([sep, expand]) Split the string at the first occurrence of sep.
Series.str.removeprefix(prefix) Remove a prefix from an object series.
Series.str.removesuffix(suffix) Remove a suffix from an object series.
Series.str.repeat(repeats) Duplicate each string in the Series or Index.
Series.str.replace(pat[, repl, n, case, ...]) Replace each occurrence of pattern/regex in the Series/Index.
Series.str.rfind(sub[, start, end]) Return highest indexes in each strings in the Series/Index.
Series.str.rindex(sub[, start, end]) Return highest indexes in each string in Series/Index.
Series.str.rjust(width[, fillchar]) Pad left side of strings in the Series/Index.
Series.str.rpartition([sep, expand]) Split the string at the last occurrence of sep.
Series.str.rstrip([to_strip]) Remove trailing characters.
Series.str.slice([start, stop, step]) Slice substrings from each element in the Series or Index.
Series.str.slice_replace([start, stop, repl]) Replace a positional slice of a string with another value.
Series.str.split([pat, n, expand, regex]) Split strings around given separator/delimiter.
Series.str.rsplit([pat, n, expand]) Split strings around given separator/delimiter.
Series.str.startswith(pat[, na]) Test if the start of each string element matches a pattern.
Series.str.strip([to_strip]) Remove leading and trailing characters.
Series.str.swapcase() Convert strings in the Series/Index to be swapcased.
Series.str.title() Convert strings in the Series/Index to titlecase.
Series.str.translate(table) Map all characters in the string through the given mapping table.
Series.str.upper() Convert strings in the Series/Index to uppercase.
Series.str.wrap(width[, expand_tabs, ...]) Wrap strings in Series/Index at specified line width.
Series.str.zfill(width) Pad strings in the Series/Index by prepending '0' characters.
Series.str.isalnum() Check whether all characters in each string are alphanumeric.
Series.str.isalpha() Check whether all characters in each string are alphabetic.
Series.str.isdigit() Check whether all characters in each string are digits.
Series.str.isspace() Check whether all characters in each string are whitespace.
Series.str.islower() Check whether all characters in each string are lowercase.
Series.str.isupper() Check whether all characters in each string are uppercase.
Series.str.istitle() Check whether all characters in each string are titlecase.
Series.str.isnumeric() Check whether all characters in each string are numeric.
Series.str.isdecimal() Check whether all characters in each string are decimal.
Series.str.get_dummies([sep, dtype]) Return DataFrame of dummy/indicator variables for Series.

Categorical accessor

Categorical-dtype specific methods and attributes are available under the Series.cat accessor.

Function Description
Series.cat.categories The categories of this categorical.
Series.cat.ordered Whether the categories have an ordered relationship.
Series.cat.codes Return Series of codes as well as the index.
Function Description
Series.cat.rename_categories(new_categories) Rename categories.
Series.cat.reorder_categories(new_categories) Reorder categories as specified in new_categories.
Series.cat.add_categories(new_categories) Add new categories.
Series.cat.remove_categories(removals) Remove the specified categories.
Series.cat.remove_unused_categories() Remove categories which are not used.
Series.cat.set_categories(new_categories[, ...]) Set the categories to the specified new categories.
Series.cat.as_ordered() Set the Categorical to be ordered.
Series.cat.as_unordered() Set the Categorical to be unordered.

Sparse accessor

Sparse-dtype specific methods and attributes are provided under the Series.sparse accessor.

Function Description
Series.sparse.npoints The number of non- fill_value points.
Series.sparse.density The percent of non- fill_value points, as decimal.
Series.sparse.fill_value Elements in data that are fill_value are not stored.
Series.sparse.sp_values An ndarray containing the non- fill_value values.
Function Description
Series.sparse.from_coo(A[, dense_index]) Create a Series with sparse values from a scipy.sparse.coo_matrix.
Series.sparse.to_coo([row_levels, ...]) Create a scipy.sparse.coo_matrix from a Series with MultiIndex.

List accessor

Arrow list-dtype specific methods and attributes are provided under the Series.list accessor.

Function Description
Series.list.flatten() Flatten list values.
Series.list.len() Return the length of each list in the Series.
Series.list.getitem(key) Index or slice lists in the Series.

Struct accessor

Arrow struct-dtype specific methods and attributes are provided under the Series.struct accessor.

Function Description
Series.struct.dtypes Return the dtype object of each child field of the struct.
Function Description
Series.struct.field(name_or_index) Extract a child field of a struct as a Series.
Series.struct.explode() Extract all child fields of a struct as a DataFrame.

Flags

Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in Series.attrs.

Function Description
Flags(obj, *, allows_duplicate_labels) Flags that apply to pandas objects.

Metadata

Series.attrs is a dictionary for storing global metadata for this Series.

Warning

Series.attrs is considered experimental and may change without warning.

Function Description
Series.attrs Dictionary of global attributes of this dataset.

Plotting

Series.plot is both a callable method and a namespace attribute for specific plotting methods of the form Series.plot.<kind>.

Function Description
Series.plot([kind, ax, figsize, ....]) Series plotting accessor and method
Function Description
Series.plot.area([x, y, stacked]) Draw a stacked area plot.
Series.plot.bar([x, y, color]) Vertical bar plot.
Series.plot.barh([x, y, color]) Make a horizontal bar plot.
Series.plot.box([by]) Make a box plot of the DataFrame columns.
Series.plot.density([bw_method, ind, weights]) Generate Kernel Density Estimate plot using Gaussian kernels.
Series.plot.hist([by, bins]) Draw one histogram of the DataFrame's columns.
Series.plot.kde([bw_method, ind, weights]) Generate Kernel Density Estimate plot using Gaussian kernels.
Series.plot.line([x, y, color]) Plot Series or DataFrame as lines.
Series.plot.pie([y]) Generate a pie plot.
Function Description
Series.hist([by, ax, grid, xlabelsize, ...]) Draw histogram of the input series using matplotlib.

Serialization / IO / conversion

Function Description
Series.from_arrow(data) Construct a Series from an array-like Arrow object.
Series.to_pickle(path, *[, compression, ...]) Pickle (serialize) object to file.
Series.to_csv([path_or_buf, sep, na_rep, ...]) Write object to a comma-separated values (csv) file.
Series.to_dict(*[, into]) Convert Series to {label -> value} dict or dict-like object.
Series.to_excel(excel_writer, *[, ...]) Write object to an Excel sheet.
Series.to_frame([name]) Convert Series to DataFrame.
Series.to_xarray() Return an xarray object from the pandas object.
Series.to_hdf(path_or_buf, *, key[, mode, ...]) Write the contained data to an HDF5 file using HDFStore.
Series.to_sql(name, con, *[, schema, ...]) Write records stored in a DataFrame to a SQL database.
Series.to_json([path_or_buf, orient, ...]) Convert the object to a JSON string.
Series.to_string([buf, na_rep, ...]) Render a string representation of the Series.
Series.to_clipboard(*[, excel, sep]) Copy object to the system clipboard.
Series.to_latex([buf, columns, header, ...]) Render object to a LaTeX tabular, longtable, or nested table.
Series.to_markdown([buf, mode, index, ...]) Print Series in Markdown-friendly format.

Examples for pandas.Series

Constructing Series from a dictionary with an Index specified

>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["a", "b", "c"])
>>> ser
a   1
b   2
c   3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index values have no effect.

>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["x", "y", "z"])
>>> ser
x   NaN
y   NaN
z   NaN
dtype: float64

Note that the Index is first built with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.

Constructing Series from a list with copy=False.

>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a copy of the original data even though copy=False, so the data is unchanged.

Constructing Series from a 1d ndarray with copy=False.

>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999,   2])
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a view on the original data, so the data is changed as well.

Examples for pandas.Series.index

To create a Series with a custom index and view the index labels:

>>> cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
>>> populations = [14.85, 2.71, 2.93, 0.51]
>>> city_series = pd.Series(populations, index=cities)
>>> city_series.index
Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')

To change the index labels of an existing Series:

>>> city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
>>> city_series.index
Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')

Examples for pandas.Series.array

For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

>>> pd.Series([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

For extension types, like Categorical, the actual ExtensionArray is returned

>>> ser = pd.Series(pd.Categorical(["a", "b", "a"]))
>>> ser.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Examples for pandas.Series.values

>>> pd.Series([1, 2, 3]).values
array([1, 2, 3])
>>> pd.Series(list("aabc")).values
<ArrowStringArray>
['a', 'a', 'b', 'c']
Length: 4, dtype: str
>>> pd.Series(list("aabc")).astype("category").values
['a', 'a', 'b', 'c']
Categories (3, str): ['a', 'b', 'c']

Timezone aware datetime data is converted to UTC:

>>> pd.Series(pd.date_range("20130101", periods=3, tz="US/Eastern")).values
array(['2013-01-01T05:00:00.000000',
       '2013-01-02T05:00:00.000000',
       '2013-01-03T05:00:00.000000'], dtype='datetime64[us]')

Examples for pandas.Series.dtype

>>> s = pd.Series([1, 2, 3])
>>> s.dtype
dtype('int64')

Examples for pandas.Series.info

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
Series name: None
Non-Null Count  Dtype
--------------  -----
5 non-null      str
dtypes: str(1)
memory usage: 106.0 bytes

Prints a summary excluding information about its values:

>>> s.info(verbose=False)
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
dtypes: str(1)
memory usage: 106.0 bytes

Pipe output of Series.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big Series and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> s = pd.Series(np.random.choice(["a", "b", "c"], 10**6))
>>> s.info()
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  str
dtypes: str(1)
memory usage: 8.6 MB
>>> s.info(memory_usage="deep")
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  str
dtypes: str(1)
memory usage: 8.6 MB

Examples for pandas.Series.shape

>>> s = pd.Series([1, 2, 3])
>>> s.shape
(3,)

Examples for pandas.Series.nbytes

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.nbytes
34

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24

Examples for pandas.Series.ndim

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.ndim
1

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1

Examples for pandas.Series.size

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.size
3

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3

Examples for pandas.Series.T

For Series:

>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.T
0     Ant
1    Bear
2     Cow
dtype: str

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')

Examples for pandas.Series.memory_usage

>>> s = pd.Series(range(3))
>>> s.memory_usage()
156

Not including the index gives the size of the rest of the data, which is necessarily smaller:

>>> s.memory_usage(index=False)
24

The memory footprint of object values is ignored by default:

>>> s = pd.Series(["a", "b"])
>>> s.values
<ArrowStringArray>
['a', 'b']
Length: 2, dtype: str
>>> s.memory_usage()
150
>>> s.memory_usage(deep=True)
150

Examples for pandas.Series.hasnans

>>> s = pd.Series([1, 2, 3, None])
>>> s
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64
>>> s.hasnans
True

Examples for pandas.Series.empty

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False
>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty!

>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False

Examples for pandas.Series.dtypes

>>> s = pd.Series([1, 2, 3])
>>> s.dtypes
dtype('int64')

Examples for pandas.Series.name

The Series name can be set initially when calling the constructor.

>>> s = pd.Series([1, 2, 3], dtype=np.int64, name="Numbers")
>>> s
0    1
1    2
2    3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0    1
1    2
2    3
Name: Integers, dtype: int64

The name of a Series within a DataFrame is its column name.

>>> df = pd.DataFrame(
...     [[1, 2], [3, 4], [5, 6]], columns=["Odd Numbers", "Even Numbers"]
... )
>>> df
   Odd Numbers  Even Numbers
0            1             2
1            3             4
2            5             6
>>> df["Even Numbers"].name
'Even Numbers'

Examples for pandas.Series.flags

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True

Examples for pandas.Series.set_flags

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False

Examples for pandas.Series.astype

Create a DataFrame:

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype("int32").dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({"col1": "int32"}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype("int64")
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype("category")
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[us]

Examples for pandas.Series.convert_dtypes

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0
>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: str

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string

DataFrame

Constructor

Function Description
DataFrame([data, index, columns, dtype, copy]) Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Attributes and underlying data

Axes

Function Description
DataFrame.index The index (row labels) of the DataFrame.
DataFrame.columns The column labels of the DataFrame.
Function Description
DataFrame.dtypes Return the dtypes in the DataFrame.
DataFrame.info([verbose, buf, max_cols, ...]) Print a concise summary of a DataFrame.
DataFrame.select_dtypes([include, exclude]) Return a subset of the DataFrame's columns based on the column dtypes.
DataFrame.values Return a Numpy representation of the DataFrame.
DataFrame.axes Return a list representing the axes of the DataFrame.
DataFrame.ndim Return an int representing the number of axes / array dimensions.
DataFrame.size Return an int representing the number of elements in this object.
DataFrame.shape Return a tuple representing the dimensionality of the DataFrame.
DataFrame.memory_usage([index, deep]) Return the memory usage of each column in bytes.
DataFrame.empty Indicator whether Series/DataFrame is empty.
DataFrame.set_flags(*[, copy, ...]) Return a new object with updated flags.

Conversion

Function Description
DataFrame.astype(dtype[, copy, errors]) Cast a pandas object to a specified dtype dtype.
DataFrame.convert_dtypes([infer_objects, ...]) Convert columns from numpy dtypes to the best dtypes that support pd.NA.
DataFrame.infer_objects([copy]) Attempt to infer better dtypes for object columns.
DataFrame.copy([deep]) Make a copy of this object's indices and data.
DataFrame.to_numpy([dtype, copy, na_value]) Convert the DataFrame to a NumPy array.

Indexing, iteration

Function Description
DataFrame.head([n]) Return the first n rows.
DataFrame.at Access a single value for a row/column label pair.
DataFrame.iat Access a single value for a row/column pair by integer position.
DataFrame.loc Access a group of rows and columns by label(s) or a boolean array.
DataFrame.iloc Purely integer-location based indexing for selection by position.
DataFrame.insert(loc, column, value[, ...]) Insert column into DataFrame at specified location.
DataFrame.iter() Iterate over info axis.
DataFrame.items() Iterate over (column name, Series) pairs.
DataFrame.keys() Get the 'info axis' (see Indexing for more).
DataFrame.iterrows() Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples([index, name]) Iterate over DataFrame rows as namedtuples.
DataFrame.pop(item) Return item and drop it from DataFrame.
DataFrame.tail([n]) Return the last n rows.
DataFrame.xs(key[, axis, level, drop_level]) Return cross-section from the Series/DataFrame.
DataFrame.get(key[, default]) Get item from object for given key (ex: DataFrame column).
DataFrame.isin(values) Whether each element in the DataFrame is contained in values.
DataFrame.where(cond[, other, inplace, ...]) Replace values where the condition is False.
DataFrame.mask(cond[, other, inplace, axis, ...]) Replace values where the condition is True.
DataFrame.query(expr, *[, parser, engine, ...]) Query the columns of a DataFrame with a boolean expression.
DataFrame.isetitem(loc, value) Set the given value in the column with position loc.

For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.

Binary operator functions

Function Description
DataFrame.add(other) Get Addition of DataFrame and other, column-wise.
DataFrame.add(other[, axis, level, fill_value]) Get Addition of dataframe and other, element-wise (binary operator add).
DataFrame.sub(other[, axis, level, fill_value]) Get Subtraction of dataframe and other, element-wise (binary operator sub).
DataFrame.mul(other[, axis, level, fill_value]) Get Multiplication of dataframe and other, element-wise (binary operator mul).
DataFrame.div(other[, axis, level, fill_value]) Get Floating division of dataframe and other, element-wise (binary operator truediv).
DataFrame.truediv(other[, axis, level, ...]) Get Floating division of dataframe and other, element-wise (binary operator truediv).
DataFrame.floordiv(other[, axis, level, ...]) Get Integer division of dataframe and other, element-wise (binary operator floordiv).
DataFrame.mod(other[, axis, level, fill_value]) Get Modulo of dataframe and other, element-wise (binary operator mod).
DataFrame.pow(other[, axis, level, fill_value]) Get Exponential power of dataframe and other, element-wise (binary operator pow).
DataFrame.dot(other) Compute the matrix multiplication between the DataFrame and other.
DataFrame.radd(other[, axis, level, fill_value]) Get Addition of dataframe and other, element-wise (binary operator radd).
DataFrame.rsub(other[, axis, level, fill_value]) Get Subtraction of dataframe and other, element-wise (binary operator rsub).
DataFrame.rmul(other[, axis, level, fill_value]) Get Multiplication of dataframe and other, element-wise (binary operator rmul).
DataFrame.rdiv(other[, axis, level, fill_value]) Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
DataFrame.rtruediv(other[, axis, level, ...]) Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
DataFrame.rfloordiv(other[, axis, level, ...]) Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
DataFrame.rmod(other[, axis, level, fill_value]) Get Modulo of dataframe and other, element-wise (binary operator rmod).
DataFrame.rpow(other[, axis, level, fill_value]) Get Exponential power of dataframe and other, element-wise (binary operator rpow).
DataFrame.lt(other[, axis, level]) Get Greater than of dataframe and other, element-wise (binary operator lt).
DataFrame.gt(other[, axis, level]) Get Greater than of dataframe and other, element-wise (binary operator gt).
DataFrame.le(other[, axis, level]) Get Greater than or equal to of dataframe and other, element-wise (binary operator le).
DataFrame.ge(other[, axis, level]) Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
DataFrame.ne(other[, axis, level]) Get Not equal to of dataframe and other, element-wise (binary operator ne).
DataFrame.eq(other[, axis, level]) Get Not equal to of dataframe and other, element-wise (binary operator eq).
DataFrame.combine(other, func[, fill_value, ...]) Perform column-wise combine with another DataFrame.
DataFrame.combine_first(other) Update null elements with value in the same location in other.

Function application, GroupBy & window

Function Description
DataFrame.apply(func[, axis, raw, ...]) Apply a function along an axis of the DataFrame.
DataFrame.map(func[, na_action]) Apply a function to a Dataframe elementwise.
DataFrame.pipe(func, *args, **kwargs) Apply chainable functions that expect Series or DataFrames.
DataFrame.agg([func, axis]) Aggregate using one or more operations over the specified axis.
DataFrame.aggregate([func, axis]) Aggregate using one or more operations over the specified axis.
DataFrame.transform(func[, axis]) Call func on self producing a DataFrame with the same axis shape as self.
DataFrame.groupby([by, level, as_index, ...]) Group DataFrame using a mapper or by a Series of columns.
DataFrame.rolling(window[, min_periods, ...]) Provide rolling window calculations.
DataFrame.expanding([min_periods, method]) Provide expanding window calculations.
DataFrame.ewm([com, span, halflife, alpha, ...]) Provide exponentially weighted (EW) calculations.

Computations / descriptive stats

Function Description
DataFrame.abs() Return a Series/DataFrame with absolute numeric value of each element.
DataFrame.all(*[, axis, bool_only, skipna]) Return whether all elements are True, potentially over an axis.
DataFrame.any(*[, axis, bool_only, skipna]) Return whether any element is True, potentially over an axis.
DataFrame.clip([lower, upper, axis, inplace]) Trim values at input threshold(s).
DataFrame.corr([method, min_periods, ...]) Compute pairwise correlation of columns, excluding NA/null values.
DataFrame.corrwith(other[, axis, drop, ...]) Compute pairwise correlation.
DataFrame.count([axis, numeric_only]) Count non-NA cells for each column or row.
DataFrame.cov([min_periods, ddof, numeric_only]) Compute pairwise covariance of columns, excluding NA/null values.
DataFrame.cummax([axis, skipna, numeric_only]) Return cumulative maximum over a DataFrame or Series axis.
DataFrame.cummin([axis, skipna, numeric_only]) Return cumulative minimum over a DataFrame or Series axis.
DataFrame.cumprod([axis, skipna, numeric_only]) Return cumulative product over a DataFrame or Series axis.
DataFrame.cumsum([axis, skipna, numeric_only]) Return cumulative sum over a DataFrame or Series axis.
DataFrame.describe([percentiles, include, ...]) Generate descriptive statistics.
DataFrame.diff([periods, axis]) First discrete difference of element.
DataFrame.eval(expr, *[, inplace]) Evaluate a string describing operations on DataFrame columns.
DataFrame.kurt(*[, axis, skipna, numeric_only]) Return unbiased kurtosis over requested axis.
DataFrame.kurtosis(*[, axis, skipna, ...]) Return unbiased kurtosis over requested axis.
DataFrame.max(*[, axis, skipna, numeric_only]) Return the maximum of the values over the requested axis.
DataFrame.mean(*[, axis, skipna, numeric_only]) Return the mean of the values over the requested axis.
DataFrame.median(*[, axis, skipna, numeric_only]) Return the median of the values over the requested axis.
DataFrame.min(*[, axis, skipna, numeric_only]) Return the minimum of the values over the requested axis.
DataFrame.mode([axis, numeric_only, dropna]) Get the mode(s) of each element along the selected axis.
DataFrame.pct_change([periods, fill_method, ...]) Fractional change between the current and a prior element.
DataFrame.prod(*[, axis, skipna, ...]) Return the product of the values over the requested axis.
DataFrame.product(*[, axis, skipna, ...]) Return the product of the values over the requested axis.
DataFrame.quantile([q, axis, numeric_only, ...]) Return values at the given quantile over requested axis.
DataFrame.rank([axis, method, numeric_only, ...]) Compute numerical data ranks (1 through n) along axis.
DataFrame.round([decimals]) Round numeric columns in a DataFrame to a variable number of decimal places.
DataFrame.sem(*[, axis, skipna, ddof, ...]) Return unbiased standard error of the mean over requested axis.
DataFrame.skew(*[, axis, skipna, numeric_only]) Return unbiased skew over requested axis.
DataFrame.sum(*[, axis, skipna, ...]) Return the sum of the values over the requested axis.
DataFrame.std(*[, axis, skipna, ddof, ...]) Return sample standard deviation over requested axis.
DataFrame.var(*[, axis, skipna, ddof, ...]) Return unbiased variance over requested axis.
DataFrame.nunique([axis, dropna]) Count number of distinct elements in specified axis.
DataFrame.value_counts([subset, normalize, ...]) Return a Series containing the frequency of each distinct row in the DataFrame.

Reindexing / selection / label manipulation

Function Description
DataFrame.add_prefix(prefix[, axis]) Prefix labels with string prefix.
DataFrame.add_suffix(suffix[, axis]) Suffix labels with string suffix.
DataFrame.align(other[, join, axis, level, ...]) Align two objects on their axes with the specified join method.
DataFrame.at_time(time[, asof, axis]) Select values at particular time of day (e.g., 9:30AM).
DataFrame.between_time(start_time, end_time) Select values between particular times of the day (e.g., 9:00-9:30 AM).
DataFrame.drop([labels, axis, index, ...]) Drop specified labels from rows or columns.
DataFrame.drop_duplicates([subset, keep, ...]) Return DataFrame with duplicate rows removed.
DataFrame.duplicated([subset, keep]) Return boolean Series denoting duplicate rows.
DataFrame.equals(other) Test whether two objects contain the same elements.
DataFrame.filter([items, like, regex, axis]) Subset the DataFrame or Series according to the specified index labels.
DataFrame.idxmax([axis, skipna, numeric_only]) Return index of first occurrence of maximum over requested axis.
DataFrame.idxmin([axis, skipna, numeric_only]) Return index of first occurrence of minimum over requested axis.
DataFrame.reindex([labels, index, columns, ...]) Conform DataFrame to new index with optional filling logic.
DataFrame.reindex_like(other[, method, ...]) Return an object with matching indices as other object.
DataFrame.rename([mapper, index, columns, ...]) Rename columns or index labels.
DataFrame.rename_axis([mapper, index, ...]) Set the name of the axis for the index or columns.
DataFrame.reset_index([level, drop, ...]) Reset the index, or a level of it.
DataFrame.sample([n, frac, replace, ...]) Return a random sample of items from an axis of object.
DataFrame.set_axis(labels, *[, axis, copy]) Assign desired index to given axis.
DataFrame.set_index(keys, *[, drop, append, ...]) Set the DataFrame index using existing columns.
DataFrame.take(indices[, axis]) Return the elements in the given positional indices along an axis.
DataFrame.truncate([before, after, axis, copy]) Truncate a Series or DataFrame before and after some index value.

Missing data handling

Function Description
DataFrame.bfill(*[, axis, inplace, limit, ...]) Fill NA/NaN values by using the next valid observation to fill the gap.
DataFrame.dropna(*[, axis, how, thresh, ...]) Remove missing values.
DataFrame.ffill(*[, axis, inplace, limit, ...]) Fill NA/NaN values by propagating the last valid observation to next valid.
DataFrame.fillna(value, *[, axis, inplace, ...]) Fill NA/NaN values with value.
DataFrame.interpolate([method, axis, limit, ...]) Fill NaN values using an interpolation method.
DataFrame.isna() Detect missing values.
DataFrame.isnull() DataFrame.isnull is an alias for DataFrame.isna.
DataFrame.notna() Detect existing (non-missing) values.
DataFrame.notnull() DataFrame.notnull is an alias for DataFrame.notna.
DataFrame.replace([to_replace, value, ...]) Replace values given in to_replace with value.

Reshaping, sorting, transposing

Function Description
DataFrame.droplevel(level[, axis]) Return Series/DataFrame with requested index / column level(s) removed.
DataFrame.pivot(*, columns[, index, values]) Return reshaped DataFrame organized by given index / column values.
DataFrame.pivot_table([values, index, ...]) Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.reorder_levels(order[, axis]) Rearrange index or column levels using input order.
DataFrame.sort_values(by, *[, axis, ...]) Sort by the values along either axis.
DataFrame.sort_index(*[, axis, level, ...]) Sort object by labels (along an axis).
DataFrame.nlargest(n, columns[, keep]) Return the first n rows ordered by columns in descending order.
DataFrame.nsmallest(n, columns[, keep]) Return the first n rows ordered by columns in ascending order.
DataFrame.swaplevel([i, j, axis]) Swap levels i and j in a MultiIndex.
DataFrame.stack([level, dropna, sort, ...]) Stack the prescribed level(s) from columns to index.
DataFrame.unstack([level, fill_value, sort]) Pivot a level of the (necessarily hierarchical) index labels.
DataFrame.melt([id_vars, value_vars, ...]) Unpivot DataFrame from wide to long format, optionally leaving identifiers set.
DataFrame.explode(column[, ignore_index]) Transform each element of a list-like to a row, replicating index values.
DataFrame.squeeze([axis]) Squeeze 1 dimensional axis objects into scalars.
DataFrame.to_xarray() Return an xarray object from the pandas object.
DataFrame.T The transpose of the DataFrame.
DataFrame.transpose(*args[, copy]) Transpose index and columns.

Combining / comparing / joining / merging

Function Description
DataFrame.assign(**kwargs) Assign new columns to a DataFrame.
DataFrame.compare(other[, align_axis, ...]) Compare to another DataFrame and show the differences.
DataFrame.join(other[, on, how, lsuffix, ...]) Join columns of another DataFrame.
DataFrame.merge(right[, how, on, left_on, ...]) Merge DataFrame or named Series objects with a database-style join.
DataFrame.update(other[, join, overwrite, ...]) Modify in place using non-NA values from another DataFrame.

Time Series-related

Function Description
DataFrame.asfreq(freq[, method, how, ...]) Convert time series to specified frequency.
DataFrame.asof(where[, subset]) Return the last row(s) without any NaNs before where.
DataFrame.shift([periods, freq, axis, ...]) Shift index by desired number of periods with an optional time freq.
DataFrame.first_valid_index() Return index for first non-missing value or None, if no value is found.
DataFrame.last_valid_index() Return index for last non-missing value or None, if no value is found.
DataFrame.resample(rule[, closed, label, ...]) Resample time-series data.
DataFrame.to_period([freq, axis, copy]) Convert DataFrame from DatetimeIndex to PeriodIndex.
DataFrame.to_timestamp([freq, how, axis, copy]) Cast PeriodIndex to DatetimeIndex of timestamps, at beginning of period.
DataFrame.tz_convert(tz[, axis, level, copy]) Convert tz-aware axis to target time zone.
DataFrame.tz_localize(tz[, axis, level, ...]) Localize time zone naive index of a Series or DataFrame to target time zone.

Flags

Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in DataFrame.attrs.

Function Description
Flags(obj, *, allows_duplicate_labels) Flags that apply to pandas objects.

Metadata

DataFrame.attrs is a dictionary for storing global metadata for this DataFrame.

Warning

DataFrame.attrs is considered experimental and may change without warning.

Function Description
DataFrame.attrs Dictionary of global attributes of this dataset.

Plotting

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

Function Description
DataFrame.plot([x, y, kind, ax, ....]) DataFrame plotting accessor and method
Function Description
DataFrame.plot.area([x, y, stacked]) Draw a stacked area plot.
DataFrame.plot.bar([x, y, color]) Vertical bar plot.
DataFrame.plot.barh([x, y, color]) Make a horizontal bar plot.
DataFrame.plot.box([by]) Make a box plot of the DataFrame columns.
DataFrame.plot.density([bw_method, ind, weights]) Generate Kernel Density Estimate plot using Gaussian kernels.
DataFrame.plot.hexbin(x, y[, C, ...]) Generate a hexagonal binning plot.
DataFrame.plot.hist([by, bins]) Draw one histogram of the DataFrame's columns.
DataFrame.plot.kde([bw_method, ind, weights]) Generate Kernel Density Estimate plot using Gaussian kernels.
DataFrame.plot.line([x, y, color]) Plot Series or DataFrame as lines.
DataFrame.plot.pie([y]) Generate a pie plot.
DataFrame.plot.scatter(x, y[, s, c]) Create a scatter plot with varying marker point size and color.
Function Description
DataFrame.boxplot([column, by, ax, ...]) Make a box plot from DataFrame columns.
DataFrame.hist([column, by, grid, ...]) Make a histogram of the DataFrame's columns.

Sparse accessor

Sparse-dtype specific methods and attributes are provided under the DataFrame.sparse accessor.

Function Description
DataFrame.sparse.density Ratio of non-sparse points to total (dense) data points.
Function Description
DataFrame.sparse.from_spmatrix(data[, ...]) Create a new DataFrame from a scipy sparse matrix.
DataFrame.sparse.to_coo() Return the contents of the frame as a sparse SciPy COO matrix.
DataFrame.sparse.to_dense() Convert a DataFrame with sparse values to dense.

Serialization / IO / conversion

Function Description
DataFrame.from_arrow(data) Construct a DataFrame from a tabular Arrow object.
DataFrame.from_dict(data[, orient, dtype, ...]) Construct DataFrame from dict of array-like or dicts.
DataFrame.from_records(data[, index, ...]) Convert structured or record ndarray to DataFrame.
DataFrame.to_orc([path, engine, index, ...]) Write a DataFrame to the Optimized Row Columnar (ORC) format.
DataFrame.to_parquet([path, engine, ...]) Write a DataFrame to the binary parquet format.
DataFrame.to_pickle(path, *[, compression, ...]) Pickle (serialize) object to file.
DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) Write object to a comma-separated values (csv) file.
DataFrame.to_hdf(path_or_buf, *, key[, ...]) Write the contained data to an HDF5 file using HDFStore.
DataFrame.to_sql(name, con, *[, schema, ...]) Write records stored in a DataFrame to a SQL database.
DataFrame.to_dict([orient, into, index]) Convert the DataFrame to a dictionary.
DataFrame.to_excel(excel_writer, *[, ...]) Write object to an Excel sheet.
DataFrame.to_json([path_or_buf, orient, ...]) Convert the object to a JSON string.
DataFrame.to_html([buf, columns, col_space, ...]) Render a DataFrame as an HTML table.
DataFrame.to_feather(path, **kwargs) Write a DataFrame to the binary Feather format.
DataFrame.to_latex([buf, columns, header, ...]) Render object to a LaTeX tabular, longtable, or nested table.
DataFrame.to_stata(path, *[, convert_dates, ...]) Export DataFrame object to Stata dta format.
DataFrame.to_records([index, column_dtypes, ...]) Convert DataFrame to a NumPy record array.
DataFrame.to_string([buf, columns, ...]) Render a DataFrame to a console-friendly tabular output.
DataFrame.to_clipboard(*[, excel, sep]) Copy object to the system clipboard.
DataFrame.to_markdown([buf, mode, index, ...]) Print DataFrame in Markdown-friendly format.
DataFrame.style Returns a Styler object.
DataFrame.dataframe([nan_as_null, ...]) (DEPRECATED) Return the dataframe interchange object implementing the interchange protocol.

Examples for pandas.DataFrame

Constructing DataFrame from a dictionary.

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

Constructing DataFrame from a dictionary including Series:

>>> d = {"col1": [0, 1, 2, 3], "col2": pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(
...     np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["a", "b", "c"]
... )
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array(
...     [(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...     dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")],
... )
>>> df3 = pd.DataFrame(data, columns=["c", "a"])
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   0
a  1
c  3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3

Examples for pandas.DataFrame.index

>>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                    'Age': [25, 30, 35],
...                    'Location': ['Seattle', 'New York', 'Kona']},
...                   index=([10, 20, 30]))
>>> df.index
Index([10, 20, 30], dtype='int64')

In this example, we create a DataFrame with 3 rows and 3 columns, including Name, Age, and Location information. We set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.

>>> df.index = [100, 200, 300]
>>> df
    Name  Age Location
100  Alice   25  Seattle
200    Bob   30 New York
300  Aritra  35    Kona

In this example, we modify the index labels of the DataFrame by assigning a new list of labels to the index attribute. The DataFrame is then updated with the new labels, and the output shows the modified DataFrame.

Examples for pandas.DataFrame.columns

>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df
        A  B
0    1  3
1    2  4
>>> df.columns
Index(['A', 'B'], dtype='str')

Examples for pandas.DataFrame.dtypes

>>> df = pd.DataFrame(
...     {
...         "float": [1.0],
...         "int": [1],
...         "datetime": [pd.Timestamp("20180310")],
...         "string": ["foo"],
...     }
... )
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[us]
string              str
dtype: object

Examples for pandas.DataFrame.info

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame(
...     {
...         "int_col": int_values,
...         "text_col": text_values,
...         "float_col": float_values,
...     }
... )
>>> df
    int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      str
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes

Prints a summary of columns count and its dtypes but not per column information:

>>> df.info(verbose=False)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> df = pd.DataFrame(
...     {
...         "column_1": np.random.choice(["a", "b", "c"], 10**6),
...         "column_2": np.random.choice(["a", "b", "c"], 10**6),
...         "column_3": np.random.choice(["a", "b", "c"], 10**6),
...     }
... )
>>> df.info()
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  str
 1   column_2  1000000 non-null  str
 2   column_3  1000000 non-null  str
dtypes: str(3)
memory usage: 25.7 MB
>>> df.info(memory_usage="deep")
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  str
 1   column_2  1000000 non-null  str
 2   column_3  1000000 non-null  str
dtypes: str(3)
memory usage: 25.7 MB

Examples for pandas.DataFrame.select_dtypes

>>> df = pd.DataFrame(
...     {"a": [1, 2] * 3, "b": [True, False] * 3, "c": [1.0, 2.0] * 3}
... )
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0
>>> df.select_dtypes(include="bool")
   b
0  True
1  False
2  True
3  False
4  True
5  False
>>> df.select_dtypes(include=["float64"])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0
>>> df.select_dtypes(exclude=["int64"])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0

Examples for pandas.DataFrame.values

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = pd.DataFrame(
...     {"age": [3, 29], "height": [94, 170], "weight": [31, 115]}
... )
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = pd.DataFrame(
...     [
...         ("parrot", 24.0, "second"),
...         ("lion", 80.5, 1),
...         ("monkey", np.nan, None),
...     ],
...     columns=("name", "max_speed", "rank"),
... )
>>> df2.dtypes
name             str
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 1],
       ['monkey', nan, None]], dtype=object)

Examples for pandas.DataFrame.axes

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]

Examples for pandas.DataFrame.ndim

>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.ndim
1
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.ndim
2

Examples for pandas.DataFrame.size

>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.size
3
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.size
4

Examples for pandas.DataFrame.shape

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.shape
(2, 2)
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4], "col3": [5, 6]})
>>> df.shape
(2, 3)

Examples for pandas.DataFrame.memory_usage

>>> dtypes = ["int64", "float64", "complex128", "object", "bool"]
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True
>>> df.memory_usage()
Index           132
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            132
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df["object"].astype("category").memory_usage(deep=True)
5140

Examples for pandas.DataFrame.empty

An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({"A": []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({"A": [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
>>> ser_empty = pd.Series({"A": []})
>>> ser_empty
A    []
dtype: object
>>> ser_empty.empty
False
>>> ser_empty = pd.Series()
>>> ser_empty.empty
True

Examples for pandas.DataFrame.set_flags

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False

Examples for pandas.DataFrame.astype

Create a DataFrame:

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype("int32").dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({"col1": "int32"}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype("int64")
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype("category")
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[us]

Examples for pandas.DataFrame.convert_dtypes

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0
>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: str

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string

Examples for pandas.DataFrame.infer_objects

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3
>>> df.dtypes
A    object
dtype: object
>>> df.infer_objects().dtypes
A    int64
dtype: object

Examples for pandas.DataFrame.copy

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy(deep=True)
>>> s_copy
a    1
b    2
dtype: int64

Due to Copy-on-Write, shallow copies still protect data modifications. Note shallow does not get modified below.

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> shallow = s.copy(deep=False)
>>> s.iloc[1] = 200
>>> shallow
a    1
b    2
dtype: int64

When the data has object dtype, even a deep copy does not copy the underlying Python objects. Updating a nested data object will be reflected in the deep copy.

>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0    [10, 2]
1     [3, 4]
dtype: object
>>> deep
0    [10, 2]
1     [3, 4]
dtype: object

Examples for pandas.DataFrame.to_numpy

>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df["C"] = pd.date_range("2000", periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

Examples for pandas.DataFrame.head

>>> df = pd.DataFrame(
...     {
...         "animal": [
...             "alligator",
...             "bee",
...             "falcon",
...             "lion",
...             "monkey",
...             "parrot",
...             "shark",
...             "whale",
...             "zebra",
...         ]
...     }
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot

pandas arrays, scalars, and data types

Objects

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.

Kind of Data pandas Data Type Scalar Array
TZ-aware datetime DatetimeTZDtype Timestamp Datetimes
Timedeltas (none) Timedelta Timedeltas
Period (time spans) PeriodDtype Period Periods
Intervals IntervalDtype Interval Intervals
Nullable Integer Int64Dtype, … (none) Nullable integer
Nullable Float Float64Dtype, … (none) Nullable float
Categorical CategoricalDtype (none) Categoricals
Sparse SparseDtype (none) Sparse
Strings StringDtype str Strings
Nullable Boolean BooleanDtype bool Nullable Boolean
PyArrow ArrowDtype Python Scalars or NA PyArrow

pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

Function Description
array(data[, dtype, copy]) Create an array.

PyArrow

Warning

This feature is experimental, and the API can change in a future release without warning.

The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with a pyarrow.DataType instead of a NumPy array and data type. The .dtype of a arrays.ArrowExtensionArray is an ArrowDtype.

Pyarrow provides similar array and data type support as NumPy including first-class nullability support for all data types, immutability and more.

The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_()).

PyArrow type pandas extension type NumPy type
pyarrow.bool_() BooleanDtype np.bool_
pyarrow.int8() Int8Dtype np.int8
pyarrow.int16() Int16Dtype np.int16
pyarrow.int32() Int32Dtype np.int32
pyarrow.int64() Int64Dtype np.int64
pyarrow.uint8() UInt8Dtype np.uint8
pyarrow.uint16() UInt16Dtype np.uint16
pyarrow.uint32() UInt32Dtype np.uint32
pyarrow.uint64() UInt64Dtype np.uint64
pyarrow.float32() Float32Dtype np.float32
pyarrow.float64() Float64Dtype np.float64
pyarrow.time32() (none) (none)
pyarrow.time64() (none) (none)
pyarrow.timestamp() DatetimeTZDtype np.datetime64
pyarrow.date32() (none) (none)
pyarrow.date64() (none) (none)
pyarrow.duration() (none) np.timedelta64
pyarrow.binary() (none) (none)
pyarrow.string() StringDtype np.str_
pyarrow.decimal128() (none) (none)
pyarrow.list_() (none) (none)
pyarrow.map_() (none) (none)
pyarrow.dictionary() CategoricalDtype (none)

Note

Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow") and pd.ArrowDtype(pa.string()). pd.StringDtype("pyarrow") is described below in the string section and will be returned if the string alias "string[pyarrow]" is specified. pd.ArrowDtype(pa.string()) generally has better interoperability with ArrowDtype of different types.

While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returned as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.

Function Description
arrays.ArrowExtensionArray(values) Pandas ExtensionArray backed by a PyArrow ChunkedArray.
Function Description
ArrowDtype(pyarrow_dtype) An ExtensionDtype for PyArrow data types.

For more information, please see the PyArrow user guide.

Datetimes

NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaT is the missing value for datetime data.

Function Description
Timestamp([ts_input, year, month, day, ...]) Pandas replacement for python datetime.datetime object.

Properties

Property Description
Timestamp.asm8 Return numpy datetime64 format with same precision.
Timestamp.day Return the day of the Timestamp.
Timestamp.dayofweek Return day of the week.
Timestamp.day_of_week Return day of the week.
Timestamp.dayofyear Return the day of the year.
Timestamp.day_of_year Return the day of the year.
Timestamp.days_in_month Return the number of days in the month.
Timestamp.daysinmonth Return the number of days in the month.
Timestamp.fold Return the fold value of the Timestamp.
Timestamp.hour Return the hour of the Timestamp.
Timestamp.is_leap_year Return True if year is a leap year.
Timestamp.is_month_end Check if the date is the last day of the month.
Timestamp.is_month_start Check if the date is the first day of the month.
Timestamp.is_quarter_end Check if date is last day of the quarter.
Timestamp.is_quarter_start Check if the date is the first day of the quarter.
Timestamp.is_year_end Return True if date is last day of the year.
Timestamp.is_year_start Return True if date is first day of the year.
Timestamp.max
Timestamp.microsecond Return the microsecond of the Timestamp.
Timestamp.min
Timestamp.minute Return the minute of the Timestamp.
Timestamp.month Return the month of the Timestamp.
Timestamp.nanosecond Return the nanosecond of the Timestamp.
Timestamp.quarter Return the quarter of the year for the Timestamp.
Timestamp.resolution
Timestamp.second Return the second of the Timestamp.
Timestamp.tz Alias for tzinfo.
Timestamp.tzinfo Returns the timezone info of the Timestamp.
Timestamp.unit The abbreviation associated with self._creso.
Timestamp.value Return the value of the Timestamp.
Timestamp.week Return the week number of the year.
Timestamp.weekofyear Return the week number of the year.
Timestamp.year Return the year of the Timestamp.

Methods

Method Description
Timestamp.as_unit(unit[, round_ok]) Convert the underlying int64 representation to the given unit.
Timestamp.astimezone(tz) Convert timezone-aware Timestamp to another time zone.
Timestamp.ceil(freq[, ambiguous, nonexistent]) Return a new Timestamp ceiled to this resolution.
Timestamp.combine(date, time) Combine a date and time into a single Timestamp object.
Timestamp.ctime() Return a ctime() style string representing the Timestamp.
Timestamp.date() Returns datetime.date with the same year, month, and day.
Timestamp.day_name([locale]) Return the day name of the Timestamp with specified locale.
Timestamp.dst() Return the daylight saving time (DST) adjustment.
Timestamp.floor(freq[, ambiguous, nonexistent]) Return a new Timestamp floored to this resolution.
Timestamp.fromordinal(ordinal[, tz]) Construct a timestamp from a proleptic Gregorian ordinal.
Timestamp.fromtimestamp(ts[, tz]) Create a Timestamp object from a POSIX timestamp.
Timestamp.isocalendar() Return a named tuple containing ISO year, week number, and weekday.
Timestamp.isoformat([sep, timespec]) Return the time formatted according to ISO 8601.
Timestamp.isoweekday() Return the day of the week represented by the date.
Timestamp.month_name([locale]) Return the month name of the Timestamp with specified locale.
Timestamp.normalize() Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz]) Return new Timestamp object representing current time local to tz.
Timestamp.replace([year, month, day, hour, ...]) Implements datetime.replace, handles nanoseconds.
Timestamp.round(freq[, ambiguous, nonexistent]) Round the Timestamp to the specified resolution.
Timestamp.strftime(format) Return a formatted string of the Timestamp.
Timestamp.strptime(date_string, format) Convert string argument to datetime.
Timestamp.time() Return time object with same time but with tzinfo=None.
Timestamp.timestamp() Return POSIX timestamp as float.
Timestamp.timetuple() Return time tuple, compatible with time.localtime().
Timestamp.timetz() Return time object with same time and tzinfo.
Timestamp.to_datetime64() Return a NumPy datetime64 object with same precision.
Timestamp.to_numpy([dtype, copy]) Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date() Convert TimeStamp to a Julian Date.
Timestamp.to_period([freq]) Return a period of which this timestamp is an observation.
Timestamp.to_pydatetime([warn]) Convert a Timestamp object to a native Python datetime object.
Timestamp.today([tz]) Return the current time in the local timezone.
Timestamp.toordinal() Return proleptic Gregorian ordinal.
Timestamp.tz_convert(tz) Convert timezone-aware Timestamp to another time zone.
Timestamp.tz_localize(tz[, ambiguous, ...]) Localize the Timestamp to a timezone.
Timestamp.tzname() Return time zone name.
Timestamp.utcfromtimestamp(ts) Construct a timezone-aware UTC datetime from a POSIX timestamp.
Timestamp.utcnow() Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset() Return utc offset.
Timestamp.utctimetuple() Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday() Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a arrays.DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are timezone-aware, then every value in the array must have the same timezone.

Function Description
arrays.DatetimeArray(data[, dtype, freq, copy]) Pandas ExtensionArray for tz-naive or tz-aware datetime data.
Function Description
DatetimeTZDtype([unit, tz]) An ExtensionDtype for timezone-aware datetime data.

Timedeltas

NumPy can natively represent timedeltas. pandas provides Timedelta for symmetry with Timestamp. NaT is the missing value for timedelta data.

Function Description
Timedelta([value, unit]) Represents a duration, the difference between two dates or times.

Properties

Property Description
Timedelta.asm8 Return a numpy timedelta64 array scalar view.
Timedelta.components Return a components namedtuple-like.
Timedelta.days Returns the days of the timedelta.
Timedelta.max
Timedelta.microseconds Return the number of microseconds (n), where 0 <= n < 1 millisecond.
Timedelta.min
Timedelta.nanoseconds Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution
Timedelta.seconds Return the total hours, minutes, and seconds of the timedelta as seconds.
Timedelta.unit Return the unit of Timedelta object.
Timedelta.value Return the value of Timedelta object in nanoseconds.
Timedelta.view(dtype) Array view compatibility.

Methods

Method Description
Timedelta.as_unit(unit[, round_ok]) Convert the underlying int64 representation to the given unit.
Timedelta.ceil(freq) Return a new Timedelta ceiled to this resolution.
Timedelta.floor(freq) Return a new Timedelta floored to this resolution.
Timedelta.isoformat() Format the Timedelta as ISO 8601 Duration.
Timedelta.round(freq) Round the Timedelta to the specified resolution.
Timedelta.to_pytimedelta() Convert a pandas Timedelta object into a python datetime.timedelta object.
Timedelta.to_timedelta64() Return a numpy.timedelta64 object with 'ns' precision.
Timedelta.to_numpy([dtype, copy]) Convert the Timedelta to a NumPy timedelta64.
Timedelta.total_seconds() Total seconds in the duration.

A collection of Timedelta may be stored in a TimedeltaArray.

Function Description
arrays.TimedeltaArray(data[, dtype, freq, copy]) Pandas ExtensionArray for timedelta data.

Periods

pandas represents spans of times as Period objects.

Period

Function Description
Period([value, freq, ordinal, year, month, ...]) Represents a period of time.

Properties

Property Description
Period.day Get day of the month that a Period falls on.
Period.dayofweek Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.day_of_week Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear Return the day of the year.
Period.day_of_year Return the day of the year.
Period.days_in_month Get the total number of days in the month that this period falls on.
Period.daysinmonth Get the total number of days of the month that this period falls on.
Period.end_time Get the Timestamp for the end of the period.
Period.freq Return the frequency object for this Period.
Period.freqstr Return a string representation of the frequency.
Period.hour Get the hour of the day component of the Period.
Period.is_leap_year Return True if the period's year is in a leap year.
Period.minute Get minute of the hour component of the Period.
Period.month Return the month this Period falls on.
Period.ordinal Return the integer ordinal for this Period.
Period.quarter Return the quarter this Period falls on.
Period.qyear Fiscal year the Period lies in according to its starting-quarter.
Period.second Get the second component of the Period.
Period.start_time Get the Timestamp for the start of the period.
Period.week Get the week of the year on the given Period.
Period.weekday Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear Get the week of the year on the given Period.
Period.year Return the year this Period falls on.

Methods

Method Description
Period.asfreq(freq[, how]) Convert Period to desired frequency, at the start or end of the interval.
Period.now(freq) Return the period of now's date.
Period.strftime(fmt) Returns a formatted string representation of the Period.
Period.to_timestamp([freq, how]) Return the Timestamp representation of the Period.

A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq.

Function Description
arrays.PeriodArray(values[, dtype, copy]) Pandas ExtensionArray for storing Period data.
Function Description
PeriodDtype(freq) An ExtensionDtype for Period data.

Intervals

Arbitrary intervals can be represented as Interval objects.

Function Description
Interval Immutable object implementing an Interval, a bounded slice-like interval.

Properties

Property Description
Interval.closed String describing the inclusive side the intervals.
Interval.closed_left Check if the interval is closed on the left side.
Interval.closed_right Check if the interval is closed on the right side.
Interval.is_empty Indicates if an interval is empty, meaning it contains no points.
Interval.left Left bound for the interval.
Interval.length Return the length of the Interval.
Interval.mid Return the midpoint of the Interval.
Interval.open_left Check if the interval is open on the left side.
Interval.open_right Check if the interval is open on the right side.
Interval.overlaps(other) Check whether two Interval objects overlap.
Interval.right Right bound for the interval.

A collection of intervals may be stored in an arrays.IntervalArray.

Function Description
arrays.IntervalArray(data[, closed, dtype, ...]) Pandas array for interval data that are closed on the same side.
Function Description
IntervalDtype([subtype, closed]) An ExtensionDtype for Interval data.

Nullable integer

numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.

Function Description
arrays.IntegerArray(values, mask[, copy]) Array of integer (optional missing) values.
Function Description
Int8Dtype() An ExtensionDtype for int8 integer data.
Int16Dtype() An ExtensionDtype for int16 integer data.
Int32Dtype() An ExtensionDtype for int32 integer data.
Int64Dtype() An ExtensionDtype for int64 integer data.
UInt8Dtype() An ExtensionDtype for uint8 integer data.
UInt16Dtype() An ExtensionDtype for uint16 integer data.
UInt32Dtype() An ExtensionDtype for uint32 integer data.
UInt64Dtype() An ExtensionDtype for uint64 integer data.

Nullable float

Function Description
arrays.FloatingArray(values, mask[, copy]) Array of floating (optional missing) values.
Function Description
Float32Dtype() An ExtensionDtype for float32 data.
Float64Dtype() An ExtensionDtype for float64 data.

Categoricals

pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.

Function Description
CategoricalDtype([categories, ordered]) Type for categorical data with the categories and orderedness.
Property Description
CategoricalDtype.categories An Index containing the unique categories allowed.
CategoricalDtype.ordered Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical:

Function Description
Categorical(values[, categories, ordered, ...]) Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Function Description
Categorical.from_codes(codes[, categories, ...]) Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Property Description
Categorical.dtype The CategoricalDtype for this instance.
Categorical.categories The categories of this categorical.
Categorical.ordered Whether the categories have an ordered relationship.
Categorical.codes The category codes of this categorical index.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Function Description
Categorical.array([dtype, copy]) The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

  • the string 'category'
  • an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.

More methods are available on Categorical:

Method Description
Categorical.as_ordered() Set the Categorical to be ordered.
Categorical.as_unordered() Set the Categorical to be unordered.
Categorical.set_categories(new_categories[, ...]) Set the categories to the specified new categories.
Categorical.rename_categories(new_categories) Rename categories.
Categorical.reorder_categories(new_categories) Reorder categories as specified in new_categories.
Categorical.add_categories(new_categories) Add new categories.
Categorical.remove_categories(removals) Remove the specified categories.
Categorical.remove_unused_categories() Remove categories which are not used.
Categorical.map(mapper[, na_action]) Map categories using an input mapping or function.

Sparse

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.

Function Description
arrays.SparseArray(data[, sparse_index, ...]) An ExtensionArray for storing sparse data.
Function Description
SparseDtype([dtype, fill_value]) Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor and the user guide for more.

Strings

When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").

Function Description
arrays.StringArray(values, *[, dtype, copy]) Extension array for string data.
arrays.ArrowStringArray(values, *[, dtype]) Extension array for string data in a pyarrow.ChunkedArray.
Function Description
StringDtype([storage, na_value]) Extension dtype for string data.

The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.

Nullable Boolean

The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False) with missing values, which is not possible with a bool numpy.ndarray.

Function Description
arrays.BooleanArray(values, mask[, copy]) Array of boolean (True/False) data with missing values.
Function Description
BooleanDtype() Extension dtype for boolean data.

Utilities

Constructors

Function Description
api.types.union_categoricals(to_union[, ...]) Combine list-like of Categorical-like, unioning categories.
api.types.infer_dtype(value[, skipna]) Return a string label of the type of the elements in a list-like input.
api.types.pandas_dtype(dtype) Convert input into a pandas only dtype object or a numpy dtype object.

Data type introspection

Function Description
api.types.is_any_real_numeric_dtype(arr_or_dtype) Check whether the provided array or dtype is of a real number dtype.
api.types.is_bool_dtype(arr_or_dtype) Check whether the provided array or dtype is of a boolean dtype.
api.types.is_categorical_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype.
api.types.is_complex_dtype(arr_or_dtype) Check whether the provided array or dtype is of a complex dtype.
api.types.is_datetime64_any_dtype(arr_or_dtype) Check whether the provided array or dtype is of the datetime64 dtype.
api.types.is_datetime64_dtype(arr_or_dtype) Check whether an array-like or dtype is of the datetime64 dtype.
api.types.is_datetime64_ns_dtype(arr_or_dtype) Check whether the provided array or dtype is of the datetime64[ns] dtype.
api.types.is_datetime64tz_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype.
api.types.is_dtype_equal(source, target) Check if two dtypes are equal.
api.types.is_extension_array_dtype(arr_or_dtype) Check if an object is a pandas extension array type.
api.types.is_float_dtype(arr_or_dtype) Check whether the provided array or dtype is of a float dtype.
api.types.is_int64_dtype(arr_or_dtype) (DEPRECATED) Check whether the provided array or dtype is of the int64 dtype.
api.types.is_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of an integer dtype.
api.types.is_interval_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Interval dtype.
api.types.is_numeric_dtype(arr_or_dtype) Check whether the provided array or dtype is of a numeric dtype.
api.types.is_object_dtype(arr_or_dtype) Check whether an array-like or dtype is of the object dtype.
api.types.is_period_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Period dtype.
api.types.is_signed_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of a signed integer dtype.
api.types.is_string_dtype(arr_or_dtype) Check whether the provided array or dtype is of the string dtype.
api.types.is_timedelta64_dtype(arr_or_dtype) Check whether an array-like or dtype is of the timedelta64 dtype.
api.types.is_timedelta64_ns_dtype(arr_or_dtype) Check whether the provided array or dtype is of the timedelta64[ns] dtype.
api.types.is_unsigned_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of an unsigned integer dtype.
api.types.is_sparse(arr) (DEPRECATED) Check whether an array-like is a 1-D pandas sparse array.

Iterable introspection

Function Description
api.types.is_dict_like(obj) Check if the object is dict-like.
api.types.is_file_like(obj) Check if the object is a file-like object.
api.types.is_list_like(obj[, allow_sets]) Check if the object is list-like.
api.types.is_named_tuple(obj) Check if the object is a named tuple.
api.types.is_iterator(obj) Check if the object is an iterator.

Scalar introspection

Function Description
api.types.is_bool(obj) Return True if given object is boolean.
api.types.is_complex(obj) Return True if given object is complex.
api.types.is_float(obj) Return True if given object is float.
api.types.is_hashable(obj[, allow_slice]) Return True if hash(obj) will succeed, False otherwise.
api.types.is_integer(obj) Return True if given object is integer.
api.types.is_number(obj) Check if the object is a number.
api.types.is_re(obj) Check if the object is a regex pattern instance.
api.types.is_re_compilable(obj) Check if the object can be compiled into a regex pattern instance.
api.types.is_scalar(val) Return True if given object is scalar.

Examples for pandas.array

If a dtype is not specified, pandas will infer the best dtype from the values. See the description of dtype for the types pandas infers for.

>>> pd.array([1, 2])
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
>>> pd.array([1, 2, np.nan])
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64
>>> pd.array([1.1, 2.2])
<FloatingArray>
[1.1, 2.2]
Length: 2, dtype: Float64
>>> pd.array(["a", None, "c"])
<ArrowStringArray>
['a', <NA>, 'c']
Length: 3, dtype: string
>>> with pd.option_context("string_storage", "python"):
...     arr = pd.array(["a", None, "c"])
>>> arr
<StringArray>
['a', <NA>, 'c']
Length: 3, dtype: string
>>> pd.array([pd.Period("2000", freq="D"), pd.Period("2000", freq="D")])
<PeriodArray>
['2000-01-01', '2000-01-01']
Length: 2, dtype: period[D]

You can use the string alias for dtype

>>> pd.array(["a", "b", "a"], dtype="category")
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Or specify the actual dtype

>>> pd.array(
...     ["a", "b", "a"], dtype=pd.CategoricalDtype(["a", "b", "c"], ordered=True)
... )
['a', 'b', 'a']
Categories (3, str): ['a' < 'b' < 'c']

If pandas does not infer a dedicated extension type a arrays.NumpyExtensionArray is returned.

>>> pd.array([1 + 1j, 3 + 2j])
<NumpyExtensionArray>
[(1+1j), (3+2j)]
Length: 2, dtype: complex128

As mentioned in the “Notes” section, new extension types may be added in the future (by pandas or 3rd party libraries), causing the return value to no longer be a arrays.NumpyExtensionArray. Specify the dtype as a NumPy dtype if you need to ensure there’s no future change in behavior.

>>> pd.array([1, 2], dtype=np.dtype("int32"))
<NumpyExtensionArray>
[1, 2]
Length: 2, dtype: int32

data must be 1-dimensional. A ValueError is raised when the input has the wrong dimensionality.

>>> pd.array(1)
Traceback (most recent call last):
  ...
ValueError: Cannot pass scalar '1' to 'pandas.array'.

Examples for pandas.arrays.ArrowExtensionArray

Create an ArrowExtensionArray with pandas.array():

>>> pd.array([1, 1, None], dtype="int64[pyarrow]")
<ArrowExtensionArray>
[1, 1, <NA>]
Length: 3, dtype: int64[pyarrow]

Examples for pandas.ArrowDtype

>>> import pyarrow as pa
>>> pd.ArrowDtype(pa.int64())
int64[pyarrow]

Types with parameters must be constructed with ArrowDtype.

>>> pd.ArrowDtype(pa.timestamp("s", tz="America/New_York"))
timestamp[s, tz=America/New_York][pyarrow]
>>> pd.ArrowDtype(pa.list_(pa.int64()))
list<item: int64>[pyarrow]

Examples for pandas.Timestamp

Using the primary calling convention:

This converts a datetime-like string

>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')

This converts a float representing a Unix epoch in units of seconds

>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')

This converts an int representing a Unix-epoch in units of weeks

>>> pd.Timestamp(1535, unit='W')
Timestamp('1999-06-03 00:00:00')

This converts an int representing a Unix-epoch in units of seconds and for a particular timezone

>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')

Using the other two forms that mimic the API for datetime.datetime:

>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')
>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00')

Examples for pandas.Timestamp.asm8

>>> ts = pd.Timestamp(2020, 3, 14, 15)
>>> ts.asm8
numpy.datetime64('2020-03-14T15:00:00.000000')

Examples for pandas.Timestamp.day

>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.day
31

Examples for pandas.Timestamp.dayofweek

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5

Examples for pandas.Timestamp.day_of_week

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5

Examples for pandas.Timestamp.dayofyear

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74

Examples for pandas.Timestamp.day_of_year

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74

Examples for pandas.Timestamp.days_in_month

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31

Examples for pandas.Timestamp.daysinmonth

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31

Examples for pandas.Timestamp.fold

>>> ts = pd.Timestamp("2024-11-03 01:30:00")
>>> ts.fold
0

Examples for pandas.Timestamp.hour

>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.hour
16

Examples for pandas.Timestamp.is_leap_year

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_leap_year
True

Examples for pandas.Timestamp.is_month_end

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_month_end
True

Examples for pandas.Timestamp.is_month_start

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_start
False
>>> ts = pd.Timestamp(2020, 1, 1)
>>> ts.is_month_start
True

Examples for pandas.Timestamp.is_quarter_end

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_end
False
>>> ts = pd.Timestamp(2020, 3, 31)
>>> ts.is_quarter_end
True

Examples for pandas.Timestamp.is_quarter_start

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_start
False
>>> ts = pd.Timestamp(2020, 4, 1)
>>> ts.is_quarter_start
True

Examples for pandas.Timestamp.is_year_end

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_year_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_year_end
True

Index objects

Index

Many of these methods or variants thereof are available on the objects that contain an index (Series/DataFrame) and those should most likely be used before calling these methods directly.

Function Description
Index([data, dtype, copy, name, tupleize_cols]) Immutable sequence used for indexing and alignment.

Properties

Function Description
Index.values Return an array representing the data in the Index.
Index.is_monotonic_increasing Return a boolean if the values are equal or increasing.
Index.is_monotonic_decreasing Return a boolean if the values are equal or decreasing.
Index.is_unique Return if the index has unique values.
Index.has_duplicates Check if the Index has duplicate values.
Index.hasnans Return True if there are any NaNs.
Index.dtype Return the dtype object of the underlying data.
Index.inferred_type Return a string of the type inferred from the values.
Index.shape Return a tuple of the shape of the underlying data.
Index.name Return Index or MultiIndex name.
Index.names Get names on index.
Index.nbytes Return the number of bytes in the underlying data.
Index.ndim Number of dimensions of the underlying data, by definition 1.
Index.size Return the number of elements in the underlying data.
Index.empty Indicator whether Index is empty.
Index.T Return the transpose, which is by definition self.
Index.memory_usage([deep]) Memory usage of the values.
Index.array The ExtensionArray of the data backing this Index.

Modifying and computations

Function Description
Index.all(*args, **kwargs) Return whether all elements are Truthy.
Index.any(*args, **kwargs) Return whether any element is Truthy.
Index.argmin([axis, skipna]) Return int position of the smallest value in the Index.
Index.argmax([axis, skipna]) Return int position of the largest value in the Index.
Index.copy([name, deep]) Make a copy of this object.
Index.delete(loc) Make new Index with passed location(-s) deleted.
Index.drop(labels[, errors]) Make new Index with passed list of labels deleted.
Index.drop_duplicates(*[, keep]) Return Index with duplicate values removed.
Index.duplicated([keep]) Indicate duplicate index values.
Index.equals(other) Determine if two Index object are equal.
Index.factorize([sort, use_na_sentinel]) Encode the object as an enumerated type or categorical variable.
Index.identical(other) Similar to equals, but checks that object attributes and types are also equal.
Index.insert(loc, item) Make new Index inserting new item at location.
Index.is_(other) More flexible, faster check like is but that works through views.
Index.min([axis, skipna]) Return the minimum value of the Index.
Index.max([axis, skipna]) Return the maximum value of the Index.
Index.reindex(target[, method, level, ...]) Create index with target's values.
Index.rename(name, *[, inplace]) Alter Index or MultiIndex name.
Index.repeat(repeats[, axis]) Repeat elements of an Index.
Index.where(cond[, other]) Replace values where the condition is False.
Index.take(indices[, axis, allow_fill, ...]) Return a new Index of the values selected by the indices.
Index.putmask(mask, value) Return a new Index of the values set with the mask.
Index.unique([level]) Return unique values in the index.
Index.nunique([dropna]) Return number of unique elements in the object.
Index.value_counts([normalize, sort, ...]) Return a Series containing counts of unique values.

Compatibility with MultiIndex

Function Description
Index.set_names(names, *[, level, inplace]) Set Index or MultiIndex name.
Index.droplevel([level]) Return index with requested level(s) removed.

Missing values

Function Description
Index.fillna(value) Fill NA/NaN values with the specified value.
Index.dropna([how]) Return Index without NA/NaN values.
Index.isna() Detect missing values.
Index.notna() Detect existing (non-missing) values.

Conversion

Function Description
Index.astype(dtype[, copy]) Create an Index with values cast to dtypes.
Index.infer_objects([copy]) If we have an object dtype, try to infer a non-object dtype.
Index.item() Return the first element of the underlying data as a Python scalar.
Index.map(mapper[, na_action]) Map values using an input mapping or function.
Index.ravel([order]) Return a view on self.
Index.to_list() Return a list of the values.
Index.to_series([index, name]) Create a Series with both index and values equal to the index keys.
Index.to_frame([index, name]) Create a DataFrame with a column containing the Index.
Index.to_numpy([dtype, copy, na_value]) A NumPy ndarray representing the values in this Series or Index.
Index.view([cls]) Return a view of the Index with the specified dtype or a new Index instance.

Sorting

Function Description
Index.argsort(*args, **kwargs) Return the integer indices that would sort the index.
Index.searchsorted(value[, side, sorter]) Find indices where elements should be inserted to maintain order.
Index.sort_values(*[, return_indexer, ...]) Return a sorted copy of the index.

Time-specific operations

Function Description
Index.shift([periods, freq]) Shift index by desired number of time frequency increments.

Combining / joining / set operations

Function Description
Index.append(other) Append a collection of Index options together.
Index.join(other, *[, how, level, ...]) Compute join_index and indexers to conform data structures to the new index.
Index.intersection(other[, sort]) Form the intersection of two Index objects.
Index.union(other[, sort]) Form the union of two Index objects.
Index.difference(other[, sort]) Return a new Index with elements of index not in other.
Index.symmetric_difference(other[, ...]) Compute the symmetric difference of two Index objects.

Selecting

Function Description
Index.asof(label) Return the label from the index, or, if not present, the previous one.
Index.asof_locs(where, mask) Return the locations (indices) of labels in the index.
Index.get_indexer(target[, method, limit, ...]) Compute indexer and mask for new index given the current index.
Index.get_indexer_for(target) Guaranteed return of an indexer even when non-unique.
Index.get_indexer_non_unique(target) Compute indexer and mask for new index given the current index.
Index.get_level_values(level) Return an Index of values for requested level.
Index.get_loc(key) Get integer location, slice or boolean mask for requested label.
Index.get_slice_bound(label, side) Calculate slice bound that corresponds to given label.
Index.isin(values[, level]) Return a boolean array where the index values are in values.
Index.slice_indexer([start, end, step]) Compute the slice indexer for input labels and step.
Index.slice_locs([start, end, step]) Compute slice locations for input labels.

Numeric Index

Function Description
RangeIndex([start, stop, step, dtype, copy, ...]) Immutable Index implementing a monotonic integer range.
Function Description
RangeIndex.start The value of the start parameter (0 if this was not supplied).
RangeIndex.stop The value of the stop parameter.
RangeIndex.step The value of the step parameter (1 if this was not supplied).
RangeIndex.from_range(data[, name, dtype]) Create pandas.RangeIndex from a range object.

CategoricalIndex

Function Description
CategoricalIndex([data, categories, ...]) Index based on an underlying Categorical.

Categorical components

Function Description
CategoricalIndex.append(other) Append a collection of Index options together.
CategoricalIndex.codes The category codes of this categorical index.
CategoricalIndex.categories The categories of this categorical.
CategoricalIndex.ordered Whether the categories have an ordered relationship.
CategoricalIndex.rename_categories(...) Rename categories.
CategoricalIndex.reorder_categories(...[, ...]) Reorder categories as specified in new_categories.
CategoricalIndex.add_categories(new_categories) Add new categories.
CategoricalIndex.remove_categories(removals) Remove the specified categories.
CategoricalIndex.remove_unused_categories() Remove categories which are not used.
CategoricalIndex.set_categories(new_categories) Set the categories to the specified new categories.
CategoricalIndex.as_ordered() Set the Categorical to be ordered.
CategoricalIndex.as_unordered() Set the Categorical to be unordered.

Modifying and computations

Function Description
CategoricalIndex.map(mapper[, na_action]) Map values using input an input mapping or function.
CategoricalIndex.equals(other) Determine if two CategoricalIndex objects contain the same elements.

IntervalIndex

Function Description
IntervalIndex(data[, closed, dtype, copy, ...]) Immutable index of intervals that are closed on the same side.

IntervalIndex components

Function Description
IntervalIndex.from_arrays(left, right[, ...]) Construct from two arrays defining the left and right bounds.
IntervalIndex.from_tuples(data[, closed, ...]) Construct an IntervalIndex from an array-like of tuples.
IntervalIndex.from_breaks(breaks[, closed, ...]) Construct an IntervalIndex from an array of splits.
IntervalIndex.left Return left bounds of the intervals in the IntervalIndex.
IntervalIndex.right Return right bounds of the intervals in the IntervalIndex.
IntervalIndex.mid Return the midpoint of each interval in the IntervalIndex as an Index.
IntervalIndex.closed String describing the inclusive side the intervals.
IntervalIndex.length Calculate the length of each interval in the IntervalIndex.
IntervalIndex.values Return an array representing the data in the Index.
IntervalIndex.is_empty Indicates if an interval is empty, meaning it contains no points.
IntervalIndex.is_non_overlapping_monotonic Return a boolean whether the IntervalArray/IntervalIndex is non-overlapping and monotonic.
IntervalIndex.is_overlapping Return True if the IntervalIndex has overlapping intervals, else False.
IntervalIndex.get_loc(key) Get integer location, slice or boolean mask for requested label.
IntervalIndex.get_indexer(target[, method, ...]) Compute indexer and mask for new index given the current index.
IntervalIndex.set_closed(closed) Return an identical IntervalArray closed on the specified side.
IntervalIndex.contains(other) Check elementwise if the Intervals contain the value.
IntervalIndex.overlaps(other) Check elementwise if an Interval overlaps the values in the IntervalArray.
IntervalIndex.to_tuples([na_tuple]) Return an ndarray (if self is IntervalArray) or Index (if self is IntervalIndex) of tuples of the form (left, right).

MultiIndex

Function Description
MultiIndex([levels, codes, sortorder, ...]) A multi-level, or hierarchical, index object for pandas objects.

MultiIndex constructors

Function Description
MultiIndex.from_arrays(arrays[, sortorder, ...]) Convert arrays to MultiIndex.
MultiIndex.from_tuples(tuples[, sortorder, ...]) Convert list of tuples to MultiIndex.
MultiIndex.from_product(iterables[, ...]) Make a MultiIndex from the cartesian product of multiple iterables.
MultiIndex.from_frame(df[, sortorder, names]) Make a MultiIndex from a DataFrame.

MultiIndex properties

Function Description
MultiIndex.names Names of levels in MultiIndex.
MultiIndex.levels Levels of the MultiIndex.
MultiIndex.codes Codes of the MultiIndex.
MultiIndex.nlevels Integer number of levels in this MultiIndex.
MultiIndex.levshape A tuple representing the length of each level in the MultiIndex.
MultiIndex.dtypes Return the dtypes as a Series for the underlying MultiIndex.

MultiIndex components

Function Description
MultiIndex.set_levels(levels, *[, level, ...]) Set new levels on MultiIndex.
MultiIndex.set_codes(codes, *[, level, ...]) Set new codes on MultiIndex.
MultiIndex.to_flat_index() Convert a MultiIndex to an Index of Tuples containing the level values.
MultiIndex.to_frame([index, name, ...]) Create a DataFrame with the levels of the MultiIndex as columns.
MultiIndex.sortlevel([level, ascending, ...]) Sort MultiIndex at the requested level.
MultiIndex.droplevel([level]) Return index with requested level(s) removed.
MultiIndex.swaplevel([i, j]) Swap level i with level j.
MultiIndex.reorder_levels(order) Rearrange levels using input order.
MultiIndex.remove_unused_levels() Create new MultiIndex from current that removes unused levels.
MultiIndex.drop(codes[, level, errors]) Make a new pandas.MultiIndex with the passed list of codes deleted.
MultiIndex.copy([names, deep, name]) Make a copy of this object.
MultiIndex.append(other) Append a collection of Index options together.
MultiIndex.truncate([before, after]) Slice index between two labels / tuples, return new MultiIndex.

MultiIndex selecting

Function Description
MultiIndex.get_loc(key) Get location for a label or a tuple of labels.
MultiIndex.get_locs(seq) Get location for a sequence of labels.
MultiIndex.get_loc_level(key[, level, ...]) Get location and sliced index for requested label(s)/level(s).
MultiIndex.get_indexer(target[, method, ...]) Compute indexer and mask for new index given the current index.
MultiIndex.get_level_values(level) Return vector of label values for requested level.
Function Description
IndexSlice Create an object to more easily perform multi-index slicing.

DatetimeIndex

Function Description
DatetimeIndex([data, freq, tz, ambiguous, ...]) Immutable ndarray-like of datetime64 data.

Time/date components

Function Description
DatetimeIndex.year The year of the datetime.
DatetimeIndex.month The month as January=1, December=12.
DatetimeIndex.day The day of the datetime.
DatetimeIndex.hour The hours of the datetime.
DatetimeIndex.minute The minutes of the datetime.
DatetimeIndex.second The seconds of the datetime.
DatetimeIndex.microsecond The microseconds of the datetime.
DatetimeIndex.nanosecond The nanoseconds of the datetime.
DatetimeIndex.date Returns numpy array of python datetime.date objects.
DatetimeIndex.time Returns numpy array of datetime.time objects.
DatetimeIndex.timetz Returns numpy array of datetime.time objects with timezones.
DatetimeIndex.dayofyear The ordinal day of the year.
DatetimeIndex.day_of_year The ordinal day of the year.
DatetimeIndex.dayofweek The day of the week with Monday=0, Sunday=6.
DatetimeIndex.day_of_week The day of the week with Monday=0, Sunday=6.
DatetimeIndex.weekday The day of the week with Monday=0, Sunday=6.
DatetimeIndex.quarter The quarter of the date.
DatetimeIndex.tz Return the timezone.
DatetimeIndex.freq Return the frequency object if it is set, otherwise None.
DatetimeIndex.freqstr Return the frequency object as a string if it's set, otherwise None.
DatetimeIndex.is_month_start Indicates whether the date is the first day of the month.
DatetimeIndex.is_month_end Indicates whether the date is the last day of the month.
DatetimeIndex.is_quarter_start Indicator for whether the date is the first day of a quarter.
DatetimeIndex.is_quarter_end Indicator for whether the date is the last day of a quarter.
DatetimeIndex.is_year_start Indicate whether the date is the first day of a year.
DatetimeIndex.is_year_end Indicate whether the date is the last day of the year.
DatetimeIndex.is_leap_year Boolean indicator if the date belongs to a leap year.
DatetimeIndex.inferred_freq Return the inferred frequency of the index.

Selecting

Function Description
DatetimeIndex.indexer_at_time(time[, asof]) Return index locations of values at particular time of day.
DatetimeIndex.indexer_between_time(...[, ...]) Return index locations of values between particular times of day.

Time-specific operations

Function Description
DatetimeIndex.normalize() Convert times to midnight.
DatetimeIndex.strftime(date_format) Convert to Index using specified date_format.
DatetimeIndex.snap([freq]) Snap time stamps to nearest occurring frequency.
DatetimeIndex.tz_convert(tz) Convert tz-aware Datetime Array/Index from one time zone to another.
DatetimeIndex.tz_localize(tz[, ambiguous, ...]) Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
DatetimeIndex.round(freq[, ambiguous, ...]) Perform round operation on the data to the specified freq.
DatetimeIndex.floor(freq[, ambiguous, ...]) Perform floor operation on the data to the specified freq.
DatetimeIndex.ceil(freq[, ambiguous, ...]) Perform ceil operation on the data to the specified freq.
DatetimeIndex.month_name([locale]) Return the month names with specified locale.
DatetimeIndex.day_name([locale]) Return the day names with specified locale.

Conversion

Function Description
DatetimeIndex.as_unit(unit[, round_ok]) Convert to a dtype with the given unit resolution.
DatetimeIndex.to_period([freq]) Cast to PeriodArray/PeriodIndex at a particular frequency.
DatetimeIndex.to_pydatetime() Return an ndarray of datetime.datetime objects.
DatetimeIndex.to_series([index, name]) Create a Series with both index and values equal to the index keys.
DatetimeIndex.to_frame([index, name]) Create a DataFrame with a column containing the Index.
DatetimeIndex.to_julian_date() Convert TimeStamp to a Julian Date.

Methods

Function Description
DatetimeIndex.mean(*[, skipna, axis]) Return the mean value of the Array.
DatetimeIndex.std([axis, dtype, out, ddof, ...]) Return sample standard deviation over requested axis.

TimedeltaIndex

Function Description
TimedeltaIndex([data, freq, dtype, copy, name]) Immutable Index of timedelta64 data.

Components

Function Description
TimedeltaIndex.days Number of days for each element.
TimedeltaIndex.seconds Number of seconds (>= 0 and less than 1 day) for each element.
TimedeltaIndex.microseconds Number of microseconds (>= 0 and less than 1 second) for each element.
TimedeltaIndex.nanoseconds Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.
TimedeltaIndex.components Return a DataFrame of the individual resolution components of the Timedeltas.
TimedeltaIndex.inferred_freq Return the inferred frequency of the index.

Conversion

Function Description
TimedeltaIndex.as_unit(unit) Convert to a dtype with the given unit resolution.
TimedeltaIndex.to_pytimedelta() Return an ndarray of datetime.timedelta objects.
TimedeltaIndex.to_series([index, name]) Create a Series with both index and values equal to the index keys.
TimedeltaIndex.round(freq[, ambiguous, ...]) Perform round operation on the data to the specified freq.
TimedeltaIndex.floor(freq[, ambiguous, ...]) Perform floor operation on the data to the specified freq.
TimedeltaIndex.ceil(freq[, ambiguous, ...]) Perform ceil operation on the data to the specified freq.
TimedeltaIndex.to_frame([index, name]) Create a DataFrame with a column containing the Index.

Methods

Function Description
TimedeltaIndex.mean(*[, skipna, axis]) Return the mean value of the Array.

PeriodIndex

Function Description
PeriodIndex([data, freq, dtype, copy, name]) Immutable ndarray holding ordinal values indicating regular periods in time.

Properties

Function Description
PeriodIndex.day The days of the period.
PeriodIndex.dayofweek The day of the week with Monday=0, Sunday=6.
PeriodIndex.day_of_week The day of the week with Monday=0, Sunday=6.
PeriodIndex.dayofyear The ordinal day of the year.
PeriodIndex.day_of_year The ordinal day of the year.
PeriodIndex.days_in_month The number of days in the month.
PeriodIndex.daysinmonth The number of days in the month.
PeriodIndex.end_time Get the Timestamp for the end of the period.
PeriodIndex.freq Return the frequency object if it is set, otherwise None.
PeriodIndex.freqstr Return the frequency object as a string if it's set, otherwise None.
PeriodIndex.hour The hour of the period.
PeriodIndex.is_leap_year Logical indicating if the date belongs to a leap year.
PeriodIndex.minute The minute of the period.
PeriodIndex.month The month as January=1, December=12.
PeriodIndex.quarter The quarter of the date.
PeriodIndex.qyear Fiscal year the Period lies in according to its starting-quarter.
PeriodIndex.second The second of the period.
PeriodIndex.start_time Get the Timestamp for the start of the period.
PeriodIndex.week The week ordinal of the year.
PeriodIndex.weekday The day of the week with Monday=0, Sunday=6.
PeriodIndex.weekofyear The week ordinal of the year.
PeriodIndex.year The year of the period.

Methods

Function Description
PeriodIndex.asfreq([freq, how]) Convert the PeriodArray to the specified frequency freq.
PeriodIndex.strftime(date_format) Convert to Index using specified date_format.
PeriodIndex.to_timestamp([freq, how]) Cast to DatetimeArray/Index.
PeriodIndex.from_fields(*[, year, quarter, ...]) Construct a PeriodIndex from fields (year, month, day, etc.).
PeriodIndex.from_ordinals(ordinals, *, freq) Construct a PeriodIndex from ordinals.

Examples for pandas.Index

>>> pd.Index([1, 2, 3])
Index([1, 2, 3], dtype='int64')
>>> pd.Index(list("abc"))
Index(['a', 'b', 'c'], dtype='str')
>>> pd.Index([1, 2, 3], dtype="uint8")
Index([1, 2, 3], dtype='uint8')

Examples for pandas.Index.values

For pandas.Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.values
array([1, 2, 3])

For pandas.IntervalIndex:

>>> idx = pd.interval_range(start=0, end=5)
>>> idx.values
<IntervalArray>
[(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]]
Length: 5, dtype: interval[int64, right]

Examples for pandas.Index.is_monotonic_increasing

>>> pd.Index([1, 2, 3]).is_monotonic_increasing
True
>>> pd.Index([1, 2, 2]).is_monotonic_increasing
True
>>> pd.Index([1, 3, 2]).is_monotonic_increasing
False

Examples for pandas.Index.is_monotonic_decreasing

>>> pd.Index([3, 2, 1]).is_monotonic_decreasing
True
>>> pd.Index([3, 2, 2]).is_monotonic_decreasing
True
>>> pd.Index([3, 1, 2]).is_monotonic_decreasing
False

Examples for pandas.Index.is_unique

>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.is_unique
False
>>> idx = pd.Index([1, 5, 7])
>>> idx.is_unique
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
...     "category"
... )
>>> idx.is_unique
False
>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.is_unique
True

Examples for pandas.Index.has_duplicates

>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
...     "category"
... )
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.has_duplicates
False

Examples for pandas.Index.hasnans

>>> s = pd.Series([1, 2, 3], index=["a", "b", None])
>>> s
a    1
b    2
None 3
dtype: int64
>>> s.index.hasnans
True

Examples for pandas.Index.dtype

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.dtype
dtype('int64')

Examples for pandas.Index.inferred_type

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.inferred_type
'integer'

Examples for pandas.Index.shape

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.shape
(3,)

Examples for pandas.Index.name

>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx
Index([1, 2, 3], dtype='int64',  name='x')
>>> idx.name
'x'

Examples for pandas.Index.names

>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx.names
FrozenList(['x'])
>>> idx = pd.Index([1, 2, 3], name=("x", "y"))
>>> idx.names
FrozenList([('x', 'y')])

If the index does not have a name set:

>>> idx = pd.Index([1, 2, 3])
>>> idx.names
FrozenList([None])

Examples for pandas.Index.nbytes

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.nbytes
34

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24

Examples for pandas.Index.ndim

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.ndim
1

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1

Examples for pandas.Index.size

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.size
3

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3

Examples for pandas.Index.empty

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False
>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty!

>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False

Examples for pandas.Index.T

For Series:

>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.T
0     Ant
1    Bear
2     Cow
dtype: str

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')

Examples for pandas.Index.memory_usage

>>> idx = pd.Index([1, 2, 3])
>>> idx.memory_usage()
24

Examples for pandas.Index.array

For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

>>> pd.Index([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

For extension types, like Categorical, the actual ExtensionArray is returned

>>> idx = pd.Index(pd.Categorical(["a", "b", "a"]))
>>> idx.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Examples for pandas.Index.all

True, because nonzero integers are considered True.

>>> pd.Index([1, 2, 3]).all()
True

False, because 0 is considered False.

>>> pd.Index([0, 1, 2]).all()
False

GroupBy

pandas.api.typing.DataFrameGroupBy and pandas.api.typing.SeriesGroupBy instances are returned by groupby calls pandas.DataFrame.groupby() and pandas.Series.groupby() respectively.

Indexing, iteration

Function Description
DataFrameGroupBy.iter() Groupby iterator.
SeriesGroupBy.iter() Groupby iterator.
DataFrameGroupBy.groups Dict {group name -> group labels}.
SeriesGroupBy.groups Dict {group name -> group labels}.
DataFrameGroupBy.indices Dict {group name -> group indices}.
SeriesGroupBy.indices Dict {group name -> group indices}.
DataFrameGroupBy.get_group(name) Construct DataFrame from group with provided name.
SeriesGroupBy.get_group(name) Construct DataFrame from group with provided name.
Function Description
Grouper(*args, **kwargs) A Grouper allows the user to specify a groupby instruction for an object.

Function application helper

Function Description
NamedAgg(column, aggfunc, *args, **kwargs) Helper for column specific aggregation with control over output column names.

Function application

Function Description
SeriesGroupBy.apply(func, *args, **kwargs) Apply function func group-wise and combine the results together.
DataFrameGroupBy.apply(func, *args[, ...]) Apply function func group-wise and combine the results together.
SeriesGroupBy.agg([func, engine, engine_kwargs]) Aggregate using one or more operations.
DataFrameGroupBy.agg([func, engine, ...]) Aggregate using one or more operations.
SeriesGroupBy.aggregate([func, engine, ...]) Aggregate using one or more operations.
DataFrameGroupBy.aggregate([func, engine, ...]) Aggregate using one or more operations.
SeriesGroupBy.transform(func, *args[, ...]) Call function producing a same-indexed Series on each group.
DataFrameGroupBy.transform(func, *args[, ...]) Call function producing a same-indexed DataFrame on each group.
SeriesGroupBy.pipe(func, *args, **kwargs) Apply a func with arguments to this GroupBy object and return its result.
DataFrameGroupBy.pipe(func, *args, **kwargs) Apply a func with arguments to this GroupBy object and return its result.
DataFrameGroupBy.filter(func[, dropna]) Filter elements from groups that don't satisfy a criterion.
SeriesGroupBy.filter(func[, dropna]) Filter elements from groups that don't satisfy a criterion.

DataFrameGroupBy computations / descriptive stats

Function Description
DataFrameGroupBy.all([skipna]) Return True if all values in the group are truthful, else False.
DataFrameGroupBy.any([skipna]) Return True if any value in the group is truthful, else False.
DataFrameGroupBy.bfill([limit]) Backward fill the values.
DataFrameGroupBy.corr([method, min_periods, ...]) Compute pairwise correlation of columns, excluding NA/null values.
DataFrameGroupBy.corrwith(other[, drop, ...]) (DEPRECATED) Compute pairwise correlation.
DataFrameGroupBy.count() Compute count of group, excluding missing values.
DataFrameGroupBy.cov([min_periods, ddof, ...]) Compute pairwise covariance of columns, excluding NA/null values.
DataFrameGroupBy.cumcount([ascending]) Number each item in each group from 0 to the length of that group - 1.
DataFrameGroupBy.cummax([numeric_only]) Cumulative max for each group.
DataFrameGroupBy.cummin([numeric_only]) Cumulative min for each group.
DataFrameGroupBy.cumprod([numeric_only]) Cumulative product for each group.
DataFrameGroupBy.cumsum([numeric_only]) Cumulative sum for each group.
DataFrameGroupBy.describe([percentiles, ...]) Generate descriptive statistics.
DataFrameGroupBy.diff([periods]) First discrete difference of element.
DataFrameGroupBy.ewm([com, span, halflife, ...]) Return an ewm grouper, providing ewm functionality per group.
DataFrameGroupBy.expanding([min_periods, method]) Return an expanding grouper, providing expanding functionality per group.
DataFrameGroupBy.ffill([limit]) Forward fill the values.
DataFrameGroupBy.first([numeric_only, ...]) Compute the first entry of each column within each group.
DataFrameGroupBy.head([n]) Return first n rows of each group.
DataFrameGroupBy.idxmax([skipna, numeric_only]) Return index of first occurrence of maximum in each group.
DataFrameGroupBy.idxmin([skipna, numeric_only]) Return index of first occurrence of minimum in each group.
DataFrameGroupBy.last([numeric_only, ...]) Compute the last entry of each column within each group.
DataFrameGroupBy.max([numeric_only, ...]) Compute max of group values.
DataFrameGroupBy.mean([numeric_only, ...]) Compute mean of groups, excluding missing values.
DataFrameGroupBy.median([numeric_only, skipna]) Compute median of groups, excluding missing values.
DataFrameGroupBy.min([numeric_only, ...]) Compute min of group values.
DataFrameGroupBy.ngroup([ascending]) Number each group from 0 to the number of groups - 1.
DataFrameGroupBy.nth Take the nth row from each group if n is an int, otherwise a subset of rows.
DataFrameGroupBy.nunique([dropna]) Return DataFrame with counts of unique elements in each position.
DataFrameGroupBy.ohlc() Compute open, high, low and close values of a group, excluding missing values.
DataFrameGroupBy.pct_change([periods, ...]) Calculate pct_change of each value to previous entry in group.
DataFrameGroupBy.prod([numeric_only, ...]) Compute prod of group values.
DataFrameGroupBy.quantile([q, ...]) Return group values at the given quantile, a la numpy.percentile.
DataFrameGroupBy.rank([method, ascending, ...]) Provide the rank of values within each group.
DataFrameGroupBy.resample(rule, *args[, ...]) Provide resampling when using a TimeGrouper.
DataFrameGroupBy.rolling(window[, ...]) Return a rolling grouper, providing rolling functionality per group.
DataFrameGroupBy.sample([n, frac, replace, ...]) Return a random sample of items from each group.
DataFrameGroupBy.sem([ddof, numeric_only, ...]) Compute standard error of the mean of groups, excluding missing values.
DataFrameGroupBy.shift([periods, freq, ...]) Shift each group by periods observations.
DataFrameGroupBy.size() Compute group sizes.
DataFrameGroupBy.skew([skipna, numeric_only]) Return unbiased skew within groups.
DataFrameGroupBy.kurt([skipna, numeric_only]) Return unbiased kurtosis within groups.
DataFrameGroupBy.std([ddof, engine, ...]) Compute standard deviation of groups, excluding missing values.
DataFrameGroupBy.sum([numeric_only, ...]) Compute sum of group values.
DataFrameGroupBy.var([ddof, engine, ...]) Compute variance of groups, excluding missing values.
DataFrameGroupBy.tail([n]) Return last n rows of each group.
DataFrameGroupBy.take(indices, **kwargs) Return the elements in the given positional indices in each group.
DataFrameGroupBy.value_counts([subset, ...]) Return a Series or DataFrame containing counts of unique rows.

SeriesGroupBy computations / descriptive stats

Function Description
SeriesGroupBy.all([skipna]) Return True if all values in the group are truthful, else False.
SeriesGroupBy.any([skipna]) Return True if any value in the group is truthful, else False.
SeriesGroupBy.bfill([limit]) Backward fill the values.
SeriesGroupBy.corr(other[, method, min_periods]) Compute correlation between each group and another Series.
SeriesGroupBy.count() Compute count of group, excluding missing values.
SeriesGroupBy.cov(other[, min_periods, ddof]) Compute covariance between each group and another Series.
SeriesGroupBy.cumcount([ascending]) Number each item in each group from 0 to the length of that group - 1.
SeriesGroupBy.cummax([numeric_only]) Cumulative max for each group.
SeriesGroupBy.cummin([numeric_only]) Cumulative min for each group.
SeriesGroupBy.cumprod([numeric_only]) Cumulative product for each group.
SeriesGroupBy.cumsum([numeric_only]) Cumulative sum for each group.
SeriesGroupBy.describe([percentiles, ...]) Generate descriptive statistics.
SeriesGroupBy.diff([periods]) First discrete difference of element.
SeriesGroupBy.ewm([com, span, halflife, ...]) Return an ewm grouper, providing ewm functionality per group.
SeriesGroupBy.expanding([min_periods, method]) Return an expanding grouper, providing expanding functionality per group.
SeriesGroupBy.ffill([limit]) Forward fill the values.
SeriesGroupBy.first([numeric_only, ...]) Compute the first entry of each column within each group.
SeriesGroupBy.head([n]) Return first n rows of each group.
SeriesGroupBy.last([numeric_only, ...]) Compute the last entry of each column within each group.
SeriesGroupBy.idxmax([skipna]) Return the row label of the maximum value.
SeriesGroupBy.idxmin([skipna]) Return the row label of the minimum value.
SeriesGroupBy.is_monotonic_increasing Return whether each group's values are monotonically increasing.
SeriesGroupBy.is_monotonic_decreasing Return whether each group's values are monotonically decreasing.
SeriesGroupBy.max([numeric_only, min_count, ...]) Compute max of group values.
SeriesGroupBy.mean([numeric_only, skipna, ...]) Compute mean of groups, excluding missing values.
SeriesGroupBy.median([numeric_only, skipna]) Compute median of groups, excluding missing values.
SeriesGroupBy.min([numeric_only, min_count, ...]) Compute min of group values.
SeriesGroupBy.ngroup([ascending]) Number each group from 0 to the number of groups - 1.
SeriesGroupBy.nlargest([n, keep]) Return the largest n elements.
SeriesGroupBy.nsmallest([n, keep]) Return the smallest n elements.
SeriesGroupBy.nth Take the nth row from each group if n is an int, otherwise a subset of rows.
SeriesGroupBy.nunique([dropna]) Return number of unique elements in the group.
SeriesGroupBy.unique() Return unique values for each group.
SeriesGroupBy.ohlc() Compute open, high, low and close values of a group, excluding missing values.
SeriesGroupBy.pct_change([periods, ...]) Calculate pct_change of each value to previous entry in group.
SeriesGroupBy.prod([numeric_only, ...]) Compute prod of group values.
SeriesGroupBy.quantile([q, interpolation, ...]) Return group values at the given quantile, a la numpy.percentile.
SeriesGroupBy.rank([method, ascending, ...]) Provide the rank of values within each group.
SeriesGroupBy.resample(rule, *args[, ...]) Provide resampling when using a TimeGrouper.
SeriesGroupBy.rolling(window[, min_periods, ...]) Return a rolling grouper, providing rolling functionality per group.
SeriesGroupBy.sample([n, frac, replace, ...]) Return a random sample of items from each group.
SeriesGroupBy.sem([ddof, numeric_only, skipna]) Compute standard error of the mean of groups, excluding missing values.
SeriesGroupBy.shift([periods, freq, ...]) Shift each group by periods observations.
SeriesGroupBy.size() Compute group sizes.
SeriesGroupBy.skew([skipna, numeric_only]) Return unbiased skew within groups.
SeriesGroupBy.kurt([skipna, numeric_only]) Return unbiased kurtosis within groups.
SeriesGroupBy.std([ddof, engine, ...]) Compute standard deviation of groups, excluding missing values.
SeriesGroupBy.sum([numeric_only, min_count, ...]) Compute sum of group values.
SeriesGroupBy.var([ddof, engine, ...]) Compute variance of groups, excluding missing values.
SeriesGroupBy.tail([n]) Return last n rows of each group.
SeriesGroupBy.take(indices, **kwargs) Return the elements in the given positional indices in each group.
SeriesGroupBy.value_counts([normalize, ...]) Return a Series or DataFrame containing counts of unique rows.

Plotting and visualization

Function Description
DataFrameGroupBy.boxplot([subplots, column, ...]) Make box plots from DataFrameGroupBy data.
DataFrameGroupBy.hist([column, by, grid, ...]) Make a histogram of the DataFrame's columns.
SeriesGroupBy.hist([by, ax, grid, ...]) Draw histogram for each group's values using Series.hist() API.
DataFrameGroupBy.plot Make plots of groups from a DataFrame.
SeriesGroupBy.plot Make plots of groups from a Series.

Examples for pandas.api.typing.DataFrameGroupBy.__iter__

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for pandas.api.typing.SeriesGroupBy.__iter__

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for pandas.api.typing.DataFrameGroupBy.groups

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for pandas.api.typing.SeriesGroupBy.groups

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for pandas.api.typing.DataFrameGroupBy.indices

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for pandas.api.typing.SeriesGroupBy.indices

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for pandas.api.typing.DataFrameGroupBy.get_group

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for pandas.api.typing.SeriesGroupBy.get_group

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for pandas.Grouper

df.groupby(pd.Grouper(key="Animal")) is equivalent to df.groupby('Animal')

>>> df = pd.DataFrame(
...     {
...         "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
...         "Speed": [100, 5, 200, 300, 15],
...     }
... )
>>> df
   Animal  Speed
0  Falcon    100
1  Parrot      5
2  Falcon    200
3  Falcon    300
4  Parrot     15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
        Speed
Animal
Falcon  200.0
Parrot   10.0

Specify a resample operation on the column ‘Publish date’

>>> df = pd.DataFrame(
...     {
...         "Publish date": [
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-09"),
...             pd.Timestamp("2000-01-16"),
...         ],
...         "ID": [0, 1, 2, 3],
...         "Price": [10, 20, 30, 40],
...     }
... )
>>> df
  Publish date  ID  Price
0   2000-01-02   0     10
1   2000-01-02   1     20
2   2000-01-09   2     30
3   2000-01-16   3     40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
               ID  Price
Publish date
2000-01-02    0.5   15.0
2000-01-09    2.0   30.0
2000-01-16    3.0   40.0

If you want to adjust the start of the bins based on a fixed timestamp:

>>> start, end = "2000-10-01 23:30:00", "2000-10-02 00:30:00"
>>> rng = pd.date_range(start, end, freq="7min")
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min")).sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="epoch")).sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="2000-01-01")).sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

>>> ts.groupby(pd.Grouper(freq="17min", origin="start")).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", offset="23h30min")).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

>>> ts.groupby(pd.Grouper(freq="17min", offset="2min")).sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17min, dtype: int64

Examples for pandas.NamedAgg

>>> df = pd.DataFrame({"key": [1, 1, 2], "a": [-1, 0, 1], 1: [10, 11, 12]})
>>> agg_a = pd.NamedAgg(column="a", aggfunc="min")
>>> agg_1 = pd.NamedAgg(column=1, aggfunc=lambda x: np.mean(x))
>>> df.groupby("key").agg(result_a=agg_a, result_1=agg_1)
    result_a  result_1
key
1          -1      10.5
2           1      12.0
>>> def n_between(ser, low, high, **kwargs):
...     return ser.between(low, high, **kwargs).sum()
>>> agg_between = pd.NamedAgg("a", n_between, 0, 1)
>>> df.groupby("key").agg(count_between=agg_between)
    count_between
key
1               1
2               1
>>> agg_between_kw = pd.NamedAgg("a", n_between, 0, 1, inclusive="both")
>>> df.groupby("key").agg(count_between_kw=agg_between_kw)
    count_between_kw
key
1                   1
2                   1

Examples for pandas.api.typing.SeriesGroupBy.apply

>>> s = pd.Series([0, 1, 2], index="a a b".split())
>>> g1 = s.groupby(s.index, group_keys=False)
>>> g2 = s.groupby(s.index, group_keys=True)

From s above we can see that g has two groups, a and b. Notice that g1 have g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: The function passed to apply takes a Series as its argument and returns a Series. apply combines the result for each group together into a new Series.

The resulting dtype will reflect the return value of the passed func.

>>> g1.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a    0.0
a    2.0
b    1.0
dtype: float64

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a  a    0.0
   a    2.0
b  b    1.0
dtype: float64

Example 2: The function passed to apply takes a Series as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g1.apply(lambda x: x.max() - x.min())
a    1
b    0
dtype: int64

The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.

>>> g2.apply(lambda x: x.max() - x.min())
a    1
b    0
dtype: int64

Examples for pandas.api.typing.DataFrameGroupBy.apply

>>> df = pd.DataFrame({"A": "a a b".split(), "B": [1, 2, 3], "C": [4, 6, 5]})
>>> g1 = df.groupby("A", group_keys=False)
>>> g2 = df.groupby("A", group_keys=True)

Notice that g1 and g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> g1[["B", "C"]].apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2[["B", "C"]].apply(lambda x: x / x.sum())
            B    C
A
a 0  0.333333  0.4
  1  0.666667  0.6
b 2  1.000000  1.0

Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.

The resulting dtype will reflect the return value of the passed func.

>>> g1[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0
>>> g2[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0

The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.

Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g1.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

Example 4: The function passed to apply returns None for one of the group. This group is filtered from the result:

>>> g1.apply(lambda x: None if x.iloc[0, 0] == 3 else x)
   B  C
0  1  4
1  2  6

Examples for pandas.api.typing.SeriesGroupBy.agg

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.groupby([1, 1, 2, 2]).min()
1    1
2    3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg("min")
1    1
2    3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
   min  max
1    1    2
2    3    4

The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.

>>> s.groupby([1, 1, 2, 2]).agg(
...     minimum="min",
...     maximum="max",
... )
   minimum  maximum
1        1        2
2        3        4

The resulting dtype will reflect the return value of the aggregating function.

>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1    1.0
2    3.0
dtype: float64

Examples for pandas.api.typing.DataFrameGroupBy.agg

>>> data = {
...     "A": [1, 1, 2, 2],
...     "B": [1, 2, 3, 4],
...     "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

The aggregation is for each column.

>>> df.groupby("A").agg("min")
   B         C
A
1  1  0.227877
2  3 -0.562860

Multiple aggregations

>>> df.groupby("A").agg(["min", "max"])
    B             C
  min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Select a column for aggregation

>>> df.groupby("A").B.agg(["min", "max"])
   min  max
A
1    1    2
2    3    4

User-defined function for aggregation

>>> df.groupby("A").agg(lambda x: sum(x) + 2)
    B          C
A
1       5       2.590715
2       9       2.704907

Different aggregations per column

>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
    B             C
  min max       sum
A
1   1   2  0.590715
2   3   4  0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

>>> df.groupby("A").agg(
...     b_min=pd.NamedAgg(column="B", aggfunc="min"),
...     c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
   b_min     c_sum
A
1      1  0.590715
2      3  0.704907
  • The keywords are the output column names
  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

See Named aggregation for more.

The resulting dtype will reflect the return value of the aggregating function.

>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
      B
A
1   1.0
2   3.0

Examples for pandas.api.typing.SeriesGroupBy.aggregate

>>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64
>>> s.groupby([1, 1, 2, 2]).min()
1    1
2    3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg("min")
1    1
2    3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
   min  max
1    1    2
2    3    4

The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.

>>> s.groupby([1, 1, 2, 2]).agg(
...     minimum="min",
...     maximum="max",
... )
   minimum  maximum
1        1        2
2        3        4

The resulting dtype will reflect the return value of the aggregating function.

>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1    1.0
2    3.0
dtype: float64

Examples for pandas.api.typing.DataFrameGroupBy.aggregate

>>> data = {
...     "A": [1, 1, 2, 2],
...     "B": [1, 2, 3, 4],
...     "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

The aggregation is for each column.

>>> df.groupby("A").agg("min")
   B         C
A
1  1  0.227877
2  3 -0.562860

Multiple aggregations

>>> df.groupby("A").agg(["min", "max"])
    B             C
  min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Select a column for aggregation

>>> df.groupby("A").B.agg(["min", "max"])
   min  max
A
1    1    2
2    3    4

User-defined function for aggregation

>>> df.groupby("A").agg(lambda x: sum(x) + 2)
    B          C
A
1       5       2.590715
2       9       2.704907

Different aggregations per column

>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
    B             C
  min max       sum
A
1   1   2  0.590715
2   3   4  0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

>>> df.groupby("A").agg(
...     b_min=pd.NamedAgg(column="B", aggfunc="min"),
...     c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
   b_min     c_sum
A
1      1  0.590715
2      3  0.704907
  • The keywords are the output column names
  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

See Named aggregation for more.

The resulting dtype will reflect the return value of the aggregating function.

>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
      B
A
1   1.0
2   3.0

Examples for pandas.api.typing.SeriesGroupBy.transform

>>> ser = pd.Series(
...     [390.0, 350.0, 30.0, 20.0],
...     index=["Falcon", "Falcon", "Parrot", "Parrot"],
...     name="Max Speed",
... )
>>> grouped = ser.groupby([1, 1, 2, 2])
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
    Falcon    0.707107
    Falcon   -0.707107
    Parrot    0.707107
    Parrot   -0.707107
    Name: Max Speed, dtype: float64

Broadcast result of the transformation

>>> grouped.transform(lambda x: x.max() - x.min())
Falcon    40.0
Falcon    40.0
Parrot    10.0
Parrot    10.0
Name: Max Speed, dtype: float64
>>> grouped.transform("mean")
Falcon    370.0
Falcon    370.0
Parrot     25.0
Parrot     25.0
Name: Max Speed, dtype: float64

The resulting dtype will reflect the return value of the passed func, for example:

>>> grouped.transform(lambda x: x.astype(int).max())
Falcon    390
Falcon    390
Parrot     30
Parrot     30
Name: Max Speed, dtype: int64

Examples for pandas.api.typing.DataFrameGroupBy.transform

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "bar", "foo", "bar", "foo", "bar"],
...         "B": ["one", "one", "two", "three", "two", "two"],
...         "C": [1, 5, 5, 2, 5, 5],
...         "D": [2.0, 5.0, 8.0, 1.0, 2.0, 9.0],
...     }
... )
>>> grouped = df.groupby("A")[["C", "D"]]
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
          C         D
0 -1.154701 -0.577350
1  0.577350  0.000000
2  0.577350  1.154701
3 -1.154701 -1.000000
4  0.577350 -0.577350
5  0.577350  1.000000

Broadcast result of the transformation

>>> grouped.transform(lambda x: x.max() - x.min())
     C    D
0  4.0  6.0
1  3.0  8.0
2  4.0  6.0
3  3.0  8.0
4  4.0  6.0
5  3.0  8.0
>>> grouped.transform("mean")
          C    D
0  3.666667  4.0
1  4.000000  5.0
2  3.666667  4.0
3  4.000000  5.0
4  3.666667  4.0
5  4.000000  5.0

The resulting dtype will reflect the return value of the passed func, for example:

>>> grouped.transform(lambda x: x.astype(int).max())
   C  D
0  5  8
1  5  9
2  5  8
3  5  9
4  5  8
5  5  9

Examples for pandas.api.typing.SeriesGroupBy.pipe

>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
   A  B
0  a  1
1  b  2
2  a  3
3  b  4

To get the difference between each groups maximum and minimum value in one pass, you can do

>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
   B
A
a  2
b  2

Examples for pandas.api.typing.DataFrameGroupBy.pipe

>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
   A  B
0  a  1
1  b  2
2  a  3
3  b  4

To get the difference between each groups maximum and minimum value in one pass, you can do

>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
   B
A
a  2
b  2

Window

pandas.api.typing.Rolling instances are returned by .rolling calls: pandas.DataFrame.rolling() and pandas.Series.rolling(). pandas.api.typing.Expanding instances are returned by .expanding calls: pandas.DataFrame.expanding() and pandas.Series.expanding(). pandas.api.typing.ExponentialMovingWindow instances are returned by .ewm calls: pandas.DataFrame.ewm() and pandas.Series.ewm().

Rolling window functions

Function Description
Rolling.count([numeric_only]) Calculate the rolling count of non NaN observations.
Rolling.sum([numeric_only, engine, ...]) Calculate the rolling sum.
Rolling.mean([numeric_only, engine, ...]) Calculate the rolling mean.
Rolling.median([numeric_only, engine, ...]) Calculate the rolling median.
Rolling.var([ddof, numeric_only, engine, ...]) Calculate the rolling variance.
Rolling.std([ddof, numeric_only, engine, ...]) Calculate the rolling standard deviation.
Rolling.min([numeric_only, engine, ...]) Calculate the rolling minimum.
Rolling.max([numeric_only, engine, ...]) Calculate the rolling maximum.
Rolling.first([numeric_only]) Calculate the rolling First (left-most) element of the window.
Rolling.last([numeric_only]) Calculate the rolling Last (right-most) element of the window.
Rolling.corr([other, pairwise, ddof, ...]) Calculate the rolling correlation.
Rolling.cov([other, pairwise, ddof, ...]) Calculate the rolling sample covariance.
Rolling.skew([numeric_only]) Calculate the rolling unbiased skewness.
Rolling.kurt([numeric_only]) Calculate the rolling Fisher's definition of kurtosis without bias.
Rolling.apply(func[, raw, engine, ...]) Calculate the rolling custom aggregation function.
Rolling.pipe(func, *args, **kwargs) Apply a func with arguments to this Rolling object and return its result.
Rolling.aggregate([func]) Aggregate using one or more operations over the specified axis.
Rolling.quantile(q[, interpolation, ...]) Calculate the rolling quantile.
Rolling.sem([ddof, numeric_only]) Calculate the rolling standard error of mean.
Rolling.rank([method, ascending, pct, ...]) Calculate the rolling rank.
Rolling.nunique([numeric_only]) Calculate the rolling nunique.

Weighted window functions

Function Description
Window.mean([numeric_only]) Calculate the rolling weighted window mean.
Window.sum([numeric_only]) Calculate the rolling weighted window sum.
Window.var([ddof, numeric_only]) Calculate the rolling weighted window variance.
Window.std([ddof, numeric_only]) Calculate the rolling weighted window standard deviation.

Expanding window functions

Function Description
Expanding.count([numeric_only]) Calculate the expanding count of non NaN observations.
Expanding.sum([numeric_only, engine, ...]) Calculate the expanding sum.
Expanding.mean([numeric_only, engine, ...]) Calculate the expanding mean.
Expanding.median([numeric_only, engine, ...]) Calculate the expanding median.
Expanding.var([ddof, numeric_only, engine, ...]) Calculate the expanding variance.
Expanding.std([ddof, numeric_only, engine, ...]) Calculate the expanding standard deviation.
Expanding.min([numeric_only, engine, ...]) Calculate the expanding minimum.
Expanding.max([numeric_only, engine, ...]) Calculate the expanding maximum.
Expanding.first([numeric_only]) Calculate the expanding First (left-most) element of the window.
Expanding.last([numeric_only]) Calculate the expanding Last (right-most) element of the window.
Expanding.corr([other, pairwise, ddof, ...]) Calculate the expanding correlation.
Expanding.cov([other, pairwise, ddof, ...]) Calculate the expanding sample covariance.
Expanding.skew([numeric_only]) Calculate the expanding unbiased skewness.
Expanding.kurt([numeric_only]) Calculate the expanding Fisher's definition of kurtosis without bias.
Expanding.apply(func[, raw, engine, ...]) Calculate the expanding custom aggregation function.
Expanding.pipe(func, *args, **kwargs) Apply a func with arguments to this Expanding object and return its result.
Expanding.aggregate([func]) Aggregate using one or more operations over the specified axis.
Expanding.quantile(q[, interpolation, ...]) Calculate the expanding quantile.
Expanding.sem([ddof, numeric_only]) Calculate the expanding standard error of mean.
Expanding.rank([method, ascending, pct, ...]) Calculate the expanding rank.
Expanding.nunique([numeric_only]) Calculate the expanding nunique.

Exponentially-weighted window functions

Function Description
ExponentialMovingWindow.mean([numeric_only, ...]) Calculate the ewm (exponential weighted moment) mean.
ExponentialMovingWindow.sum([numeric_only, ...]) Calculate the ewm (exponential weighted moment) sum.
ExponentialMovingWindow.std([bias, numeric_only]) Calculate the ewm (exponential weighted moment) standard deviation.
ExponentialMovingWindow.var([bias, numeric_only]) Calculate the ewm (exponential weighted moment) variance.
ExponentialMovingWindow.corr([other, ...]) Calculate the ewm (exponential weighted moment) sample correlation.
ExponentialMovingWindow.cov([other, ...]) Calculate the ewm (exponential weighted moment) sample covariance.

Window indexer

Base class for defining custom window boundaries.

Function Description
api.indexers.BaseIndexer([index_array, ...]) Base class for window bounds calculations.
api.indexers.FixedForwardWindowIndexer([...]) Creates window boundaries for fixed-length windows that include the current row.
api.indexers.VariableOffsetWindowIndexer([...]) Calculate window boundaries based on a non-fixed offset such as a BusinessDay.

Examples for pandas.api.typing.Rolling.count

>>> s = pd.Series([2, 3, np.nan, 10])
>>> s.rolling(2).count()
0    NaN
1    2.0
2    1.0
3    1.0
dtype: float64
>>> s.rolling(3).count()
0    NaN
1    NaN
2    2.0
3    2.0
dtype: float64
>>> s.rolling(4).count()
0    NaN
1    NaN
2    NaN
3    3.0
dtype: float64

Examples for pandas.api.typing.Rolling.sum

>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s
0    1
1    2
2    3
3    4
4    5
dtype: int64
>>> s.rolling(3).sum()
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
dtype: float64
>>> s.rolling(3, center=True).sum()
0     NaN
1     6.0
2     9.0
3    12.0
4     NaN
dtype: float64

For DataFrame, each sum is computed column-wise.

>>> df = pd.DataFrame({"A": s, "B": s**2})
>>> df
A   B
0  1   1
1  2   4
2  3   9
3  4  16
4  5  25
>>> df.rolling(3).sum()
     A     B
0   NaN   NaN
1   NaN   NaN
2   6.0  14.0
3   9.0  29.0
4  12.0  50.0

Examples for pandas.api.typing.Rolling.mean

The below examples will show rolling mean calculations with window sizes of two and three, respectively.

>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).mean()
0    NaN
1    1.5
2    2.5
3    3.5
dtype: float64
>>> s.rolling(3).mean()
0    NaN
1    NaN
2    2.0
3    3.0
dtype: float64

Examples for pandas.api.typing.Rolling.median

Compute the rolling median of a series with a window size of 3.

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.rolling(3).median()
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
dtype: float64

Examples for pandas.api.typing.Rolling.var

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).var()
0         NaN
1         NaN
2    0.333333
3    1.000000
4    1.000000
5    1.333333
6    0.000000
dtype: float64

Examples for pandas.api.typing.Rolling.std

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).std()
0         NaN
1         NaN
2    0.577350
3    1.000000
4    1.000000
5    1.154701
6    0.000000
dtype: float64

Examples for pandas.api.typing.Rolling.min

Performing a rolling minimum with a window size of 3.

>>> s = pd.Series([4, 3, 5, 2, 6])
>>> s.rolling(3).min()
0    NaN
1    NaN
2    3.0
3    2.0
4    2.0
dtype: float64

Examples for pandas.api.typing.Rolling.max

>>> ser = pd.Series([1, 2, 3, 4])
>>> ser.rolling(2).max()
0    NaN
1    2.0
2    3.0
3    4.0
dtype: float64

Examples for pandas.api.typing.Rolling.first

The example below will show a rolling calculation with a window size of three.

>>> s = pd.Series(range(5))
>>> s.rolling(3).first()
0         NaN
1         NaN
2         0.0
3         1.0
4         2.0
dtype: float64

Examples for pandas.api.typing.Rolling.last

The example below will show a rolling calculation with a window size of three.

>>> s = pd.Series(range(5))
>>> s.rolling(3).last()
0         NaN
1         NaN
2         2.0
3         3.0
4         4.0
dtype: float64

Examples for pandas.api.typing.Rolling.corr

The below example shows a rolling calculation with a window size of four matching the equivalent function call using numpy.corrcoef().

>>> v1 = [3, 3, 3, 5, 8]
>>> v2 = [3, 4, 4, 4, 8]
>>> np.corrcoef(v1[:-1], v2[:-1])
array([[1.        , 0.33333333],
    [0.33333333, 1.        ]])
>>> np.corrcoef(v1[1:], v2[1:])
array([[1.       , 0.9169493],
    [0.9169493, 1.       ]])
>>> s1 = pd.Series(v1)
>>> s2 = pd.Series(v2)
>>> s1.rolling(4).corr(s2)
0         NaN
1         NaN
2         NaN
3    0.333333
4    0.916949
dtype: float64

The below example shows a similar rolling calculation on a DataFrame using the pairwise option.

>>> matrix = np.array(
...     [[51.0, 35.0], [49.0, 30.0], [47.0, 32.0], [46.0, 31.0], [50.0, 36.0]]
... )
>>> np.corrcoef(matrix[:-1, 0], matrix[:-1, 1])
array([[1.       , 0.6263001],
    [0.6263001, 1.       ]])
>>> np.corrcoef(matrix[1:, 0], matrix[1:, 1])
array([[1.        , 0.55536811],
    [0.55536811, 1.        ]])
>>> df = pd.DataFrame(matrix, columns=["X", "Y"])
>>> df
      X     Y
0  51.0  35.0
1  49.0  30.0
2  47.0  32.0
3  46.0  31.0
4  50.0  36.0
>>> df.rolling(4).corr(pairwise=True)
            X         Y
0 X        NaN       NaN
  Y        NaN       NaN
1 X        NaN       NaN
  Y        NaN       NaN
2 X        NaN       NaN
  Y        NaN       NaN
3 X   1.000000  0.626300
  Y   0.626300  1.000000
4 X   1.000000  0.555368
  Y   0.555368  1.000000

Examples for pandas.api.typing.Rolling.cov

>>> ser1 = pd.Series([1, 2, 3, 4])
>>> ser2 = pd.Series([1, 4, 5, 8])
>>> ser1.rolling(2).cov(ser2)
0    NaN
1    1.5
2    0.5
3    1.5
dtype: float64

Examples for pandas.api.typing.Rolling.skew

>>> ser = pd.Series([1, 5, 2, 7, 15, 6])
>>> ser.rolling(3).skew().round(6)
0         NaN
1         NaN
2    1.293343
3   -0.585583
4    0.670284
5    1.652317
dtype: float64

Examples for pandas.api.typing.Rolling.kurt

The example below will show a rolling calculation with a window size of four matching the equivalent function call using scipy.stats.

>>> arr = [1, 2, 3, 4, 999]
>>> import scipy.stats
>>> print(f"{scipy.stats.kurtosis(arr[:-1], bias=False):.6f}")
-1.200000
>>> print(f"{scipy.stats.kurtosis(arr[1:], bias=False):.6f}")
3.999946
>>> s = pd.Series(arr)
>>> s.rolling(4).kurt()
0         NaN
1         NaN
2         NaN
3   -1.200000
4    3.999946
dtype: float64

Examples for pandas.api.typing.Rolling.apply

>>> ser = pd.Series([1, 6, 5, 4])
>>> ser.rolling(2).apply(lambda s: s.sum() - s.min())
0    NaN
1    6.0
2    6.0
3    5.0
dtype: float64

Examples for pandas.api.typing.Rolling.pipe

>>> df = pd.DataFrame(
...     {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
            A
2012-08-02  1
2012-08-03  2
2012-08-04  3
2012-08-05  4

To get the difference between each rolling 2-day window’s maximum and minimum value in one pass, you can do

>>> df.rolling("2D").pipe(lambda x: x.max() - x.min())
            A
2012-08-02  0.0
2012-08-03  1.0
2012-08-04  1.0
2012-08-05  1.0

Examples for pandas.api.typing.Rolling.aggregate

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
>>> df
A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
>>> df.rolling(2).sum()
    A     B     C
0  NaN   NaN   NaN
1  3.0   9.0  15.0
2  5.0  11.0  17.0
>>> df.rolling(2).agg({"A": "sum", "B": "min"})
    A    B
0  NaN  NaN
1  3.0  4.0
2  5.0  5.0

Examples for pandas.api.typing.Rolling.quantile

>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).quantile(0.4, interpolation="lower")
0    NaN
1    1.0
2    2.0
3    3.0
dtype: float64
>>> s.rolling(2).quantile(0.4, interpolation="midpoint")
0    NaN
1    1.5
2    2.5
3    3.5
dtype: float64

Examples for pandas.api.typing.Rolling.sem

>>> s = pd.Series([0, 1, 2, 3])
>>> s.rolling(2, min_periods=1).sem()
0    NaN
1    0.5
2    0.5
3    0.5
dtype: float64

Examples for pandas.api.typing.Rolling.rank

>>> s = pd.Series([1, 4, 2, 3, 5, 3])
>>> s.rolling(3).rank()
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64
>>> s.rolling(3).rank(method="max")
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64
>>> s.rolling(3).rank(method="min")
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.0
dtype: float64

Resampling

pandas.api.typing.Resampler instances are returned by resample calls: pandas.DataFrame.resample(), pandas.Series.resample().

Indexing, iteration

Function Description
Resampler.iter() Groupby iterator.
Resampler.groups Dict {group name -> group labels}.
Resampler.indices Dict {group name -> group indices}.
Resampler.get_group(name) Construct DataFrame from group with provided name.

Function application

Function Description
Resampler.apply([func]) Aggregate using one or more operations over the specified axis.
Resampler.aggregate([func]) Aggregate using one or more operations over the specified axis.
Resampler.transform(arg, *args, **kwargs) Call function producing a like-indexed Series on each group.
Resampler.pipe(func, *args, **kwargs) Apply a func with arguments to this Resampler object and return its result.

Upsampling

Function Description
Resampler.ffill([limit]) Forward fill the values.
Resampler.bfill([limit]) Backward fill the new missing values in the resampled data.
Resampler.nearest([limit]) Resample by using the nearest value.
Resampler.asfreq([fill_value]) Return the values at the new freq, essentially a reindex.
Resampler.interpolate([method, axis, limit, ...]) Interpolate values between target timestamps according to different methods.

Computations / descriptive stats

Function Description
Resampler.count() Compute count of group, excluding missing values.
Resampler.nunique() Return number of unique elements in the group.
Resampler.first([numeric_only, min_count, ...]) Compute the first non-null entry of each column.
Resampler.last([numeric_only, min_count, skipna]) Compute the last non-null entry of each column.
Resampler.max([numeric_only, min_count]) Compute max value of group.
Resampler.mean([numeric_only]) Compute mean of groups, excluding missing values.
Resampler.median([numeric_only]) Compute median of groups, excluding missing values.
Resampler.min([numeric_only, min_count]) Compute min value of group.
Resampler.ohlc() Compute open, high, low and close values of a group, excluding missing values.
Resampler.prod([numeric_only, min_count]) Compute prod of group values.
Resampler.size() Compute group sizes.
Resampler.sem([ddof, numeric_only]) Compute standard error of the mean of groups, excluding missing values.
Resampler.std([ddof, numeric_only]) Compute standard deviation of groups, excluding missing values.
Resampler.sum([numeric_only, min_count]) Compute sum of group values.
Resampler.var([ddof, numeric_only]) Compute variance of groups, excluding missing values.
Resampler.quantile([q]) Return value at the given quantile.

Examples for pandas.api.typing.Resampler.__iter__

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for pandas.api.typing.Resampler.groups

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for pandas.api.typing.Resampler.indices

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for pandas.api.typing.Resampler.get_group

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for pandas.api.typing.Resampler.apply

>>> s = pd.Series(
...     [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00    1
2013-01-01 00:00:01    2
2013-01-01 00:00:02    3
2013-01-01 00:00:03    4
2013-01-01 00:00:04    5
Freq: s, dtype: int64
>>> r = s.resample("2s")
>>> r.agg("sum")
2013-01-01 00:00:00    3
2013-01-01 00:00:02    7
2013-01-01 00:00:04    5
Freq: 2s, dtype: int64
>>> r.agg(["sum", "mean", "max"])
                    sum  mean  max
2013-01-01 00:00:00    3   1.5    2
2013-01-01 00:00:02    7   3.5    4
2013-01-01 00:00:04    5   5.0    5
>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
                    result  total
2013-01-01 00:00:00  2.121320      3
2013-01-01 00:00:02  4.949747      7
2013-01-01 00:00:04       NaN      5
>>> r.agg(average="mean", total="sum")
                        average  total
2013-01-01 00:00:00      1.5      3
2013-01-01 00:00:02      3.5      7
2013-01-01 00:00:04      5.0      5

Examples for pandas.api.typing.Resampler.aggregate

>>> s = pd.Series(
...     [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00    1
2013-01-01 00:00:01    2
2013-01-01 00:00:02    3
2013-01-01 00:00:03    4
2013-01-01 00:00:04    5
Freq: s, dtype: int64
>>> r = s.resample("2s")
>>> r.agg("sum")
2013-01-01 00:00:00    3
2013-01-01 00:00:02    7
2013-01-01 00:00:04    5
Freq: 2s, dtype: int64
>>> r.agg(["sum", "mean", "max"])
                    sum  mean  max
2013-01-01 00:00:00    3   1.5    2
2013-01-01 00:00:02    7   3.5    4
2013-01-01 00:00:04    5   5.0    5
>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
                    result  total
2013-01-01 00:00:00  2.121320      3
2013-01-01 00:00:02  4.949747      7
2013-01-01 00:00:04       NaN      5
>>> r.agg(average="mean", total="sum")
                        average  total
2013-01-01 00:00:00      1.5      3
2013-01-01 00:00:02      3.5      7
2013-01-01 00:00:04      5.0      5

Examples for pandas.api.typing.Resampler.transform

>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
Freq: h, dtype: int64
>>> resampled = s.resample("15min")
>>> resampled.transform(lambda x: (x - x.mean()) / x.std())
2018-01-01 00:00:00   NaN
2018-01-01 01:00:00   NaN
Freq: h, dtype: float64

Examples for pandas.api.typing.Resampler.pipe

>>> df = pd.DataFrame(
...     {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
            A
2012-08-02  1
2012-08-03  2
2012-08-04  3
2012-08-05  4

To get the difference between each 2-day period’s maximum and minimum value in one pass, you can do

>>> df.resample("2D").pipe(lambda x: x.max() - x.min())
            A
2012-08-02  1
2012-08-04  1

Examples for pandas.api.typing.Resampler.ffill

Here we only create a Series.

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64

Example for ffill with downsampling (we have fewer dates after resampling):

>>> ser.resample("MS").ffill()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Example for ffill with upsampling (fill the new dates with the previous value):

>>> ser.resample("W").ffill()
2023-01-01    1
2023-01-08    1
2023-01-15    2
2023-01-22    2
2023-01-29    2
2023-02-05    3
2023-02-12    3
2023-02-19    4
Freq: W-SUN, dtype: int64

With upsampling and limiting (only fill the first new date with the previous value):

>>> ser.resample("W").ffill(limit=1)
2023-01-01    1.0
2023-01-08    1.0
2023-01-15    2.0
2023-01-22    2.0
2023-01-29    NaN
2023-02-05    3.0
2023-02-12    NaN
2023-02-19    4.0
Freq: W-SUN, dtype: float64

Examples for pandas.api.typing.Resampler.bfill

Resampling a Series:

>>> s = pd.Series(
...     [1, 2, 3], index=pd.date_range("20180101", periods=3, freq="h")
... )
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: h, dtype: int64
>>> s.resample("30min").bfill()
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30min, dtype: int64
>>> s.resample("15min").bfill(limit=2)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    NaN
2018-01-01 00:30:00    2.0
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:15:00    NaN
2018-01-01 01:30:00    3.0
2018-01-01 01:45:00    3.0
2018-01-01 02:00:00    3.0
Freq: 15min, dtype: float64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame(
...     {"a": [2, np.nan, 6], "b": [1, 3, 5]},
...     index=pd.date_range("20180101", periods=3, freq="h"),
... )
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5
>>> df.resample("30min").bfill()
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:30:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:30:00  6.0  5
2018-01-01 02:00:00  6.0  5
>>> df.resample("15min").bfill(limit=2)
                       a    b
2018-01-01 00:00:00  2.0  1.0
2018-01-01 00:15:00  NaN  NaN
2018-01-01 00:30:00  NaN  3.0
2018-01-01 00:45:00  NaN  3.0
2018-01-01 01:00:00  NaN  3.0
2018-01-01 01:15:00  NaN  NaN
2018-01-01 01:30:00  6.0  5.0
2018-01-01 01:45:00  6.0  5.0
2018-01-01 02:00:00  6.0  5.0

Examples for pandas.api.typing.Resampler.nearest

>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
Freq: h, dtype: int64
>>> s.resample("15min").nearest()
2018-01-01 00:00:00    1
2018-01-01 00:15:00    1
2018-01-01 00:30:00    2
2018-01-01 00:45:00    2
2018-01-01 01:00:00    2
Freq: 15min, dtype: int64

Limit the number of upsampled values imputed by the nearest:

>>> s.resample("15min").nearest(limit=1)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    1.0
2018-01-01 00:30:00    NaN
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
Freq: 15min, dtype: float64

Examples for pandas.api.typing.Resampler.asfreq

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-31", "2023-02-01", "2023-02-28"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-31    2
2023-02-01    3
2023-02-28    4
dtype: int64
>>> ser.resample("MS").asfreq()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.interpolate

>>> start = "2023-03-01T07:00:00"
>>> timesteps = pd.date_range(start, periods=5, freq="s")
>>> series = pd.Series(data=[1, -1, 2, 1, 3], index=timesteps)
>>> series
2023-03-01 07:00:00    1
2023-03-01 07:00:01   -1
2023-03-01 07:00:02    2
2023-03-01 07:00:03    1
2023-03-01 07:00:04    3
Freq: s, dtype: int64

Downsample the dataframe to 0.5Hz by providing the period time of 2s.

>>> series.resample("2s").interpolate("linear")
2023-03-01 07:00:00    1
2023-03-01 07:00:02    2
2023-03-01 07:00:04    3
Freq: 2s, dtype: int64

Upsample the dataframe to 2Hz by providing the period time of 500ms.

>>> series.resample("500ms").interpolate("linear")
2023-03-01 07:00:00.000    1.0
2023-03-01 07:00:00.500    0.0
2023-03-01 07:00:01.000   -1.0
2023-03-01 07:00:01.500    0.5
2023-03-01 07:00:02.000    2.0
2023-03-01 07:00:02.500    1.5
2023-03-01 07:00:03.000    1.0
2023-03-01 07:00:03.500    2.0
2023-03-01 07:00:04.000    3.0
Freq: 500ms, dtype: float64

Internal reindexing with asfreq() prior to interpolation leads to an interpolated timeseries on the basis of the reindexed timestamps (anchors). It is assured that all available datapoints from original series become anchors, so it also works for resampling-cases that lead to non-aligned timestamps, as in the following example:

>>> series.resample("400ms").interpolate("linear")
2023-03-01 07:00:00.000    1.000000
2023-03-01 07:00:00.400    0.333333
2023-03-01 07:00:00.800   -0.333333
2023-03-01 07:00:01.200    0.000000
2023-03-01 07:00:01.600    1.000000
2023-03-01 07:00:02.000    2.000000
2023-03-01 07:00:02.400    1.666667
2023-03-01 07:00:02.800    1.333333
2023-03-01 07:00:03.200    1.666667
2023-03-01 07:00:03.600    2.333333
2023-03-01 07:00:04.000    3.000000
Freq: 400ms, dtype: float64

Note that the series correctly decreases between two anchors 07:00:00 and 07:00:02.

Examples for pandas.api.typing.Resampler.count

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").count()
2023-01-01    2
2023-02-01    2
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.nunique

>>> ser = pd.Series(
...     [1, 2, 3, 3],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    3
dtype: int64
>>> ser.resample("MS").nunique()
2023-01-01    2
2023-02-01    1
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.first

>>> s = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> s
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> s.resample("MS").first()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.last

>>> s = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> s
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> s.resample("MS").last()
2023-01-01    2
2023-02-01    4
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.max

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").max()
2023-01-01    2
2023-02-01    4
Freq: MS, dtype: int64

Examples for pandas.api.typing.Resampler.mean

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").mean()
2023-01-01    1.5
2023-02-01    3.5
Freq: MS, dtype: float64

Examples for pandas.api.typing.Resampler.median

>>> ser = pd.Series(
...     [1, 2, 3, 3, 4, 5],
...     index=pd.DatetimeIndex(
...         [
...             "2023-01-01",
...             "2023-01-10",
...             "2023-01-15",
...             "2023-02-01",
...             "2023-02-10",
...             "2023-02-15",
...         ]
...     ),
... )
>>> ser.resample("MS").median()
2023-01-01    2.0
2023-02-01    4.0
Freq: MS, dtype: float64

Date offsets

DateOffset

Function Description
DateOffset Standard kind of date increment used for a date range.

Properties

Function Description
DateOffset.freqstr Return a string representing the frequency.
DateOffset.kwds Return a dict of extra parameters for the offset.
DateOffset.name Return a string representing the base frequency.
DateOffset.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
DateOffset.normalize Return boolean whether the frequency can align with midnight.
DateOffset.rule_code Return a string representing the base frequency.
DateOffset.n Return the count of the number of periods.

Methods

Function Description
DateOffset.copy() Return a copy of the frequency.
DateOffset.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
DateOffset.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
DateOffset.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
DateOffset.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
DateOffset.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
DateOffset.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
DateOffset.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.
DateOffset.rollback(dt) Roll provided date backward to next offset only if not on offset.
DateOffset.rollforward(dt) Roll provided date forward to next offset only if not on offset.

BusinessDay

Function Description
BusinessDay DateOffset subclass representing possibly n business days.

Alias:

Function Description
BDay alias of BusinessDay

Properties

Function Description
BusinessDay.freqstr Return a string representing the frequency.
BusinessDay.kwds Return a dict of extra parameters for the offset.
BusinessDay.name Return a string representing the base frequency.
BusinessDay.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessDay.normalize Return boolean whether the frequency can align with midnight.
BusinessDay.rule_code Return a string representing the base frequency.
BusinessDay.n Return the count of the number of periods.
BusinessDay.weekmask Return the weekmask used for custom business day calculations.
BusinessDay.holidays Return the holidays used for custom business day calculations.
BusinessDay.calendar Return the calendar used for business day calculations.

Methods

Function Description
BusinessDay.copy() Return a copy of the frequency.
BusinessDay.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BusinessDay.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BusinessDay.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BusinessDay.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BusinessDay.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BusinessDay.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BusinessDay.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BusinessHour

Function Description
BusinessHour DateOffset subclass representing possibly n business hours.

Properties

Function Description
BusinessHour.freqstr Return a string representing the frequency.
BusinessHour.kwds Return a dict of extra parameters for the offset.
BusinessHour.name Return a string representing the base frequency.
BusinessHour.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessHour.normalize Return boolean whether the frequency can align with midnight.
BusinessHour.rule_code Return a string representing the base frequency.
BusinessHour.n Return the count of the number of periods.
BusinessHour.start Return the start time(s) of the business hour.
BusinessHour.end Return the end time(s) of the business hour.
BusinessHour.weekmask Return the weekmask used for custom business day calculations.
BusinessHour.holidays Return the holidays used for custom business day calculations.
BusinessHour.calendar Return the calendar used for business day calculations.

Methods

Function Description
BusinessHour.copy() Return a copy of the frequency.
BusinessHour.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BusinessHour.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BusinessHour.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BusinessHour.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BusinessHour.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BusinessHour.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BusinessHour.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

CustomBusinessDay

Function Description
CustomBusinessDay DateOffset subclass representing possibly n custom business days.

Alias:

Function Description
CDay alias of CustomBusinessDay

Properties

Function Description
CustomBusinessDay.freqstr Return a string representing the frequency.
CustomBusinessDay.kwds Return a dict of extra parameters for the offset.
CustomBusinessDay.name Return a string representing the base frequency.
CustomBusinessDay.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessDay.normalize Return boolean whether the frequency can align with midnight.
CustomBusinessDay.rule_code Return a string representing the base frequency.
CustomBusinessDay.n Return the count of the number of periods.
CustomBusinessDay.weekmask Return the weekmask used for custom business day calculations.
CustomBusinessDay.calendar Return the calendar used for business day calculations.
CustomBusinessDay.holidays Return the holidays used for custom business day calculations.

Methods

Function Description
CustomBusinessDay.copy() Return a copy of the frequency.
CustomBusinessDay.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
CustomBusinessDay.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
CustomBusinessDay.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
CustomBusinessDay.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessDay.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessDay.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
CustomBusinessDay.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

CustomBusinessHour

Function Description
CustomBusinessHour DateOffset subclass representing possibly n custom business days.

Properties

Function Description
CustomBusinessHour.freqstr Return a string representing the frequency.
CustomBusinessHour.kwds Return a dict of extra parameters for the offset.
CustomBusinessHour.name Return a string representing the base frequency.
CustomBusinessHour.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessHour.normalize Return boolean whether the frequency can align with midnight.
CustomBusinessHour.rule_code Return a string representing the base frequency.
CustomBusinessHour.n Return the count of the number of periods.
CustomBusinessHour.weekmask Return the weekmask used for custom business day calculations.
CustomBusinessHour.calendar Return the calendar used for business day calculations.
CustomBusinessHour.holidays Return the holidays used for custom business day calculations.
CustomBusinessHour.start Return the start time(s) of the business hour.
CustomBusinessHour.end Return the end time(s) of the business hour.

Methods

Function Description
CustomBusinessHour.copy() Return a copy of the frequency.
CustomBusinessHour.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
CustomBusinessHour.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
CustomBusinessHour.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
CustomBusinessHour.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessHour.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessHour.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
CustomBusinessHour.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

MonthEnd

Function Description
MonthEnd DateOffset of one month end.

Properties

Function Description
MonthEnd.freqstr Return a string representing the frequency.
MonthEnd.kwds Return a dict of extra parameters for the offset.
MonthEnd.name Return a string representing the base frequency.
MonthEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
MonthEnd.normalize Return boolean whether the frequency can align with midnight.
MonthEnd.rule_code Return a string representing the base frequency.
MonthEnd.n Return the count of the number of periods.

Methods

Function Description
MonthEnd.copy() Return a copy of the frequency.
MonthEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
MonthEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
MonthEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
MonthEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
MonthEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
MonthEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
MonthEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

MonthBegin

Function Description
MonthBegin DateOffset of one month at beginning.

Properties

Function Description
MonthBegin.freqstr Return a string representing the frequency.
MonthBegin.kwds Return a dict of extra parameters for the offset.
MonthBegin.name Return a string representing the base frequency.
MonthBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
MonthBegin.normalize Return boolean whether the frequency can align with midnight.
MonthBegin.rule_code Return a string representing the base frequency.
MonthBegin.n Return the count of the number of periods.

Methods

Function Description
MonthBegin.copy() Return a copy of the frequency.
MonthBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
MonthBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
MonthBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
MonthBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
MonthBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
MonthBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
MonthBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BusinessMonthEnd

Function Description
BusinessMonthEnd DateOffset increments between the last business day of the month.

Alias:

Function Description
BMonthEnd alias of BusinessMonthEnd

Properties

Function Description
BusinessMonthEnd.freqstr Return a string representing the frequency.
BusinessMonthEnd.kwds Return a dict of extra parameters for the offset.
BusinessMonthEnd.name Return a string representing the base frequency.
BusinessMonthEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessMonthEnd.normalize Return boolean whether the frequency can align with midnight.
BusinessMonthEnd.rule_code Return a string representing the base frequency.
BusinessMonthEnd.n Return the count of the number of periods.

Methods

Function Description
BusinessMonthEnd.copy() Return a copy of the frequency.
BusinessMonthEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BusinessMonthEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BusinessMonthEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BusinessMonthEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BusinessMonthEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BusinessMonthEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BusinessMonthEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BusinessMonthBegin

Function Description
BusinessMonthBegin DateOffset of one month at the first business day.

Alias:

Function Description
BMonthBegin alias of BusinessMonthBegin

Properties

Function Description
BusinessMonthBegin.freqstr Return a string representing the frequency.
BusinessMonthBegin.kwds Return a dict of extra parameters for the offset.
BusinessMonthBegin.name Return a string representing the base frequency.
BusinessMonthBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessMonthBegin.normalize Return boolean whether the frequency can align with midnight.
BusinessMonthBegin.rule_code Return a string representing the base frequency.
BusinessMonthBegin.n Return the count of the number of periods.

Methods

Function Description
BusinessMonthBegin.copy() Return a copy of the frequency.
BusinessMonthBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BusinessMonthBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BusinessMonthBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BusinessMonthBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BusinessMonthBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BusinessMonthBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BusinessMonthBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

CustomBusinessMonthEnd

Function Description
CustomBusinessMonthEnd DateOffset subclass representing custom business month(s).

Alias:

Function Description
CBMonthEnd alias of CustomBusinessMonthEnd

Properties

Function Description
CustomBusinessMonthEnd.freqstr Return a string representing the frequency.
CustomBusinessMonthEnd.kwds Return a dict of extra parameters for the offset.
CustomBusinessMonthEnd.m_offset Return a MonthBegin or MonthEnd offset.
CustomBusinessMonthEnd.name Return a string representing the base frequency.
CustomBusinessMonthEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessMonthEnd.normalize Return boolean whether the frequency can align with midnight.
CustomBusinessMonthEnd.rule_code Return a string representing the base frequency.
CustomBusinessMonthEnd.n Return the count of the number of periods.
CustomBusinessMonthEnd.weekmask Return the weekmask used for custom business day calculations.
CustomBusinessMonthEnd.calendar Return the calendar used for business day calculations.
CustomBusinessMonthEnd.holidays Return the holidays used for custom business day calculations.

Methods

Function Description
CustomBusinessMonthEnd.copy() Return a copy of the frequency.
CustomBusinessMonthEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
CustomBusinessMonthEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
CustomBusinessMonthEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
CustomBusinessMonthEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessMonthEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessMonthEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
CustomBusinessMonthEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

CustomBusinessMonthBegin

Function Description
CustomBusinessMonthBegin DateOffset subclass representing custom business month(s).

Alias:

Function Description
CBMonthBegin alias of CustomBusinessMonthBegin

Properties

Function Description
CustomBusinessMonthBegin.freqstr Return a string representing the frequency.
CustomBusinessMonthBegin.kwds Return a dict of extra parameters for the offset.
CustomBusinessMonthBegin.m_offset Return a MonthBegin or MonthEnd offset.
CustomBusinessMonthBegin.name Return a string representing the base frequency.
CustomBusinessMonthBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessMonthBegin.normalize Return boolean whether the frequency can align with midnight.
CustomBusinessMonthBegin.rule_code Return a string representing the base frequency.
CustomBusinessMonthBegin.n Return the count of the number of periods.
CustomBusinessMonthBegin.weekmask Return the weekmask used for custom business day calculations.
CustomBusinessMonthBegin.calendar Return the calendar used for business day calculations.
CustomBusinessMonthBegin.holidays Return the holidays used for custom business day calculations.

Methods

Function Description
CustomBusinessMonthBegin.copy() Return a copy of the frequency.
CustomBusinessMonthBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
CustomBusinessMonthBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
CustomBusinessMonthBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
CustomBusinessMonthBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessMonthBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessMonthBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
CustomBusinessMonthBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

SemiMonthEnd

Function Description
SemiMonthEnd Two DateOffset's per month repeating on the last day of the month & day_of_month.

Properties

Function Description
SemiMonthEnd.freqstr Return a string representing the frequency.
SemiMonthEnd.kwds Return a dict of extra parameters for the offset.
SemiMonthEnd.name Return a string representing the base frequency.
SemiMonthEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
SemiMonthEnd.normalize Return boolean whether the frequency can align with midnight.
SemiMonthEnd.rule_code
SemiMonthEnd.n Return the count of the number of periods.
SemiMonthEnd.day_of_month Return the day of the month for the semi-monthly offset.

Methods

Function Description
SemiMonthEnd.copy() Return a copy of the frequency.
SemiMonthEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
SemiMonthEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
SemiMonthEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
SemiMonthEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
SemiMonthEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
SemiMonthEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
SemiMonthEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

SemiMonthBegin

Function Description
SemiMonthBegin Two DateOffset's per month repeating on the first day of the month & day_of_month.

Properties

Function Description
SemiMonthBegin.freqstr Return a string representing the frequency.
SemiMonthBegin.kwds Return a dict of extra parameters for the offset.
SemiMonthBegin.name Return a string representing the base frequency.
SemiMonthBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
SemiMonthBegin.normalize Return boolean whether the frequency can align with midnight.
SemiMonthBegin.rule_code
SemiMonthBegin.n Return the count of the number of periods.
SemiMonthBegin.day_of_month Return the day of the month for the semi-monthly offset.

Methods

Function Description
SemiMonthBegin.copy() Return a copy of the frequency.
SemiMonthBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
SemiMonthBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
SemiMonthBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
SemiMonthBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
SemiMonthBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
SemiMonthBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
SemiMonthBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Week

Function Description
Week Weekly offset.

Properties

Function Description
Week.freqstr Return a string representing the frequency.
Week.kwds Return a dict of extra parameters for the offset.
Week.name Return a string representing the base frequency.
Week.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
Week.normalize Return boolean whether the frequency can align with midnight.
Week.rule_code Return a string representing the base frequency.
Week.n Return the count of the number of periods.
Week.weekday Return the day of the week on which the offset is applied.

Methods

Function Description
Week.copy() Return a copy of the frequency.
Week.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Week.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Week.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Week.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Week.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Week.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Week.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

WeekOfMonth

Function Description
WeekOfMonth Describes monthly dates like "the Tuesday of the 2nd week of each month".

Properties

Function Description
WeekOfMonth.freqstr Return a string representing the frequency.
WeekOfMonth.kwds Return a dict of extra parameters for the offset.
WeekOfMonth.name Return a string representing the base frequency.
WeekOfMonth.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
WeekOfMonth.normalize Return boolean whether the frequency can align with midnight.
WeekOfMonth.rule_code Return a string representing the base frequency.
WeekOfMonth.n Return the count of the number of periods.
WeekOfMonth.week

Methods

Function Description
WeekOfMonth.copy() Return a copy of the frequency.
WeekOfMonth.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
WeekOfMonth.weekday
WeekOfMonth.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
WeekOfMonth.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
WeekOfMonth.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
WeekOfMonth.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
WeekOfMonth.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
WeekOfMonth.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

LastWeekOfMonth

Function Description
LastWeekOfMonth Describes monthly dates in last week of month.

Properties

Function Description
LastWeekOfMonth.freqstr Return a string representing the frequency.
LastWeekOfMonth.kwds Return a dict of extra parameters for the offset.
LastWeekOfMonth.name Return a string representing the base frequency.
LastWeekOfMonth.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
LastWeekOfMonth.normalize Return boolean whether the frequency can align with midnight.
LastWeekOfMonth.rule_code Return a string representing the base frequency.
LastWeekOfMonth.n Return the count of the number of periods.
LastWeekOfMonth.weekday
LastWeekOfMonth.week

Methods

Function Description
LastWeekOfMonth.copy() Return a copy of the frequency.
LastWeekOfMonth.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
LastWeekOfMonth.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
LastWeekOfMonth.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
LastWeekOfMonth.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
LastWeekOfMonth.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
LastWeekOfMonth.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
LastWeekOfMonth.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BQuarterEnd

Function Description
BQuarterEnd DateOffset increments between the last business day of each Quarter.

Properties

Function Description
BQuarterEnd.freqstr Return a string representing the frequency.
BQuarterEnd.kwds Return a dict of extra parameters for the offset.
BQuarterEnd.name Return a string representing the base frequency.
BQuarterEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BQuarterEnd.normalize Return boolean whether the frequency can align with midnight.
BQuarterEnd.rule_code Return a string representing the frequency with month suffix.
BQuarterEnd.n Return the count of the number of periods.
BQuarterEnd.startingMonth Return the month of the year from which quarters start.

Methods

Function Description
BQuarterEnd.copy() Return a copy of the frequency.
BQuarterEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BQuarterEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BQuarterEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BQuarterEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BQuarterEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BQuarterEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BQuarterEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BQuarterBegin

Function Description
BQuarterBegin DateOffset increments between the first business day of each Quarter.

Properties

Function Description
BQuarterBegin.freqstr Return a string representing the frequency.
BQuarterBegin.kwds Return a dict of extra parameters for the offset.
BQuarterBegin.name Return a string representing the base frequency.
BQuarterBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BQuarterBegin.normalize Return boolean whether the frequency can align with midnight.
BQuarterBegin.rule_code Return a string representing the frequency with month suffix.
BQuarterBegin.n Return the count of the number of periods.
BQuarterBegin.startingMonth Return the month of the year from which quarters start.

Methods

Function Description
BQuarterBegin.copy() Return a copy of the frequency.
BQuarterBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BQuarterBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BQuarterBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BQuarterBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BQuarterBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BQuarterBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BQuarterBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

QuarterEnd

Function Description
QuarterEnd DateOffset increments between Quarter end dates.

Properties

Function Description
QuarterEnd.freqstr Return a string representing the frequency.
QuarterEnd.kwds Return a dict of extra parameters for the offset.
QuarterEnd.name Return a string representing the base frequency.
QuarterEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
QuarterEnd.normalize Return boolean whether the frequency can align with midnight.
QuarterEnd.rule_code Return a string representing the frequency with month suffix.
QuarterEnd.n Return the count of the number of periods.
QuarterEnd.startingMonth Return the month of the year from which quarters start.

Methods

Function Description
QuarterEnd.copy() Return a copy of the frequency.
QuarterEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
QuarterEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
QuarterEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
QuarterEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
QuarterEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
QuarterEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
QuarterEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

QuarterBegin

Function Description
QuarterBegin DateOffset increments between Quarter start dates.

Properties

Function Description
QuarterBegin.freqstr Return a string representing the frequency.
QuarterBegin.kwds Return a dict of extra parameters for the offset.
QuarterBegin.name Return a string representing the base frequency.
QuarterBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
QuarterBegin.normalize Return boolean whether the frequency can align with midnight.
QuarterBegin.rule_code Return a string representing the frequency with month suffix.
QuarterBegin.n Return the count of the number of periods.
QuarterBegin.startingMonth Return the month of the year from which quarters start.

Methods

Function Description
QuarterBegin.copy() Return a copy of the frequency.
QuarterBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
QuarterBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
QuarterBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
QuarterBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
QuarterBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
QuarterBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
QuarterBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BHalfYearEnd

Function Description
BHalfYearEnd DateOffset increments between the last business day of each half-year.

Properties

Function Description
BHalfYearEnd.freqstr Return a string representing the frequency.
BHalfYearEnd.kwds Return a dict of extra parameters for the offset.
BHalfYearEnd.name Return a string representing the base frequency.
BHalfYearEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BHalfYearEnd.normalize Return boolean whether the frequency can align with midnight.
BHalfYearEnd.rule_code Return a string representing the frequency with month suffix.
BHalfYearEnd.n Return the count of the number of periods.
BHalfYearEnd.startingMonth Return the month of the year from which half-years start.

Methods

Function Description
BHalfYearEnd.copy() Return a copy of the frequency.
BHalfYearEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BHalfYearEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BHalfYearEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BHalfYearEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BHalfYearEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BHalfYearEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BHalfYearEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BHalfYearBegin

Function Description
BHalfYearBegin DateOffset increments between the first business day of each half-year.

Properties

Function Description
BHalfYearBegin.freqstr Return a string representing the frequency.
BHalfYearBegin.kwds Return a dict of extra parameters for the offset.
BHalfYearBegin.name Return a string representing the base frequency.
BHalfYearBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BHalfYearBegin.normalize Return boolean whether the frequency can align with midnight.
BHalfYearBegin.rule_code Return a string representing the frequency with month suffix.
BHalfYearBegin.n Return the count of the number of periods.
BHalfYearBegin.startingMonth Return the month of the year from which half-years start.

Methods

Function Description
BHalfYearBegin.copy() Return a copy of the frequency.
BHalfYearBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BHalfYearBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BHalfYearBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BHalfYearBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BHalfYearBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BHalfYearBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BHalfYearBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

HalfYearEnd

Function Description
HalfYearEnd DateOffset increments between half-year end dates.

Properties

Function Description
HalfYearEnd.freqstr Return a string representing the frequency.
HalfYearEnd.kwds Return a dict of extra parameters for the offset.
HalfYearEnd.name Return a string representing the base frequency.
HalfYearEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
HalfYearEnd.normalize Return boolean whether the frequency can align with midnight.
HalfYearEnd.rule_code Return a string representing the frequency with month suffix.
HalfYearEnd.n Return the count of the number of periods.
HalfYearEnd.startingMonth Return the month of the year from which half-years start.

Methods

Function Description
HalfYearEnd.copy() Return a copy of the frequency.
HalfYearEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
HalfYearEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
HalfYearEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
HalfYearEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
HalfYearEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
HalfYearEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
HalfYearEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

HalfYearBegin

Function Description
HalfYearBegin DateOffset increments between half-year start dates.

Properties

Function Description
HalfYearBegin.freqstr Return a string representing the frequency.
HalfYearBegin.kwds Return a dict of extra parameters for the offset.
HalfYearBegin.name Return a string representing the base frequency.
HalfYearBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
HalfYearBegin.normalize Return boolean whether the frequency can align with midnight.
HalfYearBegin.rule_code Return a string representing the frequency with month suffix.
HalfYearBegin.n Return the count of the number of periods.
HalfYearBegin.startingMonth Return the month of the year from which half-years start.

Methods

Function Description
HalfYearBegin.copy() Return a copy of the frequency.
HalfYearBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
HalfYearBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
HalfYearBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
HalfYearBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
HalfYearBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
HalfYearBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
HalfYearBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BYearEnd

Function Description
BYearEnd DateOffset increments between the last business day of the year.

Properties

Function Description
BYearEnd.freqstr Return a string representing the frequency.
BYearEnd.kwds Return a dict of extra parameters for the offset.
BYearEnd.name Return a string representing the base frequency.
BYearEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BYearEnd.normalize Return boolean whether the frequency can align with midnight.
BYearEnd.rule_code Return a string representing the base frequency.
BYearEnd.n Return the count of the number of periods.
BYearEnd.month Return the month of the year on which this offset applies.

Methods

Function Description
BYearEnd.copy() Return a copy of the frequency.
BYearEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BYearEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BYearEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BYearEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BYearEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BYearEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BYearEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

BYearBegin

Function Description
BYearBegin DateOffset increments between the first business day of the year.

Properties

Function Description
BYearBegin.freqstr Return a string representing the frequency.
BYearBegin.kwds Return a dict of extra parameters for the offset.
BYearBegin.name Return a string representing the base frequency.
BYearBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
BYearBegin.normalize Return boolean whether the frequency can align with midnight.
BYearBegin.rule_code Return a string representing the base frequency.
BYearBegin.n Return the count of the number of periods.
BYearBegin.month Return the month of the year on which this offset applies.

Methods

Function Description
BYearBegin.copy() Return a copy of the frequency.
BYearBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
BYearBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
BYearBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
BYearBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
BYearBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
BYearBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
BYearBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

YearEnd

Function Description
YearEnd([n, normalize, month]) DateOffset increments between calendar year end dates.

Properties

Function Description
YearEnd.freqstr Return a string representing the frequency.
YearEnd.kwds Return a dict of extra parameters for the offset.
YearEnd.name Return a string representing the base frequency.
YearEnd.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
YearEnd.normalize Return boolean whether the frequency can align with midnight.
YearEnd.rule_code Return a string representing the base frequency.
YearEnd.n Return the count of the number of periods.
YearEnd.month Return the month of the year on which this offset applies.

Methods

Function Description
YearEnd.copy() Return a copy of the frequency.
YearEnd.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
YearEnd.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
YearEnd.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
YearEnd.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
YearEnd.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
YearEnd.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
YearEnd.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

YearBegin

Function Description
YearBegin DateOffset increments between calendar year begin dates.

Properties

Function Description
YearBegin.freqstr Return a string representing the frequency.
YearBegin.kwds Return a dict of extra parameters for the offset.
YearBegin.name Return a string representing the base frequency.
YearBegin.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
YearBegin.normalize Return boolean whether the frequency can align with midnight.
YearBegin.rule_code Return a string representing the base frequency.
YearBegin.n Return the count of the number of periods.
YearBegin.month Return the month of the year on which this offset applies.

Methods

Function Description
YearBegin.copy() Return a copy of the frequency.
YearBegin.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
YearBegin.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
YearBegin.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
YearBegin.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
YearBegin.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
YearBegin.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
YearBegin.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

FY5253

Function Description
FY5253 Describes 52-53 week fiscal year.

Properties

Function Description
FY5253.freqstr Return a string representing the frequency.
FY5253.kwds Return a dict of extra parameters for the offset.
FY5253.name Return a string representing the base frequency.
FY5253.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
FY5253.normalize Return boolean whether the frequency can align with midnight.
FY5253.rule_code
FY5253.n Return the count of the number of periods.
FY5253.startingMonth
FY5253.variation
FY5253.weekday Return the weekday used by the fiscal year.

Methods

Function Description
FY5253.copy() Return a copy of the frequency.
FY5253.get_rule_code_suffix() Return the suffix component of the rule code.
FY5253.get_year_end(dt)
FY5253.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
FY5253.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
FY5253.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
FY5253.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
FY5253.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
FY5253.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
FY5253.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

FY5253Quarter

Function Description
FY5253Quarter DateOffset increments between business quarter dates for 52-53 week fiscal year.

Properties

Function Description
FY5253Quarter.freqstr Return a string representing the frequency.
FY5253Quarter.kwds Return a dict of extra parameters for the offset.
FY5253Quarter.name Return a string representing the base frequency.
FY5253Quarter.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
FY5253Quarter.normalize Return boolean whether the frequency can align with midnight.
FY5253Quarter.rule_code
FY5253Quarter.n Return the count of the number of periods.
FY5253Quarter.qtr_with_extra_week
FY5253Quarter.startingMonth
FY5253Quarter.variation
FY5253Quarter.weekday Return the weekday used by the fiscal year.

Methods

Function Description
FY5253Quarter.copy() Return a copy of the frequency.
FY5253Quarter.get_rule_code_suffix() Return the suffix component of the rule code.
FY5253Quarter.get_weeks(dt)
FY5253Quarter.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
FY5253Quarter.year_has_extra_week(dt)
FY5253Quarter.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
FY5253Quarter.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
FY5253Quarter.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
FY5253Quarter.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
FY5253Quarter.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
FY5253Quarter.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Easter

Function Description
Easter DateOffset for the Easter holiday using logic defined in dateutil.

Properties

Function Description
Easter.freqstr Return a string representing the frequency.
Easter.kwds Return a dict of extra parameters for the offset.
Easter.name Return a string representing the base frequency.
Easter.nanos Returns an integer of the total number of nanoseconds for fixed frequencies.
Easter.normalize Return boolean whether the frequency can align with midnight.
Easter.rule_code Return a string representing the base frequency.
Easter.n Return the count of the number of periods.

Methods

Function Description
Easter.copy() Return a copy of the frequency.
Easter.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Easter.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Easter.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Easter.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Easter.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Easter.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Easter.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Tick

Function Description
Tick Base class for fixed frequency offsets (Milli, Micro, Second, Minute, Hour).

Properties

Function Description
Tick.freqstr Return a string representing the frequency.
Tick.kwds Return a dict of extra parameters for the offset.
Tick.name Return a string representing the base frequency.
Tick.nanos Returns an integer of the total number of nanoseconds.
Tick.normalize Return boolean whether the frequency can align with midnight.
Tick.rule_code Return a string representing the base frequency.
Tick.n Return the count of the number of periods.

Methods

Function Description
Tick.copy() Return a copy of the frequency.
Tick.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Tick.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Tick.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Tick.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Tick.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Tick.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Tick.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Day

Function Description
Day Offset n days.

Properties

Function Description
Day.freqstr Return a string representing the frequency.
Day.kwds Return a dict of extra parameters for the offset.
Day.name Return a string representing the base frequency.
Day.nanos Returns an integer of the total number of nanoseconds.
Day.normalize Return boolean whether the frequency can align with midnight.
Day.rule_code Return a string representing the base frequency.
Day.n Return the count of the number of periods.

Methods

Function Description
Day.copy() Return a copy of the frequency.
Day.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Day.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Day.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Day.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Day.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Day.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Day.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Hour

Function Description
Hour Offset n hours.

Properties

Function Description
Hour.freqstr Return a string representing the frequency.
Hour.kwds Return a dict of extra parameters for the offset.
Hour.name Return a string representing the base frequency.
Hour.nanos Returns an integer of the total number of nanoseconds.
Hour.normalize Return boolean whether the frequency can align with midnight.
Hour.rule_code Return a string representing the base frequency.
Hour.n Return the count of the number of periods.

Methods

Function Description
Hour.copy() Return a copy of the frequency.
Hour.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Hour.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Hour.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Hour.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Hour.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Hour.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Hour.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Minute

Function Description
Minute Offset n minutes.

Properties

Function Description
Minute.freqstr Return a string representing the frequency.
Minute.kwds Return a dict of extra parameters for the offset.
Minute.name Return a string representing the base frequency.
Minute.nanos Returns an integer of the total number of nanoseconds.
Minute.normalize Return boolean whether the frequency can align with midnight.
Minute.rule_code Return a string representing the base frequency.
Minute.n Return the count of the number of periods.

Methods

Function Description
Minute.copy() Return a copy of the frequency.
Minute.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Minute.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Minute.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Minute.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Minute.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Minute.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Minute.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Second

Function Description
Second Offset n seconds.

Properties

Function Description
Second.freqstr Return a string representing the frequency.
Second.kwds Return a dict of extra parameters for the offset.
Second.name Return a string representing the base frequency.
Second.nanos Returns an integer of the total number of nanoseconds.
Second.normalize Return boolean whether the frequency can align with midnight.
Second.rule_code Return a string representing the base frequency.
Second.n Return the count of the number of periods.

Methods

Function Description
Second.copy() Return a copy of the frequency.
Second.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Second.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Second.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Second.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Second.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Second.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Second.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Milli

Function Description
Milli Offset n milliseconds.

Properties

Function Description
Milli.freqstr Return a string representing the frequency.
Milli.kwds Return a dict of extra parameters for the offset.
Milli.name Return a string representing the base frequency.
Milli.nanos Returns an integer of the total number of nanoseconds.
Milli.normalize Return boolean whether the frequency can align with midnight.
Milli.rule_code Return a string representing the base frequency.
Milli.n Return the count of the number of periods.

Methods

Function Description
Milli.copy() Return a copy of the frequency.
Milli.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Milli.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Milli.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Milli.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Milli.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Milli.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Milli.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Micro

Function Description
Micro Offset n microseconds.

Properties

Function Description
Micro.freqstr Return a string representing the frequency.
Micro.kwds Return a dict of extra parameters for the offset.
Micro.name Return a string representing the base frequency.
Micro.nanos Returns an integer of the total number of nanoseconds.
Micro.normalize Return boolean whether the frequency can align with midnight.
Micro.rule_code Return a string representing the base frequency.
Micro.n Return the count of the number of periods.

Methods

Function Description
Micro.copy() Return a copy of the frequency.
Micro.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Micro.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Micro.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Micro.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Micro.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Micro.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Micro.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Nano

Function Description
Nano Offset n nanoseconds.

Properties

Function Description
Nano.freqstr Return a string representing the frequency.
Nano.kwds Return a dict of extra parameters for the offset.
Nano.name Return a string representing the base frequency.
Nano.nanos Returns an integer of the total number of nanoseconds.
Nano.normalize Return boolean whether the frequency can align with midnight.
Nano.rule_code Return a string representing the base frequency.
Nano.n Return the count of the number of periods.

Methods

Function Description
Nano.copy() Return a copy of the frequency.
Nano.is_on_offset(dt) Return boolean whether a timestamp intersects with this frequency.
Nano.is_month_start(ts) Return boolean whether a timestamp occurs on the month start.
Nano.is_month_end(ts) Return boolean whether a timestamp occurs on the month end.
Nano.is_quarter_start(ts) Return boolean whether a timestamp occurs on the quarter start.
Nano.is_quarter_end(ts) Return boolean whether a timestamp occurs on the quarter end.
Nano.is_year_start(ts) Return boolean whether a timestamp occurs on the year start.
Nano.is_year_end(ts) Return boolean whether a timestamp occurs on the year end.

Frequencies

Function Description
to_offset(freq[, is_period]) Return DateOffset object from string or datetime.timedelta object.

Examples for pandas.tseries.offsets.DateOffset

>>> from pandas.tseries.offsets import DateOffset
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=3)
Timestamp('2017-04-01 09:10:11')

>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=2)
Timestamp('2017-03-01 09:10:11')
>>> ts + DateOffset(day=31)
Timestamp('2017-01-31 09:10:11')

>>> ts + pd.DateOffset(hour=8)
Timestamp('2017-01-01 08:10:11')

Examples for pandas.tseries.offsets.DateOffset.freqstr

>>> pd.DateOffset(5).freqstr
'<5 * DateOffsets>'

>>> pd.offsets.BusinessHour(2).freqstr
'2bh'

>>> pd.offsets.Nano().freqstr
'ns'

>>> pd.offsets.Nano(-3).freqstr
'-3ns'

Examples for pandas.tseries.offsets.DateOffset.kwds

>>> pd.DateOffset(5).kwds
{}

>>> pd.offsets.FY5253Quarter().kwds
{'weekday': 0,
 'startingMonth': 1,
 'qtr_with_extra_week': 1,
 'variation': 'nearest'}

Examples for pandas.tseries.offsets.DateOffset.name

>>> pd.offsets.Hour().name
'h'

>>> pd.offsets.Hour(5).name
'h'

Examples for pandas.tseries.offsets.DateOffset.nanos

>>> pd.offsets.Week(n=1).nanos
ValueError: Week: weekday=None is a non-fixed frequency

Examples for pandas.tseries.offsets.DateOffset.normalize

>>> pd.offsets.Hour(5).normalize
False

>>> pd.offsets.Day(5).normalize
False

Examples for pandas.tseries.offsets.DateOffset.rule_code

>>> pd.offsets.Hour().rule_code
'h'

>>> pd.offsets.Week(5).rule_code
'W'

Examples for pandas.tseries.offsets.DateOffset.n

>>> pd.offsets.Hour(5).n
5

>>> pd.offsets.Day(3).n
3

Examples for pandas.tseries.offsets.DateOffset.copy

>>> freq = pd.DateOffset(1)
>>> freq_copy = freq.copy()
>>> freq is freq_copy
False

Examples for pandas.tseries.offsets.DateOffset.is_on_offset

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Day(1)
>>> freq.is_on_offset(ts)
True

>>> ts = pd.Timestamp(2022, 8, 6)
>>> ts.day_name()
'Saturday'
>>> freq = pd.offsets.BusinessDay(1)
>>> freq.is_on_offset(ts)
False

Examples for pandas.tseries.offsets.DateOffset.is_month_start

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_start(ts)
True

Examples for pandas.tseries.offsets.DateOffset.is_month_end

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_end(ts)
False

Examples for pandas.tseries.offsets.DateOffset.is_quarter_start

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_start(ts)
True

Examples for pandas.tseries.offsets.DateOffset.is_quarter_end

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_end(ts)
False

Examples for pandas.tseries.offsets.DateOffset.is_year_start

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_start(ts)
True

Examples for pandas.tseries.offsets.DateOffset.is_year_end

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_end(ts)
False

Examples for pandas.tseries.offsets.DateOffset.rollback

>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()

Timestamp is not on the offset (not a month end), so it rolls backward:

>>> offset.rollback(ts)
Timestamp('2024-12-31 00:00:00')

If the timestamp is already on the offset, it remains unchanged:

>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollback(ts_on_offset)
Timestamp('2025-01-31 00:00:00')

Examples for pandas.tseries.offsets.DateOffset.rollforward

>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()

Timestamp is not on the offset (not a month end), so it rolls forward:

>>> offset.rollforward(ts)
Timestamp('2025-01-31 00:00:00')

If the timestamp is already on the offset, it remains unchanged:

>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollforward(ts_on_offset)
Timestamp('2025-01-31 00:00:00')

Examples for pandas.tseries.offsets.BusinessDay

You can use the parameter n to represent a shift of n business days.

>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts.strftime('%a %d %b %Y %H:%M')
'Fri 09 Dec 2022 15:00'
>>> (ts + pd.offsets.BusinessDay(n=5)).strftime('%a %d %b %Y %H:%M')
'Fri 16 Dec 2022 15:00'

Passing the parameter normalize equal to True, you shift the start of the next business day to midnight.

>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts + pd.offsets.BusinessDay(normalize=True)
Timestamp('2022-12-12 00:00:00')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment