Pandas API Reference Documentation with Examples

Complete API reference with examples for pandas library

Extracted from: https://pandas.pydata.org/docs/reference/

Input/output
General functions
Series
DataFrame
pandas arrays, scalars, and data types
Index objects
Date offsets
Window
GroupBy
Resampling
Style
Plotting

Input/output

Pickling

Function	Description
read_pickle(filepath_or_buffer[, ...])	Load pickled pandas object (or any object) from file and return unpickled object.
DataFrame.to_pickle(path, *[, compression, ...])	Pickle (serialize) object to file.

Flat file

Function	Description
read_table(filepath_or_buffer, *[, sep, ...])	Read general delimited file into DataFrame.
read_csv(filepath_or_buffer, *[, sep, ...])	Read a comma-separated values (csv) file into DataFrame.
DataFrame.to_csv([path_or_buf, sep, na_rep, ...])	Write object to a comma-separated values (csv) file.
read_fwf(filepath_or_buffer, *[, colspecs, ...])	Read a table of fixed-width formatted lines into DataFrame.

Clipboard

Function	Description
read_clipboard([sep, dtype_backend])	Read text from clipboard and pass to read_csv().
DataFrame.to_clipboard(*[, excel, sep])	Copy object to the system clipboard.

Excel

Function	Description
read_excel(io[, sheet_name, header, names, ...])	Read an Excel file into a `DataFrame`.
DataFrame.to_excel(excel_writer, *[, ...])	Write object to an Excel sheet.
ExcelFile(path_or_buffer[, engine, ...])	Class for parsing tabular Excel sheets into DataFrame objects.
ExcelFile.book	Gets the Excel workbook.
ExcelFile.sheet_names	Names of the sheets in the document.
ExcelFile.parse([sheet_name, header, names, ...])	Parse specified sheet(s) into a DataFrame.

Function	Description
Styler.to_excel(excel_writer[, sheet_name, ...])	Write Styler to an Excel sheet.

Function	Description
ExcelWriter(path[, engine, date_format, ...])	Class for writing DataFrame objects into excel sheets.

JSON

Function	Description
read_json(path_or_buf, *[, orient, typ, ...])	Convert a JSON string to pandas object.
json_normalize(data[, record_path, meta, ...])	Normalize semi-structured JSON data into a flat table.
DataFrame.to_json([path_or_buf, orient, ...])	Convert the object to a JSON string.

Function	Description
build_table_schema(data[, index, ...])	Create a Table schema from `data`.

HTML

Function	Description
read_html(io, *[, match, flavor, header, ...])	Read HTML tables into a `list` of `DataFrame` objects.
DataFrame.to_html([buf, columns, col_space, ...])	Render a DataFrame as an HTML table.

Function	Description
Styler.to_html([buf, table_uuid, ...])	Write Styler to a file, buffer or string in HTML-CSS format.

XML

Function	Description
read_xml(path_or_buffer, *[, xpath, ...])	Read XML document into a `DataFrame` object.
DataFrame.to_xml([path_or_buffer, index, ...])	Render a DataFrame to an XML document.

Latex

Function	Description
DataFrame.to_latex([buf, columns, header, ...])	Render object to a LaTeX tabular, longtable, or nested table.

Function	Description
Styler.to_latex([buf, column_format, ...])	Write Styler to a file, buffer or string in LaTeX format.

HDFStore: PyTables (HDF5)

Function	Description
read_hdf(path_or_buf[, key, mode, errors, ...])	Read from the store, close it if we opened it.
HDFStore.put(key, value[, format, index, ...])	Store object in HDFStore.
HDFStore.append(key, value[, format, axes, ...])	Append to Table in file.
HDFStore.get(key)	Retrieve pandas object stored in file.
HDFStore.select(key[, where, start, stop, ...])	Retrieve pandas object stored in file, optionally based on where criteria.
HDFStore.info()	Print detailed information on the store.
HDFStore.keys([include])	Return a list of keys corresponding to objects stored in HDFStore.
HDFStore.groups()	Return a list of all the top-level nodes.
HDFStore.walk([where])	Walk the pytables group hierarchy for pandas objects.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

Feather

Function	Description
read_feather(path[, columns, use_threads, ...])	Load a feather-format object from the file path.
DataFrame.to_feather(path, **kwargs)	Write a DataFrame to the binary Feather format.

Parquet

Function	Description
read_parquet(path[, engine, columns, ...])	Load a parquet object from the file path, returning a DataFrame.
DataFrame.to_parquet([path, engine, ...])	Write a DataFrame to the binary parquet format.

Iceberg

Function	Description
read_iceberg(table_identifier[, ...])	Read an Apache Iceberg table into a pandas DataFrame.
DataFrame.to_iceberg(table_identifier[, ...])	Write a DataFrame to an Apache Iceberg table.

Warning

read_iceberg is experimental and may change without warning.

ORC

Function	Description
read_orc(path[, columns, dtype_backend, ...])	Load an ORC object from the file path, returning a DataFrame.
DataFrame.to_orc([path, engine, index, ...])	Write a DataFrame to the Optimized Row Columnar (ORC) format.

SAS

Function	Description
read_sas(filepath_or_buffer, *[, format, ...])	Read SAS files stored as either XPORT or SAS7BDAT format files.

SPSS

Function	Description
read_spss(path[, usecols, ...])	Load an SPSS file from the file path, returning a DataFrame.

SQL

Function	Description
read_sql_table(table_name, con[, schema, ...])	Read SQL database table into a DataFrame.
read_sql_query(sql, con[, index_col, ...])	Read SQL query into a DataFrame.
read_sql(sql, con[, index_col, ...])	Read SQL query or database table into a DataFrame.
DataFrame.to_sql(name, con, *[, schema, ...])	Write records stored in a DataFrame to a SQL database.

STATA

Function	Description
read_stata(filepath_or_buffer, *[, ...])	Read Stata file into DataFrame.
DataFrame.to_stata(path, *[, convert_dates, ...])	Export DataFrame object to Stata dta format.

Function	Description
StataReader.data_label	Return data label of Stata file.
StataReader.value_labels()	Return a nested dict associating each variable name to its value and label.
StataReader.variable_labels()	Return a dict associating each variable name with corresponding label.
StataWriter.write_file()	Export DataFrame object to Stata dta format.

Examples for `pandas.read_pickle`

>>> original_df = pd.DataFrame(
...     {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> pd.to_pickle(original_df, "./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

Examples for `pandas.DataFrame.to_pickle`

>>> original_df = pd.DataFrame(
...     {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")

>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
   foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

Examples for `pandas.read_table`

>>> pd.read_table("data.csv")
   Name  Value
0   foo      1
1   bar      2
2  #baz      3

Index and header can be specified via the index_col and header arguments.

>>> pd.read_table("data.csv", header=None)
      0      1
0  Name  Value
1   foo      1
2   bar      2
3  #baz      3

>>> pd.read_table("data.csv", index_col="Value")
       Name
Value
1       foo
2       bar
3      #baz

Column types are inferred but can be explicitly specified using the dtype argument.

>>> pd.read_table("data.csv", dtype={{"Value": float}})
   Name  Value
0   foo    1.0
1   bar    2.0
2  #baz    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_table("data.csv", na_values=["foo", "bar"])
   Name  Value
0   NaN      1
1   NaN      2
2  #baz      3

Comment lines in the input file can be skipped using the comment argument.

>>> pd.read_table("data.csv", comment="#")
  Name  Value
0  foo      1
1  bar      2

By default, columns with dates will be read as object rather than datetime.

>>> df = pd.read_table("tmp.csv")

>>> df
   col 1       col 2            col 3
0     10  10/04/2018  Sun 15 Jan 2023
1     20  15/04/2018  Fri 12 May 2023

>>> df.dtypes
col 1     int64
col 2    object
col 3    object
dtype: object

Specific columns can be parsed as dates by using the parse_dates and date_format arguments.

>>> df = pd.read_table(
...     "tmp.csv",
...     parse_dates=[1, 2],
...     date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )

>>> df.dtypes
col 1             int64
col 2    datetime64[ns]
col 3    datetime64[ns]
dtype: object

Examples for `pandas.read_csv`

>>> pd.read_csv("data.csv")
   Name  Value
0   foo      1
1   bar      2
2  #baz      3

Index and header can be specified via the index_col and header arguments.

>>> pd.read_csv("data.csv", header=None)
      0      1
0  Name  Value
1   foo      1
2   bar      2
3  #baz      3

>>> pd.read_csv("data.csv", index_col="Value")
       Name
Value
1       foo
2       bar
3      #baz

Column types are inferred but can be explicitly specified using the dtype argument.

>>> pd.read_csv("data.csv", dtype={{"Value": float}})
   Name  Value
0   foo    1.0
1   bar    2.0
2  #baz    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_csv("data.csv", na_values=["foo", "bar"])
   Name  Value
0   NaN      1
1   NaN      2
2  #baz      3

Comment lines in the input file can be skipped using the comment argument.

>>> pd.read_csv("data.csv", comment="#")
  Name  Value
0  foo      1
1  bar      2

By default, columns with dates will be read as object rather than datetime.

>>> df = pd.read_csv("tmp.csv")

>>> df
   col 1       col 2            col 3
0     10  10/04/2018  Sun 15 Jan 2023
1     20  15/04/2018  Fri 12 May 2023

>>> df.dtypes
col 1     int64
col 2    object
col 3    object
dtype: object

Specific columns can be parsed as dates by using the parse_dates and date_format arguments.

>>> df = pd.read_csv(
...     "tmp.csv",
...     parse_dates=[1, 2],
...     date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )

>>> df.dtypes
col 1             int64
col 2    datetime64[ns]
col 3    datetime64[ns]
dtype: object

Examples for `pandas.DataFrame.to_csv`

Create ‘out.csv’ containing ‘df’ without indices

>>> df = pd.DataFrame(
...     [["Raphael", "red", "sai"], ["Donatello", "purple", "bo staff"]],
...     columns=["name", "mask", "weapon"],
... )
>>> df.to_csv("out.csv", index=False)

Create ‘out.zip’ containing ‘out.csv’

>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
>>> compression_opts = dict(
...     method="zip", archive_name="out.csv"
... )
>>> df.to_csv(
...     "out.zip", index=False, compression=compression_opts
... )

To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:

>>> from pathlib import Path
>>> filepath = Path("folder/subfolder/out.csv")
>>> filepath.parent.mkdir(parents=True, exist_ok=True)
>>> df.to_csv(filepath)

>>> import os
>>> os.makedirs("folder/subfolder", exist_ok=True)
>>> df.to_csv("folder/subfolder/out.csv")

Format floats to two decimal places:

>>> df.to_csv("out1.csv", float_format="%.2f")

Format floats using scientific notation:

>>> df.to_csv("out2.csv", float_format="{{:.2e}}".format)

Examples for `pandas.read_fwf`

>>> pd.read_fwf("data.csv")

Examples for `pandas.read_clipboard`

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_clipboard()
>>> pd.read_clipboard()
     A  B  C
0    1  2  3
1    4  5  6

Examples for `pandas.DataFrame.to_clipboard`

Copy the contents of a DataFrame to the clipboard.

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])

>>> df.to_clipboard(sep=",")
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

>>> df.to_clipboard(sep=",", index=False)
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6

Using the original pyperclip package for any string output format.

import pyperclip

html = df.style.to_html()
pyperclip.copy(html)

Examples for `pandas.read_excel`

The file can be read using the file name as string or an open file object:

>>> pd.read_excel("tmp.xlsx", index_col=0)
       Name  Value
0   string1      1
1   string2      2
2  #Comment      3

>>> pd.read_excel(open("tmp.xlsx", "rb"), sheet_name="Sheet3")
   Unnamed: 0      Name  Value
0           0   string1      1
1           1   string2      2
2           2  #Comment      3

Index and header can be specified via the index_col and header arguments

>>> pd.read_excel("tmp.xlsx", index_col=None, header=None)
     0         1      2
0  NaN      Name  Value
1  0.0   string1      1
2  1.0   string2      2
3  2.0  #Comment      3

Column types are inferred but can be explicitly specified

>>> pd.read_excel(
...     "tmp.xlsx", index_col=0, dtype={"Name": str, "Value": float}
... )
       Name  Value
0   string1    1.0
1   string2    2.0
2  #Comment    3.0

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!

>>> pd.read_excel(
...     "tmp.xlsx", index_col=0, na_values=["string1", "string2"]
... )
       Name  Value
0       NaN      1
1       NaN      2
2  #Comment      3

Comment lines in the excel input file can be skipped using the comment kwarg.

>>> pd.read_excel("tmp.xlsx", index_col=0, comment="#")
      Name  Value
0  string1    1.0
1  string2    2.0
2     None    NaN

Examples for `pandas.DataFrame.to_excel`

Create, write to and save a workbook:

>>> df1 = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_1")
...     df2.to_excel(writer, sheet_name="Sheet_name_2")

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_3")

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")

Examples for `pandas.ExcelFile`

>>> file = pd.ExcelFile("myfile.xlsx")
>>> with pd.ExcelFile("myfile.xls") as xls:
...     df1 = pd.read_excel(xls, "Sheet1")

Examples for `pandas.ExcelFile.book`

>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.book
<openpyxl.workbook.workbook.Workbook object at 0x11eb5ad70>
>>> file.book.path
'/xl/workbook.xml'
>>> file.book.active
<openpyxl.worksheet._read_only.ReadOnlyWorksheet object at 0x11eb5b370>
>>> file.book.sheetnames
['Sheet1', 'Sheet2']

Examples for `pandas.ExcelFile.sheet_names`

>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.sheet_names
["Sheet1", "Sheet2"]

Examples for `pandas.ExcelFile.parse`

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_excel("myfile.xlsx")
>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.parse()

Examples for `pandas.io.formats.style.Styler.to_excel`

Create, write to and save a workbook:

>>> df1 = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_1")
...     df2.to_excel(writer, sheet_name="Sheet_name_2")

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
...     df1.to_excel(writer, sheet_name="Sheet_name_3")

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")

Examples for `pandas.ExcelWriter`

Default usage:

>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
...     df.to_excel(writer)

To write to separate sheets in a single file:

>>> df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"])
>>> df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
...     df1.to_excel(writer, sheet_name="Sheet1")
...     df2.to_excel(writer, sheet_name="Sheet2")

You can set the date format or datetime format:

>>> from datetime import date, datetime
>>> df = pd.DataFrame(
...     [
...         [date(2014, 1, 31), date(1999, 9, 24)],
...         [datetime(1998, 5, 26, 23, 33, 4), datetime(2014, 2, 28, 13, 5, 13)],
...     ],
...     index=["Date", "Datetime"],
...     columns=["X", "Y"],
... )
>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     date_format="YYYY-MM-DD",
...     datetime_format="YYYY-MM-DD HH:MM:SS",
... ) as writer:
...     df.to_excel(writer)

You can also append to an existing Excel file:

>>> with pd.ExcelWriter("path_to_file.xlsx", mode="a", engine="openpyxl") as writer:
...     df.to_excel(writer, sheet_name="Sheet3")

Here, the if_sheet_exists parameter can be set to replace a sheet if it already exists:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     mode="a",
...     engine="openpyxl",
...     if_sheet_exists="replace",
... ) as writer:
...     df.to_excel(writer, sheet_name="Sheet1")

You can also write multiple DataFrames to a single sheet. Note that the if_sheet_exists parameter needs to be set to overlay:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     mode="a",
...     engine="openpyxl",
...     if_sheet_exists="overlay",
... ) as writer:
...     df1.to_excel(writer, sheet_name="Sheet1")
...     df2.to_excel(writer, sheet_name="Sheet1", startcol=3)

You can store Excel file in RAM:

>>> import io
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> buffer = io.BytesIO()
>>> with pd.ExcelWriter(buffer) as writer:
...     df.to_excel(writer)

You can pack Excel file into zip archive:

>>> import zipfile
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with zipfile.ZipFile("path_to_file.zip", "w") as zf:
...     with zf.open("filename.xlsx", "w") as buffer:
...         with pd.ExcelWriter(buffer) as writer:
...             df.to_excel(writer)

You can specify additional arguments to the underlying engine:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     engine="xlsxwriter",
...     engine_kwargs={{"options": {{"nan_inf_to_errors": True}}}},
... ) as writer:
...     df.to_excel(writer)

In append mode, engine_kwargs are passed through to openpyxl’s load_workbook:

>>> with pd.ExcelWriter(
...     "path_to_file.xlsx",
...     engine="openpyxl",
...     mode="a",
...     engine_kwargs={{"keep_vba": True}},
... ) as writer:
...     df.to_excel(writer, sheet_name="Sheet2")

Examples for `pandas.read_json`

>>> from io import StringIO
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )

Encoding/decoding a Dataframe using 'split' formatted JSON:

>>> df.to_json(orient="split")
'{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}'

>>> pd.read_json(StringIO(_), orient="split")  # noqa: F821
      col 1 col 2
row 1     a     b
row 2     c     d

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> df.to_json(orient="index")
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'

>>> pd.read_json(StringIO(_), orient="index")  # noqa: F821
      col 1 col 2
row 1     a     b
row 2     c     d

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> df.to_json(orient="records")
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'

>>> pd.read_json(StringIO(_), orient="records")  # noqa: F821
  col 1 col 2
0     a     b
1     c     d

Encoding with Table Schema

>>> df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string","extDtype":"str"},{"name":"col 1","type":"string","extDtype":"str"},{"name":"col 2","type":"string","extDtype":"str"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":"row 1","col 1":"a","col 2":"b"},{"index":"row 2","col 1":"c","col 2":"d"}]}'

The following example uses dtype_backend="numpy_nullable"

>>> data = '''{"index": {"0": 0, "1": 1},
...        "a": {"0": 1, "1": null},
...        "b": {"0": 2.5, "1": 4.5},
...        "c": {"0": true, "1": false},
...        "d": {"0": "a", "1": "b"},
...        "e": {"0": 1577.2, "1": 1577.1}}'''
>>> pd.read_json(StringIO(data), dtype_backend="numpy_nullable")
   index     a    b      c  d       e
0      0     1  2.5   True  a  1577.2
1      1  <NA>  4.5  False  b  1577.1

Examples for `pandas.json_normalize`

>>> data = [
...     {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
...     {"name": {"given": "Mark", "family": "Regner"}},
...     {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
    id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=0)
    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

Normalizes nested data up to level 1.

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height': 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> series = pd.Series(data, index=pd.Index(["a", "b", "c"]))
>>> pd.json_normalize(series)
    id        name  fitness.height  fitness.weight
a  1.0   Cole Volk             130              60
b  NaN    Mark Reg             130              60
c  2.0  Faye Raker             130              60

>>> data = [
...     {
...         "state": "Florida",
...         "shortname": "FL",
...         "info": {"governor": "Rick Scott"},
...         "counties": [
...             {"name": "Dade", "population": 12345},
...             {"name": "Broward", "population": 40000},
...             {"name": "Palm Beach", "population": 60000},
...         ],
...     },
...     {
...         "state": "Ohio",
...         "shortname": "OH",
...         "info": {"governor": "John Kasich"},
...         "counties": [
...             {"name": "Summit", "population": 1234},
...             {"name": "Cuyahoga", "population": 1337},
...         ],
...     },
... ]
>>> result = pd.json_normalize(
...     data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
    Prefix.0
0          1
1          2

Returns normalized data with columns prefixed with the given string.

Examples for `pandas.DataFrame.to_json`

>>> from json import loads, dumps
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )

>>> result = df.to_json(orient="split")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> result = df.to_json(orient="records")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
    {{
        "col 1": "a",
        "col 2": "b"
    }},
    {{
        "col 1": "c",
        "col 2": "d"
    }}
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> result = df.to_json(orient="index")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "row 1": {{
        "col 1": "a",
        "col 2": "b"
    }},
    "row 2": {{
        "col 1": "c",
        "col 2": "d"
    }}
}}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

>>> result = df.to_json(orient="columns")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "col 1": {{
        "row 1": "a",
        "row 2": "c"
    }},
    "col 2": {{
        "row 1": "b",
        "row 2": "d"
    }}
}}

Encoding/decoding a Dataframe using 'values' formatted JSON:

>>> result = df.to_json(orient="values")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

Encoding with Table Schema:

>>> result = df.to_json(orient="table")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
    "schema": {{
        "fields": [
            {{
                "name": "index",
                "type": "string"
            }},
            {{
                "name": "col 1",
                "type": "string"
            }},
            {{
                "name": "col 2",
                "type": "string"
            }}
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "1.4.0"
    }},
    "data": [
        {{
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        }},
        {{
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }}
    ]
}}

Examples for `pandas.io.json.build_table_schema`

>>> from pandas.io.json._table_schema import build_table_schema
>>> df = pd.DataFrame(
...     {'A': [1, 2, 3],
...      'B': ['a', 'b', 'c'],
...      'C': pd.date_range('2016-01-01', freq='D', periods=3),
...      }, index=pd.Index(range(3), name='idx'))
>>> build_table_schema(df)
{'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'A', 'type': 'integer'}, {'name': 'B', 'type': 'string', 'extDtype': 'str'}, {'name': 'C', 'type': 'datetime'}], 'primaryKey': ['idx'], 'pandas_version': '1.4.0'}

General functions

Data manipulations

Function	Description
melt(frame[, id_vars, value_vars, var_name, ...])	Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
pivot(data, *, columns[, index, values])	Return reshaped DataFrame organized by given index / column values.
pivot_table(data[, values, index, columns, ...])	Create a spreadsheet-style pivot table as a DataFrame.
crosstab(index, columns[, values, rownames, ...])	Compute a simple cross tabulation of two (or more) factors.
cut(x, bins[, right, labels, retbins, ...])	Bin values into discrete intervals.
qcut(x, q[, labels, retbins, precision, ...])	Quantile-based discretization function.
merge(left, right[, how, on, left_on, ...])	Merge DataFrame or named Series objects with a database-style join.
merge_ordered(left, right[, on, left_on, ...])	Perform a merge for ordered data with optional filling/interpolation.
merge_asof(left, right[, on, left_on, ...])	Perform a merge by key distance.
concat(objs, *[, axis, join, ignore_index, ...])	Concatenate pandas objects along a particular axis.
get_dummies(data[, prefix, prefix_sep, ...])	Convert categorical variable into dummy/indicator variables.
from_dummies(data[, sep, default_category])	Create a categorical `DataFrame` from a `DataFrame` of dummy variables.
factorize(values[, sort, use_na_sentinel, ...])	Encode the object as an enumerated type or categorical variable.
unique(values)	Return unique values based on a hash table.
lreshape(data, groups[, dropna])	Reshape wide-format data to long.
wide_to_long(df, stubnames, i, j[, sep, suffix])	Unpivot a DataFrame from wide to long format.

Top-level missing data

Function	Description
isna(obj)	Detect missing values for an array-like object.
isnull(obj)	Detect missing values for an array-like object.
notna(obj)	Detect non-missing values for an array-like object.
notnull(obj)	Detect non-missing values for an array-like object.

Top-level dealing with numeric data

Function	Description
to_numeric(arg[, errors, downcast, ...])	Convert argument to a numeric type.

Top-level dealing with datetimelike data

Function	Description
to_datetime(arg[, errors, dayfirst, ...])	Convert argument to datetime.
to_timedelta(arg[, unit, errors])	Convert argument to timedelta.
date_range([start, end, periods, freq, tz, ...])	Return a fixed frequency DatetimeIndex.
bdate_range([start, end, periods, freq, tz, ...])	Return a fixed frequency DatetimeIndex with business day as the default.
period_range([start, end, periods, freq, name])	Return a fixed frequency PeriodIndex.
timedelta_range([start, end, periods, freq, ...])	Return a fixed frequency TimedeltaIndex with day as the default.
infer_freq(index)	Infer the most likely frequency given the input index.

Top-level dealing with Interval data

Function	Description
interval_range([start, end, periods, freq, ...])	Return a fixed frequency IntervalIndex.

Top-level evaluation

Function	Description
col(col_name)	Generate deferred object representing a column of a DataFrame.
eval(expr[, parser, engine, local_dict, ...])	Evaluate a Python expression as a string using various backends.

Datetime formats

Function	Description
tseries.api.guess_datetime_format(dt_str[, ...])	Guess the datetime format of a given datetime string.

Hashing

Function	Description
util.hash_array(vals[, encoding, hash_key, ...])	Given a 1d array, return an array of deterministic integers.
util.hash_pandas_object(obj[, index, ...])	Return a data hash of the Index/Series/DataFrame.

Importing from other DataFrame libraries

Function	Description
api.interchange.from_dataframe(df[, allow_copy])	Build a `pd.DataFrame` from any DataFrame supporting the interchange protocol.

Examples for `pandas.melt`

>>> df = pd.DataFrame(
...     {
...         "A": {0: "a", 1: "b", 2: "c"},
...         "B": {0: 1, 1: 3, 2: 5},
...         "C": {0: 2, 1: 4, 2: 6},
...     }
... )
>>> df
A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

>>> pd.melt(df, id_vars=["A"], value_vars=["B"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

>>> pd.melt(
...     df,
...     id_vars=["A"],
...     value_vars=["B"],
...     var_name="myVarname",
...     value_name="myValname",
... )
A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"], ignore_index=False)
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
0  a        C      2
1  b        C      4
2  c        C      6

If you have multi-index columns:

>>> df.columns = [list("ABC"), list("DEF")]
>>> df
A  B  C
D  E  F
0  a  1  2
1  b  3  4
2  c  5  6

>>> pd.melt(df, col_level=0, id_vars=["A"], value_vars=["B"])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

>>> pd.melt(df, id_vars=[("A", "D")], value_vars=[("B", "E")])
(A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

Examples for `pandas.pivot`

>>> df = pd.DataFrame(
...     {
...         "foo": ["one", "one", "one", "two", "two", "two"],
...         "bar": ["A", "B", "C", "A", "B", "C"],
...         "baz": [1, 2, 3, 4, 5, 6],
...         "zoo": ["x", "y", "z", "q", "w", "t"],
...     }
... )
>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t

>>> df.pivot(index="foo", columns="bar", values="baz")
bar  A   B   C
foo
one  1   2   3
two  4   5   6

>>> df.pivot(index="foo", columns="bar")["baz"]
bar  A   B   C
foo
one  1   2   3
two  4   5   6

>>> df.pivot(index="foo", columns="bar", values=["baz", "zoo"])
      baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame(
...     {
...         "lev1": [1, 1, 1, 2, 2, 2],
...         "lev2": [1, 1, 2, 1, 1, 2],
...         "lev3": [1, 2, 1, 2, 1, 2],
...         "lev4": [1, 2, 3, 4, 5, 6],
...         "values": [0, 1, 2, 3, 4, 5],
...     }
... )
>>> df
    lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5

>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0

>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values")
      lev3    1    2
lev1  lev2
   1     1  0.0  1.0
         2  2.0  NaN
   2     1  4.0  3.0
         2  NaN  5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame(
...     {
...         "foo": ["one", "one", "two", "two"],
...         "bar": ["A", "A", "B", "C"],
...         "baz": [1, 2, 3, 4],
...     }
... )
>>> df
   foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

>>> df.pivot(index="foo", columns="bar", values="baz")
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape

Examples for `pandas.pivot_table`

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
...         "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
...         "C": [
...             "small",
...             "large",
...             "large",
...             "small",
...             "small",
...             "large",
...             "small",
...             "small",
...             "large",
...         ],
...         "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...         "E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
...     }
... )
>>> df
     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(
...     df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum"
... )
>>> table
C        large  small
A   B
bar one    4.0    5.0
    two    7.0    6.0
foo one    4.0    1.0
    two    NaN    6.0

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(
...     df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum", fill_value=0
... )
>>> table
C        large  small
A   B
bar one      4      5
    two      7      6
foo one      4      1
    two      0      6

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(
...     df, values=["D", "E"], index=["A", "C"], aggfunc={"D": "mean", "E": "mean"}
... )
>>> table
                D         E
A   C
bar large  5.500000  7.500000
    small  5.500000  8.500000
foo large  2.000000  4.500000
    small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(
...     df,
...     values=["D", "E"],
...     index=["A", "C"],
...     aggfunc={"D": "mean", "E": ["min", "max", "mean"]},
... )
>>> table
                  D   E
               mean max      mean  min
A   C
bar large  5.500000   9  7.500000    6
    small  5.500000   9  8.500000    8
foo large  2.000000   5  4.500000    4
    small  2.333333   6  4.333333    2

Examples for `pandas.crosstab`

>>> a = np.array(
...     [
...         "foo",
...         "foo",
...         "foo",
...         "foo",
...         "bar",
...         "bar",
...         "bar",
...         "bar",
...         "foo",
...         "foo",
...         "foo",
...     ],
...     dtype=object,
... )
>>> b = np.array(
...     [
...         "one",
...         "one",
...         "one",
...         "two",
...         "one",
...         "one",
...         "one",
...         "two",
...         "two",
...         "two",
...         "one",
...     ],
...     dtype=object,
... )
>>> c = np.array(
...     [
...         "dull",
...         "dull",
...         "shiny",
...         "dull",
...         "dull",
...         "shiny",
...         "shiny",
...         "dull",
...         "shiny",
...         "shiny",
...         "shiny",
...     ],
...     dtype=object,
... )
>>> pd.crosstab(a, [b, c], rownames=["a"], colnames=["b", "c"])
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2

Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.

>>> foo = pd.Categorical(["a", "b"], categories=["a", "b", "c"])
>>> bar = pd.Categorical(["d", "e"], categories=["d", "e", "f"])
>>> pd.crosstab(foo, bar)
col_0  d  e
row_0
a      1  0
b      0  1
>>> pd.crosstab(foo, bar, dropna=False)
col_0  d  e  f
row_0
a      1  0  0
b      0  1  0
c      0  0  0

Examples for `pandas.cut`

Discretize into three equal-sized bins.

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
...
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)
...
([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
array([0.994, 3.   , 5.   , 7.   ]))

Discovers the same bins, but assign them specific labels. Notice that the returned Categorical’s categories are labels and is ordered.

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["bad", "medium", "good"])
['bad', 'good', 'medium', 'medium', 'good', 'bad']
Categories (3, str): ['bad' < 'medium' < 'good']

ordered=False will result in unordered categories when labels are passed. This parameter can be used to allow non-unique labels:

>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["B", "A", "B"], ordered=False)
['B', 'B', 'A', 'A', 'B', 'B']
Categories (2, str): ['A', 'B']

labels=False implies you just want the bins back.

>>> pd.cut([0, 1, 1, 2], bins=4, labels=False)
array([0, 1, 1, 3])

Passing a Series as an input returns a Series with categorical dtype:

>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, 3)
...
a    (1.992, 4.667]
b    (1.992, 4.667]
c    (4.667, 7.333]
d     (7.333, 10.0]
e     (7.333, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(1.992, 4.667] < (4.667, ...

Passing a Series as an input returns a Series with mapping value. It is used to map numerically to intervals based on bins.

>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, [0, 2, 4, 6, 8, 10], labels=False, retbins=True, right=False)
...
(a    1.0
 b    2.0
 c    3.0
 d    4.0
 e    NaN
 dtype: float64,
 array([ 0,  2,  4,  6,  8, 10]))

Use drop optional when bins is not unique

>>> pd.cut(
...     s,
...     [0, 2, 4, 6, 10, 10],
...     labels=False,
...     retbins=True,
...     right=False,
...     duplicates="drop",
... )
...
(a    1.0
 b    2.0
 c    3.0
 d    3.0
 e    NaN
 dtype: float64,
 array([ 0,  2,  4,  6, 10]))

Passing an IntervalIndex for bins results in those categories exactly. Notice that values not covered by the IntervalIndex are set to NaN. 0 is to the left of the first bin (which is closed on the right), and 1.5 falls between two bins.

>>> bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
>>> pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
[NaN, (0.0, 1.0], NaN, (2.0, 3.0], (4.0, 5.0]]
Categories (3, interval[int64, right]): [(0, 1] < (2, 3] < (4, 5]]

Using np.histogram_bin_edges with cut

>>> pd.cut(
...     np.array([1, 7, 5, 4]),
...     bins=np.histogram_bin_edges(np.array([1, 7, 5, 4]), bins="auto"),
... )
...
[NaN, (5.0, 7.0], (3.0, 5.0], (3.0, 5.0]]
Categories (3, interval[float64, right]): [(1.0, 3.0] < (3.0, 5.0] < (5.0, 7.0]]

Examples for `pandas.qcut`

>>> pd.qcut(range(5), 4)
...
[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
Categories (4, interval[float64, right]): [(-0.001, 1.0] < (1.0, 2.0] ...

>>> pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
...
[good, good, medium, bad, bad]
Categories (3, str): [good < medium < bad]

>>> pd.qcut(range(5), 4, labels=False)
array([0, 0, 1, 2, 3])

Examples for `pandas.merge`

>>> df1 = pd.DataFrame(
...     {"lkey": ["foo", "bar", "baz", "foo"], "value": [1, 2, 3, 5]}
... )
>>> df2 = pd.DataFrame(
...     {"rkey": ["foo", "bar", "baz", "foo"], "value": [5, 6, 7, 8]}
... )
>>> df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

>>> df1.merge(df2, left_on="lkey", right_on="rkey")
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  bar        2  bar        6
3  baz        3  baz        7
4  foo        5  foo        5
5  foo        5  foo        8

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=("_left", "_right"))
  lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  bar           2  bar            6
3  baz           3  baz            7
4  foo           5  foo            5
5  foo           5  foo            8

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
    Index(['value'], dtype='str')

>>> df1 = pd.DataFrame({"a": ["foo", "bar"], "b": [1, 2]})
>>> df2 = pd.DataFrame({"a": ["foo", "baz"], "c": [3, 4]})
>>> df1
      a  b
0   foo  1
1   bar  2
>>> df2
      a  c
0   foo  3
1   baz  4

>>> df1.merge(df2, how="inner", on="a")
      a  b  c
0   foo  1  3

>>> df1.merge(df2, how="left", on="a")
      a  b  c
0   foo  1  3.0
1   bar  2  NaN

>>> df1 = pd.DataFrame({"left": ["foo", "bar"]})
>>> df2 = pd.DataFrame({"right": [7, 8]})
>>> df1
    left
0   foo
1   bar
>>> df2
    right
0   7
1   8

>>> df1.merge(df2, how="cross")
   left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8

Examples for `pandas.merge_ordered`

>>> from pandas import merge_ordered
>>> df1 = pd.DataFrame(
...     {
...         "key": ["a", "c", "e", "a", "c", "e"],
...         "lvalue": [1, 2, 3, 1, 2, 3],
...         "group": ["a", "a", "a", "b", "b", "b"],
...     }
... )
>>> df1
  key  lvalue group
0   a       1     a
1   c       2     a
2   e       3     a
3   a       1     b
4   c       2     b
5   e       3     b

>>> df2 = pd.DataFrame({"key": ["b", "c", "d"], "rvalue": [1, 2, 3]})
>>> df2
  key  rvalue
0   b       1
1   c       2
2   d       3

>>> merge_ordered(df1, df2, fill_method="ffill", left_by="group")
  key  lvalue group  rvalue
0   a       1     a     NaN
1   b       1     a     1.0
2   c       2     a     2.0
3   d       2     a     3.0
4   e       3     a     3.0
5   a       1     b     NaN
6   b       1     b     1.0
7   c       2     b     2.0
8   d       2     b     3.0
9   e       3     b     3.0

Examples for `pandas.merge_asof`

>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
    a left_val
0   1        a
1   5        b
2  10        c

>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
   a  right_val
0  1          1
1  2          2
2  3          3
3  6          6
4  7          7

>>> pd.merge_asof(left, right, on="a")
    a left_val  right_val
0   1        a          1
1   5        b          3
2  10        c          7

>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
    a left_val  right_val
0   1        a        NaN
1   5        b        3.0
2  10        c        7.0

>>> pd.merge_asof(left, right, on="a", direction="forward")
    a left_val  right_val
0   1        a        1.0
1   5        b        6.0
2  10        c        NaN

>>> pd.merge_asof(left, right, on="a", direction="nearest")
    a left_val  right_val
0   1        a          1
1   5        b          6
2  10        c          7

We can use indexed DataFrames as well.

>>> left = pd.DataFrame({"left_val": ["a", "b", "c"]}, index=[1, 5, 10])
>>> left
   left_val
1         a
5         b
10        c

>>> right = pd.DataFrame({"right_val": [1, 2, 3, 6, 7]}, index=[1, 2, 3, 6, 7])
>>> right
   right_val
1          1
2          2
3          3
6          6
7          7

>>> pd.merge_asof(left, right, left_index=True, right_index=True)
   left_val  right_val
1         a          1
5         b          3
10        c          7

Here is a real-world times-series example

>>> quotes = pd.DataFrame(
...     {
...         "time": [
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.030"),
...             pd.Timestamp("2016-05-25 13:30:00.041"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.049"),
...             pd.Timestamp("2016-05-25 13:30:00.072"),
...             pd.Timestamp("2016-05-25 13:30:00.075"),
...         ],
...         "ticker": [
...             "GOOG",
...             "MSFT",
...             "MSFT",
...             "MSFT",
...             "GOOG",
...             "AAPL",
...             "GOOG",
...             "MSFT",
...         ],
...         "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...         "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
...     }
... )
>>> quotes
                     time ticker     bid     ask
0 2016-05-25 13:30:00.023   GOOG  720.50  720.93
1 2016-05-25 13:30:00.023   MSFT   51.95   51.96
2 2016-05-25 13:30:00.030   MSFT   51.97   51.98
3 2016-05-25 13:30:00.041   MSFT   51.99   52.00
4 2016-05-25 13:30:00.048   GOOG  720.50  720.93
5 2016-05-25 13:30:00.049   AAPL   97.99   98.01
6 2016-05-25 13:30:00.072   GOOG  720.50  720.88
7 2016-05-25 13:30:00.075   MSFT   52.01   52.03

>>> trades = pd.DataFrame(
...     {
...         "time": [
...             pd.Timestamp("2016-05-25 13:30:00.023"),
...             pd.Timestamp("2016-05-25 13:30:00.038"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...             pd.Timestamp("2016-05-25 13:30:00.048"),
...         ],
...         "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...         "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...         "quantity": [75, 155, 100, 100, 100],
...     }
... )
>>> trades
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   GOOG  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100

By default we are taking the asof of the quotes

>>> pd.merge_asof(trades, quotes, on="time", by="ticker")
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 2ms between the quote time and the trade time

>>> pd.merge_asof(
...     trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms")
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.038   MSFT   51.95       155     NaN     NaN
2 2016-05-25 13:30:00.048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

We only asof within 10ms between the quote time and the trade time and we exclude exact matches on time. However prior data will propagate forward

>>> pd.merge_asof(
...     trades,
...     quotes,
...     on="time",
...     by="ticker",
...     tolerance=pd.Timedelta("10ms"),
...     allow_exact_matches=False,
... )
                     time ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.023   MSFT   51.95        75     NaN     NaN
1 2016-05-25 13:30:00.038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.048   GOOG  720.77       100     NaN     NaN
3 2016-05-25 13:30:00.048   GOOG  720.92       100     NaN     NaN
4 2016-05-25 13:30:00.048   AAPL   98.00       100     NaN     NaN

Examples for `pandas.concat`

Combine two Series.

>>> s1 = pd.Series(["a", "b"])
>>> s2 = pd.Series(["c", "d"])
>>> pd.concat([s1, s2])
0    a
1    b
0    c
1    d
dtype: str

Clear the existing index and reset it in the result by setting the ignore_index option to True.

>>> pd.concat([s1, s2], ignore_index=True)
0    a
1    b
2    c
3    d
dtype: str

Add a hierarchical index at the outermost level of the data with the keys option.

>>> pd.concat([s1, s2], keys=["s1", "s2"])
s1  0    a
    1    b
s2  0    c
    1    d
dtype: str

Label the index keys you create with the names option.

>>> pd.concat([s1, s2], keys=["s1", "s2"], names=["Series name", "Row ID"])
Series name  Row ID
s1           0         a
             1         b
s2           0         c
             1         d
dtype: str

Combine two DataFrame objects with identical columns.

>>> df1 = pd.DataFrame([["a", 1], ["b", 2]], columns=["letter", "number"])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df2 = pd.DataFrame([["c", 3], ["d", 4]], columns=["letter", "number"])
>>> df2
  letter  number
0      c       3
1      d       4
>>> pd.concat([df1, df2])
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with NaN values.

>>> df3 = pd.DataFrame(
...     [["c", 3, "cat"], ["d", 4, "dog"]], columns=["letter", "number", "animal"]
... )
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> pd.concat([df1, df3], sort=False)
  letter  number animal
0      a       1    NaN
1      b       2    NaN
0      c       3    cat
1      d       4    dog

Combine DataFrame objects with overlapping columns and return only those that are shared by passing inner to the join keyword argument.

>>> pd.concat([df1, df3], join="inner")
  letter  number
0      a       1
1      b       2
0      c       3
1      d       4

Combine DataFrame objects horizontally along the x axis by passing in axis=1.

>>> df4 = pd.DataFrame(
...     [["bird", "polly"], ["monkey", "george"]], columns=["animal", "name"]
... )
>>> pd.concat([df1, df4], axis=1)
  letter  number  animal    name
0      a       1    bird   polly
1      b       2  monkey  george

Prevent the result from including duplicate index values with the verify_integrity option.

>>> df5 = pd.DataFrame([1], index=["a"])
>>> df5
   0
a  1
>>> df6 = pd.DataFrame([2], index=["a"])
>>> df6
   0
a  2
>>> pd.concat([df5, df6], verify_integrity=True)
Traceback (most recent call last):
    ...
ValueError: Indexes have overlapping values: ['a']

Append a single row to the end of a DataFrame object.

>>> df7 = pd.DataFrame({"a": 1, "b": 2}, index=[0])
>>> df7
    a   b
0   1   2
>>> new_row = pd.Series({"a": 3, "b": 4})
>>> new_row
a    3
b    4
dtype: int64
>>> pd.concat([df7, new_row.to_frame().T], ignore_index=True)
    a   b
0   1   2
1   3   4

Examples for `pandas.get_dummies`

>>> s = pd.Series(list("abca"))

>>> pd.get_dummies(s)
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False

>>> s1 = ["a", "b", np.nan]

>>> pd.get_dummies(s1)
       a      b
0   True  False
1  False   True
2  False  False

>>> pd.get_dummies(s1, dummy_na=True)
       a      b    NaN
0   True  False  False
1  False   True  False
2  False  False   True

>>> df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["b", "a", "c"], "C": [1, 2, 3]})

>>> pd.get_dummies(df, prefix=["col1", "col2"])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1    True   False   False    True   False
1  2   False    True    True   False   False
2  3    True   False   False   False    True

>>> pd.get_dummies(pd.Series(list("abcaa")))
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False
4   True  False  False

>>> pd.get_dummies(pd.Series(list("abcaa")), drop_first=True)
       b      c
0  False  False
1   True  False
2  False   True
3  False  False
4  False  False

>>> pd.get_dummies(pd.Series(list("abc")), dtype=float)
     a    b    c
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0

Examples for `pandas.from_dummies`

>>> df = pd.DataFrame({"a": [1, 0, 0, 1], "b": [0, 1, 0, 0], "c": [0, 0, 1, 0]})

>>> pd.from_dummies(df)
0     a
1     b
2     c
3     a

>>> df = pd.DataFrame(
...     {
...         "col1_a": [1, 0, 1],
...         "col1_b": [0, 1, 0],
...         "col2_a": [0, 1, 0],
...         "col2_b": [1, 0, 0],
...         "col2_c": [0, 0, 1],
...     }
... )

>>> df
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       1       0       0       0       1

>>> pd.from_dummies(df, sep="_")
    col1    col2
0    a       b
1    b       a
2    a       c

>>> df = pd.DataFrame(
...     {
...         "col1_a": [1, 0, 0],
...         "col1_b": [0, 1, 0],
...         "col2_a": [0, 1, 0],
...         "col2_b": [1, 0, 0],
...         "col2_c": [0, 0, 0],
...     }
... )

>>> df
      col1_a  col1_b  col2_a  col2_b  col2_c
0       1       0       0       1       0
1       0       1       1       0       0
2       0       0       0       0       0

>>> pd.from_dummies(df, sep="_", default_category={"col1": "d", "col2": "e"})
    col1    col2
0    a       b
1    b       a
2    d       e

Examples for `pandas.factorize`

These examples all show factorize as a top-level method like pd.factorize(values). The results are identical for methods like Series.factorize().

>>> codes, uniques = pd.factorize(np.array(["b", "b", "a", "c", "b"], dtype="O"))
>>> codes
array([0, 0, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

With sort=True, the uniques will be sorted, and codes will be shuffled so that the relationship is the maintained.

>>> codes, uniques = pd.factorize(
...     np.array(["b", "b", "a", "c", "b"], dtype="O"), sort=True
... )
>>> codes
array([1, 1, 0, 2, 1])
>>> uniques
array(['a', 'b', 'c'], dtype=object)

When use_na_sentinel=True (the default), missing values are indicated in the codes with the sentinel value -1 and missing values are not included in uniques.

>>> codes, uniques = pd.factorize(np.array(["b", None, "a", "c", "b"], dtype="O"))
>>> codes
array([ 0, -1,  1,  2,  0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)

Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.

>>> cat = pd.Categorical(["a", "a", "c"], categories=["a", "b", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
['a', 'c']
Categories (3, str): ['a', 'b', 'c']

Notice that 'b' is in uniques.categories, despite not being present in cat.values.

For all other pandas objects, an Index of the appropriate type is returned.

>>> cat = pd.Series(["a", "a", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
Index(['a', 'c'], dtype='str')

If NaN is in the values, and we want to include NaN in the uniques of the values, it can be achieved by setting use_na_sentinel=False.

>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values)  # default: use_na_sentinel=True
>>> codes
array([ 0,  1,  0, -1])
>>> uniques
array([1., 2.])

>>> codes, uniques = pd.factorize(values, use_na_sentinel=False)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1.,  2., nan])

Examples for `pandas.unique`

>>> pd.unique(pd.Series([2, 1, 3, 3]))
array([2, 1, 3])

>>> pd.unique(pd.Series([2] + [1] * 5))
array([2, 1])

>>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
array(['2016-01-01T00:00:00.000000'], dtype='datetime64[us]')

>>> pd.unique(
...     pd.Series(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ],
...         dtype="M8[ns, US/Eastern]",
...     )
... )
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]

>>> pd.unique(
...     pd.Index(
...         [
...             pd.Timestamp("20160101", tz="US/Eastern"),
...             pd.Timestamp("20160101", tz="US/Eastern"),
...         ],
...         dtype="M8[ns, US/Eastern]",
...     )
... )
DatetimeIndex(['2016-01-01 00:00:00-05:00'],
        dtype='datetime64[ns, US/Eastern]',
        freq=None)

>>> pd.unique(np.array(list("baabc"), dtype="O"))
array(['b', 'a', 'c'], dtype=object)

An unordered Categorical will return categories in the order of appearance.

>>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']

>>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']

An ordered Categorical preserves the category ordering.

>>> pd.unique(
...     pd.Series(
...         pd.Categorical(list("baabc"), categories=list("abc"), ordered=True)
...     )
... )
['b', 'a', 'c']
Categories (3, str): ['a' < 'b' < 'c']

An array of tuples

>>> pd.unique(pd.Series([("a", "b"), ("b", "a"), ("a", "c"), ("b", "a")]).values)
array([('a', 'b'), ('b', 'a'), ('a', 'c')], dtype=object)

A NumpyExtensionArray of complex

>>> pd.unique(pd.array([1 + 1j, 2, 3]))
<NumpyExtensionArray>
[(1+1j), (2+0j), (3+0j)]
Length: 3, dtype: complex128

Examples for `pandas.lreshape`

>>> data = pd.DataFrame(
...     {
...         "hr1": [514, 573],
...         "hr2": [545, 526],
...         "team": ["Red Sox", "Yankees"],
...         "year1": [2007, 2007],
...         "year2": [2008, 2008],
...     }
... )
>>> data
   hr1  hr2     team  year1  year2
0  514  545  Red Sox   2007   2008
1  573  526  Yankees   2007   2008

>>> pd.lreshape(data, {"year": ["year1", "year2"], "hr": ["hr1", "hr2"]})
      team  year   hr
0  Red Sox  2007  514
1  Yankees  2007  573
2  Red Sox  2008  545
3  Yankees  2008  526

Examples for `pandas.wide_to_long`

>>> np.random.seed(123)
>>> df = pd.DataFrame(
...     {
...         "A1970": {0: "a", 1: "b", 2: "c"},
...         "A1980": {0: "d", 1: "e", 2: "f"},
...         "B1970": {0: 2.5, 1: 1.2, 2: 0.7},
...         "B1980": {0: 3.2, 1: 1.3, 2: 0.1},
...         "X": dict(zip(range(3), np.random.randn(3), strict=True)),
...     }
... )
>>> df["id"] = df.index
>>> df
  A1970 A1980  B1970  B1980         X  id
0     a     d    2.5    3.2 -1.085631   0
1     b     e    1.2    1.3  0.997345   1
2     c     f    0.7    0.1  0.282978   2
>>> pd.wide_to_long(df, ["A", "B"], i="id", j="year")
...
                X  A    B
id year
0  1970 -1.085631  a  2.5
1  1970  0.997345  b  1.2
2  1970  0.282978  c  0.7
0  1980 -1.085631  d  3.2
1  1980  0.997345  e  1.3
2  1980  0.282978  f  0.1

With multiple id columns

>>> df = pd.DataFrame(
...     {
...         "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
...         "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
...         "ht1": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
...         "ht2": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
...     }
... )
>>> df
   famid  birth  ht1  ht2
0      1      1  2.8  3.4
1      1      2  2.9  3.8
2      1      3  2.2  2.9
3      2      1  2.0  3.2
4      2      2  1.8  2.8
5      2      3  1.9  2.4
6      3      1  2.2  3.3
7      3      2  2.3  3.4
8      3      3  2.1  2.9
>>> long_format = pd.wide_to_long(df, stubnames="ht", i=["famid", "birth"], j="age")
>>> long_format
...
                  ht
famid birth age
1     1     1    2.8
            2    3.4
      2     1    2.9
            2    3.8
      3     1    2.2
            2    2.9
2     1     1    2.0
            2    3.2
      2     1    1.8
            2    2.8
      3     1    1.9
            2    2.4
3     1     1    2.2
            2    3.3
      2     1    2.3
            2    3.4
      3     1    2.1
            2    2.9

Going from long back to wide just takes some creative use of unstack

>>> wide_format = long_format.unstack()
>>> wide_format.columns = wide_format.columns.map("{0[0]}{0[1]}".format)
>>> wide_format.reset_index()
   famid  birth  ht1  ht2
0      1      1  2.8  3.4
1      1      2  2.9  3.8
2      1      3  2.2  2.9
3      2      1  2.0  3.2
4      2      2  1.8  2.8
5      2      3  1.9  2.4
6      3      1  2.2  3.3
7      3      2  2.3  3.4
8      3      3  2.1  2.9

Less wieldy column names are also handled

>>> np.random.seed(0)
>>> df = pd.DataFrame(
...     {
...         "A(weekly)-2010": np.random.rand(3),
...         "A(weekly)-2011": np.random.rand(3),
...         "B(weekly)-2010": np.random.rand(3),
...         "B(weekly)-2011": np.random.rand(3),
...         "X": np.random.randint(3, size=3),
...     }
... )
>>> df["id"] = df.index
>>> df
   A(weekly)-2010  A(weekly)-2011  B(weekly)-2010  B(weekly)-2011  X  id
0        0.548814        0.544883        0.437587        0.383442  0   0
1        0.715189        0.423655        0.891773        0.791725  1   1
2        0.602763        0.645894        0.963663        0.528895  1   2

>>> pd.wide_to_long(df, ["A(weekly)", "B(weekly)"], i="id", j="year", sep="-")
...
         X  A(weekly)  B(weekly)
id year
0  2010  0   0.548814   0.437587
1  2010  1   0.715189   0.891773
2  2010  1   0.602763   0.963663
0  2011  0   0.544883   0.383442
1  2011  1   0.423655   0.791725
2  2011  1   0.645894   0.528895

If we have many columns, we could also use a regex to find our stubnames and pass that list on to wide_to_long

>>> stubnames = sorted(
...     set(
...         [
...             match[0]
...             for match in df.columns.str.findall(r"[A-B]\(.*\)").values
...             if match != []
...         ]
...     )
... )
>>> list(stubnames)
['A(weekly)', 'B(weekly)']

All of the above examples have integers as suffixes. It is possible to have non-integers as suffixes.

>>> df = pd.DataFrame(
...     {
...         "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
...         "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
...         "ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
...         "ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
...     }
... )
>>> df
   famid  birth  ht_one  ht_two
0      1      1     2.8     3.4
1      1      2     2.9     3.8
2      1      3     2.2     2.9
3      2      1     2.0     3.2
4      2      2     1.8     2.8
5      2      3     1.9     2.4
6      3      1     2.2     3.3
7      3      2     2.3     3.4
8      3      3     2.1     2.9

>>> long_format = pd.wide_to_long(
...     df, stubnames="ht", i=["famid", "birth"], j="age", sep="_", suffix=r"\w+"
... )
>>> long_format
...
                  ht
famid birth age
1     1     one  2.8
            two  3.4
      2     one  2.9
            two  3.8
      3     one  2.2
            two  2.9
2     1     one  2.0
            two  3.2
      2     one  1.8
            two  2.8
      3     one  1.9
            two  2.4
3     1     one  2.2
            two  3.3
      2     one  2.3
            two  3.4
      3     one  2.1
            two  2.9

Examples for `pandas.isna`

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.isna("dog")
False

>>> pd.isna(pd.NA)
True

>>> pd.isna(np.nan)
True

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.isna(array)
array([[False,  True, False],
       [False, False,  True]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False,  True, False])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.isna(df)
       0      1      2
0  False  False  False
1  False   True  False

>>> pd.isna(df[1])
0    False
1     True
Name: 1, dtype: bool

Examples for `pandas.isnull`

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.isna("dog")
False

>>> pd.isna(pd.NA)
True

>>> pd.isna(np.nan)
True

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.isna(array)
array([[False,  True, False],
       [False, False,  True]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False,  True, False])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.isna(df)
       0      1      2
0  False  False  False
1  False   True  False

>>> pd.isna(df[1])
0    False
1     True
Name: 1, dtype: bool

Examples for `pandas.notna`

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.notna("dog")
True

>>> pd.notna(pd.NA)
False

>>> pd.notna(np.nan)
False

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.notna(array)
array([[ True, False,  True],
       [ True,  True, False]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True,  True, False,  True])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.notna(df)
      0      1     2
0  True   True  True
1  True  False  True

>>> pd.notna(df[1])
0     True
1    False
Name: 1, dtype: bool

Examples for `pandas.notnull`

Scalar arguments (including strings) result in a scalar boolean.

>>> pd.notna("dog")
True

>>> pd.notna(pd.NA)
False

>>> pd.notna(np.nan)
False

ndarrays result in an ndarray of booleans.

>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan,  3.],
       [ 4.,  5., nan]])
>>> pd.notna(array)
array([[ True, False,  True],
       [ True,  True, False]])

For indexes, an ndarray of booleans is returned.

>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
              dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True,  True, False,  True])

For Series and DataFrame, the same type is returned, containing booleans.

>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
     0    1    2
0  ant  bee  cat
1  dog  NaN  fly
>>> pd.notna(df)
      0      1     2
0  True   True  True
1  True  False  True

>>> pd.notna(df[1])
0     True
1    False
Name: 1, dtype: bool

Series

Constructor

Function	Description
Series([data, index, dtype, name, copy])	One-dimensional ndarray with axis labels (including time series).

Attributes

Axes

Function	Description
Series.index	The index (axis labels) of the Series.
Series.array	The ExtensionArray of the data backing this Series or Index.
Series.values	Return Series as ndarray or ndarray-like depending on the dtype.
Series.dtype	Return the dtype object of the underlying data.
Series.info([verbose, buf, max_cols, ...])	Print a concise summary of a Series.
Series.shape	Return a tuple of the shape of the underlying data.
Series.nbytes	Return the number of bytes in the underlying data.
Series.ndim	Number of dimensions of the underlying data, by definition 1.
Series.size	Return the number of elements in the underlying data.
Series.T	Return the transpose, which is by definition self.
Series.memory_usage([index, deep])	Return the memory usage of the Series.
Series.hasnans	Return True if there are any NaNs.
Series.empty	Indicator whether Index is empty.
Series.dtypes	Return the dtype object of the underlying data.
Series.name	Return the name of the Series.
Series.flags	Get the properties associated with this pandas object.
Series.set_flags(*[, copy, ...])	Return a new object with updated flags.

Conversion

Function	Description
Series.astype(dtype[, copy, errors])	Cast a pandas object to a specified dtype `dtype`.
Series.convert_dtypes([infer_objects, ...])	Convert columns from numpy dtypes to the best dtypes that support `pd.NA`.
Series.infer_objects([copy])	Attempt to infer better dtypes for object columns.
Series.copy([deep])	Make a copy of this object's indices and data.
Series.to_numpy([dtype, copy, na_value])	A NumPy ndarray representing the values in this Series or Index.
Series.to_period([freq, copy])	Convert Series from DatetimeIndex to PeriodIndex.
Series.to_timestamp([freq, how, copy])	Cast to DatetimeIndex of Timestamps, at beginning of period.
Series.to_list()	Return a list of the values.
Series.array([dtype, copy])	Return the values as a NumPy array.

Indexing, iteration

Function	Description
Series.get(key[, default])	Get item from object for given key (ex: DataFrame column).
Series.at	Access a single value for a row/column label pair.
Series.iat	Access a single value for a row/column pair by integer position.
Series.loc	Access a group of rows and columns by label(s) or a boolean array.
Series.iloc	Purely integer-location based indexing for selection by position.
Series.iter()	Return an iterator of the values.
Series.items()	Lazily iterate over (index, value) tuples.
Series.keys()	Return alias for index.
Series.pop(item)	Return item and drops from series.
Series.item()	Return the first element of the underlying data as a Python scalar.
Series.xs(key[, axis, level, drop_level])	Return cross-section from the Series/DataFrame.

For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.

Binary operator functions

Function	Description
Series.add(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator add).
Series.sub(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator sub).
Series.mul(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator mul).
Series.div(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
Series.truediv(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
Series.floordiv(other[, level, fill_value, axis])	Return Integer division of series and other, element-wise (binary operator floordiv).
Series.mod(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator mod).
Series.pow(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator pow).
Series.radd(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator radd).
Series.rsub(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator rsub).
Series.rmul(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator rmul).
Series.rdiv(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
Series.rtruediv(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
Series.rfloordiv(other[, level, fill_value, ...])	Return Integer division of series and other, element-wise (binary operator rfloordiv).
Series.rmod(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator rmod).
Series.rpow(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator rpow).
Series.combine(other, func[, fill_value])	Combine the Series with a Series or scalar according to func.
Series.combine_first(other)	Update null elements with value in the same location in 'other'.
Series.round([decimals])	Round each value in a Series to the given number of decimals.
Series.lt(other[, level, fill_value, axis])	Return Greater than of series and other, element-wise (binary operator lt).
Series.gt(other[, level, fill_value, axis])	Return Greater than of series and other, element-wise (binary operator gt).
Series.le(other[, level, fill_value, axis])	Return Less than or equal to of series and other, element-wise (binary operator le).
Series.ge(other[, level, fill_value, axis])	Return Greater than or equal to of series and other, element-wise (binary operator ge).
Series.ne(other[, level, fill_value, axis])	Return Not equal to of series and other, element-wise (binary operator ne).
Series.eq(other[, level, fill_value, axis])	Return Equal to of series and other, element-wise (binary operator eq).
Series.product(*[, axis, skipna, ...])	Return the product of the values over the requested axis.
Series.dot(other)	Compute the dot product between the Series and the columns of other.

Function application, GroupBy & window

Function	Description
Series.apply(func[, args, by_row])	Invoke function on values of Series.
Series.agg([func, axis])	Aggregate using one or more operations over the specified axis.
Series.aggregate([func, axis])	Aggregate using one or more operations over the specified axis.
Series.transform(func[, axis])	Call `func` on self producing a Series with the same axis shape as self.
Series.map([func, na_action, engine])	Map values of Series according to an input mapping or function.
Series.groupby([by, level, as_index, sort, ...])	Group Series using a mapper or by a Series of columns.
Series.rolling(window[, min_periods, ...])	Provide rolling window calculations.
Series.expanding([min_periods, method])	Provide expanding window calculations.
Series.ewm([com, span, halflife, alpha, ...])	Provide exponentially weighted (EW) calculations.
Series.pipe(func, args, *kwargs)	Apply chainable functions that expect Series or DataFrames.

Computations / descriptive stats

Function	Description
Series.abs()	Return a Series/DataFrame with absolute numeric value of each element.
Series.all(*[, axis, bool_only, skipna])	Return whether all elements are True, potentially over an axis.
Series.any(*[, axis, bool_only, skipna])	Return whether any element is True, potentially over an axis.
Series.autocorr([lag])	Compute the lag-N autocorrelation.
Series.between(left, right[, inclusive])	Return boolean Series equivalent to left <= series <= right.
Series.clip([lower, upper, axis, inplace])	Trim values at input threshold(s).
Series.corr(other[, method, min_periods])	Compute correlation with other Series, excluding missing values.
Series.count()	Return number of non-NA/null observations in the Series.
Series.cov(other[, min_periods, ddof])	Compute covariance with Series, excluding missing values.
Series.cummax([axis, skipna])	Return cumulative maximum over a Series.
Series.cummin([axis, skipna])	Return cumulative minimum over a Series.
Series.cumprod([axis, skipna])	Return cumulative product over a Series.
Series.cumsum([axis, skipna])	Return cumulative sum over a Series.
Series.describe([percentiles, include, exclude])	Generate descriptive statistics.
Series.diff([periods])	First discrete difference of Series elements.
Series.factorize([sort, use_na_sentinel])	Encode the object as an enumerated type or categorical variable.
Series.kurt(*[, axis, skipna, numeric_only])	Return unbiased kurtosis over requested axis.
Series.max(*[, axis, skipna, numeric_only])	Return the maximum of the values over the requested axis.
Series.mean(*[, axis, skipna, numeric_only])	Return the mean of the values over the requested axis.
Series.median(*[, axis, skipna, numeric_only])	Return the median of the values over the requested axis.
Series.min(*[, axis, skipna, numeric_only])	Return the minimum of the values over the requested axis.
Series.mode([dropna])	Return the mode(s) of the Series.
Series.nlargest([n, keep])	Return the largest n elements.
Series.nsmallest([n, keep])	Return the smallest n elements.
Series.pct_change([periods, fill_method, freq])	Fractional change between the current and a prior element.
Series.prod(*[, axis, skipna, numeric_only, ...])	Return the product of the values over the requested axis.
Series.quantile([q, interpolation])	Return value at the given quantile.
Series.rank([axis, method, numeric_only, ...])	Compute numerical data ranks (1 through n) along axis.
Series.sem(*[, axis, skipna, ddof, numeric_only])	Return unbiased standard error of the mean over requested axis.
Series.skew(*[, axis, skipna, numeric_only])	Return unbiased skew over requested axis.
Series.std(*[, axis, skipna, ddof, numeric_only])	Return sample standard deviation.
Series.sum(*[, axis, skipna, numeric_only, ...])	Return the sum of the values over the requested axis.
Series.var(*[, axis, skipna, ddof, numeric_only])	Return unbiased variance over requested axis.
Series.kurtosis(*[, axis, skipna, numeric_only])	Return unbiased kurtosis over requested axis.
Series.unique()	Return unique values of Series object.
Series.nunique([dropna])	Return number of unique elements in the object.
Series.is_unique	Return True if values in the object are unique.
Series.is_monotonic_increasing	Return True if values in the object are monotonically increasing.
Series.is_monotonic_decreasing	Return True if values in the object are monotonically decreasing.
Series.value_counts([normalize, sort, ...])	Return a Series containing counts of unique values.

Reindexing / selection / label manipulation

Function	Description
Series.align(other[, join, axis, level, ...])	Align two objects on their axes with the specified join method.
Series.case_when(caselist)	Replace values where the conditions are True.
Series.drop([labels, axis, index, columns, ...])	Return Series with specified index labels removed.
Series.droplevel(level[, axis])	Return Series/DataFrame with requested index / column level(s) removed.
Series.drop_duplicates(*[, keep, inplace, ...])	Return Series with duplicate values removed.
Series.duplicated([keep])	Indicate duplicate Series values.
Series.equals(other)	Test whether two objects contain the same elements.
Series.head([n])	Return the first n rows.
Series.idxmax([axis, skipna])	Return the row label of the maximum value.
Series.idxmin([axis, skipna])	Return the row label of the minimum value.
Series.isin(values)	Whether elements in Series are contained in values.
Series.reindex([index, axis, method, copy, ...])	Conform Series to new index with optional filling logic.
Series.reindex_like(other[, method, copy, ...])	Return an object with matching indices as other object.
Series.rename([index, axis, copy, inplace, ...])	Alter Series index labels or name.
Series.rename_axis([mapper, index, axis, ...])	Set the name of the axis for the index.
Series.reset_index([level, drop, name, ...])	Generate a new DataFrame or Series with the index reset.
Series.sample([n, frac, replace, weights, ...])	Return a random sample of items from an axis of object.
Series.set_axis(labels, *[, axis, copy])	(DEPRECATED) Assign desired index to given axis.
Series.take(indices[, axis])	Return the elements in the given positional indices along an axis.
Series.tail([n])	Return the last n rows.
Series.truncate([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.
Series.where(cond[, other, inplace, axis, level])	Replace values where the condition is False.
Series.mask(cond[, other, inplace, axis, level])	Replace values where the condition is True.
Series.add_prefix(prefix[, axis])	Prefix labels with string prefix.
Series.add_suffix(suffix[, axis])	Suffix labels with string suffix.
Series.filter([items, like, regex, axis])	Subset the DataFrame or Series according to the specified index labels.

Missing data handling

Function	Description
Series.bfill(*[, axis, inplace, limit, ...])	Fill NA/NaN values by using the next valid observation to fill the gap.
Series.dropna(*[, axis, inplace, how, ...])	Return a new Series with missing values removed.
Series.ffill(*[, axis, inplace, limit, ...])	Fill NA/NaN values by propagating the last valid observation to next valid.
Series.fillna(value, *[, axis, inplace, limit])	Fill NA/NaN values with value.
Series.interpolate([method, axis, limit, ...])	Fill NaN values using an interpolation method.
Series.isna()	Detect missing values.
Series.isnull()	Series.isnull is an alias for Series.isna.
Series.notna()	Detect existing (non-missing) values.
Series.notnull()	Series.notnull is an alias for Series.notna.
Series.replace([to_replace, value, inplace, ...])	Replace values given in to_replace with value.

Reshaping, sorting

Function	Description
Series.argsort([axis, kind, order, stable])	Return the integer indices that would sort the Series values.
Series.argmin([axis, skipna])	Return int position of the smallest value in the Series.
Series.argmax([axis, skipna])	Return int position of the largest value in the Series.
Series.reorder_levels(order)	Rearrange index levels using input order.
Series.sort_values(*[, axis, ascending, ...])	Sort by the values.
Series.sort_index(*[, axis, level, ...])	Sort Series by index labels.
Series.swaplevel([i, j, copy])	Swap levels i and j in a `MultiIndex`.
Series.unstack([level, fill_value, sort])	Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
Series.explode([ignore_index])	Transform each element of a list-like to a row.
Series.searchsorted(value[, side, sorter])	Find indices where elements should be inserted to maintain order.
Series.repeat(repeats[, axis])	Repeat elements of a Series.
Series.squeeze([axis])	Squeeze 1 dimensional axis objects into scalars.

Combining / comparing / joining / merging

Function	Description
Series.compare(other[, align_axis, ...])	Compare to another Series and show the differences.
Series.update(other)	Modify Series in place using values from passed Series.

Time Series-related

Function	Description
Series.asfreq(freq[, method, how, ...])	Convert time series to specified frequency.
Series.asof(where[, subset])	Return the last row(s) without any NaNs before where.
Series.shift([periods, freq, axis, ...])	Shift index by desired number of periods with an optional time freq.
Series.first_valid_index()	Return index for first non-missing value or None, if no value is found.
Series.last_valid_index()	Return index for last non-missing value or None, if no value is found.
Series.resample(rule[, closed, label, ...])	Resample time-series data.
Series.tz_convert(tz[, axis, level, copy])	Convert tz-aware axis to target time zone.
Series.tz_localize(tz[, axis, level, copy, ...])	Localize time zone naive index of a Series or DataFrame to target time zone.
Series.at_time(time[, asof, axis])	Select values at particular time of day (e.g., 9:30AM).
Series.between_time(start_time, end_time[, ...])	Select values between particular times of the day (e.g., 9:00-9:30 AM).

Accessors

pandas provides dtype-specific methods under various accessors. These are separate namespaces within Series that only apply to specific data types.

Function	Description
Series.str	alias of `StringMethods`
Series.cat	alias of `CategoricalAccessor`
Series.dt	alias of `CombinedDatetimelikeProperties`
Series.sparse	alias of `SparseAccessor`
DataFrame.sparse	alias of `SparseFrameAccessor`
Index.str	alias of `StringMethods`

Data Type	Accessor
Datetime, Timedelta, Period	dt
String	str
Categorical	cat
Sparse	sparse

Datetimelike properties

Series.dt can be used to access the values of the series as datetimelike and return several properties. These can be accessed like Series.dt.<property>.

Datetime properties

Function	Description
Series.dt.date	Returns numpy array of python `datetime.date` objects.
Series.dt.time	Returns numpy array of `datetime.time` objects.
Series.dt.timetz	Returns numpy array of `datetime.time` objects with timezones.
Series.dt.year	The year of the datetime.
Series.dt.month	The month as January=1, December=12.
Series.dt.day	The day of the datetime.
Series.dt.hour	The hours of the datetime.
Series.dt.minute	The minutes of the datetime.
Series.dt.second	The seconds of the datetime.
Series.dt.microsecond	The microseconds of the datetime.
Series.dt.nanosecond	The nanoseconds of the datetime.
Series.dt.dayofweek	The day of the week with Monday=0, Sunday=6.
Series.dt.day_of_week	The day of the week with Monday=0, Sunday=6.
Series.dt.weekday	The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear	The ordinal day of the year.
Series.dt.day_of_year	The ordinal day of the year.
Series.dt.days_in_month	The number of days in the month.
Series.dt.quarter	The quarter of the date.
Series.dt.is_month_start	Indicates whether the date is the first day of the month.
Series.dt.is_month_end	Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start	Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end	Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start	Indicate whether the date is the first day of a year.
Series.dt.is_year_end	Indicate whether the date is the last day of the year.
Series.dt.is_leap_year	Boolean indicator if the date belongs to a leap year.
Series.dt.daysinmonth	The number of days in the month.
Series.dt.days_in_month	The number of days in the month.
Series.dt.tz	Return the timezone.
Series.dt.freq	Tries to return a string representing a frequency generated by infer_freq.
Series.dt.unit	The precision unit of the datetime data.

Datetime methods

Function	Description
Series.dt.isocalendar()	Calculate year, week, and day according to the ISO 8601 standard.
Series.dt.to_period([freq])	Cast to PeriodArray/PeriodIndex at a particular frequency.
Series.dt.to_pydatetime()	Return the data as a Series of `datetime.datetime` objects.
Series.dt.tz_localize(tz[, ambiguous, ...])	Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
Series.dt.tz_convert(tz)	Convert tz-aware Datetime Array/Index from one time zone to another.
Series.dt.normalize()	Convert times to midnight.
Series.dt.strftime(date_format)	Convert to Index using specified date_format.
Series.dt.round(freq[, ambiguous, nonexistent])	Perform round operation on the data to the specified freq.
Series.dt.floor(freq[, ambiguous, nonexistent])	Perform floor operation on the data to the specified freq.
Series.dt.ceil(freq[, ambiguous, nonexistent])	Perform ceil operation on the data to the specified freq.
Series.dt.month_name([locale])	Return the month names with specified locale.
Series.dt.day_name([locale])	Return the day names with specified locale.
Series.dt.as_unit(unit[, round_ok])	Convert to a dtype with the given unit resolution.

Period properties

Function	Description
Series.dt.qyear	Fiscal year the Period lies in according to its starting-quarter.
Series.dt.start_time	Get the Timestamp for the start of the period.
Series.dt.end_time	Get the Timestamp for the end of the period.

Timedelta properties

Function	Description
Series.dt.days	Number of days for each element.
Series.dt.seconds	Number of seconds (>= 0 and less than 1 day) for each element.
Series.dt.microseconds	Number of microseconds (>= 0 and less than 1 second) for each element.
Series.dt.nanoseconds	Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.
Series.dt.components	Return a Dataframe of the components of the Timedeltas.
Series.dt.unit	The precision unit of the datetime data.

Timedelta methods

Function	Description
Series.dt.to_pytimedelta()	Return an array of native `datetime.timedelta` objects.
Series.dt.total_seconds()	Return total duration of each element expressed in seconds.
Series.dt.as_unit(unit[, round_ok])	Convert to a dtype with the given unit resolution.

String handling

Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property>.

Function	Description
Series.str.capitalize()	Convert strings in the Series/Index to be capitalized.
Series.str.casefold()	Convert strings in the Series/Index to be casefolded.
Series.str.cat([others, sep, na_rep, join])	Concatenate strings in the Series/Index with given separator.
Series.str.center(width[, fillchar])	Pad left and right side of strings in the Series/Index.
Series.str.contains(pat[, case, flags, na, ...])	Test if pattern or regex is contained within a string of a Series or Index.
Series.str.count(pat[, flags])	Count occurrences of pattern in each string of the Series/Index.
Series.str.decode(encoding[, errors, dtype])	Decode character string in the Series/Index using indicated encoding.
Series.str.encode(encoding[, errors])	Encode character string in the Series/Index using indicated encoding.
Series.str.endswith(pat[, na])	Test if the end of each string element matches a pattern.
Series.str.extract(pat[, flags, expand])	Extract capture groups in the regex pat as columns in a DataFrame.
Series.str.extractall(pat[, flags])	Extract capture groups in the regex pat as columns in DataFrame.
Series.str.find(sub[, start, end])	Return lowest indexes in each strings in the Series/Index.
Series.str.findall(pat[, flags])	Find all occurrences of pattern or regular expression in the Series/Index.
Series.str.fullmatch(pat[, case, flags, na])	Determine if each string entirely matches a regular expression.
Series.str.get(i)	Extract element from each component at specified position or with specified key.
Series.str.index(sub[, start, end])	Return lowest indexes in each string in Series/Index.
Series.str.isascii()	Check whether all characters in each string are ascii.
Series.str.join(sep)	Join lists contained as elements in the Series/Index with passed delimiter.
Series.str.len()	Compute the length of each element in the Series/Index.
Series.str.ljust(width[, fillchar])	Pad right side of strings in the Series/Index.
Series.str.lower()	Convert strings in the Series/Index to lowercase.
Series.str.lstrip([to_strip])	Remove leading characters.
Series.str.match(pat[, case, flags, na])	Determine if each string starts with a match of a regular expression.
Series.str.normalize(form)	Return the Unicode normal form for the strings in the Series/Index.
Series.str.pad(width[, side, fillchar])	Pad strings in the Series/Index up to width.
Series.str.partition([sep, expand])	Split the string at the first occurrence of sep.
Series.str.removeprefix(prefix)	Remove a prefix from an object series.
Series.str.removesuffix(suffix)	Remove a suffix from an object series.
Series.str.repeat(repeats)	Duplicate each string in the Series or Index.
Series.str.replace(pat[, repl, n, case, ...])	Replace each occurrence of pattern/regex in the Series/Index.
Series.str.rfind(sub[, start, end])	Return highest indexes in each strings in the Series/Index.
Series.str.rindex(sub[, start, end])	Return highest indexes in each string in Series/Index.
Series.str.rjust(width[, fillchar])	Pad left side of strings in the Series/Index.
Series.str.rpartition([sep, expand])	Split the string at the last occurrence of sep.
Series.str.rstrip([to_strip])	Remove trailing characters.
Series.str.slice([start, stop, step])	Slice substrings from each element in the Series or Index.
Series.str.slice_replace([start, stop, repl])	Replace a positional slice of a string with another value.
Series.str.split([pat, n, expand, regex])	Split strings around given separator/delimiter.
Series.str.rsplit([pat, n, expand])	Split strings around given separator/delimiter.
Series.str.startswith(pat[, na])	Test if the start of each string element matches a pattern.
Series.str.strip([to_strip])	Remove leading and trailing characters.
Series.str.swapcase()	Convert strings in the Series/Index to be swapcased.
Series.str.title()	Convert strings in the Series/Index to titlecase.
Series.str.translate(table)	Map all characters in the string through the given mapping table.
Series.str.upper()	Convert strings in the Series/Index to uppercase.
Series.str.wrap(width[, expand_tabs, ...])	Wrap strings in Series/Index at specified line width.
Series.str.zfill(width)	Pad strings in the Series/Index by prepending '0' characters.
Series.str.isalnum()	Check whether all characters in each string are alphanumeric.
Series.str.isalpha()	Check whether all characters in each string are alphabetic.
Series.str.isdigit()	Check whether all characters in each string are digits.
Series.str.isspace()	Check whether all characters in each string are whitespace.
Series.str.islower()	Check whether all characters in each string are lowercase.
Series.str.isupper()	Check whether all characters in each string are uppercase.
Series.str.istitle()	Check whether all characters in each string are titlecase.
Series.str.isnumeric()	Check whether all characters in each string are numeric.
Series.str.isdecimal()	Check whether all characters in each string are decimal.
Series.str.get_dummies([sep, dtype])	Return DataFrame of dummy/indicator variables for Series.

Categorical accessor

Categorical-dtype specific methods and attributes are available under the Series.cat accessor.

Function	Description
Series.cat.categories	The categories of this categorical.
Series.cat.ordered	Whether the categories have an ordered relationship.
Series.cat.codes	Return Series of codes as well as the index.

Function	Description
Series.cat.rename_categories(new_categories)	Rename categories.
Series.cat.reorder_categories(new_categories)	Reorder categories as specified in new_categories.
Series.cat.add_categories(new_categories)	Add new categories.
Series.cat.remove_categories(removals)	Remove the specified categories.
Series.cat.remove_unused_categories()	Remove categories which are not used.
Series.cat.set_categories(new_categories[, ...])	Set the categories to the specified new categories.
Series.cat.as_ordered()	Set the Categorical to be ordered.
Series.cat.as_unordered()	Set the Categorical to be unordered.

Sparse accessor

Sparse-dtype specific methods and attributes are provided under the Series.sparse accessor.

Function	Description
Series.sparse.npoints	The number of non- `fill_value` points.
Series.sparse.density	The percent of non- `fill_value` points, as decimal.
Series.sparse.fill_value	Elements in data that are fill_value are not stored.
Series.sparse.sp_values	An ndarray containing the non- `fill_value` values.

Function	Description
Series.sparse.from_coo(A[, dense_index])	Create a Series with sparse values from a scipy.sparse.coo_matrix.
Series.sparse.to_coo([row_levels, ...])	Create a scipy.sparse.coo_matrix from a Series with MultiIndex.

List accessor

Arrow list-dtype specific methods and attributes are provided under the Series.list accessor.

Function	Description
Series.list.flatten()	Flatten list values.
Series.list.len()	Return the length of each list in the Series.
Series.list.getitem(key)	Index or slice lists in the Series.

Struct accessor

Arrow struct-dtype specific methods and attributes are provided under the Series.struct accessor.

Function	Description
Series.struct.dtypes	Return the dtype object of each child field of the struct.

Function	Description
Series.struct.field(name_or_index)	Extract a child field of a struct as a Series.
Series.struct.explode()	Extract all child fields of a struct as a DataFrame.

Flags

Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in Series.attrs.

Function	Description
Flags(obj, *, allows_duplicate_labels)	Flags that apply to pandas objects.

Metadata

Series.attrs is a dictionary for storing global metadata for this Series.

Warning

Series.attrs is considered experimental and may change without warning.

Function	Description
Series.attrs	Dictionary of global attributes of this dataset.

Plotting

Series.plot is both a callable method and a namespace attribute for specific plotting methods of the form Series.plot.<kind>.

Function	Description
Series.plot([kind, ax, figsize, ....])	Series plotting accessor and method

Function	Description
Series.plot.area([x, y, stacked])	Draw a stacked area plot.
Series.plot.bar([x, y, color])	Vertical bar plot.
Series.plot.barh([x, y, color])	Make a horizontal bar plot.
Series.plot.box([by])	Make a box plot of the DataFrame columns.
Series.plot.density([bw_method, ind, weights])	Generate Kernel Density Estimate plot using Gaussian kernels.
Series.plot.hist([by, bins])	Draw one histogram of the DataFrame's columns.
Series.plot.kde([bw_method, ind, weights])	Generate Kernel Density Estimate plot using Gaussian kernels.
Series.plot.line([x, y, color])	Plot Series or DataFrame as lines.
Series.plot.pie([y])	Generate a pie plot.

Function	Description
Series.hist([by, ax, grid, xlabelsize, ...])	Draw histogram of the input series using matplotlib.

Serialization / IO / conversion

Function	Description
Series.from_arrow(data)	Construct a Series from an array-like Arrow object.
Series.to_pickle(path, *[, compression, ...])	Pickle (serialize) object to file.
Series.to_csv([path_or_buf, sep, na_rep, ...])	Write object to a comma-separated values (csv) file.
Series.to_dict(*[, into])	Convert Series to {label -> value} dict or dict-like object.
Series.to_excel(excel_writer, *[, ...])	Write object to an Excel sheet.
Series.to_frame([name])	Convert Series to DataFrame.
Series.to_xarray()	Return an xarray object from the pandas object.
Series.to_hdf(path_or_buf, *, key[, mode, ...])	Write the contained data to an HDF5 file using HDFStore.
Series.to_sql(name, con, *[, schema, ...])	Write records stored in a DataFrame to a SQL database.
Series.to_json([path_or_buf, orient, ...])	Convert the object to a JSON string.
Series.to_string([buf, na_rep, ...])	Render a string representation of the Series.
Series.to_clipboard(*[, excel, sep])	Copy object to the system clipboard.
Series.to_latex([buf, columns, header, ...])	Render object to a LaTeX tabular, longtable, or nested table.
Series.to_markdown([buf, mode, index, ...])	Print Series in Markdown-friendly format.

Examples for `pandas.Series`

Constructing Series from a dictionary with an Index specified

>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["a", "b", "c"])
>>> ser
a   1
b   2
c   3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index values have no effect.

>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["x", "y", "z"])
>>> ser
x   NaN
y   NaN
z   NaN
dtype: float64

Note that the Index is first built with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.

Constructing Series from a list with copy=False.

>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a copy of the original data even though copy=False, so the data is unchanged.

Constructing Series from a 1d ndarray with copy=False.

>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999,   2])
>>> ser
0    999
1      2
dtype: int64

Due to input data type the Series has a view on the original data, so the data is changed as well.

Examples for `pandas.Series.index`

To create a Series with a custom index and view the index labels:

>>> cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
>>> populations = [14.85, 2.71, 2.93, 0.51]
>>> city_series = pd.Series(populations, index=cities)
>>> city_series.index
Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')

To change the index labels of an existing Series:

>>> city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
>>> city_series.index
Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')

Examples for `pandas.Series.array`

For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

>>> pd.Series([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

For extension types, like Categorical, the actual ExtensionArray is returned

>>> ser = pd.Series(pd.Categorical(["a", "b", "a"]))
>>> ser.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Examples for `pandas.Series.values`

>>> pd.Series([1, 2, 3]).values
array([1, 2, 3])

>>> pd.Series(list("aabc")).values
<ArrowStringArray>
['a', 'a', 'b', 'c']
Length: 4, dtype: str

>>> pd.Series(list("aabc")).astype("category").values
['a', 'a', 'b', 'c']
Categories (3, str): ['a', 'b', 'c']

Timezone aware datetime data is converted to UTC:

>>> pd.Series(pd.date_range("20130101", periods=3, tz="US/Eastern")).values
array(['2013-01-01T05:00:00.000000',
       '2013-01-02T05:00:00.000000',
       '2013-01-03T05:00:00.000000'], dtype='datetime64[us]')

Examples for `pandas.Series.dtype`

>>> s = pd.Series([1, 2, 3])
>>> s.dtype
dtype('int64')

Examples for `pandas.Series.info`

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
Series name: None
Non-Null Count  Dtype
--------------  -----
5 non-null      str
dtypes: str(1)
memory usage: 106.0 bytes

Prints a summary excluding information about its values:

>>> s.info(verbose=False)
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
dtypes: str(1)
memory usage: 106.0 bytes

Pipe output of Series.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big Series and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> s = pd.Series(np.random.choice(["a", "b", "c"], 10**6))
>>> s.info()
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  str
dtypes: str(1)
memory usage: 8.6 MB

>>> s.info(memory_usage="deep")
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count    Dtype
--------------    -----
1000000 non-null  str
dtypes: str(1)
memory usage: 8.6 MB

Examples for `pandas.Series.shape`

>>> s = pd.Series([1, 2, 3])
>>> s.shape
(3,)

Examples for `pandas.Series.nbytes`

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.nbytes
34

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24

Examples for `pandas.Series.ndim`

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.ndim
1

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1

Examples for `pandas.Series.size`

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.size
3

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3

Examples for `pandas.Series.T`

For Series:

>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.T
0     Ant
1    Bear
2     Cow
dtype: str

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')

Examples for `pandas.Series.memory_usage`

>>> s = pd.Series(range(3))
>>> s.memory_usage()
156

Not including the index gives the size of the rest of the data, which is necessarily smaller:

>>> s.memory_usage(index=False)
24

The memory footprint of object values is ignored by default:

>>> s = pd.Series(["a", "b"])
>>> s.values
<ArrowStringArray>
['a', 'b']
Length: 2, dtype: str
>>> s.memory_usage()
150
>>> s.memory_usage(deep=True)
150

Examples for `pandas.Series.hasnans`

>>> s = pd.Series([1, 2, 3, None])
>>> s
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64
>>> s.hasnans
True

Examples for `pandas.Series.empty`

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False

>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty!

>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False

Examples for `pandas.Series.dtypes`

>>> s = pd.Series([1, 2, 3])
>>> s.dtypes
dtype('int64')

Examples for `pandas.Series.name`

The Series name can be set initially when calling the constructor.

>>> s = pd.Series([1, 2, 3], dtype=np.int64, name="Numbers")
>>> s
0    1
1    2
2    3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0    1
1    2
2    3
Name: Integers, dtype: int64

The name of a Series within a DataFrame is its column name.

>>> df = pd.DataFrame(
...     [[1, 2], [3, 4], [5, 6]], columns=["Odd Numbers", "Even Numbers"]
... )
>>> df
   Odd Numbers  Even Numbers
0            1             2
1            3             4
2            5             6
>>> df["Even Numbers"].name
'Even Numbers'

Examples for `pandas.Series.flags`

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True

Examples for `pandas.Series.set_flags`

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False

Examples for `pandas.Series.astype`

Create a DataFrame:

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype("int32").dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({"col1": "int32"}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype("int64")
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype("category")
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[us]

Examples for `pandas.Series.convert_dtypes`

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0

>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: str

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string

DataFrame

Constructor

Function	Description
DataFrame([data, index, columns, dtype, copy])	Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Attributes and underlying data

Axes

Function	Description
DataFrame.index	The index (row labels) of the DataFrame.
DataFrame.columns	The column labels of the DataFrame.

Function	Description
DataFrame.dtypes	Return the dtypes in the DataFrame.
DataFrame.info([verbose, buf, max_cols, ...])	Print a concise summary of a DataFrame.
DataFrame.select_dtypes([include, exclude])	Return a subset of the DataFrame's columns based on the column dtypes.
DataFrame.values	Return a Numpy representation of the DataFrame.
DataFrame.axes	Return a list representing the axes of the DataFrame.
DataFrame.ndim	Return an int representing the number of axes / array dimensions.
DataFrame.size	Return an int representing the number of elements in this object.
DataFrame.shape	Return a tuple representing the dimensionality of the DataFrame.
DataFrame.memory_usage([index, deep])	Return the memory usage of each column in bytes.
DataFrame.empty	Indicator whether Series/DataFrame is empty.
DataFrame.set_flags(*[, copy, ...])	Return a new object with updated flags.

Conversion

Function	Description
DataFrame.astype(dtype[, copy, errors])	Cast a pandas object to a specified dtype `dtype`.
DataFrame.convert_dtypes([infer_objects, ...])	Convert columns from numpy dtypes to the best dtypes that support `pd.NA`.
DataFrame.infer_objects([copy])	Attempt to infer better dtypes for object columns.
DataFrame.copy([deep])	Make a copy of this object's indices and data.
DataFrame.to_numpy([dtype, copy, na_value])	Convert the DataFrame to a NumPy array.

Indexing, iteration

Function	Description
DataFrame.head([n])	Return the first n rows.
DataFrame.at	Access a single value for a row/column label pair.
DataFrame.iat	Access a single value for a row/column pair by integer position.
DataFrame.loc	Access a group of rows and columns by label(s) or a boolean array.
DataFrame.iloc	Purely integer-location based indexing for selection by position.
DataFrame.insert(loc, column, value[, ...])	Insert column into DataFrame at specified location.
DataFrame.iter()	Iterate over info axis.
DataFrame.items()	Iterate over (column name, Series) pairs.
DataFrame.keys()	Get the 'info axis' (see Indexing for more).
DataFrame.iterrows()	Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples([index, name])	Iterate over DataFrame rows as namedtuples.
DataFrame.pop(item)	Return item and drop it from DataFrame.
DataFrame.tail([n])	Return the last n rows.
DataFrame.xs(key[, axis, level, drop_level])	Return cross-section from the Series/DataFrame.
DataFrame.get(key[, default])	Get item from object for given key (ex: DataFrame column).
DataFrame.isin(values)	Whether each element in the DataFrame is contained in values.
DataFrame.where(cond[, other, inplace, ...])	Replace values where the condition is False.
DataFrame.mask(cond[, other, inplace, axis, ...])	Replace values where the condition is True.
DataFrame.query(expr, *[, parser, engine, ...])	Query the columns of a DataFrame with a boolean expression.
DataFrame.isetitem(loc, value)	Set the given value in the column with position loc.

For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.

Binary operator functions

Function	Description
DataFrame.add(other)	Get Addition of DataFrame and other, column-wise.
DataFrame.add(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator add).
DataFrame.sub(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator sub).
DataFrame.mul(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator mul).
DataFrame.div(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator truediv).
DataFrame.truediv(other[, axis, level, ...])	Get Floating division of dataframe and other, element-wise (binary operator truediv).
DataFrame.floordiv(other[, axis, level, ...])	Get Integer division of dataframe and other, element-wise (binary operator floordiv).
DataFrame.mod(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator mod).
DataFrame.pow(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator pow).
DataFrame.dot(other)	Compute the matrix multiplication between the DataFrame and other.
DataFrame.radd(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator radd).
DataFrame.rsub(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator rsub).
DataFrame.rmul(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator rmul).
DataFrame.rdiv(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
DataFrame.rtruediv(other[, axis, level, ...])	Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
DataFrame.rfloordiv(other[, axis, level, ...])	Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
DataFrame.rmod(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator rmod).
DataFrame.rpow(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator rpow).
DataFrame.lt(other[, axis, level])	Get Greater than of dataframe and other, element-wise (binary operator lt).
DataFrame.gt(other[, axis, level])	Get Greater than of dataframe and other, element-wise (binary operator gt).
DataFrame.le(other[, axis, level])	Get Greater than or equal to of dataframe and other, element-wise (binary operator le).
DataFrame.ge(other[, axis, level])	Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
DataFrame.ne(other[, axis, level])	Get Not equal to of dataframe and other, element-wise (binary operator ne).
DataFrame.eq(other[, axis, level])	Get Not equal to of dataframe and other, element-wise (binary operator eq).
DataFrame.combine(other, func[, fill_value, ...])	Perform column-wise combine with another DataFrame.
DataFrame.combine_first(other)	Update null elements with value in the same location in other.

Function application, GroupBy & window

Function	Description
DataFrame.apply(func[, axis, raw, ...])	Apply a function along an axis of the DataFrame.
DataFrame.map(func[, na_action])	Apply a function to a Dataframe elementwise.
DataFrame.pipe(func, args, *kwargs)	Apply chainable functions that expect Series or DataFrames.
DataFrame.agg([func, axis])	Aggregate using one or more operations over the specified axis.
DataFrame.aggregate([func, axis])	Aggregate using one or more operations over the specified axis.
DataFrame.transform(func[, axis])	Call `func` on self producing a DataFrame with the same axis shape as self.
DataFrame.groupby([by, level, as_index, ...])	Group DataFrame using a mapper or by a Series of columns.
DataFrame.rolling(window[, min_periods, ...])	Provide rolling window calculations.
DataFrame.expanding([min_periods, method])	Provide expanding window calculations.
DataFrame.ewm([com, span, halflife, alpha, ...])	Provide exponentially weighted (EW) calculations.

Computations / descriptive stats

Function	Description
DataFrame.abs()	Return a Series/DataFrame with absolute numeric value of each element.
DataFrame.all(*[, axis, bool_only, skipna])	Return whether all elements are True, potentially over an axis.
DataFrame.any(*[, axis, bool_only, skipna])	Return whether any element is True, potentially over an axis.
DataFrame.clip([lower, upper, axis, inplace])	Trim values at input threshold(s).
DataFrame.corr([method, min_periods, ...])	Compute pairwise correlation of columns, excluding NA/null values.
DataFrame.corrwith(other[, axis, drop, ...])	Compute pairwise correlation.
DataFrame.count([axis, numeric_only])	Count non-NA cells for each column or row.
DataFrame.cov([min_periods, ddof, numeric_only])	Compute pairwise covariance of columns, excluding NA/null values.
DataFrame.cummax([axis, skipna, numeric_only])	Return cumulative maximum over a DataFrame or Series axis.
DataFrame.cummin([axis, skipna, numeric_only])	Return cumulative minimum over a DataFrame or Series axis.
DataFrame.cumprod([axis, skipna, numeric_only])	Return cumulative product over a DataFrame or Series axis.
DataFrame.cumsum([axis, skipna, numeric_only])	Return cumulative sum over a DataFrame or Series axis.
DataFrame.describe([percentiles, include, ...])	Generate descriptive statistics.
DataFrame.diff([periods, axis])	First discrete difference of element.
DataFrame.eval(expr, *[, inplace])	Evaluate a string describing operations on DataFrame columns.
DataFrame.kurt(*[, axis, skipna, numeric_only])	Return unbiased kurtosis over requested axis.
DataFrame.kurtosis(*[, axis, skipna, ...])	Return unbiased kurtosis over requested axis.
DataFrame.max(*[, axis, skipna, numeric_only])	Return the maximum of the values over the requested axis.
DataFrame.mean(*[, axis, skipna, numeric_only])	Return the mean of the values over the requested axis.
DataFrame.median(*[, axis, skipna, numeric_only])	Return the median of the values over the requested axis.
DataFrame.min(*[, axis, skipna, numeric_only])	Return the minimum of the values over the requested axis.
DataFrame.mode([axis, numeric_only, dropna])	Get the mode(s) of each element along the selected axis.
DataFrame.pct_change([periods, fill_method, ...])	Fractional change between the current and a prior element.
DataFrame.prod(*[, axis, skipna, ...])	Return the product of the values over the requested axis.
DataFrame.product(*[, axis, skipna, ...])	Return the product of the values over the requested axis.
DataFrame.quantile([q, axis, numeric_only, ...])	Return values at the given quantile over requested axis.
DataFrame.rank([axis, method, numeric_only, ...])	Compute numerical data ranks (1 through n) along axis.
DataFrame.round([decimals])	Round numeric columns in a DataFrame to a variable number of decimal places.
DataFrame.sem(*[, axis, skipna, ddof, ...])	Return unbiased standard error of the mean over requested axis.
DataFrame.skew(*[, axis, skipna, numeric_only])	Return unbiased skew over requested axis.
DataFrame.sum(*[, axis, skipna, ...])	Return the sum of the values over the requested axis.
DataFrame.std(*[, axis, skipna, ddof, ...])	Return sample standard deviation over requested axis.
DataFrame.var(*[, axis, skipna, ddof, ...])	Return unbiased variance over requested axis.
DataFrame.nunique([axis, dropna])	Count number of distinct elements in specified axis.
DataFrame.value_counts([subset, normalize, ...])	Return a Series containing the frequency of each distinct row in the DataFrame.

Reindexing / selection / label manipulation

Function	Description
DataFrame.add_prefix(prefix[, axis])	Prefix labels with string prefix.
DataFrame.add_suffix(suffix[, axis])	Suffix labels with string suffix.
DataFrame.align(other[, join, axis, level, ...])	Align two objects on their axes with the specified join method.
DataFrame.at_time(time[, asof, axis])	Select values at particular time of day (e.g., 9:30AM).
DataFrame.between_time(start_time, end_time)	Select values between particular times of the day (e.g., 9:00-9:30 AM).
DataFrame.drop([labels, axis, index, ...])	Drop specified labels from rows or columns.
DataFrame.drop_duplicates([subset, keep, ...])	Return DataFrame with duplicate rows removed.
DataFrame.duplicated([subset, keep])	Return boolean Series denoting duplicate rows.
DataFrame.equals(other)	Test whether two objects contain the same elements.
DataFrame.filter([items, like, regex, axis])	Subset the DataFrame or Series according to the specified index labels.
DataFrame.idxmax([axis, skipna, numeric_only])	Return index of first occurrence of maximum over requested axis.
DataFrame.idxmin([axis, skipna, numeric_only])	Return index of first occurrence of minimum over requested axis.
DataFrame.reindex([labels, index, columns, ...])	Conform DataFrame to new index with optional filling logic.
DataFrame.reindex_like(other[, method, ...])	Return an object with matching indices as other object.
DataFrame.rename([mapper, index, columns, ...])	Rename columns or index labels.
DataFrame.rename_axis([mapper, index, ...])	Set the name of the axis for the index or columns.
DataFrame.reset_index([level, drop, ...])	Reset the index, or a level of it.
DataFrame.sample([n, frac, replace, ...])	Return a random sample of items from an axis of object.
DataFrame.set_axis(labels, *[, axis, copy])	Assign desired index to given axis.
DataFrame.set_index(keys, *[, drop, append, ...])	Set the DataFrame index using existing columns.
DataFrame.take(indices[, axis])	Return the elements in the given positional indices along an axis.
DataFrame.truncate([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.

Missing data handling

Function	Description
DataFrame.bfill(*[, axis, inplace, limit, ...])	Fill NA/NaN values by using the next valid observation to fill the gap.
DataFrame.dropna(*[, axis, how, thresh, ...])	Remove missing values.
DataFrame.ffill(*[, axis, inplace, limit, ...])	Fill NA/NaN values by propagating the last valid observation to next valid.
DataFrame.fillna(value, *[, axis, inplace, ...])	Fill NA/NaN values with value.
DataFrame.interpolate([method, axis, limit, ...])	Fill NaN values using an interpolation method.
DataFrame.isna()	Detect missing values.
DataFrame.isnull()	DataFrame.isnull is an alias for DataFrame.isna.
DataFrame.notna()	Detect existing (non-missing) values.
DataFrame.notnull()	DataFrame.notnull is an alias for DataFrame.notna.
DataFrame.replace([to_replace, value, ...])	Replace values given in to_replace with value.

Reshaping, sorting, transposing

Function	Description
DataFrame.droplevel(level[, axis])	Return Series/DataFrame with requested index / column level(s) removed.
DataFrame.pivot(*, columns[, index, values])	Return reshaped DataFrame organized by given index / column values.
DataFrame.pivot_table([values, index, ...])	Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.reorder_levels(order[, axis])	Rearrange index or column levels using input `order`.
DataFrame.sort_values(by, *[, axis, ...])	Sort by the values along either axis.
DataFrame.sort_index(*[, axis, level, ...])	Sort object by labels (along an axis).
DataFrame.nlargest(n, columns[, keep])	Return the first n rows ordered by columns in descending order.
DataFrame.nsmallest(n, columns[, keep])	Return the first n rows ordered by columns in ascending order.
DataFrame.swaplevel([i, j, axis])	Swap levels i and j in a `MultiIndex`.
DataFrame.stack([level, dropna, sort, ...])	Stack the prescribed level(s) from columns to index.
DataFrame.unstack([level, fill_value, sort])	Pivot a level of the (necessarily hierarchical) index labels.
DataFrame.melt([id_vars, value_vars, ...])	Unpivot DataFrame from wide to long format, optionally leaving identifiers set.
DataFrame.explode(column[, ignore_index])	Transform each element of a list-like to a row, replicating index values.
DataFrame.squeeze([axis])	Squeeze 1 dimensional axis objects into scalars.
DataFrame.to_xarray()	Return an xarray object from the pandas object.
DataFrame.T	The transpose of the DataFrame.
DataFrame.transpose(*args[, copy])	Transpose index and columns.

Combining / comparing / joining / merging

Function	Description
DataFrame.assign(**kwargs)	Assign new columns to a DataFrame.
DataFrame.compare(other[, align_axis, ...])	Compare to another DataFrame and show the differences.
DataFrame.join(other[, on, how, lsuffix, ...])	Join columns of another DataFrame.
DataFrame.merge(right[, how, on, left_on, ...])	Merge DataFrame or named Series objects with a database-style join.
DataFrame.update(other[, join, overwrite, ...])	Modify in place using non-NA values from another DataFrame.

Time Series-related

Function	Description
DataFrame.asfreq(freq[, method, how, ...])	Convert time series to specified frequency.
DataFrame.asof(where[, subset])	Return the last row(s) without any NaNs before where.
DataFrame.shift([periods, freq, axis, ...])	Shift index by desired number of periods with an optional time freq.
DataFrame.first_valid_index()	Return index for first non-missing value or None, if no value is found.
DataFrame.last_valid_index()	Return index for last non-missing value or None, if no value is found.
DataFrame.resample(rule[, closed, label, ...])	Resample time-series data.
DataFrame.to_period([freq, axis, copy])	Convert DataFrame from DatetimeIndex to PeriodIndex.
DataFrame.to_timestamp([freq, how, axis, copy])	Cast PeriodIndex to DatetimeIndex of timestamps, at beginning of period.
DataFrame.tz_convert(tz[, axis, level, copy])	Convert tz-aware axis to target time zone.
DataFrame.tz_localize(tz[, axis, level, ...])	Localize time zone naive index of a Series or DataFrame to target time zone.

Flags

Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in DataFrame.attrs.

Function	Description
Flags(obj, *, allows_duplicate_labels)	Flags that apply to pandas objects.

Metadata

DataFrame.attrs is a dictionary for storing global metadata for this DataFrame.

Warning

DataFrame.attrs is considered experimental and may change without warning.

Function	Description
DataFrame.attrs	Dictionary of global attributes of this dataset.

Plotting

DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.

Function	Description
DataFrame.plot([x, y, kind, ax, ....])	DataFrame plotting accessor and method

Function	Description
DataFrame.plot.area([x, y, stacked])	Draw a stacked area plot.
DataFrame.plot.bar([x, y, color])	Vertical bar plot.
DataFrame.plot.barh([x, y, color])	Make a horizontal bar plot.
DataFrame.plot.box([by])	Make a box plot of the DataFrame columns.
DataFrame.plot.density([bw_method, ind, weights])	Generate Kernel Density Estimate plot using Gaussian kernels.
DataFrame.plot.hexbin(x, y[, C, ...])	Generate a hexagonal binning plot.
DataFrame.plot.hist([by, bins])	Draw one histogram of the DataFrame's columns.
DataFrame.plot.kde([bw_method, ind, weights])	Generate Kernel Density Estimate plot using Gaussian kernels.
DataFrame.plot.line([x, y, color])	Plot Series or DataFrame as lines.
DataFrame.plot.pie([y])	Generate a pie plot.
DataFrame.plot.scatter(x, y[, s, c])	Create a scatter plot with varying marker point size and color.

Function	Description
DataFrame.boxplot([column, by, ax, ...])	Make a box plot from DataFrame columns.
DataFrame.hist([column, by, grid, ...])	Make a histogram of the DataFrame's columns.

Sparse accessor

Sparse-dtype specific methods and attributes are provided under the DataFrame.sparse accessor.

Function	Description
DataFrame.sparse.density	Ratio of non-sparse points to total (dense) data points.

Function	Description
DataFrame.sparse.from_spmatrix(data[, ...])	Create a new DataFrame from a scipy sparse matrix.
DataFrame.sparse.to_coo()	Return the contents of the frame as a sparse SciPy COO matrix.
DataFrame.sparse.to_dense()	Convert a DataFrame with sparse values to dense.

Serialization / IO / conversion

Function	Description
DataFrame.from_arrow(data)	Construct a DataFrame from a tabular Arrow object.
DataFrame.from_dict(data[, orient, dtype, ...])	Construct DataFrame from dict of array-like or dicts.
DataFrame.from_records(data[, index, ...])	Convert structured or record ndarray to DataFrame.
DataFrame.to_orc([path, engine, index, ...])	Write a DataFrame to the Optimized Row Columnar (ORC) format.
DataFrame.to_parquet([path, engine, ...])	Write a DataFrame to the binary parquet format.
DataFrame.to_pickle(path, *[, compression, ...])	Pickle (serialize) object to file.
DataFrame.to_csv([path_or_buf, sep, na_rep, ...])	Write object to a comma-separated values (csv) file.
DataFrame.to_hdf(path_or_buf, *, key[, ...])	Write the contained data to an HDF5 file using HDFStore.
DataFrame.to_sql(name, con, *[, schema, ...])	Write records stored in a DataFrame to a SQL database.
DataFrame.to_dict([orient, into, index])	Convert the DataFrame to a dictionary.
DataFrame.to_excel(excel_writer, *[, ...])	Write object to an Excel sheet.
DataFrame.to_json([path_or_buf, orient, ...])	Convert the object to a JSON string.
DataFrame.to_html([buf, columns, col_space, ...])	Render a DataFrame as an HTML table.
DataFrame.to_feather(path, **kwargs)	Write a DataFrame to the binary Feather format.
DataFrame.to_latex([buf, columns, header, ...])	Render object to a LaTeX tabular, longtable, or nested table.
DataFrame.to_stata(path, *[, convert_dates, ...])	Export DataFrame object to Stata dta format.
DataFrame.to_records([index, column_dtypes, ...])	Convert DataFrame to a NumPy record array.
DataFrame.to_string([buf, columns, ...])	Render a DataFrame to a console-friendly tabular output.
DataFrame.to_clipboard(*[, excel, sep])	Copy object to the system clipboard.
DataFrame.to_markdown([buf, mode, index, ...])	Print DataFrame in Markdown-friendly format.
DataFrame.style	Returns a Styler object.
DataFrame.dataframe([nan_as_null, ...])	(DEPRECATED) Return the dataframe interchange object implementing the interchange protocol.

Examples for `pandas.DataFrame`

Constructing DataFrame from a dictionary.

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

Constructing DataFrame from a dictionary including Series:

>>> d = {"col1": [0, 1, 2, 3], "col2": pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(
...     np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["a", "b", "c"]
... )
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array(
...     [(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...     dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")],
... )
>>> df3 = pd.DataFrame(data, columns=["c", "a"])
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

Constructing DataFrame from Series/DataFrame:

>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
   0
a  1
c  3

>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
   x
a  1
c  3

Examples for `pandas.DataFrame.index`

>>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
...                    'Age': [25, 30, 35],
...                    'Location': ['Seattle', 'New York', 'Kona']},
...                   index=([10, 20, 30]))
>>> df.index
Index([10, 20, 30], dtype='int64')

In this example, we create a DataFrame with 3 rows and 3 columns, including Name, Age, and Location information. We set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.

>>> df.index = [100, 200, 300]
>>> df
    Name  Age Location
100  Alice   25  Seattle
200    Bob   30 New York
300  Aritra  35    Kona

In this example, we modify the index labels of the DataFrame by assigning a new list of labels to the index attribute. The DataFrame is then updated with the new labels, and the output shows the modified DataFrame.

Examples for `pandas.DataFrame.columns`

>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df
        A  B
0    1  3
1    2  4
>>> df.columns
Index(['A', 'B'], dtype='str')

Examples for `pandas.DataFrame.dtypes`

>>> df = pd.DataFrame(
...     {
...         "float": [1.0],
...         "int": [1],
...         "datetime": [pd.Timestamp("20180310")],
...         "string": ["foo"],
...     }
... )
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[us]
string              str
dtype: object

Examples for `pandas.DataFrame.info`

>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame(
...     {
...         "int_col": int_values,
...         "text_col": text_values,
...         "float_col": float_values,
...     }
... )
>>> df
    int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      str
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes

Prints a summary of columns count and its dtypes but not per column information:

>>> df.info(verbose=False)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> df = pd.DataFrame(
...     {
...         "column_1": np.random.choice(["a", "b", "c"], 10**6),
...         "column_2": np.random.choice(["a", "b", "c"], 10**6),
...         "column_3": np.random.choice(["a", "b", "c"], 10**6),
...     }
... )
>>> df.info()
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  str
 1   column_2  1000000 non-null  str
 2   column_3  1000000 non-null  str
dtypes: str(3)
memory usage: 25.7 MB

>>> df.info(memory_usage="deep")
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  str
 1   column_2  1000000 non-null  str
 2   column_3  1000000 non-null  str
dtypes: str(3)
memory usage: 25.7 MB

Examples for `pandas.DataFrame.select_dtypes`

>>> df = pd.DataFrame(
...     {"a": [1, 2] * 3, "b": [True, False] * 3, "c": [1.0, 2.0] * 3}
... )
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

>>> df.select_dtypes(include="bool")
   b
0  True
1  False
2  True
3  False
4  True
5  False

>>> df.select_dtypes(include=["float64"])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0

>>> df.select_dtypes(exclude=["int64"])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0

Examples for `pandas.DataFrame.values`

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

>>> df = pd.DataFrame(
...     {"age": [3, 29], "height": [94, 170], "weight": [31, 115]}
... )
>>> df
   age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
       [ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

>>> df2 = pd.DataFrame(
...     [
...         ("parrot", 24.0, "second"),
...         ("lion", 80.5, 1),
...         ("monkey", np.nan, None),
...     ],
...     columns=("name", "max_speed", "rank"),
... )
>>> df2.dtypes
name             str
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
       ['lion', 80.5, 1],
       ['monkey', nan, None]], dtype=object)

Examples for `pandas.DataFrame.axes`

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]

Examples for `pandas.DataFrame.ndim`

>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.ndim
1

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.ndim
2

Examples for `pandas.DataFrame.size`

>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.size
3

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.size
4

Examples for `pandas.DataFrame.shape`

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.shape
(2, 2)

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4], "col3": [5, 6]})
>>> df.shape
(2, 3)

Examples for `pandas.DataFrame.memory_usage`

>>> dtypes = ["int64", "float64", "complex128", "object", "bool"]
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
   int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True

>>> df.memory_usage()
Index           132
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

>>> df.memory_usage(deep=True)
Index            132
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

>>> df["object"].astype("category").memory_usage(deep=True)
5140

Examples for `pandas.DataFrame.empty`

An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({"A": []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({"A": [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True

>>> ser_empty = pd.Series({"A": []})
>>> ser_empty
A    []
dtype: object
>>> ser_empty.empty
False
>>> ser_empty = pd.Series()
>>> ser_empty.empty
True

Examples for `pandas.DataFrame.set_flags`

>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False

Examples for `pandas.DataFrame.astype`

Create a DataFrame:

>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

>>> df.astype("int32").dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

>>> df.astype({"col1": "int32"}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype("int64")
0    1
1    2
dtype: int64

Convert to categorical type:

>>> ser.astype("category")
0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Create a series of dates:

>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[us]

Examples for `pandas.DataFrame.convert_dtypes`

>>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

>>> df
   a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

>>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

>>> dfn = df.convert_dtypes()
>>> dfn
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0

>>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

>>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: str

Obtain a Series with dtype StringDtype.

>>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string

Examples for `pandas.DataFrame.infer_objects`

>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
   A
1  1
2  2
3  3

>>> df.dtypes
A    object
dtype: object

>>> df.infer_objects().dtypes
A    int64
dtype: object

Examples for `pandas.DataFrame.copy`

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64

>>> s_copy = s.copy(deep=True)
>>> s_copy
a    1
b    2
dtype: int64

Due to Copy-on-Write, shallow copies still protect data modifications. Note shallow does not get modified below.

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> shallow = s.copy(deep=False)
>>> s.iloc[1] = 200
>>> shallow
a    1
b    2
dtype: int64

When the data has object dtype, even a deep copy does not copy the underlying Python objects. Updating a nested data object will be reflected in the deep copy.

>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0    [10, 2]
1     [3, 4]
dtype: object
>>> deep
0    [10, 2]
1     [3, 4]
dtype: object

Examples for `pandas.DataFrame.to_numpy`

>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])

With heterogeneous data, the lowest common type will have to be used.

>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

>>> df["C"] = pd.date_range("2000", periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

Examples for `pandas.DataFrame.head`

>>> df = pd.DataFrame(
...     {
...         "animal": [
...             "alligator",
...             "bee",
...             "falcon",
...             "lion",
...             "monkey",
...             "parrot",
...             "shark",
...             "whale",
...             "zebra",
...         ]
...     }
... )
>>> df
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

>>> df.head()
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
      animal
0  alligator
1        bee
2     falcon

For negative values of n

>>> df.head(-3)
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot

pandas arrays, scalars, and data types

Objects

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.

Kind of Data	pandas Data Type	Scalar	Array
TZ-aware datetime	DatetimeTZDtype	Timestamp	Datetimes
Timedeltas	(none)	Timedelta	Timedeltas
Period (time spans)	PeriodDtype	Period	Periods
Intervals	IntervalDtype	Interval	Intervals
Nullable Integer	Int64Dtype, …	(none)	Nullable integer
Nullable Float	Float64Dtype, …	(none)	Nullable float
Categorical	CategoricalDtype	(none)	Categoricals
Sparse	SparseDtype	(none)	Sparse
Strings	StringDtype	`str`	Strings
Nullable Boolean	BooleanDtype	`bool`	Nullable Boolean
PyArrow	ArrowDtype	Python Scalars or NA	PyArrow

pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

Function	Description
array(data[, dtype, copy])	Create an array.

PyArrow

Warning

This feature is experimental, and the API can change in a future release without warning.

The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with a pyarrow.DataType instead of a NumPy array and data type. The .dtype of a arrays.ArrowExtensionArray is an ArrowDtype.

Pyarrow provides similar array and data type support as NumPy including first-class nullability support for all data types, immutability and more.

The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_()).

PyArrow type	pandas extension type	NumPy type
`pyarrow.bool_()`	BooleanDtype	`np.bool_`
`pyarrow.int8()`	Int8Dtype	`np.int8`
`pyarrow.int16()`	Int16Dtype	`np.int16`
`pyarrow.int32()`	Int32Dtype	`np.int32`
`pyarrow.int64()`	Int64Dtype	`np.int64`
`pyarrow.uint8()`	UInt8Dtype	`np.uint8`
`pyarrow.uint16()`	UInt16Dtype	`np.uint16`
`pyarrow.uint32()`	UInt32Dtype	`np.uint32`
`pyarrow.uint64()`	UInt64Dtype	`np.uint64`
`pyarrow.float32()`	Float32Dtype	`np.float32`
`pyarrow.float64()`	Float64Dtype	`np.float64`
`pyarrow.time32()`	(none)	(none)
`pyarrow.time64()`	(none)	(none)
`pyarrow.timestamp()`	DatetimeTZDtype	`np.datetime64`
`pyarrow.date32()`	(none)	(none)
`pyarrow.date64()`	(none)	(none)
`pyarrow.duration()`	(none)	`np.timedelta64`
`pyarrow.binary()`	(none)	(none)
`pyarrow.string()`	StringDtype	`np.str_`
`pyarrow.decimal128()`	(none)	(none)
`pyarrow.list_()`	(none)	(none)
`pyarrow.map_()`	(none)	(none)
`pyarrow.dictionary()`	CategoricalDtype	(none)

Note

Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow") and pd.ArrowDtype(pa.string()). pd.StringDtype("pyarrow") is described below in the string section and will be returned if the string alias "string[pyarrow]" is specified. pd.ArrowDtype(pa.string()) generally has better interoperability with ArrowDtype of different types.

While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returned as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.

Function	Description
arrays.ArrowExtensionArray(values)	Pandas ExtensionArray backed by a PyArrow ChunkedArray.

Function	Description
ArrowDtype(pyarrow_dtype)	An ExtensionDtype for PyArrow data types.

For more information, please see the PyArrow user guide.

Datetimes

NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaT is the missing value for datetime data.

Function	Description
Timestamp([ts_input, year, month, day, ...])	Pandas replacement for python datetime.datetime object.

Properties

Property	Description
Timestamp.asm8	Return numpy datetime64 format with same precision.
Timestamp.day	Return the day of the Timestamp.
Timestamp.dayofweek	Return day of the week.
Timestamp.day_of_week	Return day of the week.
Timestamp.dayofyear	Return the day of the year.
Timestamp.day_of_year	Return the day of the year.
Timestamp.days_in_month	Return the number of days in the month.
Timestamp.daysinmonth	Return the number of days in the month.
Timestamp.fold	Return the fold value of the Timestamp.
Timestamp.hour	Return the hour of the Timestamp.
Timestamp.is_leap_year	Return True if year is a leap year.
Timestamp.is_month_end	Check if the date is the last day of the month.
Timestamp.is_month_start	Check if the date is the first day of the month.
Timestamp.is_quarter_end	Check if date is last day of the quarter.
Timestamp.is_quarter_start	Check if the date is the first day of the quarter.
Timestamp.is_year_end	Return True if date is last day of the year.
Timestamp.is_year_start	Return True if date is first day of the year.
Timestamp.max
Timestamp.microsecond	Return the microsecond of the Timestamp.
Timestamp.min
Timestamp.minute	Return the minute of the Timestamp.
Timestamp.month	Return the month of the Timestamp.
Timestamp.nanosecond	Return the nanosecond of the Timestamp.
Timestamp.quarter	Return the quarter of the year for the Timestamp.
Timestamp.resolution
Timestamp.second	Return the second of the Timestamp.
Timestamp.tz	Alias for tzinfo.
Timestamp.tzinfo	Returns the timezone info of the Timestamp.
Timestamp.unit	The abbreviation associated with self._creso.
Timestamp.value	Return the value of the Timestamp.
Timestamp.week	Return the week number of the year.
Timestamp.weekofyear	Return the week number of the year.
Timestamp.year	Return the year of the Timestamp.

Methods

Method	Description
Timestamp.as_unit(unit[, round_ok])	Convert the underlying int64 representation to the given unit.
Timestamp.astimezone(tz)	Convert timezone-aware Timestamp to another time zone.
Timestamp.ceil(freq[, ambiguous, nonexistent])	Return a new Timestamp ceiled to this resolution.
Timestamp.combine(date, time)	Combine a date and time into a single Timestamp object.
Timestamp.ctime()	Return a ctime() style string representing the Timestamp.
Timestamp.date()	Returns datetime.date with the same year, month, and day.
Timestamp.day_name([locale])	Return the day name of the Timestamp with specified locale.
Timestamp.dst()	Return the daylight saving time (DST) adjustment.
Timestamp.floor(freq[, ambiguous, nonexistent])	Return a new Timestamp floored to this resolution.
Timestamp.fromordinal(ordinal[, tz])	Construct a timestamp from a proleptic Gregorian ordinal.
Timestamp.fromtimestamp(ts[, tz])	Create a Timestamp object from a POSIX timestamp.
Timestamp.isocalendar()	Return a named tuple containing ISO year, week number, and weekday.
Timestamp.isoformat([sep, timespec])	Return the time formatted according to ISO 8601.
Timestamp.isoweekday()	Return the day of the week represented by the date.
Timestamp.month_name([locale])	Return the month name of the Timestamp with specified locale.
Timestamp.normalize()	Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz])	Return new Timestamp object representing current time local to tz.
Timestamp.replace([year, month, day, hour, ...])	Implements datetime.replace, handles nanoseconds.
Timestamp.round(freq[, ambiguous, nonexistent])	Round the Timestamp to the specified resolution.
Timestamp.strftime(format)	Return a formatted string of the Timestamp.
Timestamp.strptime(date_string, format)	Convert string argument to datetime.
Timestamp.time()	Return time object with same time but with tzinfo=None.
Timestamp.timestamp()	Return POSIX timestamp as float.
Timestamp.timetuple()	Return time tuple, compatible with time.localtime().
Timestamp.timetz()	Return time object with same time and tzinfo.
Timestamp.to_datetime64()	Return a NumPy datetime64 object with same precision.
Timestamp.to_numpy([dtype, copy])	Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date()	Convert TimeStamp to a Julian Date.
Timestamp.to_period([freq])	Return a period of which this timestamp is an observation.
Timestamp.to_pydatetime([warn])	Convert a Timestamp object to a native Python datetime object.
Timestamp.today([tz])	Return the current time in the local timezone.
Timestamp.toordinal()	Return proleptic Gregorian ordinal.
Timestamp.tz_convert(tz)	Convert timezone-aware Timestamp to another time zone.
Timestamp.tz_localize(tz[, ambiguous, ...])	Localize the Timestamp to a timezone.
Timestamp.tzname()	Return time zone name.
Timestamp.utcfromtimestamp(ts)	Construct a timezone-aware UTC datetime from a POSIX timestamp.
Timestamp.utcnow()	Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset()	Return utc offset.
Timestamp.utctimetuple()	Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday()	Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a arrays.DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are timezone-aware, then every value in the array must have the same timezone.

Function	Description
arrays.DatetimeArray(data[, dtype, freq, copy])	Pandas ExtensionArray for tz-naive or tz-aware datetime data.

Function	Description
DatetimeTZDtype([unit, tz])	An ExtensionDtype for timezone-aware datetime data.

Timedeltas

NumPy can natively represent timedeltas. pandas provides Timedelta for symmetry with Timestamp. NaT is the missing value for timedelta data.

Function	Description
Timedelta([value, unit])	Represents a duration, the difference between two dates or times.

Properties

Property	Description
Timedelta.asm8	Return a numpy timedelta64 array scalar view.
Timedelta.components	Return a components namedtuple-like.
Timedelta.days	Returns the days of the timedelta.
Timedelta.max
Timedelta.microseconds	Return the number of microseconds (n), where 0 <= n < 1 millisecond.
Timedelta.min
Timedelta.nanoseconds	Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution
Timedelta.seconds	Return the total hours, minutes, and seconds of the timedelta as seconds.
Timedelta.unit	Return the unit of Timedelta object.
Timedelta.value	Return the value of Timedelta object in nanoseconds.
Timedelta.view(dtype)	Array view compatibility.

Methods

Method	Description
Timedelta.as_unit(unit[, round_ok])	Convert the underlying int64 representation to the given unit.
Timedelta.ceil(freq)	Return a new Timedelta ceiled to this resolution.
Timedelta.floor(freq)	Return a new Timedelta floored to this resolution.
Timedelta.isoformat()	Format the Timedelta as ISO 8601 Duration.
Timedelta.round(freq)	Round the Timedelta to the specified resolution.
Timedelta.to_pytimedelta()	Convert a pandas Timedelta object into a python `datetime.timedelta` object.
Timedelta.to_timedelta64()	Return a numpy.timedelta64 object with 'ns' precision.
Timedelta.to_numpy([dtype, copy])	Convert the Timedelta to a NumPy timedelta64.
Timedelta.total_seconds()	Total seconds in the duration.

A collection of Timedelta may be stored in a TimedeltaArray.

Function	Description
arrays.TimedeltaArray(data[, dtype, freq, copy])	Pandas ExtensionArray for timedelta data.

Periods

pandas represents spans of times as Period objects.

Period

Function	Description
Period([value, freq, ordinal, year, month, ...])	Represents a period of time.

Properties

Property	Description
Period.day	Get day of the month that a Period falls on.
Period.dayofweek	Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.day_of_week	Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear	Return the day of the year.
Period.day_of_year	Return the day of the year.
Period.days_in_month	Get the total number of days in the month that this period falls on.
Period.daysinmonth	Get the total number of days of the month that this period falls on.
Period.end_time	Get the Timestamp for the end of the period.
Period.freq	Return the frequency object for this Period.
Period.freqstr	Return a string representation of the frequency.
Period.hour	Get the hour of the day component of the Period.
Period.is_leap_year	Return True if the period's year is in a leap year.
Period.minute	Get minute of the hour component of the Period.
Period.month	Return the month this Period falls on.
Period.ordinal	Return the integer ordinal for this Period.
Period.quarter	Return the quarter this Period falls on.
Period.qyear	Fiscal year the Period lies in according to its starting-quarter.
Period.second	Get the second component of the Period.
Period.start_time	Get the Timestamp for the start of the period.
Period.week	Get the week of the year on the given Period.
Period.weekday	Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear	Get the week of the year on the given Period.
Period.year	Return the year this Period falls on.

Methods

Method	Description
Period.asfreq(freq[, how])	Convert Period to desired frequency, at the start or end of the interval.
Period.now(freq)	Return the period of now's date.
Period.strftime(fmt)	Returns a formatted string representation of the `Period`.
Period.to_timestamp([freq, how])	Return the Timestamp representation of the Period.

A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq.

Function	Description
arrays.PeriodArray(values[, dtype, copy])	Pandas ExtensionArray for storing Period data.

Function	Description
PeriodDtype(freq)	An ExtensionDtype for Period data.

Intervals

Arbitrary intervals can be represented as Interval objects.

Function	Description
Interval	Immutable object implementing an Interval, a bounded slice-like interval.

Properties

Property	Description
Interval.closed	String describing the inclusive side the intervals.
Interval.closed_left	Check if the interval is closed on the left side.
Interval.closed_right	Check if the interval is closed on the right side.
Interval.is_empty	Indicates if an interval is empty, meaning it contains no points.
Interval.left	Left bound for the interval.
Interval.length	Return the length of the Interval.
Interval.mid	Return the midpoint of the Interval.
Interval.open_left	Check if the interval is open on the left side.
Interval.open_right	Check if the interval is open on the right side.
Interval.overlaps(other)	Check whether two Interval objects overlap.
Interval.right	Right bound for the interval.

A collection of intervals may be stored in an arrays.IntervalArray.

Function	Description
arrays.IntervalArray(data[, closed, dtype, ...])	Pandas array for interval data that are closed on the same side.

Function	Description
IntervalDtype([subtype, closed])	An ExtensionDtype for Interval data.

Nullable integer

numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.

Function	Description
arrays.IntegerArray(values, mask[, copy])	Array of integer (optional missing) values.

Function	Description
Int8Dtype()	An ExtensionDtype for int8 integer data.
Int16Dtype()	An ExtensionDtype for int16 integer data.
Int32Dtype()	An ExtensionDtype for int32 integer data.
Int64Dtype()	An ExtensionDtype for int64 integer data.
UInt8Dtype()	An ExtensionDtype for uint8 integer data.
UInt16Dtype()	An ExtensionDtype for uint16 integer data.
UInt32Dtype()	An ExtensionDtype for uint32 integer data.
UInt64Dtype()	An ExtensionDtype for uint64 integer data.

Nullable float

Function	Description
arrays.FloatingArray(values, mask[, copy])	Array of floating (optional missing) values.

Function	Description
Float32Dtype()	An ExtensionDtype for float32 data.
Float64Dtype()	An ExtensionDtype for float64 data.

Categoricals

pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.

Function	Description
CategoricalDtype([categories, ordered])	Type for categorical data with the categories and orderedness.

Property	Description
CategoricalDtype.categories	An `Index` containing the unique categories allowed.
CategoricalDtype.ordered	Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical:

Function	Description
Categorical(values[, categories, ordered, ...])	Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Function	Description
Categorical.from_codes(codes[, categories, ...])	Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Property	Description
Categorical.dtype	The `CategoricalDtype` for this instance.
Categorical.categories	The categories of this categorical.
Categorical.ordered	Whether the categories have an ordered relationship.
Categorical.codes	The category codes of this categorical index.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Function	Description
Categorical.array([dtype, copy])	The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

the string 'category'
an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.

More methods are available on Categorical:

Method	Description
Categorical.as_ordered()	Set the Categorical to be ordered.
Categorical.as_unordered()	Set the Categorical to be unordered.
Categorical.set_categories(new_categories[, ...])	Set the categories to the specified new categories.
Categorical.rename_categories(new_categories)	Rename categories.
Categorical.reorder_categories(new_categories)	Reorder categories as specified in new_categories.
Categorical.add_categories(new_categories)	Add new categories.
Categorical.remove_categories(removals)	Remove the specified categories.
Categorical.remove_unused_categories()	Remove categories which are not used.
Categorical.map(mapper[, na_action])	Map categories using an input mapping or function.

Sparse

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.

Function	Description
arrays.SparseArray(data[, sparse_index, ...])	An ExtensionArray for storing sparse data.

Function	Description
SparseDtype([dtype, fill_value])	Dtype for data stored in `SparseArray`.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor and the user guide for more.

Strings

When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").

Function	Description
arrays.StringArray(values, *[, dtype, copy])	Extension array for string data.
arrays.ArrowStringArray(values, *[, dtype])	Extension array for string data in a `pyarrow.ChunkedArray`.

Function	Description
StringDtype([storage, na_value])	Extension dtype for string data.

The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.

Nullable Boolean

The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False) with missing values, which is not possible with a bool numpy.ndarray.

Function	Description
arrays.BooleanArray(values, mask[, copy])	Array of boolean (True/False) data with missing values.

Function	Description
BooleanDtype()	Extension dtype for boolean data.

Utilities

Constructors

Function	Description
api.types.union_categoricals(to_union[, ...])	Combine list-like of Categorical-like, unioning categories.
api.types.infer_dtype(value[, skipna])	Return a string label of the type of the elements in a list-like input.
api.types.pandas_dtype(dtype)	Convert input into a pandas only dtype object or a numpy dtype object.

Data type introspection

Function	Description
api.types.is_any_real_numeric_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a real number dtype.
api.types.is_bool_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a boolean dtype.
api.types.is_categorical_dtype(arr_or_dtype)	(DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype.
api.types.is_complex_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a complex dtype.
api.types.is_datetime64_any_dtype(arr_or_dtype)	Check whether the provided array or dtype is of the datetime64 dtype.
api.types.is_datetime64_dtype(arr_or_dtype)	Check whether an array-like or dtype is of the datetime64 dtype.
api.types.is_datetime64_ns_dtype(arr_or_dtype)	Check whether the provided array or dtype is of the datetime64[ns] dtype.
api.types.is_datetime64tz_dtype(arr_or_dtype)	(DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype.
api.types.is_dtype_equal(source, target)	Check if two dtypes are equal.
api.types.is_extension_array_dtype(arr_or_dtype)	Check if an object is a pandas extension array type.
api.types.is_float_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a float dtype.
api.types.is_int64_dtype(arr_or_dtype)	(DEPRECATED) Check whether the provided array or dtype is of the int64 dtype.
api.types.is_integer_dtype(arr_or_dtype)	Check whether the provided array or dtype is of an integer dtype.
api.types.is_interval_dtype(arr_or_dtype)	(DEPRECATED) Check whether an array-like or dtype is of the Interval dtype.
api.types.is_numeric_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a numeric dtype.
api.types.is_object_dtype(arr_or_dtype)	Check whether an array-like or dtype is of the object dtype.
api.types.is_period_dtype(arr_or_dtype)	(DEPRECATED) Check whether an array-like or dtype is of the Period dtype.
api.types.is_signed_integer_dtype(arr_or_dtype)	Check whether the provided array or dtype is of a signed integer dtype.
api.types.is_string_dtype(arr_or_dtype)	Check whether the provided array or dtype is of the string dtype.
api.types.is_timedelta64_dtype(arr_or_dtype)	Check whether an array-like or dtype is of the timedelta64 dtype.
api.types.is_timedelta64_ns_dtype(arr_or_dtype)	Check whether the provided array or dtype is of the timedelta64[ns] dtype.
api.types.is_unsigned_integer_dtype(arr_or_dtype)	Check whether the provided array or dtype is of an unsigned integer dtype.
api.types.is_sparse(arr)	(DEPRECATED) Check whether an array-like is a 1-D pandas sparse array.

Iterable introspection

Function	Description
api.types.is_dict_like(obj)	Check if the object is dict-like.
api.types.is_file_like(obj)	Check if the object is a file-like object.
api.types.is_list_like(obj[, allow_sets])	Check if the object is list-like.
api.types.is_named_tuple(obj)	Check if the object is a named tuple.
api.types.is_iterator(obj)	Check if the object is an iterator.

Scalar introspection

Function	Description
api.types.is_bool(obj)	Return True if given object is boolean.
api.types.is_complex(obj)	Return True if given object is complex.
api.types.is_float(obj)	Return True if given object is float.
api.types.is_hashable(obj[, allow_slice])	Return True if hash(obj) will succeed, False otherwise.
api.types.is_integer(obj)	Return True if given object is integer.
api.types.is_number(obj)	Check if the object is a number.
api.types.is_re(obj)	Check if the object is a regex pattern instance.
api.types.is_re_compilable(obj)	Check if the object can be compiled into a regex pattern instance.
api.types.is_scalar(val)	Return True if given object is scalar.

Examples for `pandas.array`

If a dtype is not specified, pandas will infer the best dtype from the values. See the description of dtype for the types pandas infers for.

>>> pd.array([1, 2])
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64

>>> pd.array([1, 2, np.nan])
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64

>>> pd.array([1.1, 2.2])
<FloatingArray>
[1.1, 2.2]
Length: 2, dtype: Float64

>>> pd.array(["a", None, "c"])
<ArrowStringArray>
['a', <NA>, 'c']
Length: 3, dtype: string

>>> with pd.option_context("string_storage", "python"):
...     arr = pd.array(["a", None, "c"])
>>> arr
<StringArray>
['a', <NA>, 'c']
Length: 3, dtype: string

>>> pd.array([pd.Period("2000", freq="D"), pd.Period("2000", freq="D")])
<PeriodArray>
['2000-01-01', '2000-01-01']
Length: 2, dtype: period[D]

You can use the string alias for dtype

>>> pd.array(["a", "b", "a"], dtype="category")
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Or specify the actual dtype

>>> pd.array(
...     ["a", "b", "a"], dtype=pd.CategoricalDtype(["a", "b", "c"], ordered=True)
... )
['a', 'b', 'a']
Categories (3, str): ['a' < 'b' < 'c']

If pandas does not infer a dedicated extension type a arrays.NumpyExtensionArray is returned.

>>> pd.array([1 + 1j, 3 + 2j])
<NumpyExtensionArray>
[(1+1j), (3+2j)]
Length: 2, dtype: complex128

As mentioned in the “Notes” section, new extension types may be added in the future (by pandas or 3rd party libraries), causing the return value to no longer be a arrays.NumpyExtensionArray. Specify the dtype as a NumPy dtype if you need to ensure there’s no future change in behavior.

>>> pd.array([1, 2], dtype=np.dtype("int32"))
<NumpyExtensionArray>
[1, 2]
Length: 2, dtype: int32

data must be 1-dimensional. A ValueError is raised when the input has the wrong dimensionality.

>>> pd.array(1)
Traceback (most recent call last):
  ...
ValueError: Cannot pass scalar '1' to 'pandas.array'.

Examples for `pandas.arrays.ArrowExtensionArray`

Create an ArrowExtensionArray with pandas.array():

>>> pd.array([1, 1, None], dtype="int64[pyarrow]")
<ArrowExtensionArray>
[1, 1, <NA>]
Length: 3, dtype: int64[pyarrow]

Examples for `pandas.ArrowDtype`

>>> import pyarrow as pa
>>> pd.ArrowDtype(pa.int64())
int64[pyarrow]

Types with parameters must be constructed with ArrowDtype.

>>> pd.ArrowDtype(pa.timestamp("s", tz="America/New_York"))
timestamp[s, tz=America/New_York][pyarrow]
>>> pd.ArrowDtype(pa.list_(pa.int64()))
list<item: int64>[pyarrow]

Examples for `pandas.Timestamp`

Using the primary calling convention:

This converts a datetime-like string

>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')

This converts a float representing a Unix epoch in units of seconds

>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')

This converts an int representing a Unix-epoch in units of weeks

>>> pd.Timestamp(1535, unit='W')
Timestamp('1999-06-03 00:00:00')

This converts an int representing a Unix-epoch in units of seconds and for a particular timezone

>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')

Using the other two forms that mimic the API for datetime.datetime:

>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')

>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00')

Examples for `pandas.Timestamp.asm8`

>>> ts = pd.Timestamp(2020, 3, 14, 15)
>>> ts.asm8
numpy.datetime64('2020-03-14T15:00:00.000000')

Examples for `pandas.Timestamp.day`

>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.day
31

Examples for `pandas.Timestamp.dayofweek`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5

Examples for `pandas.Timestamp.day_of_week`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5

Examples for `pandas.Timestamp.dayofyear`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74

Examples for `pandas.Timestamp.day_of_year`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74

Examples for `pandas.Timestamp.days_in_month`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31

Examples for `pandas.Timestamp.daysinmonth`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31

Examples for `pandas.Timestamp.fold`

>>> ts = pd.Timestamp("2024-11-03 01:30:00")
>>> ts.fold
0

Examples for `pandas.Timestamp.hour`

>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.hour
16

Examples for `pandas.Timestamp.is_leap_year`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_leap_year
True

Examples for `pandas.Timestamp.is_month_end`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_end
False

>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_month_end
True

Examples for `pandas.Timestamp.is_month_start`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_start
False

>>> ts = pd.Timestamp(2020, 1, 1)
>>> ts.is_month_start
True

Examples for `pandas.Timestamp.is_quarter_end`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_end
False

>>> ts = pd.Timestamp(2020, 3, 31)
>>> ts.is_quarter_end
True

Examples for `pandas.Timestamp.is_quarter_start`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_start
False

>>> ts = pd.Timestamp(2020, 4, 1)
>>> ts.is_quarter_start
True

Examples for `pandas.Timestamp.is_year_end`

>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_year_end
False

>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_year_end
True

Index objects

Index

Many of these methods or variants thereof are available on the objects that contain an index (Series/DataFrame) and those should most likely be used before calling these methods directly.

Function	Description
Index([data, dtype, copy, name, tupleize_cols])	Immutable sequence used for indexing and alignment.

Properties

Function	Description
Index.values	Return an array representing the data in the Index.
Index.is_monotonic_increasing	Return a boolean if the values are equal or increasing.
Index.is_monotonic_decreasing	Return a boolean if the values are equal or decreasing.
Index.is_unique	Return if the index has unique values.
Index.has_duplicates	Check if the Index has duplicate values.
Index.hasnans	Return True if there are any NaNs.
Index.dtype	Return the dtype object of the underlying data.
Index.inferred_type	Return a string of the type inferred from the values.
Index.shape	Return a tuple of the shape of the underlying data.
Index.name	Return Index or MultiIndex name.
Index.names	Get names on index.
Index.nbytes	Return the number of bytes in the underlying data.
Index.ndim	Number of dimensions of the underlying data, by definition 1.
Index.size	Return the number of elements in the underlying data.
Index.empty	Indicator whether Index is empty.
Index.T	Return the transpose, which is by definition self.
Index.memory_usage([deep])	Memory usage of the values.
Index.array	The ExtensionArray of the data backing this Index.

Modifying and computations

Function	Description
Index.all(args, *kwargs)	Return whether all elements are Truthy.
Index.any(args, *kwargs)	Return whether any element is Truthy.
Index.argmin([axis, skipna])	Return int position of the smallest value in the Index.
Index.argmax([axis, skipna])	Return int position of the largest value in the Index.
Index.copy([name, deep])	Make a copy of this object.
Index.delete(loc)	Make new Index with passed location(-s) deleted.
Index.drop(labels[, errors])	Make new Index with passed list of labels deleted.
Index.drop_duplicates(*[, keep])	Return Index with duplicate values removed.
Index.duplicated([keep])	Indicate duplicate index values.
Index.equals(other)	Determine if two Index object are equal.
Index.factorize([sort, use_na_sentinel])	Encode the object as an enumerated type or categorical variable.
Index.identical(other)	Similar to equals, but checks that object attributes and types are also equal.
Index.insert(loc, item)	Make new Index inserting new item at location.
Index.is_(other)	More flexible, faster check like `is` but that works through views.
Index.min([axis, skipna])	Return the minimum value of the Index.
Index.max([axis, skipna])	Return the maximum value of the Index.
Index.reindex(target[, method, level, ...])	Create index with target's values.
Index.rename(name, *[, inplace])	Alter Index or MultiIndex name.
Index.repeat(repeats[, axis])	Repeat elements of an Index.
Index.where(cond[, other])	Replace values where the condition is False.
Index.take(indices[, axis, allow_fill, ...])	Return a new Index of the values selected by the indices.
Index.putmask(mask, value)	Return a new Index of the values set with the mask.
Index.unique([level])	Return unique values in the index.
Index.nunique([dropna])	Return number of unique elements in the object.
Index.value_counts([normalize, sort, ...])	Return a Series containing counts of unique values.

Compatibility with MultiIndex

Function	Description
Index.set_names(names, *[, level, inplace])	Set Index or MultiIndex name.
Index.droplevel([level])	Return index with requested level(s) removed.

Missing values

Function	Description
Index.fillna(value)	Fill NA/NaN values with the specified value.
Index.dropna([how])	Return Index without NA/NaN values.
Index.isna()	Detect missing values.
Index.notna()	Detect existing (non-missing) values.

Conversion

Function	Description
Index.astype(dtype[, copy])	Create an Index with values cast to dtypes.
Index.infer_objects([copy])	If we have an object dtype, try to infer a non-object dtype.
Index.item()	Return the first element of the underlying data as a Python scalar.
Index.map(mapper[, na_action])	Map values using an input mapping or function.
Index.ravel([order])	Return a view on self.
Index.to_list()	Return a list of the values.
Index.to_series([index, name])	Create a Series with both index and values equal to the index keys.
Index.to_frame([index, name])	Create a DataFrame with a column containing the Index.
Index.to_numpy([dtype, copy, na_value])	A NumPy ndarray representing the values in this Series or Index.
Index.view([cls])	Return a view of the Index with the specified dtype or a new Index instance.

Sorting

Function	Description
Index.argsort(args, *kwargs)	Return the integer indices that would sort the index.
Index.searchsorted(value[, side, sorter])	Find indices where elements should be inserted to maintain order.
Index.sort_values(*[, return_indexer, ...])	Return a sorted copy of the index.

Time-specific operations

Function	Description
Index.shift([periods, freq])	Shift index by desired number of time frequency increments.

Combining / joining / set operations

Function	Description
Index.append(other)	Append a collection of Index options together.
Index.join(other, *[, how, level, ...])	Compute join_index and indexers to conform data structures to the new index.
Index.intersection(other[, sort])	Form the intersection of two Index objects.
Index.union(other[, sort])	Form the union of two Index objects.
Index.difference(other[, sort])	Return a new Index with elements of index not in other.
Index.symmetric_difference(other[, ...])	Compute the symmetric difference of two Index objects.

Selecting

Function	Description
Index.asof(label)	Return the label from the index, or, if not present, the previous one.
Index.asof_locs(where, mask)	Return the locations (indices) of labels in the index.
Index.get_indexer(target[, method, limit, ...])	Compute indexer and mask for new index given the current index.
Index.get_indexer_for(target)	Guaranteed return of an indexer even when non-unique.
Index.get_indexer_non_unique(target)	Compute indexer and mask for new index given the current index.
Index.get_level_values(level)	Return an Index of values for requested level.
Index.get_loc(key)	Get integer location, slice or boolean mask for requested label.
Index.get_slice_bound(label, side)	Calculate slice bound that corresponds to given label.
Index.isin(values[, level])	Return a boolean array where the index values are in values.
Index.slice_indexer([start, end, step])	Compute the slice indexer for input labels and step.
Index.slice_locs([start, end, step])	Compute slice locations for input labels.

Numeric Index

Function	Description
RangeIndex([start, stop, step, dtype, copy, ...])	Immutable Index implementing a monotonic integer range.

Function	Description
RangeIndex.start	The value of the start parameter (`0` if this was not supplied).
RangeIndex.stop	The value of the stop parameter.
RangeIndex.step	The value of the step parameter (`1` if this was not supplied).
RangeIndex.from_range(data[, name, dtype])	Create `pandas.RangeIndex` from a `range` object.

CategoricalIndex

Function	Description
CategoricalIndex([data, categories, ...])	Index based on an underlying `Categorical`.

Categorical components

Function	Description
CategoricalIndex.append(other)	Append a collection of Index options together.
CategoricalIndex.codes	The category codes of this categorical index.
CategoricalIndex.categories	The categories of this categorical.
CategoricalIndex.ordered	Whether the categories have an ordered relationship.
CategoricalIndex.rename_categories(...)	Rename categories.
CategoricalIndex.reorder_categories(...[, ...])	Reorder categories as specified in new_categories.
CategoricalIndex.add_categories(new_categories)	Add new categories.
CategoricalIndex.remove_categories(removals)	Remove the specified categories.
CategoricalIndex.remove_unused_categories()	Remove categories which are not used.
CategoricalIndex.set_categories(new_categories)	Set the categories to the specified new categories.
CategoricalIndex.as_ordered()	Set the Categorical to be ordered.
CategoricalIndex.as_unordered()	Set the Categorical to be unordered.

Modifying and computations

Function	Description
CategoricalIndex.map(mapper[, na_action])	Map values using input an input mapping or function.
CategoricalIndex.equals(other)	Determine if two CategoricalIndex objects contain the same elements.

IntervalIndex

Function	Description
IntervalIndex(data[, closed, dtype, copy, ...])	Immutable index of intervals that are closed on the same side.

IntervalIndex components

Function	Description
IntervalIndex.from_arrays(left, right[, ...])	Construct from two arrays defining the left and right bounds.
IntervalIndex.from_tuples(data[, closed, ...])	Construct an IntervalIndex from an array-like of tuples.
IntervalIndex.from_breaks(breaks[, closed, ...])	Construct an IntervalIndex from an array of splits.
IntervalIndex.left	Return left bounds of the intervals in the IntervalIndex.
IntervalIndex.right	Return right bounds of the intervals in the IntervalIndex.
IntervalIndex.mid	Return the midpoint of each interval in the IntervalIndex as an Index.
IntervalIndex.closed	String describing the inclusive side the intervals.
IntervalIndex.length	Calculate the length of each interval in the IntervalIndex.
IntervalIndex.values	Return an array representing the data in the Index.
IntervalIndex.is_empty	Indicates if an interval is empty, meaning it contains no points.
IntervalIndex.is_non_overlapping_monotonic	Return a boolean whether the IntervalArray/IntervalIndex is non-overlapping and monotonic.
IntervalIndex.is_overlapping	Return True if the IntervalIndex has overlapping intervals, else False.
IntervalIndex.get_loc(key)	Get integer location, slice or boolean mask for requested label.
IntervalIndex.get_indexer(target[, method, ...])	Compute indexer and mask for new index given the current index.
IntervalIndex.set_closed(closed)	Return an identical IntervalArray closed on the specified side.
IntervalIndex.contains(other)	Check elementwise if the Intervals contain the value.
IntervalIndex.overlaps(other)	Check elementwise if an Interval overlaps the values in the IntervalArray.
IntervalIndex.to_tuples([na_tuple])	Return an ndarray (if self is IntervalArray) or Index (if self is IntervalIndex) of tuples of the form (left, right).

MultiIndex

Function	Description
MultiIndex([levels, codes, sortorder, ...])	A multi-level, or hierarchical, index object for pandas objects.

MultiIndex constructors

Function	Description
MultiIndex.from_arrays(arrays[, sortorder, ...])	Convert arrays to MultiIndex.
MultiIndex.from_tuples(tuples[, sortorder, ...])	Convert list of tuples to MultiIndex.
MultiIndex.from_product(iterables[, ...])	Make a MultiIndex from the cartesian product of multiple iterables.
MultiIndex.from_frame(df[, sortorder, names])	Make a MultiIndex from a DataFrame.

MultiIndex properties

Function	Description
MultiIndex.names	Names of levels in MultiIndex.
MultiIndex.levels	Levels of the MultiIndex.
MultiIndex.codes	Codes of the MultiIndex.
MultiIndex.nlevels	Integer number of levels in this MultiIndex.
MultiIndex.levshape	A tuple representing the length of each level in the MultiIndex.
MultiIndex.dtypes	Return the dtypes as a Series for the underlying MultiIndex.

MultiIndex components

Function	Description
MultiIndex.set_levels(levels, *[, level, ...])	Set new levels on MultiIndex.
MultiIndex.set_codes(codes, *[, level, ...])	Set new codes on MultiIndex.
MultiIndex.to_flat_index()	Convert a MultiIndex to an Index of Tuples containing the level values.
MultiIndex.to_frame([index, name, ...])	Create a DataFrame with the levels of the MultiIndex as columns.
MultiIndex.sortlevel([level, ascending, ...])	Sort MultiIndex at the requested level.
MultiIndex.droplevel([level])	Return index with requested level(s) removed.
MultiIndex.swaplevel([i, j])	Swap level i with level j.
MultiIndex.reorder_levels(order)	Rearrange levels using input order.
MultiIndex.remove_unused_levels()	Create new MultiIndex from current that removes unused levels.
MultiIndex.drop(codes[, level, errors])	Make a new `pandas.MultiIndex` with the passed list of codes deleted.
MultiIndex.copy([names, deep, name])	Make a copy of this object.
MultiIndex.append(other)	Append a collection of Index options together.
MultiIndex.truncate([before, after])	Slice index between two labels / tuples, return new MultiIndex.

MultiIndex selecting

Function	Description
MultiIndex.get_loc(key)	Get location for a label or a tuple of labels.
MultiIndex.get_locs(seq)	Get location for a sequence of labels.
MultiIndex.get_loc_level(key[, level, ...])	Get location and sliced index for requested label(s)/level(s).
MultiIndex.get_indexer(target[, method, ...])	Compute indexer and mask for new index given the current index.
MultiIndex.get_level_values(level)	Return vector of label values for requested level.

Function	Description
IndexSlice	Create an object to more easily perform multi-index slicing.

DatetimeIndex

Function	Description
DatetimeIndex([data, freq, tz, ambiguous, ...])	Immutable ndarray-like of datetime64 data.

Time/date components

Function	Description
DatetimeIndex.year	The year of the datetime.
DatetimeIndex.month	The month as January=1, December=12.
DatetimeIndex.day	The day of the datetime.
DatetimeIndex.hour	The hours of the datetime.
DatetimeIndex.minute	The minutes of the datetime.
DatetimeIndex.second	The seconds of the datetime.
DatetimeIndex.microsecond	The microseconds of the datetime.
DatetimeIndex.nanosecond	The nanoseconds of the datetime.
DatetimeIndex.date	Returns numpy array of python `datetime.date` objects.
DatetimeIndex.time	Returns numpy array of `datetime.time` objects.
DatetimeIndex.timetz	Returns numpy array of `datetime.time` objects with timezones.
DatetimeIndex.dayofyear	The ordinal day of the year.
DatetimeIndex.day_of_year	The ordinal day of the year.
DatetimeIndex.dayofweek	The day of the week with Monday=0, Sunday=6.
DatetimeIndex.day_of_week	The day of the week with Monday=0, Sunday=6.
DatetimeIndex.weekday	The day of the week with Monday=0, Sunday=6.
DatetimeIndex.quarter	The quarter of the date.
DatetimeIndex.tz	Return the timezone.
DatetimeIndex.freq	Return the frequency object if it is set, otherwise None.
DatetimeIndex.freqstr	Return the frequency object as a string if it's set, otherwise None.
DatetimeIndex.is_month_start	Indicates whether the date is the first day of the month.
DatetimeIndex.is_month_end	Indicates whether the date is the last day of the month.
DatetimeIndex.is_quarter_start	Indicator for whether the date is the first day of a quarter.
DatetimeIndex.is_quarter_end	Indicator for whether the date is the last day of a quarter.
DatetimeIndex.is_year_start	Indicate whether the date is the first day of a year.
DatetimeIndex.is_year_end	Indicate whether the date is the last day of the year.
DatetimeIndex.is_leap_year	Boolean indicator if the date belongs to a leap year.
DatetimeIndex.inferred_freq	Return the inferred frequency of the index.

Selecting

Function	Description
DatetimeIndex.indexer_at_time(time[, asof])	Return index locations of values at particular time of day.
DatetimeIndex.indexer_between_time(...[, ...])	Return index locations of values between particular times of day.

Time-specific operations

Function	Description
DatetimeIndex.normalize()	Convert times to midnight.
DatetimeIndex.strftime(date_format)	Convert to Index using specified date_format.
DatetimeIndex.snap([freq])	Snap time stamps to nearest occurring frequency.
DatetimeIndex.tz_convert(tz)	Convert tz-aware Datetime Array/Index from one time zone to another.
DatetimeIndex.tz_localize(tz[, ambiguous, ...])	Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
DatetimeIndex.round(freq[, ambiguous, ...])	Perform round operation on the data to the specified freq.
DatetimeIndex.floor(freq[, ambiguous, ...])	Perform floor operation on the data to the specified freq.
DatetimeIndex.ceil(freq[, ambiguous, ...])	Perform ceil operation on the data to the specified freq.
DatetimeIndex.month_name([locale])	Return the month names with specified locale.
DatetimeIndex.day_name([locale])	Return the day names with specified locale.

Conversion

Function	Description
DatetimeIndex.as_unit(unit[, round_ok])	Convert to a dtype with the given unit resolution.
DatetimeIndex.to_period([freq])	Cast to PeriodArray/PeriodIndex at a particular frequency.
DatetimeIndex.to_pydatetime()	Return an ndarray of `datetime.datetime` objects.
DatetimeIndex.to_series([index, name])	Create a Series with both index and values equal to the index keys.
DatetimeIndex.to_frame([index, name])	Create a DataFrame with a column containing the Index.
DatetimeIndex.to_julian_date()	Convert TimeStamp to a Julian Date.

Methods

Function	Description
DatetimeIndex.mean(*[, skipna, axis])	Return the mean value of the Array.
DatetimeIndex.std([axis, dtype, out, ddof, ...])	Return sample standard deviation over requested axis.

TimedeltaIndex

Function	Description
TimedeltaIndex([data, freq, dtype, copy, name])	Immutable Index of timedelta64 data.

Components

Function	Description
TimedeltaIndex.days	Number of days for each element.
TimedeltaIndex.seconds	Number of seconds (>= 0 and less than 1 day) for each element.
TimedeltaIndex.microseconds	Number of microseconds (>= 0 and less than 1 second) for each element.
TimedeltaIndex.nanoseconds	Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.
TimedeltaIndex.components	Return a DataFrame of the individual resolution components of the Timedeltas.
TimedeltaIndex.inferred_freq	Return the inferred frequency of the index.

Conversion

Function	Description
TimedeltaIndex.as_unit(unit)	Convert to a dtype with the given unit resolution.
TimedeltaIndex.to_pytimedelta()	Return an ndarray of datetime.timedelta objects.
TimedeltaIndex.to_series([index, name])	Create a Series with both index and values equal to the index keys.
TimedeltaIndex.round(freq[, ambiguous, ...])	Perform round operation on the data to the specified freq.
TimedeltaIndex.floor(freq[, ambiguous, ...])	Perform floor operation on the data to the specified freq.
TimedeltaIndex.ceil(freq[, ambiguous, ...])	Perform ceil operation on the data to the specified freq.
TimedeltaIndex.to_frame([index, name])	Create a DataFrame with a column containing the Index.

Methods

Function	Description
TimedeltaIndex.mean(*[, skipna, axis])	Return the mean value of the Array.

PeriodIndex

Function	Description
PeriodIndex([data, freq, dtype, copy, name])	Immutable ndarray holding ordinal values indicating regular periods in time.

Properties

Function	Description
PeriodIndex.day	The days of the period.
PeriodIndex.dayofweek	The day of the week with Monday=0, Sunday=6.
PeriodIndex.day_of_week	The day of the week with Monday=0, Sunday=6.
PeriodIndex.dayofyear	The ordinal day of the year.
PeriodIndex.day_of_year	The ordinal day of the year.
PeriodIndex.days_in_month	The number of days in the month.
PeriodIndex.daysinmonth	The number of days in the month.
PeriodIndex.end_time	Get the Timestamp for the end of the period.
PeriodIndex.freq	Return the frequency object if it is set, otherwise None.
PeriodIndex.freqstr	Return the frequency object as a string if it's set, otherwise None.
PeriodIndex.hour	The hour of the period.
PeriodIndex.is_leap_year	Logical indicating if the date belongs to a leap year.
PeriodIndex.minute	The minute of the period.
PeriodIndex.month	The month as January=1, December=12.
PeriodIndex.quarter	The quarter of the date.
PeriodIndex.qyear	Fiscal year the Period lies in according to its starting-quarter.
PeriodIndex.second	The second of the period.
PeriodIndex.start_time	Get the Timestamp for the start of the period.
PeriodIndex.week	The week ordinal of the year.
PeriodIndex.weekday	The day of the week with Monday=0, Sunday=6.
PeriodIndex.weekofyear	The week ordinal of the year.
PeriodIndex.year	The year of the period.

Methods

Function	Description
PeriodIndex.asfreq([freq, how])	Convert the PeriodArray to the specified frequency freq.
PeriodIndex.strftime(date_format)	Convert to Index using specified date_format.
PeriodIndex.to_timestamp([freq, how])	Cast to DatetimeArray/Index.
PeriodIndex.from_fields(*[, year, quarter, ...])	Construct a PeriodIndex from fields (year, month, day, etc.).
PeriodIndex.from_ordinals(ordinals, *, freq)	Construct a PeriodIndex from ordinals.

Examples for `pandas.Index`

>>> pd.Index([1, 2, 3])
Index([1, 2, 3], dtype='int64')

>>> pd.Index(list("abc"))
Index(['a', 'b', 'c'], dtype='str')

>>> pd.Index([1, 2, 3], dtype="uint8")
Index([1, 2, 3], dtype='uint8')

Examples for `pandas.Index.values`

For pandas.Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.values
array([1, 2, 3])

For pandas.IntervalIndex:

>>> idx = pd.interval_range(start=0, end=5)
>>> idx.values
<IntervalArray>
[(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]]
Length: 5, dtype: interval[int64, right]

Examples for `pandas.Index.is_monotonic_increasing`

>>> pd.Index([1, 2, 3]).is_monotonic_increasing
True
>>> pd.Index([1, 2, 2]).is_monotonic_increasing
True
>>> pd.Index([1, 3, 2]).is_monotonic_increasing
False

Examples for `pandas.Index.is_monotonic_decreasing`

>>> pd.Index([3, 2, 1]).is_monotonic_decreasing
True
>>> pd.Index([3, 2, 2]).is_monotonic_decreasing
True
>>> pd.Index([3, 1, 2]).is_monotonic_decreasing
False

Examples for `pandas.Index.is_unique`

>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.is_unique
False

>>> idx = pd.Index([1, 5, 7])
>>> idx.is_unique
True

>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
...     "category"
... )
>>> idx.is_unique
False

>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.is_unique
True

Examples for `pandas.Index.has_duplicates`

>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True

>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False

>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
...     "category"
... )
>>> idx.has_duplicates
True

>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.has_duplicates
False

Examples for `pandas.Index.hasnans`

>>> s = pd.Series([1, 2, 3], index=["a", "b", None])
>>> s
a    1
b    2
None 3
dtype: int64
>>> s.index.hasnans
True

Examples for `pandas.Index.dtype`

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.dtype
dtype('int64')

Examples for `pandas.Index.inferred_type`

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.inferred_type
'integer'

Examples for `pandas.Index.shape`

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.shape
(3,)

Examples for `pandas.Index.name`

>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx
Index([1, 2, 3], dtype='int64',  name='x')
>>> idx.name
'x'

Examples for `pandas.Index.names`

>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx.names
FrozenList(['x'])

>>> idx = pd.Index([1, 2, 3], name=("x", "y"))
>>> idx.names
FrozenList([('x', 'y')])

If the index does not have a name set:

>>> idx = pd.Index([1, 2, 3])
>>> idx.names
FrozenList([None])

Examples for `pandas.Index.nbytes`

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.nbytes
34

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24

Examples for `pandas.Index.ndim`

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.ndim
1

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1

Examples for `pandas.Index.size`

For Series:

>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.size
3

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3

Examples for `pandas.Index.empty`

>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False

>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty!

>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False

Examples for `pandas.Index.T`

For Series:

>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0     Ant
1    Bear
2     Cow
dtype: str
>>> s.T
0     Ant
1    Bear
2     Cow
dtype: str

For Index:

>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')

Examples for `pandas.Index.memory_usage`

>>> idx = pd.Index([1, 2, 3])
>>> idx.memory_usage()
24

Examples for `pandas.Index.array`

For regular NumPy types like int, and float, a NumpyExtensionArray is returned.

>>> pd.Index([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64

For extension types, like Categorical, the actual ExtensionArray is returned

>>> idx = pd.Index(pd.Categorical(["a", "b", "a"]))
>>> idx.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']

Examples for `pandas.Index.all`

True, because nonzero integers are considered True.

>>> pd.Index([1, 2, 3]).all()
True

False, because 0 is considered False.

>>> pd.Index([0, 1, 2]).all()
False

GroupBy

pandas.api.typing.DataFrameGroupBy and pandas.api.typing.SeriesGroupBy instances are returned by groupby calls pandas.DataFrame.groupby() and pandas.Series.groupby() respectively.

Indexing, iteration

Function	Description
DataFrameGroupBy.iter()	Groupby iterator.
SeriesGroupBy.iter()	Groupby iterator.
DataFrameGroupBy.groups	Dict {group name -> group labels}.
SeriesGroupBy.groups	Dict {group name -> group labels}.
DataFrameGroupBy.indices	Dict {group name -> group indices}.
SeriesGroupBy.indices	Dict {group name -> group indices}.
DataFrameGroupBy.get_group(name)	Construct DataFrame from group with provided name.
SeriesGroupBy.get_group(name)	Construct DataFrame from group with provided name.

Function	Description
Grouper(args, *kwargs)	A Grouper allows the user to specify a groupby instruction for an object.

Function application helper

Function	Description
NamedAgg(column, aggfunc, args, *kwargs)	Helper for column specific aggregation with control over output column names.

Function application

Function	Description
SeriesGroupBy.apply(func, args, *kwargs)	Apply function `func` group-wise and combine the results together.
DataFrameGroupBy.apply(func, *args[, ...])	Apply function `func` group-wise and combine the results together.
SeriesGroupBy.agg([func, engine, engine_kwargs])	Aggregate using one or more operations.
DataFrameGroupBy.agg([func, engine, ...])	Aggregate using one or more operations.
SeriesGroupBy.aggregate([func, engine, ...])	Aggregate using one or more operations.
DataFrameGroupBy.aggregate([func, engine, ...])	Aggregate using one or more operations.
SeriesGroupBy.transform(func, *args[, ...])	Call function producing a same-indexed Series on each group.
DataFrameGroupBy.transform(func, *args[, ...])	Call function producing a same-indexed DataFrame on each group.
SeriesGroupBy.pipe(func, args, *kwargs)	Apply a `func` with arguments to this GroupBy object and return its result.
DataFrameGroupBy.pipe(func, args, *kwargs)	Apply a `func` with arguments to this GroupBy object and return its result.
DataFrameGroupBy.filter(func[, dropna])	Filter elements from groups that don't satisfy a criterion.
SeriesGroupBy.filter(func[, dropna])	Filter elements from groups that don't satisfy a criterion.

`DataFrameGroupBy` computations / descriptive stats

Function	Description
DataFrameGroupBy.all([skipna])	Return True if all values in the group are truthful, else False.
DataFrameGroupBy.any([skipna])	Return True if any value in the group is truthful, else False.
DataFrameGroupBy.bfill([limit])	Backward fill the values.
DataFrameGroupBy.corr([method, min_periods, ...])	Compute pairwise correlation of columns, excluding NA/null values.
DataFrameGroupBy.corrwith(other[, drop, ...])	(DEPRECATED) Compute pairwise correlation.
DataFrameGroupBy.count()	Compute count of group, excluding missing values.
DataFrameGroupBy.cov([min_periods, ddof, ...])	Compute pairwise covariance of columns, excluding NA/null values.
DataFrameGroupBy.cumcount([ascending])	Number each item in each group from 0 to the length of that group - 1.
DataFrameGroupBy.cummax([numeric_only])	Cumulative max for each group.
DataFrameGroupBy.cummin([numeric_only])	Cumulative min for each group.
DataFrameGroupBy.cumprod([numeric_only])	Cumulative product for each group.
DataFrameGroupBy.cumsum([numeric_only])	Cumulative sum for each group.
DataFrameGroupBy.describe([percentiles, ...])	Generate descriptive statistics.
DataFrameGroupBy.diff([periods])	First discrete difference of element.
DataFrameGroupBy.ewm([com, span, halflife, ...])	Return an ewm grouper, providing ewm functionality per group.
DataFrameGroupBy.expanding([min_periods, method])	Return an expanding grouper, providing expanding functionality per group.
DataFrameGroupBy.ffill([limit])	Forward fill the values.
DataFrameGroupBy.first([numeric_only, ...])	Compute the first entry of each column within each group.
DataFrameGroupBy.head([n])	Return first n rows of each group.
DataFrameGroupBy.idxmax([skipna, numeric_only])	Return index of first occurrence of maximum in each group.
DataFrameGroupBy.idxmin([skipna, numeric_only])	Return index of first occurrence of minimum in each group.
DataFrameGroupBy.last([numeric_only, ...])	Compute the last entry of each column within each group.
DataFrameGroupBy.max([numeric_only, ...])	Compute max of group values.
DataFrameGroupBy.mean([numeric_only, ...])	Compute mean of groups, excluding missing values.
DataFrameGroupBy.median([numeric_only, skipna])	Compute median of groups, excluding missing values.
DataFrameGroupBy.min([numeric_only, ...])	Compute min of group values.
DataFrameGroupBy.ngroup([ascending])	Number each group from 0 to the number of groups - 1.
DataFrameGroupBy.nth	Take the nth row from each group if n is an int, otherwise a subset of rows.
DataFrameGroupBy.nunique([dropna])	Return DataFrame with counts of unique elements in each position.
DataFrameGroupBy.ohlc()	Compute open, high, low and close values of a group, excluding missing values.
DataFrameGroupBy.pct_change([periods, ...])	Calculate pct_change of each value to previous entry in group.
DataFrameGroupBy.prod([numeric_only, ...])	Compute prod of group values.
DataFrameGroupBy.quantile([q, ...])	Return group values at the given quantile, a la numpy.percentile.
DataFrameGroupBy.rank([method, ascending, ...])	Provide the rank of values within each group.
DataFrameGroupBy.resample(rule, *args[, ...])	Provide resampling when using a TimeGrouper.
DataFrameGroupBy.rolling(window[, ...])	Return a rolling grouper, providing rolling functionality per group.
DataFrameGroupBy.sample([n, frac, replace, ...])	Return a random sample of items from each group.
DataFrameGroupBy.sem([ddof, numeric_only, ...])	Compute standard error of the mean of groups, excluding missing values.
DataFrameGroupBy.shift([periods, freq, ...])	Shift each group by periods observations.
DataFrameGroupBy.size()	Compute group sizes.
DataFrameGroupBy.skew([skipna, numeric_only])	Return unbiased skew within groups.
DataFrameGroupBy.kurt([skipna, numeric_only])	Return unbiased kurtosis within groups.
DataFrameGroupBy.std([ddof, engine, ...])	Compute standard deviation of groups, excluding missing values.
DataFrameGroupBy.sum([numeric_only, ...])	Compute sum of group values.
DataFrameGroupBy.var([ddof, engine, ...])	Compute variance of groups, excluding missing values.
DataFrameGroupBy.tail([n])	Return last n rows of each group.
DataFrameGroupBy.take(indices, **kwargs)	Return the elements in the given positional indices in each group.
DataFrameGroupBy.value_counts([subset, ...])	Return a Series or DataFrame containing counts of unique rows.

`SeriesGroupBy` computations / descriptive stats

Function	Description
SeriesGroupBy.all([skipna])	Return True if all values in the group are truthful, else False.
SeriesGroupBy.any([skipna])	Return True if any value in the group is truthful, else False.
SeriesGroupBy.bfill([limit])	Backward fill the values.
SeriesGroupBy.corr(other[, method, min_periods])	Compute correlation between each group and another Series.
SeriesGroupBy.count()	Compute count of group, excluding missing values.
SeriesGroupBy.cov(other[, min_periods, ddof])	Compute covariance between each group and another Series.
SeriesGroupBy.cumcount([ascending])	Number each item in each group from 0 to the length of that group - 1.
SeriesGroupBy.cummax([numeric_only])	Cumulative max for each group.
SeriesGroupBy.cummin([numeric_only])	Cumulative min for each group.
SeriesGroupBy.cumprod([numeric_only])	Cumulative product for each group.
SeriesGroupBy.cumsum([numeric_only])	Cumulative sum for each group.
SeriesGroupBy.describe([percentiles, ...])	Generate descriptive statistics.
SeriesGroupBy.diff([periods])	First discrete difference of element.
SeriesGroupBy.ewm([com, span, halflife, ...])	Return an ewm grouper, providing ewm functionality per group.
SeriesGroupBy.expanding([min_periods, method])	Return an expanding grouper, providing expanding functionality per group.
SeriesGroupBy.ffill([limit])	Forward fill the values.
SeriesGroupBy.first([numeric_only, ...])	Compute the first entry of each column within each group.
SeriesGroupBy.head([n])	Return first n rows of each group.
SeriesGroupBy.last([numeric_only, ...])	Compute the last entry of each column within each group.
SeriesGroupBy.idxmax([skipna])	Return the row label of the maximum value.
SeriesGroupBy.idxmin([skipna])	Return the row label of the minimum value.
SeriesGroupBy.is_monotonic_increasing	Return whether each group's values are monotonically increasing.
SeriesGroupBy.is_monotonic_decreasing	Return whether each group's values are monotonically decreasing.
SeriesGroupBy.max([numeric_only, min_count, ...])	Compute max of group values.
SeriesGroupBy.mean([numeric_only, skipna, ...])	Compute mean of groups, excluding missing values.
SeriesGroupBy.median([numeric_only, skipna])	Compute median of groups, excluding missing values.
SeriesGroupBy.min([numeric_only, min_count, ...])	Compute min of group values.
SeriesGroupBy.ngroup([ascending])	Number each group from 0 to the number of groups - 1.
SeriesGroupBy.nlargest([n, keep])	Return the largest n elements.
SeriesGroupBy.nsmallest([n, keep])	Return the smallest n elements.
SeriesGroupBy.nth	Take the nth row from each group if n is an int, otherwise a subset of rows.
SeriesGroupBy.nunique([dropna])	Return number of unique elements in the group.
SeriesGroupBy.unique()	Return unique values for each group.
SeriesGroupBy.ohlc()	Compute open, high, low and close values of a group, excluding missing values.
SeriesGroupBy.pct_change([periods, ...])	Calculate pct_change of each value to previous entry in group.
SeriesGroupBy.prod([numeric_only, ...])	Compute prod of group values.
SeriesGroupBy.quantile([q, interpolation, ...])	Return group values at the given quantile, a la numpy.percentile.
SeriesGroupBy.rank([method, ascending, ...])	Provide the rank of values within each group.
SeriesGroupBy.resample(rule, *args[, ...])	Provide resampling when using a TimeGrouper.
SeriesGroupBy.rolling(window[, min_periods, ...])	Return a rolling grouper, providing rolling functionality per group.
SeriesGroupBy.sample([n, frac, replace, ...])	Return a random sample of items from each group.
SeriesGroupBy.sem([ddof, numeric_only, skipna])	Compute standard error of the mean of groups, excluding missing values.
SeriesGroupBy.shift([periods, freq, ...])	Shift each group by periods observations.
SeriesGroupBy.size()	Compute group sizes.
SeriesGroupBy.skew([skipna, numeric_only])	Return unbiased skew within groups.
SeriesGroupBy.kurt([skipna, numeric_only])	Return unbiased kurtosis within groups.
SeriesGroupBy.std([ddof, engine, ...])	Compute standard deviation of groups, excluding missing values.
SeriesGroupBy.sum([numeric_only, min_count, ...])	Compute sum of group values.
SeriesGroupBy.var([ddof, engine, ...])	Compute variance of groups, excluding missing values.
SeriesGroupBy.tail([n])	Return last n rows of each group.
SeriesGroupBy.take(indices, **kwargs)	Return the elements in the given positional indices in each group.
SeriesGroupBy.value_counts([normalize, ...])	Return a Series or DataFrame containing counts of unique rows.

Plotting and visualization

Function	Description
DataFrameGroupBy.boxplot([subplots, column, ...])	Make box plots from DataFrameGroupBy data.
DataFrameGroupBy.hist([column, by, grid, ...])	Make a histogram of the DataFrame's columns.
SeriesGroupBy.hist([by, ax, grid, ...])	Draw histogram for each group's values using `Series.hist()` API.
DataFrameGroupBy.plot	Make plots of groups from a DataFrame.
SeriesGroupBy.plot	Make plots of groups from a Series.

Examples for `pandas.api.typing.DataFrameGroupBy.iter`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for `pandas.api.typing.SeriesGroupBy.iter`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for `pandas.api.typing.DataFrameGroupBy.groups`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for `pandas.api.typing.SeriesGroupBy.groups`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for `pandas.api.typing.DataFrameGroupBy.indices`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for `pandas.api.typing.SeriesGroupBy.indices`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for `pandas.api.typing.DataFrameGroupBy.get_group`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for `pandas.api.typing.SeriesGroupBy.get_group`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for `pandas.Grouper`

df.groupby(pd.Grouper(key="Animal")) is equivalent to df.groupby('Animal')

>>> df = pd.DataFrame(
...     {
...         "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
...         "Speed": [100, 5, 200, 300, 15],
...     }
... )
>>> df
   Animal  Speed
0  Falcon    100
1  Parrot      5
2  Falcon    200
3  Falcon    300
4  Parrot     15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
        Speed
Animal
Falcon  200.0
Parrot   10.0

Specify a resample operation on the column ‘Publish date’

>>> df = pd.DataFrame(
...     {
...         "Publish date": [
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-02"),
...             pd.Timestamp("2000-01-09"),
...             pd.Timestamp("2000-01-16"),
...         ],
...         "ID": [0, 1, 2, 3],
...         "Price": [10, 20, 30, 40],
...     }
... )
>>> df
  Publish date  ID  Price
0   2000-01-02   0     10
1   2000-01-02   1     20
2   2000-01-09   2     30
3   2000-01-16   3     40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
               ID  Price
Publish date
2000-01-02    0.5   15.0
2000-01-09    2.0   30.0
2000-01-16    3.0   40.0

If you want to adjust the start of the bins based on a fixed timestamp:

>>> start, end = "2000-10-01 23:30:00", "2000-10-02 00:30:00"
>>> rng = pd.date_range(start, end, freq="7min")
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64

>>> ts.groupby(pd.Grouper(freq="17min")).sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

>>> ts.groupby(pd.Grouper(freq="17min", origin="epoch")).sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64

>>> ts.groupby(pd.Grouper(freq="17min", origin="2000-01-01")).sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

>>> ts.groupby(pd.Grouper(freq="17min", origin="start")).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64

>>> ts.groupby(pd.Grouper(freq="17min", offset="23h30min")).sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17min, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

>>> ts.groupby(pd.Grouper(freq="17min", offset="2min")).sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17min, dtype: int64

Examples for `pandas.NamedAgg`

>>> df = pd.DataFrame({"key": [1, 1, 2], "a": [-1, 0, 1], 1: [10, 11, 12]})
>>> agg_a = pd.NamedAgg(column="a", aggfunc="min")
>>> agg_1 = pd.NamedAgg(column=1, aggfunc=lambda x: np.mean(x))
>>> df.groupby("key").agg(result_a=agg_a, result_1=agg_1)
    result_a  result_1
key
1          -1      10.5
2           1      12.0

>>> def n_between(ser, low, high, **kwargs):
...     return ser.between(low, high, **kwargs).sum()

>>> agg_between = pd.NamedAgg("a", n_between, 0, 1)
>>> df.groupby("key").agg(count_between=agg_between)
    count_between
key
1               1
2               1

>>> agg_between_kw = pd.NamedAgg("a", n_between, 0, 1, inclusive="both")
>>> df.groupby("key").agg(count_between_kw=agg_between_kw)
    count_between_kw
key
1                   1
2                   1

Examples for `pandas.api.typing.SeriesGroupBy.apply`

>>> s = pd.Series([0, 1, 2], index="a a b".split())
>>> g1 = s.groupby(s.index, group_keys=False)
>>> g2 = s.groupby(s.index, group_keys=True)

From s above we can see that g has two groups, a and b. Notice that g1 have g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: The function passed to apply takes a Series as its argument and returns a Series. apply combines the result for each group together into a new Series.

The resulting dtype will reflect the return value of the passed func.

>>> g1.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a    0.0
a    2.0
b    1.0
dtype: float64

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a  a    0.0
   a    2.0
b  b    1.0
dtype: float64

Example 2: The function passed to apply takes a Series as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g1.apply(lambda x: x.max() - x.min())
a    1
b    0
dtype: int64

The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.

>>> g2.apply(lambda x: x.max() - x.min())
a    1
b    0
dtype: int64

Examples for `pandas.api.typing.DataFrameGroupBy.apply`

>>> df = pd.DataFrame({"A": "a a b".split(), "B": [1, 2, 3], "C": [4, 6, 5]})
>>> g1 = df.groupby("A", group_keys=False)
>>> g2 = df.groupby("A", group_keys=True)

Notice that g1 and g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:

Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:

>>> g1[["B", "C"]].apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:

>>> g2[["B", "C"]].apply(lambda x: x / x.sum())
            B    C
A
a 0  0.333333  0.4
  1  0.666667  0.6
b 2  1.000000  1.0

Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.

The resulting dtype will reflect the return value of the passed func.

>>> g1[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0

>>> g2[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
     B    C
A
a  1.0  2.0
b  0.0  0.0

The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.

Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:

>>> g1.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

Example 4: The function passed to apply returns None for one of the group. This group is filtered from the result:

>>> g1.apply(lambda x: None if x.iloc[0, 0] == 3 else x)
   B  C
0  1  4
1  2  6

Examples for `pandas.api.typing.SeriesGroupBy.agg`

>>> s = pd.Series([1, 2, 3, 4])

>>> s
0    1
1    2
2    3
3    4
dtype: int64

>>> s.groupby([1, 1, 2, 2]).min()
1    1
2    3
dtype: int64

>>> s.groupby([1, 1, 2, 2]).agg("min")
1    1
2    3
dtype: int64

>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
   min  max
1    1    2
2    3    4

The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.

>>> s.groupby([1, 1, 2, 2]).agg(
...     minimum="min",
...     maximum="max",
... )
   minimum  maximum
1        1        2
2        3        4

The resulting dtype will reflect the return value of the aggregating function.

>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1    1.0
2    3.0
dtype: float64

Examples for `pandas.api.typing.DataFrameGroupBy.agg`

>>> data = {
...     "A": [1, 1, 2, 2],
...     "B": [1, 2, 3, 4],
...     "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

The aggregation is for each column.

>>> df.groupby("A").agg("min")
   B         C
A
1  1  0.227877
2  3 -0.562860

Multiple aggregations

>>> df.groupby("A").agg(["min", "max"])
    B             C
  min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Select a column for aggregation

>>> df.groupby("A").B.agg(["min", "max"])
   min  max
A
1    1    2
2    3    4

User-defined function for aggregation

>>> df.groupby("A").agg(lambda x: sum(x) + 2)
    B          C
A
1       5       2.590715
2       9       2.704907

Different aggregations per column

>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
    B             C
  min max       sum
A
1   1   2  0.590715
2   3   4  0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

>>> df.groupby("A").agg(
...     b_min=pd.NamedAgg(column="B", aggfunc="min"),
...     c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
   b_min     c_sum
A
1      1  0.590715
2      3  0.704907

The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

See Named aggregation for more.

The resulting dtype will reflect the return value of the aggregating function.

>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
      B
A
1   1.0
2   3.0

Examples for `pandas.api.typing.SeriesGroupBy.aggregate`

>>> s = pd.Series([1, 2, 3, 4])

>>> s
0    1
1    2
2    3
3    4
dtype: int64

>>> s.groupby([1, 1, 2, 2]).min()
1    1
2    3
dtype: int64

>>> s.groupby([1, 1, 2, 2]).agg("min")
1    1
2    3
dtype: int64

>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
   min  max
1    1    2
2    3    4

The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.

>>> s.groupby([1, 1, 2, 2]).agg(
...     minimum="min",
...     maximum="max",
... )
   minimum  maximum
1        1        2
2        3        4

The resulting dtype will reflect the return value of the aggregating function.

>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1    1.0
2    3.0
dtype: float64

Examples for `pandas.api.typing.DataFrameGroupBy.aggregate`

>>> data = {
...     "A": [1, 1, 2, 2],
...     "B": [1, 2, 3, 4],
...     "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
   A  B         C
0  1  1  0.362838
1  1  2  0.227877
2  2  3  1.267767
3  2  4 -0.562860

The aggregation is for each column.

>>> df.groupby("A").agg("min")
   B         C
A
1  1  0.227877
2  3 -0.562860

Multiple aggregations

>>> df.groupby("A").agg(["min", "max"])
    B             C
  min max       min       max
A
1   1   2  0.227877  0.362838
2   3   4 -0.562860  1.267767

Select a column for aggregation

>>> df.groupby("A").B.agg(["min", "max"])
   min  max
A
1    1    2
2    3    4

User-defined function for aggregation

>>> df.groupby("A").agg(lambda x: sum(x) + 2)
    B          C
A
1       5       2.590715
2       9       2.704907

Different aggregations per column

>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
    B             C
  min max       sum
A
1   1   2  0.590715
2   3   4  0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

>>> df.groupby("A").agg(
...     b_min=pd.NamedAgg(column="B", aggfunc="min"),
...     c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
   b_min     c_sum
A
1      1  0.590715
2      3  0.704907

The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

See Named aggregation for more.

The resulting dtype will reflect the return value of the aggregating function.

>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
      B
A
1   1.0
2   3.0

Examples for `pandas.api.typing.SeriesGroupBy.transform`

>>> ser = pd.Series(
...     [390.0, 350.0, 30.0, 20.0],
...     index=["Falcon", "Falcon", "Parrot", "Parrot"],
...     name="Max Speed",
... )
>>> grouped = ser.groupby([1, 1, 2, 2])
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
    Falcon    0.707107
    Falcon   -0.707107
    Parrot    0.707107
    Parrot   -0.707107
    Name: Max Speed, dtype: float64

Broadcast result of the transformation

>>> grouped.transform(lambda x: x.max() - x.min())
Falcon    40.0
Falcon    40.0
Parrot    10.0
Parrot    10.0
Name: Max Speed, dtype: float64

>>> grouped.transform("mean")
Falcon    370.0
Falcon    370.0
Parrot     25.0
Parrot     25.0
Name: Max Speed, dtype: float64

The resulting dtype will reflect the return value of the passed func, for example:

>>> grouped.transform(lambda x: x.astype(int).max())
Falcon    390
Falcon    390
Parrot     30
Parrot     30
Name: Max Speed, dtype: int64

Examples for `pandas.api.typing.DataFrameGroupBy.transform`

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "bar", "foo", "bar", "foo", "bar"],
...         "B": ["one", "one", "two", "three", "two", "two"],
...         "C": [1, 5, 5, 2, 5, 5],
...         "D": [2.0, 5.0, 8.0, 1.0, 2.0, 9.0],
...     }
... )
>>> grouped = df.groupby("A")[["C", "D"]]
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
          C         D
0 -1.154701 -0.577350
1  0.577350  0.000000
2  0.577350  1.154701
3 -1.154701 -1.000000
4  0.577350 -0.577350
5  0.577350  1.000000

Broadcast result of the transformation

>>> grouped.transform(lambda x: x.max() - x.min())
     C    D
0  4.0  6.0
1  3.0  8.0
2  4.0  6.0
3  3.0  8.0
4  4.0  6.0
5  3.0  8.0

>>> grouped.transform("mean")
          C    D
0  3.666667  4.0
1  4.000000  5.0
2  3.666667  4.0
3  4.000000  5.0
4  3.666667  4.0
5  4.000000  5.0

The resulting dtype will reflect the return value of the passed func, for example:

>>> grouped.transform(lambda x: x.astype(int).max())
   C  D
0  5  8
1  5  9
2  5  8
3  5  9
4  5  8
5  5  9

Examples for `pandas.api.typing.SeriesGroupBy.pipe`

>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
   A  B
0  a  1
1  b  2
2  a  3
3  b  4

To get the difference between each groups maximum and minimum value in one pass, you can do

>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
   B
A
a  2
b  2

Examples for `pandas.api.typing.DataFrameGroupBy.pipe`

>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
   A  B
0  a  1
1  b  2
2  a  3
3  b  4

To get the difference between each groups maximum and minimum value in one pass, you can do

>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
   B
A
a  2
b  2

Window

pandas.api.typing.Rolling instances are returned by .rolling calls: pandas.DataFrame.rolling() and pandas.Series.rolling(). pandas.api.typing.Expanding instances are returned by .expanding calls: pandas.DataFrame.expanding() and pandas.Series.expanding(). pandas.api.typing.ExponentialMovingWindow instances are returned by .ewm calls: pandas.DataFrame.ewm() and pandas.Series.ewm().

Rolling window functions

Function	Description
Rolling.count([numeric_only])	Calculate the rolling count of non NaN observations.
Rolling.sum([numeric_only, engine, ...])	Calculate the rolling sum.
Rolling.mean([numeric_only, engine, ...])	Calculate the rolling mean.
Rolling.median([numeric_only, engine, ...])	Calculate the rolling median.
Rolling.var([ddof, numeric_only, engine, ...])	Calculate the rolling variance.
Rolling.std([ddof, numeric_only, engine, ...])	Calculate the rolling standard deviation.
Rolling.min([numeric_only, engine, ...])	Calculate the rolling minimum.
Rolling.max([numeric_only, engine, ...])	Calculate the rolling maximum.
Rolling.first([numeric_only])	Calculate the rolling First (left-most) element of the window.
Rolling.last([numeric_only])	Calculate the rolling Last (right-most) element of the window.
Rolling.corr([other, pairwise, ddof, ...])	Calculate the rolling correlation.
Rolling.cov([other, pairwise, ddof, ...])	Calculate the rolling sample covariance.
Rolling.skew([numeric_only])	Calculate the rolling unbiased skewness.
Rolling.kurt([numeric_only])	Calculate the rolling Fisher's definition of kurtosis without bias.
Rolling.apply(func[, raw, engine, ...])	Calculate the rolling custom aggregation function.
Rolling.pipe(func, args, *kwargs)	Apply a `func` with arguments to this Rolling object and return its result.
Rolling.aggregate([func])	Aggregate using one or more operations over the specified axis.
Rolling.quantile(q[, interpolation, ...])	Calculate the rolling quantile.
Rolling.sem([ddof, numeric_only])	Calculate the rolling standard error of mean.
Rolling.rank([method, ascending, pct, ...])	Calculate the rolling rank.
Rolling.nunique([numeric_only])	Calculate the rolling nunique.

Weighted window functions

Function	Description
Window.mean([numeric_only])	Calculate the rolling weighted window mean.
Window.sum([numeric_only])	Calculate the rolling weighted window sum.
Window.var([ddof, numeric_only])	Calculate the rolling weighted window variance.
Window.std([ddof, numeric_only])	Calculate the rolling weighted window standard deviation.

Expanding window functions

Function	Description
Expanding.count([numeric_only])	Calculate the expanding count of non NaN observations.
Expanding.sum([numeric_only, engine, ...])	Calculate the expanding sum.
Expanding.mean([numeric_only, engine, ...])	Calculate the expanding mean.
Expanding.median([numeric_only, engine, ...])	Calculate the expanding median.
Expanding.var([ddof, numeric_only, engine, ...])	Calculate the expanding variance.
Expanding.std([ddof, numeric_only, engine, ...])	Calculate the expanding standard deviation.
Expanding.min([numeric_only, engine, ...])	Calculate the expanding minimum.
Expanding.max([numeric_only, engine, ...])	Calculate the expanding maximum.
Expanding.first([numeric_only])	Calculate the expanding First (left-most) element of the window.
Expanding.last([numeric_only])	Calculate the expanding Last (right-most) element of the window.
Expanding.corr([other, pairwise, ddof, ...])	Calculate the expanding correlation.
Expanding.cov([other, pairwise, ddof, ...])	Calculate the expanding sample covariance.
Expanding.skew([numeric_only])	Calculate the expanding unbiased skewness.
Expanding.kurt([numeric_only])	Calculate the expanding Fisher's definition of kurtosis without bias.
Expanding.apply(func[, raw, engine, ...])	Calculate the expanding custom aggregation function.
Expanding.pipe(func, args, *kwargs)	Apply a `func` with arguments to this Expanding object and return its result.
Expanding.aggregate([func])	Aggregate using one or more operations over the specified axis.
Expanding.quantile(q[, interpolation, ...])	Calculate the expanding quantile.
Expanding.sem([ddof, numeric_only])	Calculate the expanding standard error of mean.
Expanding.rank([method, ascending, pct, ...])	Calculate the expanding rank.
Expanding.nunique([numeric_only])	Calculate the expanding nunique.

Exponentially-weighted window functions

Function	Description
ExponentialMovingWindow.mean([numeric_only, ...])	Calculate the ewm (exponential weighted moment) mean.
ExponentialMovingWindow.sum([numeric_only, ...])	Calculate the ewm (exponential weighted moment) sum.
ExponentialMovingWindow.std([bias, numeric_only])	Calculate the ewm (exponential weighted moment) standard deviation.
ExponentialMovingWindow.var([bias, numeric_only])	Calculate the ewm (exponential weighted moment) variance.
ExponentialMovingWindow.corr([other, ...])	Calculate the ewm (exponential weighted moment) sample correlation.
ExponentialMovingWindow.cov([other, ...])	Calculate the ewm (exponential weighted moment) sample covariance.

Window indexer

Base class for defining custom window boundaries.

Function	Description
api.indexers.BaseIndexer([index_array, ...])	Base class for window bounds calculations.
api.indexers.FixedForwardWindowIndexer([...])	Creates window boundaries for fixed-length windows that include the current row.
api.indexers.VariableOffsetWindowIndexer([...])	Calculate window boundaries based on a non-fixed offset such as a BusinessDay.

Examples for `pandas.api.typing.Rolling.count`

>>> s = pd.Series([2, 3, np.nan, 10])
>>> s.rolling(2).count()
0    NaN
1    2.0
2    1.0
3    1.0
dtype: float64
>>> s.rolling(3).count()
0    NaN
1    NaN
2    2.0
3    2.0
dtype: float64
>>> s.rolling(4).count()
0    NaN
1    NaN
2    NaN
3    3.0
dtype: float64

Examples for `pandas.api.typing.Rolling.sum`

>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s
0    1
1    2
2    3
3    4
4    5
dtype: int64

>>> s.rolling(3).sum()
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
dtype: float64

>>> s.rolling(3, center=True).sum()
0     NaN
1     6.0
2     9.0
3    12.0
4     NaN
dtype: float64

For DataFrame, each sum is computed column-wise.

>>> df = pd.DataFrame({"A": s, "B": s**2})
>>> df
A   B
0  1   1
1  2   4
2  3   9
3  4  16
4  5  25

>>> df.rolling(3).sum()
     A     B
0   NaN   NaN
1   NaN   NaN
2   6.0  14.0
3   9.0  29.0
4  12.0  50.0

Examples for `pandas.api.typing.Rolling.mean`

The below examples will show rolling mean calculations with window sizes of two and three, respectively.

>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).mean()
0    NaN
1    1.5
2    2.5
3    3.5
dtype: float64

>>> s.rolling(3).mean()
0    NaN
1    NaN
2    2.0
3    3.0
dtype: float64

Examples for `pandas.api.typing.Rolling.median`

Compute the rolling median of a series with a window size of 3.

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.rolling(3).median()
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
dtype: float64

Examples for `pandas.api.typing.Rolling.var`

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).var()
0         NaN
1         NaN
2    0.333333
3    1.000000
4    1.000000
5    1.333333
6    0.000000
dtype: float64

Examples for `pandas.api.typing.Rolling.std`

>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).std()
0         NaN
1         NaN
2    0.577350
3    1.000000
4    1.000000
5    1.154701
6    0.000000
dtype: float64

Examples for `pandas.api.typing.Rolling.min`

Performing a rolling minimum with a window size of 3.

>>> s = pd.Series([4, 3, 5, 2, 6])
>>> s.rolling(3).min()
0    NaN
1    NaN
2    3.0
3    2.0
4    2.0
dtype: float64

Examples for `pandas.api.typing.Rolling.max`

>>> ser = pd.Series([1, 2, 3, 4])
>>> ser.rolling(2).max()
0    NaN
1    2.0
2    3.0
3    4.0
dtype: float64

Examples for `pandas.api.typing.Rolling.first`

The example below will show a rolling calculation with a window size of three.

>>> s = pd.Series(range(5))
>>> s.rolling(3).first()
0         NaN
1         NaN
2         0.0
3         1.0
4         2.0
dtype: float64

Examples for `pandas.api.typing.Rolling.last`

The example below will show a rolling calculation with a window size of three.

>>> s = pd.Series(range(5))
>>> s.rolling(3).last()
0         NaN
1         NaN
2         2.0
3         3.0
4         4.0
dtype: float64

Examples for `pandas.api.typing.Rolling.corr`

The below example shows a rolling calculation with a window size of four matching the equivalent function call using numpy.corrcoef().

>>> v1 = [3, 3, 3, 5, 8]
>>> v2 = [3, 4, 4, 4, 8]
>>> np.corrcoef(v1[:-1], v2[:-1])
array([[1.        , 0.33333333],
    [0.33333333, 1.        ]])
>>> np.corrcoef(v1[1:], v2[1:])
array([[1.       , 0.9169493],
    [0.9169493, 1.       ]])
>>> s1 = pd.Series(v1)
>>> s2 = pd.Series(v2)
>>> s1.rolling(4).corr(s2)
0         NaN
1         NaN
2         NaN
3    0.333333
4    0.916949
dtype: float64

The below example shows a similar rolling calculation on a DataFrame using the pairwise option.

>>> matrix = np.array(
...     [[51.0, 35.0], [49.0, 30.0], [47.0, 32.0], [46.0, 31.0], [50.0, 36.0]]
... )
>>> np.corrcoef(matrix[:-1, 0], matrix[:-1, 1])
array([[1.       , 0.6263001],
    [0.6263001, 1.       ]])
>>> np.corrcoef(matrix[1:, 0], matrix[1:, 1])
array([[1.        , 0.55536811],
    [0.55536811, 1.        ]])
>>> df = pd.DataFrame(matrix, columns=["X", "Y"])
>>> df
      X     Y
0  51.0  35.0
1  49.0  30.0
2  47.0  32.0
3  46.0  31.0
4  50.0  36.0
>>> df.rolling(4).corr(pairwise=True)
            X         Y
0 X        NaN       NaN
  Y        NaN       NaN
1 X        NaN       NaN
  Y        NaN       NaN
2 X        NaN       NaN
  Y        NaN       NaN
3 X   1.000000  0.626300
  Y   0.626300  1.000000
4 X   1.000000  0.555368
  Y   0.555368  1.000000

Examples for `pandas.api.typing.Rolling.cov`

>>> ser1 = pd.Series([1, 2, 3, 4])
>>> ser2 = pd.Series([1, 4, 5, 8])
>>> ser1.rolling(2).cov(ser2)
0    NaN
1    1.5
2    0.5
3    1.5
dtype: float64

Examples for `pandas.api.typing.Rolling.skew`

>>> ser = pd.Series([1, 5, 2, 7, 15, 6])
>>> ser.rolling(3).skew().round(6)
0         NaN
1         NaN
2    1.293343
3   -0.585583
4    0.670284
5    1.652317
dtype: float64

Examples for `pandas.api.typing.Rolling.kurt`

The example below will show a rolling calculation with a window size of four matching the equivalent function call using scipy.stats.

>>> arr = [1, 2, 3, 4, 999]
>>> import scipy.stats
>>> print(f"{scipy.stats.kurtosis(arr[:-1], bias=False):.6f}")
-1.200000
>>> print(f"{scipy.stats.kurtosis(arr[1:], bias=False):.6f}")
3.999946
>>> s = pd.Series(arr)
>>> s.rolling(4).kurt()
0         NaN
1         NaN
2         NaN
3   -1.200000
4    3.999946
dtype: float64

Examples for `pandas.api.typing.Rolling.apply`

>>> ser = pd.Series([1, 6, 5, 4])
>>> ser.rolling(2).apply(lambda s: s.sum() - s.min())
0    NaN
1    6.0
2    6.0
3    5.0
dtype: float64

Examples for `pandas.api.typing.Rolling.pipe`

>>> df = pd.DataFrame(
...     {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
            A
2012-08-02  1
2012-08-03  2
2012-08-04  3
2012-08-05  4

To get the difference between each rolling 2-day window’s maximum and minimum value in one pass, you can do

>>> df.rolling("2D").pipe(lambda x: x.max() - x.min())
            A
2012-08-02  0.0
2012-08-03  1.0
2012-08-04  1.0
2012-08-05  1.0

Examples for `pandas.api.typing.Rolling.aggregate`

>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
>>> df
A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

>>> df.rolling(2).sum()
    A     B     C
0  NaN   NaN   NaN
1  3.0   9.0  15.0
2  5.0  11.0  17.0

>>> df.rolling(2).agg({"A": "sum", "B": "min"})
    A    B
0  NaN  NaN
1  3.0  4.0
2  5.0  5.0

Examples for `pandas.api.typing.Rolling.quantile`

>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).quantile(0.4, interpolation="lower")
0    NaN
1    1.0
2    2.0
3    3.0
dtype: float64

>>> s.rolling(2).quantile(0.4, interpolation="midpoint")
0    NaN
1    1.5
2    2.5
3    3.5
dtype: float64

Examples for `pandas.api.typing.Rolling.sem`

>>> s = pd.Series([0, 1, 2, 3])
>>> s.rolling(2, min_periods=1).sem()
0    NaN
1    0.5
2    0.5
3    0.5
dtype: float64

Examples for `pandas.api.typing.Rolling.rank`

>>> s = pd.Series([1, 4, 2, 3, 5, 3])
>>> s.rolling(3).rank()
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64

>>> s.rolling(3).rank(method="max")
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64

>>> s.rolling(3).rank(method="min")
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.0
dtype: float64

Resampling

pandas.api.typing.Resampler instances are returned by resample calls: pandas.DataFrame.resample(), pandas.Series.resample().

Indexing, iteration

Function	Description
Resampler.iter()	Groupby iterator.
Resampler.groups	Dict {group name -> group labels}.
Resampler.indices	Dict {group name -> group indices}.
Resampler.get_group(name)	Construct DataFrame from group with provided name.

Function application

Function	Description
Resampler.apply([func])	Aggregate using one or more operations over the specified axis.
Resampler.aggregate([func])	Aggregate using one or more operations over the specified axis.
Resampler.transform(arg, args, *kwargs)	Call function producing a like-indexed Series on each group.
Resampler.pipe(func, args, *kwargs)	Apply a `func` with arguments to this Resampler object and return its result.

Upsampling

Function	Description
Resampler.ffill([limit])	Forward fill the values.
Resampler.bfill([limit])	Backward fill the new missing values in the resampled data.
Resampler.nearest([limit])	Resample by using the nearest value.
Resampler.asfreq([fill_value])	Return the values at the new freq, essentially a reindex.
Resampler.interpolate([method, axis, limit, ...])	Interpolate values between target timestamps according to different methods.

Computations / descriptive stats

Function	Description
Resampler.count()	Compute count of group, excluding missing values.
Resampler.nunique()	Return number of unique elements in the group.
Resampler.first([numeric_only, min_count, ...])	Compute the first non-null entry of each column.
Resampler.last([numeric_only, min_count, skipna])	Compute the last non-null entry of each column.
Resampler.max([numeric_only, min_count])	Compute max value of group.
Resampler.mean([numeric_only])	Compute mean of groups, excluding missing values.
Resampler.median([numeric_only])	Compute median of groups, excluding missing values.
Resampler.min([numeric_only, min_count])	Compute min value of group.
Resampler.ohlc()	Compute open, high, low and close values of a group, excluding missing values.
Resampler.prod([numeric_only, min_count])	Compute prod of group values.
Resampler.size()	Compute group sizes.
Resampler.sem([ddof, numeric_only])	Compute standard error of the mean of groups, excluding missing values.
Resampler.std([ddof, numeric_only])	Compute standard deviation of groups, excluding missing values.
Resampler.sum([numeric_only, min_count])	Compute sum of group values.
Resampler.var([ddof, numeric_only])	Compute variance of groups, excluding missing values.
Resampler.quantile([q])	Return value at the given quantile.

Examples for `pandas.api.typing.Resampler.iter`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> for x, y in ser.groupby(level=0):
...     print(f"{x}\n{y}\n")
a
a    1
a    2
dtype: int64
b
b    3
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> for x, y in df.groupby(by=["a"]):
...     print(f"{x}\n{y}\n")
(1,)
   a  b  c
0  1  2  3
1  1  5  6
(7,)
   a  b  c
2  7  8  9

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> for x, y in ser.resample("MS"):
...     print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01    1
2023-01-15    2
dtype: int64
2023-02-01 00:00:00
2023-02-01    3
2023-02-15    4
dtype: int64

Examples for `pandas.api.typing.Resampler.groups`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
 Timestamp('2023-02-01 00:00:00'): np.int64(4)}

Examples for `pandas.api.typing.Resampler.indices`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})

Examples for `pandas.api.typing.Resampler.get_group`

For SeriesGroupBy:

>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a    1
a    2
b    3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a    1
a    2
dtype: int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
...     data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
>>> df.groupby(by=["a"]).get_group((1,))
        a  b  c
owl     1  2  3
toucan  1  5  6

For Resampler:

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01    1
2023-01-15    2
dtype: int64

Examples for `pandas.api.typing.Resampler.apply`

>>> s = pd.Series(
...     [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00    1
2013-01-01 00:00:01    2
2013-01-01 00:00:02    3
2013-01-01 00:00:03    4
2013-01-01 00:00:04    5
Freq: s, dtype: int64

>>> r = s.resample("2s")

>>> r.agg("sum")
2013-01-01 00:00:00    3
2013-01-01 00:00:02    7
2013-01-01 00:00:04    5
Freq: 2s, dtype: int64

>>> r.agg(["sum", "mean", "max"])
                    sum  mean  max
2013-01-01 00:00:00    3   1.5    2
2013-01-01 00:00:02    7   3.5    4
2013-01-01 00:00:04    5   5.0    5

>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
                    result  total
2013-01-01 00:00:00  2.121320      3
2013-01-01 00:00:02  4.949747      7
2013-01-01 00:00:04       NaN      5

>>> r.agg(average="mean", total="sum")
                        average  total
2013-01-01 00:00:00      1.5      3
2013-01-01 00:00:02      3.5      7
2013-01-01 00:00:04      5.0      5

Examples for `pandas.api.typing.Resampler.aggregate`

>>> s = pd.Series(
...     [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00    1
2013-01-01 00:00:01    2
2013-01-01 00:00:02    3
2013-01-01 00:00:03    4
2013-01-01 00:00:04    5
Freq: s, dtype: int64

>>> r = s.resample("2s")

>>> r.agg("sum")
2013-01-01 00:00:00    3
2013-01-01 00:00:02    7
2013-01-01 00:00:04    5
Freq: 2s, dtype: int64

>>> r.agg(["sum", "mean", "max"])
                    sum  mean  max
2013-01-01 00:00:00    3   1.5    2
2013-01-01 00:00:02    7   3.5    4
2013-01-01 00:00:04    5   5.0    5

>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
                    result  total
2013-01-01 00:00:00  2.121320      3
2013-01-01 00:00:02  4.949747      7
2013-01-01 00:00:04       NaN      5

>>> r.agg(average="mean", total="sum")
                        average  total
2013-01-01 00:00:00      1.5      3
2013-01-01 00:00:02      3.5      7
2013-01-01 00:00:04      5.0      5

Examples for `pandas.api.typing.Resampler.transform`

>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
Freq: h, dtype: int64

>>> resampled = s.resample("15min")
>>> resampled.transform(lambda x: (x - x.mean()) / x.std())
2018-01-01 00:00:00   NaN
2018-01-01 01:00:00   NaN
Freq: h, dtype: float64

Examples for `pandas.api.typing.Resampler.pipe`

>>> df = pd.DataFrame(
...     {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
            A
2012-08-02  1
2012-08-03  2
2012-08-04  3
2012-08-05  4

To get the difference between each 2-day period’s maximum and minimum value in one pass, you can do

>>> df.resample("2D").pipe(lambda x: x.max() - x.min())
            A
2012-08-02  1
2012-08-04  1

Examples for `pandas.api.typing.Resampler.ffill`

Here we only create a Series.

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64

Example for ffill with downsampling (we have fewer dates after resampling):

>>> ser.resample("MS").ffill()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Example for ffill with upsampling (fill the new dates with the previous value):

>>> ser.resample("W").ffill()
2023-01-01    1
2023-01-08    1
2023-01-15    2
2023-01-22    2
2023-01-29    2
2023-02-05    3
2023-02-12    3
2023-02-19    4
Freq: W-SUN, dtype: int64

With upsampling and limiting (only fill the first new date with the previous value):

>>> ser.resample("W").ffill(limit=1)
2023-01-01    1.0
2023-01-08    1.0
2023-01-15    2.0
2023-01-22    2.0
2023-01-29    NaN
2023-02-05    3.0
2023-02-12    NaN
2023-02-19    4.0
Freq: W-SUN, dtype: float64

Examples for `pandas.api.typing.Resampler.bfill`

Resampling a Series:

>>> s = pd.Series(
...     [1, 2, 3], index=pd.date_range("20180101", periods=3, freq="h")
... )
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: h, dtype: int64

>>> s.resample("30min").bfill()
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30min, dtype: int64

>>> s.resample("15min").bfill(limit=2)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    NaN
2018-01-01 00:30:00    2.0
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:15:00    NaN
2018-01-01 01:30:00    3.0
2018-01-01 01:45:00    3.0
2018-01-01 02:00:00    3.0
Freq: 15min, dtype: float64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame(
...     {"a": [2, np.nan, 6], "b": [1, 3, 5]},
...     index=pd.date_range("20180101", periods=3, freq="h"),
... )
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample("30min").bfill()
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:30:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:30:00  6.0  5
2018-01-01 02:00:00  6.0  5

>>> df.resample("15min").bfill(limit=2)
                       a    b
2018-01-01 00:00:00  2.0  1.0
2018-01-01 00:15:00  NaN  NaN
2018-01-01 00:30:00  NaN  3.0
2018-01-01 00:45:00  NaN  3.0
2018-01-01 01:00:00  NaN  3.0
2018-01-01 01:15:00  NaN  NaN
2018-01-01 01:30:00  6.0  5.0
2018-01-01 01:45:00  6.0  5.0
2018-01-01 02:00:00  6.0  5.0

Examples for `pandas.api.typing.Resampler.nearest`

>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
Freq: h, dtype: int64

>>> s.resample("15min").nearest()
2018-01-01 00:00:00    1
2018-01-01 00:15:00    1
2018-01-01 00:30:00    2
2018-01-01 00:45:00    2
2018-01-01 01:00:00    2
Freq: 15min, dtype: int64

Limit the number of upsampled values imputed by the nearest:

>>> s.resample("15min").nearest(limit=1)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    1.0
2018-01-01 00:30:00    NaN
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
Freq: 15min, dtype: float64

Examples for `pandas.api.typing.Resampler.asfreq`

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-31", "2023-02-01", "2023-02-28"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-31    2
2023-02-01    3
2023-02-28    4
dtype: int64
>>> ser.resample("MS").asfreq()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.interpolate`

>>> start = "2023-03-01T07:00:00"
>>> timesteps = pd.date_range(start, periods=5, freq="s")
>>> series = pd.Series(data=[1, -1, 2, 1, 3], index=timesteps)
>>> series
2023-03-01 07:00:00    1
2023-03-01 07:00:01   -1
2023-03-01 07:00:02    2
2023-03-01 07:00:03    1
2023-03-01 07:00:04    3
Freq: s, dtype: int64

Downsample the dataframe to 0.5Hz by providing the period time of 2s.

>>> series.resample("2s").interpolate("linear")
2023-03-01 07:00:00    1
2023-03-01 07:00:02    2
2023-03-01 07:00:04    3
Freq: 2s, dtype: int64

Upsample the dataframe to 2Hz by providing the period time of 500ms.

>>> series.resample("500ms").interpolate("linear")
2023-03-01 07:00:00.000    1.0
2023-03-01 07:00:00.500    0.0
2023-03-01 07:00:01.000   -1.0
2023-03-01 07:00:01.500    0.5
2023-03-01 07:00:02.000    2.0
2023-03-01 07:00:02.500    1.5
2023-03-01 07:00:03.000    1.0
2023-03-01 07:00:03.500    2.0
2023-03-01 07:00:04.000    3.0
Freq: 500ms, dtype: float64

Internal reindexing with asfreq() prior to interpolation leads to an interpolated timeseries on the basis of the reindexed timestamps (anchors). It is assured that all available datapoints from original series become anchors, so it also works for resampling-cases that lead to non-aligned timestamps, as in the following example:

>>> series.resample("400ms").interpolate("linear")
2023-03-01 07:00:00.000    1.000000
2023-03-01 07:00:00.400    0.333333
2023-03-01 07:00:00.800   -0.333333
2023-03-01 07:00:01.200    0.000000
2023-03-01 07:00:01.600    1.000000
2023-03-01 07:00:02.000    2.000000
2023-03-01 07:00:02.400    1.666667
2023-03-01 07:00:02.800    1.333333
2023-03-01 07:00:03.200    1.666667
2023-03-01 07:00:03.600    2.333333
2023-03-01 07:00:04.000    3.000000
Freq: 400ms, dtype: float64

Note that the series correctly decreases between two anchors 07:00:00 and 07:00:02.

Examples for `pandas.api.typing.Resampler.count`

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").count()
2023-01-01    2
2023-02-01    2
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.nunique`

>>> ser = pd.Series(
...     [1, 2, 3, 3],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    3
dtype: int64
>>> ser.resample("MS").nunique()
2023-01-01    2
2023-02-01    1
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.first`

>>> s = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> s
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> s.resample("MS").first()
2023-01-01    1
2023-02-01    3
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.last`

>>> s = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> s
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> s.resample("MS").last()
2023-01-01    2
2023-02-01    4
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.max`

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").max()
2023-01-01    2
2023-02-01    4
Freq: MS, dtype: int64

Examples for `pandas.api.typing.Resampler.mean`

>>> ser = pd.Series(
...     [1, 2, 3, 4],
...     index=pd.DatetimeIndex(
...         ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
...     ),
... )
>>> ser
2023-01-01    1
2023-01-15    2
2023-02-01    3
2023-02-15    4
dtype: int64
>>> ser.resample("MS").mean()
2023-01-01    1.5
2023-02-01    3.5
Freq: MS, dtype: float64

Examples for `pandas.api.typing.Resampler.median`

>>> ser = pd.Series(
...     [1, 2, 3, 3, 4, 5],
...     index=pd.DatetimeIndex(
...         [
...             "2023-01-01",
...             "2023-01-10",
...             "2023-01-15",
...             "2023-02-01",
...             "2023-02-10",
...             "2023-02-15",
...         ]
...     ),
... )
>>> ser.resample("MS").median()
2023-01-01    2.0
2023-02-01    4.0
Freq: MS, dtype: float64

Date offsets

DateOffset

Function	Description
DateOffset	Standard kind of date increment used for a date range.

Properties

Function	Description
DateOffset.freqstr	Return a string representing the frequency.
DateOffset.kwds	Return a dict of extra parameters for the offset.
DateOffset.name	Return a string representing the base frequency.
DateOffset.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
DateOffset.normalize	Return boolean whether the frequency can align with midnight.
DateOffset.rule_code	Return a string representing the base frequency.
DateOffset.n	Return the count of the number of periods.

Methods

Function	Description
DateOffset.copy()	Return a copy of the frequency.
DateOffset.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
DateOffset.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
DateOffset.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
DateOffset.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
DateOffset.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
DateOffset.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
DateOffset.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.
DateOffset.rollback(dt)	Roll provided date backward to next offset only if not on offset.
DateOffset.rollforward(dt)	Roll provided date forward to next offset only if not on offset.

BusinessDay

Function	Description
BusinessDay	DateOffset subclass representing possibly n business days.

Alias:

Function	Description
BDay	alias of BusinessDay

Properties

Function	Description
BusinessDay.freqstr	Return a string representing the frequency.
BusinessDay.kwds	Return a dict of extra parameters for the offset.
BusinessDay.name	Return a string representing the base frequency.
BusinessDay.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessDay.normalize	Return boolean whether the frequency can align with midnight.
BusinessDay.rule_code	Return a string representing the base frequency.
BusinessDay.n	Return the count of the number of periods.
BusinessDay.weekmask	Return the weekmask used for custom business day calculations.
BusinessDay.holidays	Return the holidays used for custom business day calculations.
BusinessDay.calendar	Return the calendar used for business day calculations.

Methods

Function	Description
BusinessDay.copy()	Return a copy of the frequency.
BusinessDay.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BusinessDay.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BusinessDay.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BusinessDay.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BusinessDay.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BusinessDay.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BusinessDay.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BusinessHour

Function	Description
BusinessHour	DateOffset subclass representing possibly n business hours.

Properties

Function	Description
BusinessHour.freqstr	Return a string representing the frequency.
BusinessHour.kwds	Return a dict of extra parameters for the offset.
BusinessHour.name	Return a string representing the base frequency.
BusinessHour.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessHour.normalize	Return boolean whether the frequency can align with midnight.
BusinessHour.rule_code	Return a string representing the base frequency.
BusinessHour.n	Return the count of the number of periods.
BusinessHour.start	Return the start time(s) of the business hour.
BusinessHour.end	Return the end time(s) of the business hour.
BusinessHour.weekmask	Return the weekmask used for custom business day calculations.
BusinessHour.holidays	Return the holidays used for custom business day calculations.
BusinessHour.calendar	Return the calendar used for business day calculations.

Methods

Function	Description
BusinessHour.copy()	Return a copy of the frequency.
BusinessHour.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BusinessHour.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BusinessHour.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BusinessHour.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BusinessHour.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BusinessHour.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BusinessHour.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

CustomBusinessDay

Function	Description
CustomBusinessDay	DateOffset subclass representing possibly n custom business days.

Alias:

Function	Description
CDay	alias of CustomBusinessDay

Properties

Function	Description
CustomBusinessDay.freqstr	Return a string representing the frequency.
CustomBusinessDay.kwds	Return a dict of extra parameters for the offset.
CustomBusinessDay.name	Return a string representing the base frequency.
CustomBusinessDay.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessDay.normalize	Return boolean whether the frequency can align with midnight.
CustomBusinessDay.rule_code	Return a string representing the base frequency.
CustomBusinessDay.n	Return the count of the number of periods.
CustomBusinessDay.weekmask	Return the weekmask used for custom business day calculations.
CustomBusinessDay.calendar	Return the calendar used for business day calculations.
CustomBusinessDay.holidays	Return the holidays used for custom business day calculations.

Methods

Function	Description
CustomBusinessDay.copy()	Return a copy of the frequency.
CustomBusinessDay.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
CustomBusinessDay.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
CustomBusinessDay.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
CustomBusinessDay.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessDay.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessDay.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
CustomBusinessDay.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

CustomBusinessHour

Function	Description
CustomBusinessHour	DateOffset subclass representing possibly n custom business days.

Properties

Function	Description
CustomBusinessHour.freqstr	Return a string representing the frequency.
CustomBusinessHour.kwds	Return a dict of extra parameters for the offset.
CustomBusinessHour.name	Return a string representing the base frequency.
CustomBusinessHour.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessHour.normalize	Return boolean whether the frequency can align with midnight.
CustomBusinessHour.rule_code	Return a string representing the base frequency.
CustomBusinessHour.n	Return the count of the number of periods.
CustomBusinessHour.weekmask	Return the weekmask used for custom business day calculations.
CustomBusinessHour.calendar	Return the calendar used for business day calculations.
CustomBusinessHour.holidays	Return the holidays used for custom business day calculations.
CustomBusinessHour.start	Return the start time(s) of the business hour.
CustomBusinessHour.end	Return the end time(s) of the business hour.

Methods

Function	Description
CustomBusinessHour.copy()	Return a copy of the frequency.
CustomBusinessHour.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
CustomBusinessHour.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
CustomBusinessHour.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
CustomBusinessHour.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessHour.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessHour.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
CustomBusinessHour.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

MonthEnd

Function	Description
MonthEnd	DateOffset of one month end.

Properties

Function	Description
MonthEnd.freqstr	Return a string representing the frequency.
MonthEnd.kwds	Return a dict of extra parameters for the offset.
MonthEnd.name	Return a string representing the base frequency.
MonthEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
MonthEnd.normalize	Return boolean whether the frequency can align with midnight.
MonthEnd.rule_code	Return a string representing the base frequency.
MonthEnd.n	Return the count of the number of periods.

Methods

Function	Description
MonthEnd.copy()	Return a copy of the frequency.
MonthEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
MonthEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
MonthEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
MonthEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
MonthEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
MonthEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
MonthEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

MonthBegin

Function	Description
MonthBegin	DateOffset of one month at beginning.

Properties

Function	Description
MonthBegin.freqstr	Return a string representing the frequency.
MonthBegin.kwds	Return a dict of extra parameters for the offset.
MonthBegin.name	Return a string representing the base frequency.
MonthBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
MonthBegin.normalize	Return boolean whether the frequency can align with midnight.
MonthBegin.rule_code	Return a string representing the base frequency.
MonthBegin.n	Return the count of the number of periods.

Methods

Function	Description
MonthBegin.copy()	Return a copy of the frequency.
MonthBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
MonthBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
MonthBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
MonthBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
MonthBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
MonthBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
MonthBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BusinessMonthEnd

Function	Description
BusinessMonthEnd	DateOffset increments between the last business day of the month.

Alias:

Function	Description
BMonthEnd	alias of BusinessMonthEnd

Properties

Function	Description
BusinessMonthEnd.freqstr	Return a string representing the frequency.
BusinessMonthEnd.kwds	Return a dict of extra parameters for the offset.
BusinessMonthEnd.name	Return a string representing the base frequency.
BusinessMonthEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessMonthEnd.normalize	Return boolean whether the frequency can align with midnight.
BusinessMonthEnd.rule_code	Return a string representing the base frequency.
BusinessMonthEnd.n	Return the count of the number of periods.

Methods

Function	Description
BusinessMonthEnd.copy()	Return a copy of the frequency.
BusinessMonthEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BusinessMonthEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BusinessMonthEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BusinessMonthEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BusinessMonthEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BusinessMonthEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BusinessMonthEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BusinessMonthBegin

Function	Description
BusinessMonthBegin	DateOffset of one month at the first business day.

Alias:

Function	Description
BMonthBegin	alias of BusinessMonthBegin

Properties

Function	Description
BusinessMonthBegin.freqstr	Return a string representing the frequency.
BusinessMonthBegin.kwds	Return a dict of extra parameters for the offset.
BusinessMonthBegin.name	Return a string representing the base frequency.
BusinessMonthBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BusinessMonthBegin.normalize	Return boolean whether the frequency can align with midnight.
BusinessMonthBegin.rule_code	Return a string representing the base frequency.
BusinessMonthBegin.n	Return the count of the number of periods.

Methods

Function	Description
BusinessMonthBegin.copy()	Return a copy of the frequency.
BusinessMonthBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BusinessMonthBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BusinessMonthBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BusinessMonthBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BusinessMonthBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BusinessMonthBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BusinessMonthBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

CustomBusinessMonthEnd

Function	Description
CustomBusinessMonthEnd	DateOffset subclass representing custom business month(s).

Alias:

Function	Description
CBMonthEnd	alias of CustomBusinessMonthEnd

Properties

Function	Description
CustomBusinessMonthEnd.freqstr	Return a string representing the frequency.
CustomBusinessMonthEnd.kwds	Return a dict of extra parameters for the offset.
CustomBusinessMonthEnd.m_offset	Return a MonthBegin or MonthEnd offset.
CustomBusinessMonthEnd.name	Return a string representing the base frequency.
CustomBusinessMonthEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessMonthEnd.normalize	Return boolean whether the frequency can align with midnight.
CustomBusinessMonthEnd.rule_code	Return a string representing the base frequency.
CustomBusinessMonthEnd.n	Return the count of the number of periods.
CustomBusinessMonthEnd.weekmask	Return the weekmask used for custom business day calculations.
CustomBusinessMonthEnd.calendar	Return the calendar used for business day calculations.
CustomBusinessMonthEnd.holidays	Return the holidays used for custom business day calculations.

Methods

Function	Description
CustomBusinessMonthEnd.copy()	Return a copy of the frequency.
CustomBusinessMonthEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
CustomBusinessMonthEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
CustomBusinessMonthEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
CustomBusinessMonthEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessMonthEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessMonthEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
CustomBusinessMonthEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

CustomBusinessMonthBegin

Function	Description
CustomBusinessMonthBegin	DateOffset subclass representing custom business month(s).

Alias:

Function	Description
CBMonthBegin	alias of CustomBusinessMonthBegin

Properties

Function	Description
CustomBusinessMonthBegin.freqstr	Return a string representing the frequency.
CustomBusinessMonthBegin.kwds	Return a dict of extra parameters for the offset.
CustomBusinessMonthBegin.m_offset	Return a MonthBegin or MonthEnd offset.
CustomBusinessMonthBegin.name	Return a string representing the base frequency.
CustomBusinessMonthBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
CustomBusinessMonthBegin.normalize	Return boolean whether the frequency can align with midnight.
CustomBusinessMonthBegin.rule_code	Return a string representing the base frequency.
CustomBusinessMonthBegin.n	Return the count of the number of periods.
CustomBusinessMonthBegin.weekmask	Return the weekmask used for custom business day calculations.
CustomBusinessMonthBegin.calendar	Return the calendar used for business day calculations.
CustomBusinessMonthBegin.holidays	Return the holidays used for custom business day calculations.

Methods

Function	Description
CustomBusinessMonthBegin.copy()	Return a copy of the frequency.
CustomBusinessMonthBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
CustomBusinessMonthBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
CustomBusinessMonthBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
CustomBusinessMonthBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
CustomBusinessMonthBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
CustomBusinessMonthBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
CustomBusinessMonthBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

SemiMonthEnd

Function	Description
SemiMonthEnd	Two DateOffset's per month repeating on the last day of the month & day_of_month.

Properties

Function	Description
SemiMonthEnd.freqstr	Return a string representing the frequency.
SemiMonthEnd.kwds	Return a dict of extra parameters for the offset.
SemiMonthEnd.name	Return a string representing the base frequency.
SemiMonthEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
SemiMonthEnd.normalize	Return boolean whether the frequency can align with midnight.
SemiMonthEnd.rule_code
SemiMonthEnd.n	Return the count of the number of periods.
SemiMonthEnd.day_of_month	Return the day of the month for the semi-monthly offset.

Methods

Function	Description
SemiMonthEnd.copy()	Return a copy of the frequency.
SemiMonthEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
SemiMonthEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
SemiMonthEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
SemiMonthEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
SemiMonthEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
SemiMonthEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
SemiMonthEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

SemiMonthBegin

Function	Description
SemiMonthBegin	Two DateOffset's per month repeating on the first day of the month & day_of_month.

Properties

Function	Description
SemiMonthBegin.freqstr	Return a string representing the frequency.
SemiMonthBegin.kwds	Return a dict of extra parameters for the offset.
SemiMonthBegin.name	Return a string representing the base frequency.
SemiMonthBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
SemiMonthBegin.normalize	Return boolean whether the frequency can align with midnight.
SemiMonthBegin.rule_code
SemiMonthBegin.n	Return the count of the number of periods.
SemiMonthBegin.day_of_month	Return the day of the month for the semi-monthly offset.

Methods

Function	Description
SemiMonthBegin.copy()	Return a copy of the frequency.
SemiMonthBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
SemiMonthBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
SemiMonthBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
SemiMonthBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
SemiMonthBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
SemiMonthBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
SemiMonthBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Week

Function	Description
Week	Weekly offset.

Properties

Function	Description
Week.freqstr	Return a string representing the frequency.
Week.kwds	Return a dict of extra parameters for the offset.
Week.name	Return a string representing the base frequency.
Week.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
Week.normalize	Return boolean whether the frequency can align with midnight.
Week.rule_code	Return a string representing the base frequency.
Week.n	Return the count of the number of periods.
Week.weekday	Return the day of the week on which the offset is applied.

Methods

Function	Description
Week.copy()	Return a copy of the frequency.
Week.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Week.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Week.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Week.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Week.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Week.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Week.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

WeekOfMonth

Function	Description
WeekOfMonth	Describes monthly dates like "the Tuesday of the 2nd week of each month".

Properties

Function	Description
WeekOfMonth.freqstr	Return a string representing the frequency.
WeekOfMonth.kwds	Return a dict of extra parameters for the offset.
WeekOfMonth.name	Return a string representing the base frequency.
WeekOfMonth.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
WeekOfMonth.normalize	Return boolean whether the frequency can align with midnight.
WeekOfMonth.rule_code	Return a string representing the base frequency.
WeekOfMonth.n	Return the count of the number of periods.
WeekOfMonth.week

Methods

Function	Description
WeekOfMonth.copy()	Return a copy of the frequency.
WeekOfMonth.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
WeekOfMonth.weekday
WeekOfMonth.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
WeekOfMonth.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
WeekOfMonth.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
WeekOfMonth.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
WeekOfMonth.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
WeekOfMonth.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

LastWeekOfMonth

Function	Description
LastWeekOfMonth	Describes monthly dates in last week of month.

Properties

Function	Description
LastWeekOfMonth.freqstr	Return a string representing the frequency.
LastWeekOfMonth.kwds	Return a dict of extra parameters for the offset.
LastWeekOfMonth.name	Return a string representing the base frequency.
LastWeekOfMonth.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
LastWeekOfMonth.normalize	Return boolean whether the frequency can align with midnight.
LastWeekOfMonth.rule_code	Return a string representing the base frequency.
LastWeekOfMonth.n	Return the count of the number of periods.
LastWeekOfMonth.weekday
LastWeekOfMonth.week

Methods

Function	Description
LastWeekOfMonth.copy()	Return a copy of the frequency.
LastWeekOfMonth.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
LastWeekOfMonth.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
LastWeekOfMonth.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
LastWeekOfMonth.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
LastWeekOfMonth.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
LastWeekOfMonth.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
LastWeekOfMonth.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BQuarterEnd

Function	Description
BQuarterEnd	DateOffset increments between the last business day of each Quarter.

Properties

Function	Description
BQuarterEnd.freqstr	Return a string representing the frequency.
BQuarterEnd.kwds	Return a dict of extra parameters for the offset.
BQuarterEnd.name	Return a string representing the base frequency.
BQuarterEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BQuarterEnd.normalize	Return boolean whether the frequency can align with midnight.
BQuarterEnd.rule_code	Return a string representing the frequency with month suffix.
BQuarterEnd.n	Return the count of the number of periods.
BQuarterEnd.startingMonth	Return the month of the year from which quarters start.

Methods

Function	Description
BQuarterEnd.copy()	Return a copy of the frequency.
BQuarterEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BQuarterEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BQuarterEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BQuarterEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BQuarterEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BQuarterEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BQuarterEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BQuarterBegin

Function	Description
BQuarterBegin	DateOffset increments between the first business day of each Quarter.

Properties

Function	Description
BQuarterBegin.freqstr	Return a string representing the frequency.
BQuarterBegin.kwds	Return a dict of extra parameters for the offset.
BQuarterBegin.name	Return a string representing the base frequency.
BQuarterBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BQuarterBegin.normalize	Return boolean whether the frequency can align with midnight.
BQuarterBegin.rule_code	Return a string representing the frequency with month suffix.
BQuarterBegin.n	Return the count of the number of periods.
BQuarterBegin.startingMonth	Return the month of the year from which quarters start.

Methods

Function	Description
BQuarterBegin.copy()	Return a copy of the frequency.
BQuarterBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BQuarterBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BQuarterBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BQuarterBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BQuarterBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BQuarterBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BQuarterBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

QuarterEnd

Function	Description
QuarterEnd	DateOffset increments between Quarter end dates.

Properties

Function	Description
QuarterEnd.freqstr	Return a string representing the frequency.
QuarterEnd.kwds	Return a dict of extra parameters for the offset.
QuarterEnd.name	Return a string representing the base frequency.
QuarterEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
QuarterEnd.normalize	Return boolean whether the frequency can align with midnight.
QuarterEnd.rule_code	Return a string representing the frequency with month suffix.
QuarterEnd.n	Return the count of the number of periods.
QuarterEnd.startingMonth	Return the month of the year from which quarters start.

Methods

Function	Description
QuarterEnd.copy()	Return a copy of the frequency.
QuarterEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
QuarterEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
QuarterEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
QuarterEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
QuarterEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
QuarterEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
QuarterEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

QuarterBegin

Function	Description
QuarterBegin	DateOffset increments between Quarter start dates.

Properties

Function	Description
QuarterBegin.freqstr	Return a string representing the frequency.
QuarterBegin.kwds	Return a dict of extra parameters for the offset.
QuarterBegin.name	Return a string representing the base frequency.
QuarterBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
QuarterBegin.normalize	Return boolean whether the frequency can align with midnight.
QuarterBegin.rule_code	Return a string representing the frequency with month suffix.
QuarterBegin.n	Return the count of the number of periods.
QuarterBegin.startingMonth	Return the month of the year from which quarters start.

Methods

Function	Description
QuarterBegin.copy()	Return a copy of the frequency.
QuarterBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
QuarterBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
QuarterBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
QuarterBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
QuarterBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
QuarterBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
QuarterBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BHalfYearEnd

Function	Description
BHalfYearEnd	DateOffset increments between the last business day of each half-year.

Properties

Function	Description
BHalfYearEnd.freqstr	Return a string representing the frequency.
BHalfYearEnd.kwds	Return a dict of extra parameters for the offset.
BHalfYearEnd.name	Return a string representing the base frequency.
BHalfYearEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BHalfYearEnd.normalize	Return boolean whether the frequency can align with midnight.
BHalfYearEnd.rule_code	Return a string representing the frequency with month suffix.
BHalfYearEnd.n	Return the count of the number of periods.
BHalfYearEnd.startingMonth	Return the month of the year from which half-years start.

Methods

Function	Description
BHalfYearEnd.copy()	Return a copy of the frequency.
BHalfYearEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BHalfYearEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BHalfYearEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BHalfYearEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BHalfYearEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BHalfYearEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BHalfYearEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BHalfYearBegin

Function	Description
BHalfYearBegin	DateOffset increments between the first business day of each half-year.

Properties

Function	Description
BHalfYearBegin.freqstr	Return a string representing the frequency.
BHalfYearBegin.kwds	Return a dict of extra parameters for the offset.
BHalfYearBegin.name	Return a string representing the base frequency.
BHalfYearBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BHalfYearBegin.normalize	Return boolean whether the frequency can align with midnight.
BHalfYearBegin.rule_code	Return a string representing the frequency with month suffix.
BHalfYearBegin.n	Return the count of the number of periods.
BHalfYearBegin.startingMonth	Return the month of the year from which half-years start.

Methods

Function	Description
BHalfYearBegin.copy()	Return a copy of the frequency.
BHalfYearBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BHalfYearBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BHalfYearBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BHalfYearBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BHalfYearBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BHalfYearBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BHalfYearBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

HalfYearEnd

Function	Description
HalfYearEnd	DateOffset increments between half-year end dates.

Properties

Function	Description
HalfYearEnd.freqstr	Return a string representing the frequency.
HalfYearEnd.kwds	Return a dict of extra parameters for the offset.
HalfYearEnd.name	Return a string representing the base frequency.
HalfYearEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
HalfYearEnd.normalize	Return boolean whether the frequency can align with midnight.
HalfYearEnd.rule_code	Return a string representing the frequency with month suffix.
HalfYearEnd.n	Return the count of the number of periods.
HalfYearEnd.startingMonth	Return the month of the year from which half-years start.

Methods

Function	Description
HalfYearEnd.copy()	Return a copy of the frequency.
HalfYearEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
HalfYearEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
HalfYearEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
HalfYearEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
HalfYearEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
HalfYearEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
HalfYearEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

HalfYearBegin

Function	Description
HalfYearBegin	DateOffset increments between half-year start dates.

Properties

Function	Description
HalfYearBegin.freqstr	Return a string representing the frequency.
HalfYearBegin.kwds	Return a dict of extra parameters for the offset.
HalfYearBegin.name	Return a string representing the base frequency.
HalfYearBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
HalfYearBegin.normalize	Return boolean whether the frequency can align with midnight.
HalfYearBegin.rule_code	Return a string representing the frequency with month suffix.
HalfYearBegin.n	Return the count of the number of periods.
HalfYearBegin.startingMonth	Return the month of the year from which half-years start.

Methods

Function	Description
HalfYearBegin.copy()	Return a copy of the frequency.
HalfYearBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
HalfYearBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
HalfYearBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
HalfYearBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
HalfYearBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
HalfYearBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
HalfYearBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BYearEnd

Function	Description
BYearEnd	DateOffset increments between the last business day of the year.

Properties

Function	Description
BYearEnd.freqstr	Return a string representing the frequency.
BYearEnd.kwds	Return a dict of extra parameters for the offset.
BYearEnd.name	Return a string representing the base frequency.
BYearEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BYearEnd.normalize	Return boolean whether the frequency can align with midnight.
BYearEnd.rule_code	Return a string representing the base frequency.
BYearEnd.n	Return the count of the number of periods.
BYearEnd.month	Return the month of the year on which this offset applies.

Methods

Function	Description
BYearEnd.copy()	Return a copy of the frequency.
BYearEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BYearEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BYearEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BYearEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BYearEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BYearEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BYearEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

BYearBegin

Function	Description
BYearBegin	DateOffset increments between the first business day of the year.

Properties

Function	Description
BYearBegin.freqstr	Return a string representing the frequency.
BYearBegin.kwds	Return a dict of extra parameters for the offset.
BYearBegin.name	Return a string representing the base frequency.
BYearBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
BYearBegin.normalize	Return boolean whether the frequency can align with midnight.
BYearBegin.rule_code	Return a string representing the base frequency.
BYearBegin.n	Return the count of the number of periods.
BYearBegin.month	Return the month of the year on which this offset applies.

Methods

Function	Description
BYearBegin.copy()	Return a copy of the frequency.
BYearBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
BYearBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
BYearBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
BYearBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
BYearBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
BYearBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
BYearBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

YearEnd

Function	Description
YearEnd([n, normalize, month])	DateOffset increments between calendar year end dates.

Properties

Function	Description
YearEnd.freqstr	Return a string representing the frequency.
YearEnd.kwds	Return a dict of extra parameters for the offset.
YearEnd.name	Return a string representing the base frequency.
YearEnd.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
YearEnd.normalize	Return boolean whether the frequency can align with midnight.
YearEnd.rule_code	Return a string representing the base frequency.
YearEnd.n	Return the count of the number of periods.
YearEnd.month	Return the month of the year on which this offset applies.

Methods

Function	Description
YearEnd.copy()	Return a copy of the frequency.
YearEnd.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
YearEnd.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
YearEnd.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
YearEnd.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
YearEnd.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
YearEnd.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
YearEnd.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

YearBegin

Function	Description
YearBegin	DateOffset increments between calendar year begin dates.

Properties

Function	Description
YearBegin.freqstr	Return a string representing the frequency.
YearBegin.kwds	Return a dict of extra parameters for the offset.
YearBegin.name	Return a string representing the base frequency.
YearBegin.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
YearBegin.normalize	Return boolean whether the frequency can align with midnight.
YearBegin.rule_code	Return a string representing the base frequency.
YearBegin.n	Return the count of the number of periods.
YearBegin.month	Return the month of the year on which this offset applies.

Methods

Function	Description
YearBegin.copy()	Return a copy of the frequency.
YearBegin.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
YearBegin.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
YearBegin.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
YearBegin.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
YearBegin.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
YearBegin.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
YearBegin.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

FY5253

Function	Description
FY5253	Describes 52-53 week fiscal year.

Properties

Function	Description
FY5253.freqstr	Return a string representing the frequency.
FY5253.kwds	Return a dict of extra parameters for the offset.
FY5253.name	Return a string representing the base frequency.
FY5253.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
FY5253.normalize	Return boolean whether the frequency can align with midnight.
FY5253.rule_code
FY5253.n	Return the count of the number of periods.
FY5253.startingMonth
FY5253.variation
FY5253.weekday	Return the weekday used by the fiscal year.

Methods

Function	Description
FY5253.copy()	Return a copy of the frequency.
FY5253.get_rule_code_suffix()	Return the suffix component of the rule code.
FY5253.get_year_end(dt)
FY5253.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
FY5253.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
FY5253.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
FY5253.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
FY5253.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
FY5253.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
FY5253.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

FY5253Quarter

Function	Description
FY5253Quarter	DateOffset increments between business quarter dates for 52-53 week fiscal year.

Properties

Function	Description
FY5253Quarter.freqstr	Return a string representing the frequency.
FY5253Quarter.kwds	Return a dict of extra parameters for the offset.
FY5253Quarter.name	Return a string representing the base frequency.
FY5253Quarter.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
FY5253Quarter.normalize	Return boolean whether the frequency can align with midnight.
FY5253Quarter.rule_code
FY5253Quarter.n	Return the count of the number of periods.
FY5253Quarter.qtr_with_extra_week
FY5253Quarter.startingMonth
FY5253Quarter.variation
FY5253Quarter.weekday	Return the weekday used by the fiscal year.

Methods

Function	Description
FY5253Quarter.copy()	Return a copy of the frequency.
FY5253Quarter.get_rule_code_suffix()	Return the suffix component of the rule code.
FY5253Quarter.get_weeks(dt)
FY5253Quarter.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
FY5253Quarter.year_has_extra_week(dt)
FY5253Quarter.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
FY5253Quarter.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
FY5253Quarter.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
FY5253Quarter.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
FY5253Quarter.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
FY5253Quarter.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Easter

Function	Description
Easter	DateOffset for the Easter holiday using logic defined in dateutil.

Properties

Function	Description
Easter.freqstr	Return a string representing the frequency.
Easter.kwds	Return a dict of extra parameters for the offset.
Easter.name	Return a string representing the base frequency.
Easter.nanos	Returns an integer of the total number of nanoseconds for fixed frequencies.
Easter.normalize	Return boolean whether the frequency can align with midnight.
Easter.rule_code	Return a string representing the base frequency.
Easter.n	Return the count of the number of periods.

Methods

Function	Description
Easter.copy()	Return a copy of the frequency.
Easter.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Easter.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Easter.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Easter.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Easter.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Easter.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Easter.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Tick

Function	Description
Tick	Base class for fixed frequency offsets (Milli, Micro, Second, Minute, Hour).

Properties

Function	Description
Tick.freqstr	Return a string representing the frequency.
Tick.kwds	Return a dict of extra parameters for the offset.
Tick.name	Return a string representing the base frequency.
Tick.nanos	Returns an integer of the total number of nanoseconds.
Tick.normalize	Return boolean whether the frequency can align with midnight.
Tick.rule_code	Return a string representing the base frequency.
Tick.n	Return the count of the number of periods.

Methods

Function	Description
Tick.copy()	Return a copy of the frequency.
Tick.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Tick.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Tick.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Tick.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Tick.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Tick.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Tick.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Day

Function	Description
Day	Offset `n` days.

Properties

Function	Description
Day.freqstr	Return a string representing the frequency.
Day.kwds	Return a dict of extra parameters for the offset.
Day.name	Return a string representing the base frequency.
Day.nanos	Returns an integer of the total number of nanoseconds.
Day.normalize	Return boolean whether the frequency can align with midnight.
Day.rule_code	Return a string representing the base frequency.
Day.n	Return the count of the number of periods.

Methods

Function	Description
Day.copy()	Return a copy of the frequency.
Day.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Day.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Day.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Day.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Day.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Day.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Day.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Hour

Function	Description
Hour	Offset `n` hours.

Properties

Function	Description
Hour.freqstr	Return a string representing the frequency.
Hour.kwds	Return a dict of extra parameters for the offset.
Hour.name	Return a string representing the base frequency.
Hour.nanos	Returns an integer of the total number of nanoseconds.
Hour.normalize	Return boolean whether the frequency can align with midnight.
Hour.rule_code	Return a string representing the base frequency.
Hour.n	Return the count of the number of periods.

Methods

Function	Description
Hour.copy()	Return a copy of the frequency.
Hour.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Hour.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Hour.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Hour.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Hour.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Hour.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Hour.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Minute

Function	Description
Minute	Offset `n` minutes.

Properties

Function	Description
Minute.freqstr	Return a string representing the frequency.
Minute.kwds	Return a dict of extra parameters for the offset.
Minute.name	Return a string representing the base frequency.
Minute.nanos	Returns an integer of the total number of nanoseconds.
Minute.normalize	Return boolean whether the frequency can align with midnight.
Minute.rule_code	Return a string representing the base frequency.
Minute.n	Return the count of the number of periods.

Methods

Function	Description
Minute.copy()	Return a copy of the frequency.
Minute.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Minute.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Minute.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Minute.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Minute.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Minute.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Minute.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Second

Function	Description
Second	Offset `n` seconds.

Properties

Function	Description
Second.freqstr	Return a string representing the frequency.
Second.kwds	Return a dict of extra parameters for the offset.
Second.name	Return a string representing the base frequency.
Second.nanos	Returns an integer of the total number of nanoseconds.
Second.normalize	Return boolean whether the frequency can align with midnight.
Second.rule_code	Return a string representing the base frequency.
Second.n	Return the count of the number of periods.

Methods

Function	Description
Second.copy()	Return a copy of the frequency.
Second.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Second.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Second.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Second.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Second.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Second.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Second.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Milli

Function	Description
Milli	Offset `n` milliseconds.

Properties

Function	Description
Milli.freqstr	Return a string representing the frequency.
Milli.kwds	Return a dict of extra parameters for the offset.
Milli.name	Return a string representing the base frequency.
Milli.nanos	Returns an integer of the total number of nanoseconds.
Milli.normalize	Return boolean whether the frequency can align with midnight.
Milli.rule_code	Return a string representing the base frequency.
Milli.n	Return the count of the number of periods.

Methods

Function	Description
Milli.copy()	Return a copy of the frequency.
Milli.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Milli.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Milli.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Milli.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Milli.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Milli.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Milli.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Micro

Function	Description
Micro	Offset `n` microseconds.

Properties

Function	Description
Micro.freqstr	Return a string representing the frequency.
Micro.kwds	Return a dict of extra parameters for the offset.
Micro.name	Return a string representing the base frequency.
Micro.nanos	Returns an integer of the total number of nanoseconds.
Micro.normalize	Return boolean whether the frequency can align with midnight.
Micro.rule_code	Return a string representing the base frequency.
Micro.n	Return the count of the number of periods.

Methods

Function	Description
Micro.copy()	Return a copy of the frequency.
Micro.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Micro.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Micro.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Micro.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Micro.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Micro.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Micro.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Nano

Function	Description
Nano	Offset `n` nanoseconds.

Properties

Function	Description
Nano.freqstr	Return a string representing the frequency.
Nano.kwds	Return a dict of extra parameters for the offset.
Nano.name	Return a string representing the base frequency.
Nano.nanos	Returns an integer of the total number of nanoseconds.
Nano.normalize	Return boolean whether the frequency can align with midnight.
Nano.rule_code	Return a string representing the base frequency.
Nano.n	Return the count of the number of periods.

Methods

Function	Description
Nano.copy()	Return a copy of the frequency.
Nano.is_on_offset(dt)	Return boolean whether a timestamp intersects with this frequency.
Nano.is_month_start(ts)	Return boolean whether a timestamp occurs on the month start.
Nano.is_month_end(ts)	Return boolean whether a timestamp occurs on the month end.
Nano.is_quarter_start(ts)	Return boolean whether a timestamp occurs on the quarter start.
Nano.is_quarter_end(ts)	Return boolean whether a timestamp occurs on the quarter end.
Nano.is_year_start(ts)	Return boolean whether a timestamp occurs on the year start.
Nano.is_year_end(ts)	Return boolean whether a timestamp occurs on the year end.

Frequencies

Function	Description
to_offset(freq[, is_period])	Return DateOffset object from string or datetime.timedelta object.

Examples for `pandas.tseries.offsets.DateOffset`

>>> from pandas.tseries.offsets import DateOffset
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=3)
Timestamp('2017-04-01 09:10:11')

>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=2)
Timestamp('2017-03-01 09:10:11')
>>> ts + DateOffset(day=31)
Timestamp('2017-01-31 09:10:11')

>>> ts + pd.DateOffset(hour=8)
Timestamp('2017-01-01 08:10:11')

Examples for `pandas.tseries.offsets.DateOffset.freqstr`

>>> pd.DateOffset(5).freqstr
'<5 * DateOffsets>'

>>> pd.offsets.BusinessHour(2).freqstr
'2bh'

>>> pd.offsets.Nano().freqstr
'ns'

>>> pd.offsets.Nano(-3).freqstr
'-3ns'

Examples for `pandas.tseries.offsets.DateOffset.kwds`

>>> pd.DateOffset(5).kwds
{}

>>> pd.offsets.FY5253Quarter().kwds
{'weekday': 0,
 'startingMonth': 1,
 'qtr_with_extra_week': 1,
 'variation': 'nearest'}

Examples for `pandas.tseries.offsets.DateOffset.name`

>>> pd.offsets.Hour().name
'h'

>>> pd.offsets.Hour(5).name
'h'

Examples for `pandas.tseries.offsets.DateOffset.nanos`

>>> pd.offsets.Week(n=1).nanos
ValueError: Week: weekday=None is a non-fixed frequency

Examples for `pandas.tseries.offsets.DateOffset.normalize`

>>> pd.offsets.Hour(5).normalize
False

>>> pd.offsets.Day(5).normalize
False

Examples for `pandas.tseries.offsets.DateOffset.rule_code`

>>> pd.offsets.Hour().rule_code
'h'

>>> pd.offsets.Week(5).rule_code
'W'

Examples for `pandas.tseries.offsets.DateOffset.n`

>>> pd.offsets.Hour(5).n
5

>>> pd.offsets.Day(3).n
3

Examples for `pandas.tseries.offsets.DateOffset.copy`

>>> freq = pd.DateOffset(1)
>>> freq_copy = freq.copy()
>>> freq is freq_copy
False

Examples for `pandas.tseries.offsets.DateOffset.is_on_offset`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Day(1)
>>> freq.is_on_offset(ts)
True

>>> ts = pd.Timestamp(2022, 8, 6)
>>> ts.day_name()
'Saturday'
>>> freq = pd.offsets.BusinessDay(1)
>>> freq.is_on_offset(ts)
False

Examples for `pandas.tseries.offsets.DateOffset.is_month_start`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_start(ts)
True

Examples for `pandas.tseries.offsets.DateOffset.is_month_end`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_end(ts)
False

Examples for `pandas.tseries.offsets.DateOffset.is_quarter_start`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_start(ts)
True

Examples for `pandas.tseries.offsets.DateOffset.is_quarter_end`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_end(ts)
False

Examples for `pandas.tseries.offsets.DateOffset.is_year_start`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_start(ts)
True

Examples for `pandas.tseries.offsets.DateOffset.is_year_end`

>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_end(ts)
False

Examples for `pandas.tseries.offsets.DateOffset.rollback`

>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()

Timestamp is not on the offset (not a month end), so it rolls backward:

>>> offset.rollback(ts)
Timestamp('2024-12-31 00:00:00')

If the timestamp is already on the offset, it remains unchanged:

>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollback(ts_on_offset)
Timestamp('2025-01-31 00:00:00')

Examples for `pandas.tseries.offsets.DateOffset.rollforward`

>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()

Timestamp is not on the offset (not a month end), so it rolls forward:

>>> offset.rollforward(ts)
Timestamp('2025-01-31 00:00:00')

If the timestamp is already on the offset, it remains unchanged:

>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollforward(ts_on_offset)
Timestamp('2025-01-31 00:00:00')

Examples for `pandas.tseries.offsets.BusinessDay`

You can use the parameter n to represent a shift of n business days.

>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts.strftime('%a %d %b %Y %H:%M')
'Fri 09 Dec 2022 15:00'
>>> (ts + pd.offsets.BusinessDay(n=5)).strftime('%a %d %b %Y %H:%M')
'Fri 16 Dec 2022 15:00'

Passing the parameter normalize equal to True, you shift the start of the next business day to midnight.

>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts + pd.offsets.BusinessDay(normalize=True)
Timestamp('2022-12-12 00:00:00')

ikouchiha47/pandas_api.md

Pandas API Reference Documentation with Examples

Table Of Contents

Input/output

Pickling

Flat file

Clipboard

Excel

JSON

HTML

XML

Latex

HDFStore: PyTables (HDF5)

Feather

Parquet

Iceberg

ORC

SAS

SPSS

SQL

STATA

Examples for pandas.read_pickle

Examples for pandas.DataFrame.to_pickle

Examples for pandas.read_table

Examples for pandas.read_csv

Examples for pandas.DataFrame.to_csv

Examples for pandas.read_fwf

Examples for pandas.read_clipboard

Examples for pandas.DataFrame.to_clipboard

Examples for pandas.read_excel

Examples for pandas.DataFrame.to_excel

Examples for pandas.ExcelFile

Examples for pandas.ExcelFile.book

Examples for pandas.ExcelFile.sheet_names

Examples for pandas.ExcelFile.parse

Examples for pandas.io.formats.style.Styler.to_excel

Examples for pandas.ExcelWriter

Examples for pandas.read_json

Examples for pandas.json_normalize

Examples for pandas.DataFrame.to_json

Examples for pandas.io.json.build_table_schema

General functions

Data manipulations

Top-level missing data

Top-level dealing with numeric data

Top-level dealing with datetimelike data

Top-level dealing with Interval data

Top-level evaluation

Datetime formats

Hashing

Importing from other DataFrame libraries

Examples for pandas.melt

Examples for pandas.pivot

Examples for pandas.pivot_table

Examples for pandas.crosstab

Examples for pandas.cut

Examples for pandas.qcut

Examples for pandas.merge

Examples for pandas.merge_ordered

Examples for pandas.merge_asof

Examples for pandas.concat

Examples for pandas.get_dummies

Examples for pandas.from_dummies

Examples for pandas.factorize

Examples for pandas.unique

Examples for pandas.lreshape

Examples for pandas.wide_to_long

Examples for pandas.isna

Examples for pandas.isnull

Examples for pandas.notna

Examples for pandas.notnull

Series

Constructor

Attributes

Conversion

Indexing, iteration

Binary operator functions

Function application, GroupBy & window

Computations / descriptive stats

Reindexing / selection / label manipulation

Examples for `pandas.read_pickle`

Examples for `pandas.DataFrame.to_pickle`

Examples for `pandas.read_table`

Examples for `pandas.read_csv`

Examples for `pandas.DataFrame.to_csv`

Examples for `pandas.read_fwf`

Examples for `pandas.read_clipboard`

Examples for `pandas.DataFrame.to_clipboard`

Examples for `pandas.read_excel`

Examples for `pandas.DataFrame.to_excel`

Examples for `pandas.ExcelFile`

Examples for `pandas.ExcelFile.book`

Examples for `pandas.ExcelFile.sheet_names`

Examples for `pandas.ExcelFile.parse`

Examples for `pandas.io.formats.style.Styler.to_excel`

Examples for `pandas.ExcelWriter`

Examples for `pandas.read_json`

Examples for `pandas.json_normalize`

Examples for `pandas.DataFrame.to_json`

Examples for `pandas.io.json.build_table_schema`

Examples for `pandas.melt`

Examples for `pandas.pivot`

Examples for `pandas.pivot_table`

Examples for `pandas.crosstab`

Examples for `pandas.cut`

Examples for `pandas.qcut`

Examples for `pandas.merge`

Examples for `pandas.merge_ordered`

Examples for `pandas.merge_asof`

Examples for `pandas.concat`

Examples for `pandas.get_dummies`

Examples for `pandas.from_dummies`

Examples for `pandas.factorize`

Examples for `pandas.unique`

Examples for `pandas.lreshape`

Examples for `pandas.wide_to_long`

Examples for `pandas.isna`

Examples for `pandas.isnull`

Examples for `pandas.notna`

Examples for `pandas.notnull`

Examples for `pandas.Series`

Examples for `pandas.Series.index`

Examples for `pandas.Series.array`

Examples for `pandas.Series.values`

Examples for `pandas.Series.dtype`

Examples for `pandas.Series.info`

Examples for `pandas.Series.shape`

Examples for `pandas.Series.nbytes`

Examples for `pandas.Series.ndim`

Examples for `pandas.Series.size`

Examples for `pandas.Series.T`

Examples for `pandas.Series.memory_usage`

Examples for `pandas.Series.hasnans`

Examples for `pandas.Series.empty`

Examples for `pandas.Series.dtypes`

Examples for `pandas.Series.name`

Examples for `pandas.Series.flags`

Examples for `pandas.Series.set_flags`

Examples for `pandas.Series.astype`

Examples for `pandas.Series.convert_dtypes`

Examples for `pandas.DataFrame`

Examples for `pandas.DataFrame.index`

Examples for `pandas.DataFrame.columns`

Examples for `pandas.DataFrame.dtypes`

Examples for `pandas.DataFrame.info`

Examples for `pandas.DataFrame.select_dtypes`

Examples for `pandas.DataFrame.values`

Examples for `pandas.DataFrame.axes`