Complete API reference with examples for pandas library
Extracted from: https://pandas.pydata.org/docs/reference/
- Input/output
- General functions
- Series
- DataFrame
- pandas arrays, scalars, and data types
- Index objects
- Date offsets
- Window
- GroupBy
- Resampling
- Style
- Plotting
| Function | Description |
|---|---|
| read_pickle(filepath_or_buffer[, ...]) | Load pickled pandas object (or any object) from file and return unpickled object. |
| DataFrame.to_pickle(path, *[, compression, ...]) | Pickle (serialize) object to file. |
| Function | Description |
|---|---|
| read_table(filepath_or_buffer, *[, sep, ...]) | Read general delimited file into DataFrame. |
| read_csv(filepath_or_buffer, *[, sep, ...]) | Read a comma-separated values (csv) file into DataFrame. |
| DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
| read_fwf(filepath_or_buffer, *[, colspecs, ...]) | Read a table of fixed-width formatted lines into DataFrame. |
| Function | Description |
|---|---|
| read_clipboard([sep, dtype_backend]) | Read text from clipboard and pass to read_csv(). |
| DataFrame.to_clipboard(*[, excel, sep]) | Copy object to the system clipboard. |
| Function | Description |
|---|---|
| read_excel(io[, sheet_name, header, names, ...]) | Read an Excel file into a DataFrame. |
| DataFrame.to_excel(excel_writer, *[, ...]) | Write object to an Excel sheet. |
| ExcelFile(path_or_buffer[, engine, ...]) | Class for parsing tabular Excel sheets into DataFrame objects. |
| ExcelFile.book | Gets the Excel workbook. |
| ExcelFile.sheet_names | Names of the sheets in the document. |
| ExcelFile.parse([sheet_name, header, names, ...]) | Parse specified sheet(s) into a DataFrame. |
| Function | Description |
|---|---|
| Styler.to_excel(excel_writer[, sheet_name, ...]) | Write Styler to an Excel sheet. |
| Function | Description |
|---|---|
| ExcelWriter(path[, engine, date_format, ...]) | Class for writing DataFrame objects into excel sheets. |
| Function | Description |
|---|---|
| read_json(path_or_buf, *[, orient, typ, ...]) | Convert a JSON string to pandas object. |
| json_normalize(data[, record_path, meta, ...]) | Normalize semi-structured JSON data into a flat table. |
| DataFrame.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
| Function | Description |
|---|---|
| build_table_schema(data[, index, ...]) | Create a Table schema from data. |
| Function | Description |
|---|---|
| read_html(io, *[, match, flavor, header, ...]) | Read HTML tables into a list of DataFrame objects. |
| DataFrame.to_html([buf, columns, col_space, ...]) | Render a DataFrame as an HTML table. |
| Function | Description |
|---|---|
| Styler.to_html([buf, table_uuid, ...]) | Write Styler to a file, buffer or string in HTML-CSS format. |
| Function | Description |
|---|---|
| read_xml(path_or_buffer, *[, xpath, ...]) | Read XML document into a DataFrame object. |
| DataFrame.to_xml([path_or_buffer, index, ...]) | Render a DataFrame to an XML document. |
| Function | Description |
|---|---|
| DataFrame.to_latex([buf, columns, header, ...]) | Render object to a LaTeX tabular, longtable, or nested table. |
| Function | Description |
|---|---|
| Styler.to_latex([buf, column_format, ...]) | Write Styler to a file, buffer or string in LaTeX format. |
| Function | Description |
|---|---|
| read_hdf(path_or_buf[, key, mode, errors, ...]) | Read from the store, close it if we opened it. |
| HDFStore.put(key, value[, format, index, ...]) | Store object in HDFStore. |
| HDFStore.append(key, value[, format, axes, ...]) | Append to Table in file. |
| HDFStore.get(key) | Retrieve pandas object stored in file. |
| HDFStore.select(key[, where, start, stop, ...]) | Retrieve pandas object stored in file, optionally based on where criteria. |
| HDFStore.info() | Print detailed information on the store. |
| HDFStore.keys([include]) | Return a list of keys corresponding to objects stored in HDFStore. |
| HDFStore.groups() | Return a list of all the top-level nodes. |
| HDFStore.walk([where]) | Walk the pytables group hierarchy for pandas objects. |
Warning
One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.
| Function | Description |
|---|---|
| read_feather(path[, columns, use_threads, ...]) | Load a feather-format object from the file path. |
| DataFrame.to_feather(path, **kwargs) | Write a DataFrame to the binary Feather format. |
| Function | Description |
|---|---|
| read_parquet(path[, engine, columns, ...]) | Load a parquet object from the file path, returning a DataFrame. |
| DataFrame.to_parquet([path, engine, ...]) | Write a DataFrame to the binary parquet format. |
| Function | Description |
|---|---|
| read_iceberg(table_identifier[, ...]) | Read an Apache Iceberg table into a pandas DataFrame. |
| DataFrame.to_iceberg(table_identifier[, ...]) | Write a DataFrame to an Apache Iceberg table. |
Warning
read_iceberg is experimental and may change without warning.
| Function | Description |
|---|---|
| read_orc(path[, columns, dtype_backend, ...]) | Load an ORC object from the file path, returning a DataFrame. |
| DataFrame.to_orc([path, engine, index, ...]) | Write a DataFrame to the Optimized Row Columnar (ORC) format. |
| Function | Description |
|---|---|
| read_sas(filepath_or_buffer, *[, format, ...]) | Read SAS files stored as either XPORT or SAS7BDAT format files. |
| Function | Description |
|---|---|
| read_spss(path[, usecols, ...]) | Load an SPSS file from the file path, returning a DataFrame. |
| Function | Description |
|---|---|
| read_sql_table(table_name, con[, schema, ...]) | Read SQL database table into a DataFrame. |
| read_sql_query(sql, con[, index_col, ...]) | Read SQL query into a DataFrame. |
| read_sql(sql, con[, index_col, ...]) | Read SQL query or database table into a DataFrame. |
| DataFrame.to_sql(name, con, *[, schema, ...]) | Write records stored in a DataFrame to a SQL database. |
| Function | Description |
|---|---|
| read_stata(filepath_or_buffer, *[, ...]) | Read Stata file into DataFrame. |
| DataFrame.to_stata(path, *[, convert_dates, ...]) | Export DataFrame object to Stata dta format. |
| Function | Description |
|---|---|
| StataReader.data_label | Return data label of Stata file. |
| StataReader.value_labels() | Return a nested dict associating each variable name to its value and label. |
| StataReader.variable_labels() | Return a dict associating each variable name with corresponding label. |
| StataWriter.write_file() | Export DataFrame object to Stata dta format. |
>>> original_df = pd.DataFrame(
... {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df = pd.DataFrame(
... {{"foo": range(5), "bar": range(5, 10)}}
... )
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.read_table("data.csv")
Name Value
0 foo 1
1 bar 2
2 #baz 3
Index and header can be specified via the index_col and header arguments.
>>> pd.read_table("data.csv", header=None)
0 1
0 Name Value
1 foo 1
2 bar 2
3 #baz 3
>>> pd.read_table("data.csv", index_col="Value")
Name
Value
1 foo
2 bar
3 #baz
Column types are inferred but can be explicitly specified using the dtype argument.
>>> pd.read_table("data.csv", dtype={{"Value": float}})
Name Value
0 foo 1.0
1 bar 2.0
2 #baz 3.0
True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!
>>> pd.read_table("data.csv", na_values=["foo", "bar"])
Name Value
0 NaN 1
1 NaN 2
2 #baz 3
Comment lines in the input file can be skipped using the comment argument.
>>> pd.read_table("data.csv", comment="#")
Name Value
0 foo 1
1 bar 2
By default, columns with dates will be read as object rather than datetime.
>>> df = pd.read_table("tmp.csv")
>>> df
col 1 col 2 col 3
0 10 10/04/2018 Sun 15 Jan 2023
1 20 15/04/2018 Fri 12 May 2023
>>> df.dtypes
col 1 int64
col 2 object
col 3 object
dtype: object
Specific columns can be parsed as dates by using the parse_dates and date_format arguments.
>>> df = pd.read_table(
... "tmp.csv",
... parse_dates=[1, 2],
... date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )
>>> df.dtypes
col 1 int64
col 2 datetime64[ns]
col 3 datetime64[ns]
dtype: object
>>> pd.read_csv("data.csv")
Name Value
0 foo 1
1 bar 2
2 #baz 3
Index and header can be specified via the index_col and header arguments.
>>> pd.read_csv("data.csv", header=None)
0 1
0 Name Value
1 foo 1
2 bar 2
3 #baz 3
>>> pd.read_csv("data.csv", index_col="Value")
Name
Value
1 foo
2 bar
3 #baz
Column types are inferred but can be explicitly specified using the dtype argument.
>>> pd.read_csv("data.csv", dtype={{"Value": float}})
Name Value
0 foo 1.0
1 bar 2.0
2 #baz 3.0
True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!
>>> pd.read_csv("data.csv", na_values=["foo", "bar"])
Name Value
0 NaN 1
1 NaN 2
2 #baz 3
Comment lines in the input file can be skipped using the comment argument.
>>> pd.read_csv("data.csv", comment="#")
Name Value
0 foo 1
1 bar 2
By default, columns with dates will be read as object rather than datetime.
>>> df = pd.read_csv("tmp.csv")
>>> df
col 1 col 2 col 3
0 10 10/04/2018 Sun 15 Jan 2023
1 20 15/04/2018 Fri 12 May 2023
>>> df.dtypes
col 1 int64
col 2 object
col 3 object
dtype: object
Specific columns can be parsed as dates by using the parse_dates and date_format arguments.
>>> df = pd.read_csv(
... "tmp.csv",
... parse_dates=[1, 2],
... date_format={{"col 2": "%d/%m/%Y", "col 3": "%a %d %b %Y"}},
... )
>>> df.dtypes
col 1 int64
col 2 datetime64[ns]
col 3 datetime64[ns]
dtype: object
Create ‘out.csv’ containing ‘df’ without indices
>>> df = pd.DataFrame(
... [["Raphael", "red", "sai"], ["Donatello", "purple", "bo staff"]],
... columns=["name", "mask", "weapon"],
... )
>>> df.to_csv("out.csv", index=False)
Create ‘out.zip’ containing ‘out.csv’
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
>>> compression_opts = dict(
... method="zip", archive_name="out.csv"
... )
>>> df.to_csv(
... "out.zip", index=False, compression=compression_opts
... )
To write a csv file to a new folder or nested folder you will first need to create it using either Pathlib or os:
>>> from pathlib import Path
>>> filepath = Path("folder/subfolder/out.csv")
>>> filepath.parent.mkdir(parents=True, exist_ok=True)
>>> df.to_csv(filepath)
>>> import os
>>> os.makedirs("folder/subfolder", exist_ok=True)
>>> df.to_csv("folder/subfolder/out.csv")
Format floats to two decimal places:
>>> df.to_csv("out1.csv", float_format="%.2f")
Format floats using scientific notation:
>>> df.to_csv("out2.csv", float_format="{{:.2e}}".format)
>>> pd.read_fwf("data.csv")
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_clipboard()
>>> pd.read_clipboard()
A B C
0 1 2 3
1 4 5 6
Copy the contents of a DataFrame to the clipboard.
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_clipboard(sep=",")
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6
We can omit the index by passing the keyword index and setting it to false.
>>> df.to_clipboard(sep=",", index=False)
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
Using the original pyperclip package for any string output format.
import pyperclip
html = df.style.to_html()
pyperclip.copy(html)
The file can be read using the file name as string or an open file object:
>>> pd.read_excel("tmp.xlsx", index_col=0)
Name Value
0 string1 1
1 string2 2
2 #Comment 3
>>> pd.read_excel(open("tmp.xlsx", "rb"), sheet_name="Sheet3")
Unnamed: 0 Name Value
0 0 string1 1
1 1 string2 2
2 2 #Comment 3
Index and header can be specified via the index_col and header arguments
>>> pd.read_excel("tmp.xlsx", index_col=None, header=None)
0 1 2
0 NaN Name Value
1 0.0 string1 1
2 1.0 string2 2
3 2.0 #Comment 3
Column types are inferred but can be explicitly specified
>>> pd.read_excel(
... "tmp.xlsx", index_col=0, dtype={"Name": str, "Value": float}
... )
Name Value
0 string1 1.0
1 string2 2.0
2 #Comment 3.0
True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too. Supply the values you would like as strings or lists of strings!
>>> pd.read_excel(
... "tmp.xlsx", index_col=0, na_values=["string1", "string2"]
... )
Name Value
0 NaN 1
1 NaN 2
2 #Comment 3
Comment lines in the excel input file can be skipped using the comment kwarg.
>>> pd.read_excel("tmp.xlsx", index_col=0, comment="#")
Name Value
0 string1 1.0
1 string2 2.0
2 None NaN
Create, write to and save a workbook:
>>> df1 = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")
To specify the sheet name:
>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")
If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:
>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
... df1.to_excel(writer, sheet_name="Sheet_name_1")
... df2.to_excel(writer, sheet_name="Sheet_name_2")
ExcelWriter can also be used to append to an existing Excel file:
>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
... df1.to_excel(writer, sheet_name="Sheet_name_3")
To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):
>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")
>>> file = pd.ExcelFile("myfile.xlsx")
>>> with pd.ExcelFile("myfile.xls") as xls:
... df1 = pd.read_excel(xls, "Sheet1")
>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.book
<openpyxl.workbook.workbook.Workbook object at 0x11eb5ad70>
>>> file.book.path
'/xl/workbook.xml'
>>> file.book.active
<openpyxl.worksheet._read_only.ReadOnlyWorksheet object at 0x11eb5b370>
>>> file.book.sheetnames
['Sheet1', 'Sheet2']
>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.sheet_names
["Sheet1", "Sheet2"]
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])
>>> df.to_excel("myfile.xlsx")
>>> file = pd.ExcelFile("myfile.xlsx")
>>> file.parse()
Create, write to and save a workbook:
>>> df1 = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> df1.to_excel("output.xlsx")
To specify the sheet name:
>>> df1.to_excel("output.xlsx", sheet_name="Sheet_name_1")
If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:
>>> df2 = df1.copy()
>>> with pd.ExcelWriter("output.xlsx") as writer:
... df1.to_excel(writer, sheet_name="Sheet_name_1")
... df2.to_excel(writer, sheet_name="Sheet_name_2")
ExcelWriter can also be used to append to an existing Excel file:
>>> with pd.ExcelWriter("output.xlsx", mode="a") as writer:
... df1.to_excel(writer, sheet_name="Sheet_name_3")
To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):
>>> df1.to_excel("output1.xlsx", engine="xlsxwriter")
Default usage:
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
... df.to_excel(writer)
To write to separate sheets in a single file:
>>> df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"])
>>> df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
... df1.to_excel(writer, sheet_name="Sheet1")
... df2.to_excel(writer, sheet_name="Sheet2")
You can set the date format or datetime format:
>>> from datetime import date, datetime
>>> df = pd.DataFrame(
... [
... [date(2014, 1, 31), date(1999, 9, 24)],
... [datetime(1998, 5, 26, 23, 33, 4), datetime(2014, 2, 28, 13, 5, 13)],
... ],
... index=["Date", "Datetime"],
... columns=["X", "Y"],
... )
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... date_format="YYYY-MM-DD",
... datetime_format="YYYY-MM-DD HH:MM:SS",
... ) as writer:
... df.to_excel(writer)
You can also append to an existing Excel file:
>>> with pd.ExcelWriter("path_to_file.xlsx", mode="a", engine="openpyxl") as writer:
... df.to_excel(writer, sheet_name="Sheet3")
Here, the if_sheet_exists parameter can be set to replace a sheet if it already exists:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... mode="a",
... engine="openpyxl",
... if_sheet_exists="replace",
... ) as writer:
... df.to_excel(writer, sheet_name="Sheet1")
You can also write multiple DataFrames to a single sheet. Note that the if_sheet_exists parameter needs to be set to overlay:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... mode="a",
... engine="openpyxl",
... if_sheet_exists="overlay",
... ) as writer:
... df1.to_excel(writer, sheet_name="Sheet1")
... df2.to_excel(writer, sheet_name="Sheet1", startcol=3)
You can store Excel file in RAM:
>>> import io
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> buffer = io.BytesIO()
>>> with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer)
You can pack Excel file into zip archive:
>>> import zipfile
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> with zipfile.ZipFile("path_to_file.zip", "w") as zf:
... with zf.open("filename.xlsx", "w") as buffer:
... with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer)
You can specify additional arguments to the underlying engine:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... engine="xlsxwriter",
... engine_kwargs={{"options": {{"nan_inf_to_errors": True}}}},
... ) as writer:
... df.to_excel(writer)
In append mode, engine_kwargs are passed through to openpyxl’s load_workbook:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... engine="openpyxl",
... mode="a",
... engine_kwargs={{"keep_vba": True}},
... ) as writer:
... df.to_excel(writer, sheet_name="Sheet2")
>>> from io import StringIO
>>> df = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
Encoding/decoding a Dataframe using 'split' formatted JSON:
>>> df.to_json(orient="split")
'{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}'
>>> pd.read_json(StringIO(_), orient="split") # noqa: F821
col 1 col 2
row 1 a b
row 2 c d
Encoding/decoding a Dataframe using 'index' formatted JSON:
>>> df.to_json(orient="index")
'{"row 1":{"col 1":"a","col 2":"b"},"row 2":{"col 1":"c","col 2":"d"}}'
>>> pd.read_json(StringIO(_), orient="index") # noqa: F821
col 1 col 2
row 1 a b
row 2 c d
Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.
>>> df.to_json(orient="records")
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
>>> pd.read_json(StringIO(_), orient="records") # noqa: F821
col 1 col 2
0 a b
1 c d
Encoding with Table Schema
>>> df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string","extDtype":"str"},{"name":"col 1","type":"string","extDtype":"str"},{"name":"col 2","type":"string","extDtype":"str"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":"row 1","col 1":"a","col 2":"b"},{"index":"row 2","col 1":"c","col 2":"d"}]}'
The following example uses dtype_backend="numpy_nullable"
>>> data = '''{"index": {"0": 0, "1": 1},
... "a": {"0": 1, "1": null},
... "b": {"0": 2.5, "1": 4.5},
... "c": {"0": true, "1": false},
... "d": {"0": "a", "1": "b"},
... "e": {"0": 1577.2, "1": 1577.1}}'''
>>> pd.read_json(StringIO(data), dtype_backend="numpy_nullable")
index a b c d e
0 0 1 2.5 True a 1577.2
1 1 <NA> 4.5 False b 1577.1
>>> data = [
... {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
... {"name": {"given": "Mark", "family": "Regner"}},
... {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
id name.first name.last name.given name.family name
0 1.0 Coleen Volk NaN NaN NaN
1 NaN NaN NaN Mark Regner NaN
2 2.0 NaN NaN NaN NaN Faye Raker
>>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> pd.json_normalize(data, max_level=0)
id name fitness
0 1.0 Cole Volk {'height': 130, 'weight': 60}
1 NaN Mark Reg {'height': 130, 'weight': 60}
2 2.0 Faye Raker {'height': 130, 'weight': 60}
Normalizes nested data up to level 1.
>>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> pd.json_normalize(data, max_level=1)
id name fitness.height fitness.weight
0 1.0 Cole Volk 130 60
1 NaN Mark Reg 130 60
2 2.0 Faye Raker 130 60
>>> data = [
... {
... "id": 1,
... "name": "Cole Volk",
... "fitness": {"height": 130, "weight": 60},
... },
... {"name": "Mark Reg", "fitness": {"height': 130, "weight": 60}},
... {
... "id": 2,
... "name": "Faye Raker",
... "fitness": {"height": 130, "weight": 60},
... },
... ]
>>> series = pd.Series(data, index=pd.Index(["a", "b", "c"]))
>>> pd.json_normalize(series)
id name fitness.height fitness.weight
a 1.0 Cole Volk 130 60
b NaN Mark Reg 130 60
c 2.0 Faye Raker 130 60
>>> data = [
... {
... "state": "Florida",
... "shortname": "FL",
... "info": {"governor": "Rick Scott"},
... "counties": [
... {"name": "Dade", "population": 12345},
... {"name": "Broward", "population": 40000},
... {"name": "Palm Beach", "population": 60000},
... ],
... },
... {
... "state": "Ohio",
... "shortname": "OH",
... "info": {"governor": "John Kasich"},
... "counties": [
... {"name": "Summit", "population": 1234},
... {"name": "Cuyahoga", "population": 1337},
... ],
... },
... ]
>>> result = pd.json_normalize(
... data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
Prefix.0
0 1
1 2
Returns normalized data with columns prefixed with the given string.
>>> from json import loads, dumps
>>> df = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
"columns": [
"col 1",
"col 2"
],
"index": [
"row 1",
"row 2"
],
"data": [
[
"a",
"b"
],
[
"c",
"d"
]
]
}}
Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.
>>> result = df.to_json(orient="records")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
{{
"col 1": "a",
"col 2": "b"
}},
{{
"col 1": "c",
"col 2": "d"
}}
]
Encoding/decoding a Dataframe using 'index' formatted JSON:
>>> result = df.to_json(orient="index")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
"row 1": {{
"col 1": "a",
"col 2": "b"
}},
"row 2": {{
"col 1": "c",
"col 2": "d"
}}
}}
Encoding/decoding a Dataframe using 'columns' formatted JSON:
>>> result = df.to_json(orient="columns")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
"col 1": {{
"row 1": "a",
"row 2": "c"
}},
"col 2": {{
"row 1": "b",
"row 2": "d"
}}
}}
Encoding/decoding a Dataframe using 'values' formatted JSON:
>>> result = df.to_json(orient="values")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
[
[
"a",
"b"
],
[
"c",
"d"
]
]
Encoding with Table Schema:
>>> result = df.to_json(orient="table")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)
{{
"schema": {{
"fields": [
{{
"name": "index",
"type": "string"
}},
{{
"name": "col 1",
"type": "string"
}},
{{
"name": "col 2",
"type": "string"
}}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
}},
"data": [
{{
"index": "row 1",
"col 1": "a",
"col 2": "b"
}},
{{
"index": "row 2",
"col 1": "c",
"col 2": "d"
}}
]
}}
>>> from pandas.io.json._table_schema import build_table_schema
>>> df = pd.DataFrame(
... {'A': [1, 2, 3],
... 'B': ['a', 'b', 'c'],
... 'C': pd.date_range('2016-01-01', freq='D', periods=3),
... }, index=pd.Index(range(3), name='idx'))
>>> build_table_schema(df)
{'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'A', 'type': 'integer'}, {'name': 'B', 'type': 'string', 'extDtype': 'str'}, {'name': 'C', 'type': 'datetime'}], 'primaryKey': ['idx'], 'pandas_version': '1.4.0'}
| Function | Description |
|---|---|
| melt(frame[, id_vars, value_vars, var_name, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
| pivot(data, *, columns[, index, values]) | Return reshaped DataFrame organized by given index / column values. |
| pivot_table(data[, values, index, columns, ...]) | Create a spreadsheet-style pivot table as a DataFrame. |
| crosstab(index, columns[, values, rownames, ...]) | Compute a simple cross tabulation of two (or more) factors. |
| cut(x, bins[, right, labels, retbins, ...]) | Bin values into discrete intervals. |
| qcut(x, q[, labels, retbins, precision, ...]) | Quantile-based discretization function. |
| merge(left, right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
| merge_ordered(left, right[, on, left_on, ...]) | Perform a merge for ordered data with optional filling/interpolation. |
| merge_asof(left, right[, on, left_on, ...]) | Perform a merge by key distance. |
| concat(objs, *[, axis, join, ignore_index, ...]) | Concatenate pandas objects along a particular axis. |
| get_dummies(data[, prefix, prefix_sep, ...]) | Convert categorical variable into dummy/indicator variables. |
| from_dummies(data[, sep, default_category]) | Create a categorical DataFrame from a DataFrame of dummy variables. |
| factorize(values[, sort, use_na_sentinel, ...]) | Encode the object as an enumerated type or categorical variable. |
| unique(values) | Return unique values based on a hash table. |
| lreshape(data, groups[, dropna]) | Reshape wide-format data to long. |
| wide_to_long(df, stubnames, i, j[, sep, suffix]) | Unpivot a DataFrame from wide to long format. |
| Function | Description |
|---|---|
| isna(obj) | Detect missing values for an array-like object. |
| isnull(obj) | Detect missing values for an array-like object. |
| notna(obj) | Detect non-missing values for an array-like object. |
| notnull(obj) | Detect non-missing values for an array-like object. |
| Function | Description |
|---|---|
| to_numeric(arg[, errors, downcast, ...]) | Convert argument to a numeric type. |
| Function | Description |
|---|---|
| to_datetime(arg[, errors, dayfirst, ...]) | Convert argument to datetime. |
| to_timedelta(arg[, unit, errors]) | Convert argument to timedelta. |
| date_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency DatetimeIndex. |
| bdate_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency DatetimeIndex with business day as the default. |
| period_range([start, end, periods, freq, name]) | Return a fixed frequency PeriodIndex. |
| timedelta_range([start, end, periods, freq, ...]) | Return a fixed frequency TimedeltaIndex with day as the default. |
| infer_freq(index) | Infer the most likely frequency given the input index. |
| Function | Description |
|---|---|
| interval_range([start, end, periods, freq, ...]) | Return a fixed frequency IntervalIndex. |
| Function | Description |
|---|---|
| col(col_name) | Generate deferred object representing a column of a DataFrame. |
| eval(expr[, parser, engine, local_dict, ...]) | Evaluate a Python expression as a string using various backends. |
| Function | Description |
|---|---|
| tseries.api.guess_datetime_format(dt_str[, ...]) | Guess the datetime format of a given datetime string. |
| Function | Description |
|---|---|
| util.hash_array(vals[, encoding, hash_key, ...]) | Given a 1d array, return an array of deterministic integers. |
| util.hash_pandas_object(obj[, index, ...]) | Return a data hash of the Index/Series/DataFrame. |
| Function | Description |
|---|---|
| api.interchange.from_dataframe(df[, allow_copy]) | Build a pd.DataFrame from any DataFrame supporting the interchange protocol. |
>>> df = pd.DataFrame(
... {
... "A": {0: "a", 1: "b", 2: "c"},
... "B": {0: 1, 1: 3, 2: 5},
... "C": {0: 2, 1: 4, 2: 6},
... }
... )
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, id_vars=["A"], value_vars=["B"])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of ‘variable’ and ‘value’ columns can be customized:
>>> pd.melt(
... df,
... id_vars=["A"],
... value_vars=["B"],
... var_name="myVarname",
... value_name="myValname",
... )
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
Original index values can be kept around:
>>> pd.melt(df, id_vars=["A"], value_vars=["B", "C"], ignore_index=False)
A variable value
0 a B 1
1 b B 3
2 c B 5
0 a C 2
1 b C 4
2 c C 6
If you have multi-index columns:
>>> df.columns = [list("ABC"), list("DEF")]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> pd.melt(df, col_level=0, id_vars=["A"], value_vars=["B"])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> pd.melt(df, id_vars=[("A", "D")], value_vars=[("B", "E")])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
>>> df = pd.DataFrame(
... {
... "foo": ["one", "one", "one", "two", "two", "two"],
... "bar": ["A", "B", "C", "A", "B", "C"],
... "baz": [1, 2, 3, 4, 5, 6],
... "zoo": ["x", "y", "z", "q", "w", "t"],
... }
... )
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
>>> df.pivot(index="foo", columns="bar", values="baz")
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index="foo", columns="bar")["baz"]
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index="foo", columns="bar", values=["baz", "zoo"])
baz zoo
bar A B C A B C
foo
one 1 2 3 x y z
two 4 5 6 q w t
You could also assign a list of column names or a list of index names.
>>> df = pd.DataFrame(
... {
... "lev1": [1, 1, 1, 2, 2, 2],
... "lev2": [1, 1, 2, 1, 1, 2],
... "lev3": [1, 2, 1, 2, 1, 2],
... "lev4": [1, 2, 3, 4, 5, 6],
... "values": [0, 1, 2, 3, 4, 5],
... }
... )
>>> df
lev1 lev2 lev3 lev4 values
0 1 1 1 1 0
1 1 1 2 2 1
2 1 2 1 3 2
3 2 1 2 4 3
4 2 1 1 5 4
5 2 2 2 6 5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values")
lev2 1 2
lev3 1 2 1 2
lev1
1 0.0 1.0 2.0 NaN
2 4.0 3.0 NaN 5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values")
lev3 1 2
lev1 lev2
1 1 0.0 1.0
2 2.0 NaN
2 1 4.0 3.0
2 NaN 5.0
A ValueError is raised if there are any duplicates.
>>> df = pd.DataFrame(
... {
... "foo": ["one", "one", "two", "two"],
... "bar": ["A", "A", "B", "C"],
... "baz": [1, 2, 3, 4],
... }
... )
>>> df
foo bar baz
0 one A 1
1 one A 2
2 two B 3
3 two C 4
Notice that the first two rows are the same for our index and columns arguments.
>>> df.pivot(index="foo", columns="bar", values="baz")
Traceback (most recent call last):
...
ValueError: Index contains duplicate entries, cannot reshape
>>> df = pd.DataFrame(
... {
... "A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
... "C": [
... "small",
... "large",
... "large",
... "small",
... "small",
... "large",
... "small",
... "small",
... "large",
... ],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
... }
... )
>>> df
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
This first example aggregates values by taking the sum.
>>> table = pd.pivot_table(
... df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum"
... )
>>> table
C large small
A B
bar one 4.0 5.0
two 7.0 6.0
foo one 4.0 1.0
two NaN 6.0
We can also fill missing values using the fill_value parameter.
>>> table = pd.pivot_table(
... df, values="D", index=["A", "B"], columns=["C"], aggfunc="sum", fill_value=0
... )
>>> table
C large small
A B
bar one 4 5
two 7 6
foo one 4 1
two 0 6
The next example aggregates by taking the mean across multiple columns.
>>> table = pd.pivot_table(
... df, values=["D", "E"], index=["A", "C"], aggfunc={"D": "mean", "E": "mean"}
... )
>>> table
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333
We can also calculate multiple types of aggregations for any given value column.
>>> table = pd.pivot_table(
... df,
... values=["D", "E"],
... index=["A", "C"],
... aggfunc={"D": "mean", "E": ["min", "max", "mean"]},
... )
>>> table
D E
mean max mean min
A C
bar large 5.500000 9 7.500000 6
small 5.500000 9 8.500000 8
foo large 2.000000 5 4.500000 4
small 2.333333 6 4.333333 2
>>> a = np.array(
... [
... "foo",
... "foo",
... "foo",
... "foo",
... "bar",
... "bar",
... "bar",
... "bar",
... "foo",
... "foo",
... "foo",
... ],
... dtype=object,
... )
>>> b = np.array(
... [
... "one",
... "one",
... "one",
... "two",
... "one",
... "one",
... "one",
... "two",
... "two",
... "two",
... "one",
... ],
... dtype=object,
... )
>>> c = np.array(
... [
... "dull",
... "dull",
... "shiny",
... "dull",
... "dull",
... "shiny",
... "shiny",
... "dull",
... "shiny",
... "shiny",
... "shiny",
... ],
... dtype=object,
... )
>>> pd.crosstab(a, [b, c], rownames=["a"], colnames=["b", "c"])
b one two
c dull shiny dull shiny
a
bar 1 2 1 0
foo 2 2 1 2
Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.
>>> foo = pd.Categorical(["a", "b"], categories=["a", "b", "c"])
>>> bar = pd.Categorical(["d", "e"], categories=["d", "e", "f"])
>>> pd.crosstab(foo, bar)
col_0 d e
row_0
a 1 0
b 0 1
>>> pd.crosstab(foo, bar, dropna=False)
col_0 d e f
row_0
a 1 0 0
b 0 1 0
c 0 0 0
Discretize into three equal-sized bins.
>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
...
[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)
...
([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...
Categories (3, interval[float64, right]): [(0.994, 3.0] < (3.0, 5.0] ...
array([0.994, 3. , 5. , 7. ]))
Discovers the same bins, but assign them specific labels. Notice that the returned Categorical’s categories are labels and is ordered.
>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["bad", "medium", "good"])
['bad', 'good', 'medium', 'medium', 'good', 'bad']
Categories (3, str): ['bad' < 'medium' < 'good']
ordered=False will result in unordered categories when labels are passed. This parameter can be used to allow non-unique labels:
>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, labels=["B", "A", "B"], ordered=False)
['B', 'B', 'A', 'A', 'B', 'B']
Categories (2, str): ['A', 'B']
labels=False implies you just want the bins back.
>>> pd.cut([0, 1, 1, 2], bins=4, labels=False)
array([0, 1, 1, 3])
Passing a Series as an input returns a Series with categorical dtype:
>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, 3)
...
a (1.992, 4.667]
b (1.992, 4.667]
c (4.667, 7.333]
d (7.333, 10.0]
e (7.333, 10.0]
dtype: category
Categories (3, interval[float64, right]): [(1.992, 4.667] < (4.667, ...
Passing a Series as an input returns a Series with mapping value. It is used to map numerically to intervals based on bins.
>>> s = pd.Series(np.array([2, 4, 6, 8, 10]), index=["a", "b", "c", "d", "e"])
>>> pd.cut(s, [0, 2, 4, 6, 8, 10], labels=False, retbins=True, right=False)
...
(a 1.0
b 2.0
c 3.0
d 4.0
e NaN
dtype: float64,
array([ 0, 2, 4, 6, 8, 10]))
Use drop optional when bins is not unique
>>> pd.cut(
... s,
... [0, 2, 4, 6, 10, 10],
... labels=False,
... retbins=True,
... right=False,
... duplicates="drop",
... )
...
(a 1.0
b 2.0
c 3.0
d 3.0
e NaN
dtype: float64,
array([ 0, 2, 4, 6, 10]))
Passing an IntervalIndex for bins results in those categories exactly. Notice that values not covered by the IntervalIndex are set to NaN. 0 is to the left of the first bin (which is closed on the right), and 1.5 falls between two bins.
>>> bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
>>> pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
[NaN, (0.0, 1.0], NaN, (2.0, 3.0], (4.0, 5.0]]
Categories (3, interval[int64, right]): [(0, 1] < (2, 3] < (4, 5]]
Using np.histogram_bin_edges with cut
>>> pd.cut(
... np.array([1, 7, 5, 4]),
... bins=np.histogram_bin_edges(np.array([1, 7, 5, 4]), bins="auto"),
... )
...
[NaN, (5.0, 7.0], (3.0, 5.0], (3.0, 5.0]]
Categories (3, interval[float64, right]): [(1.0, 3.0] < (3.0, 5.0] < (5.0, 7.0]]
>>> pd.qcut(range(5), 4)
...
[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
Categories (4, interval[float64, right]): [(-0.001, 1.0] < (1.0, 2.0] ...
>>> pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
...
[good, good, medium, bad, bad]
Categories (3, str): [good < medium < bad]
>>> pd.qcut(range(5), 4, labels=False)
array([0, 0, 1, 2, 3])
>>> df1 = pd.DataFrame(
... {"lkey": ["foo", "bar", "baz", "foo"], "value": [1, 2, 3, 5]}
... )
>>> df2 = pd.DataFrame(
... {"rkey": ["foo", "bar", "baz", "foo"], "value": [5, 6, 7, 8]}
... )
>>> df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
>>> df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.
>>> df1.merge(df2, left_on="lkey", right_on="rkey")
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 bar 2 bar 6
3 baz 3 baz 7
4 foo 5 foo 5
5 foo 5 foo 8
Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.
>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=("_left", "_right"))
lkey value_left rkey value_right
0 foo 1 foo 5
1 foo 1 foo 8
2 bar 2 bar 6
3 baz 3 baz 7
4 foo 5 foo 5
5 foo 5 foo 8
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.
>>> df1.merge(df2, left_on="lkey", right_on="rkey", suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='str')
>>> df1 = pd.DataFrame({"a": ["foo", "bar"], "b": [1, 2]})
>>> df2 = pd.DataFrame({"a": ["foo", "baz"], "c": [3, 4]})
>>> df1
a b
0 foo 1
1 bar 2
>>> df2
a c
0 foo 3
1 baz 4
>>> df1.merge(df2, how="inner", on="a")
a b c
0 foo 1 3
>>> df1.merge(df2, how="left", on="a")
a b c
0 foo 1 3.0
1 bar 2 NaN
>>> df1 = pd.DataFrame({"left": ["foo", "bar"]})
>>> df2 = pd.DataFrame({"right": [7, 8]})
>>> df1
left
0 foo
1 bar
>>> df2
right
0 7
1 8
>>> df1.merge(df2, how="cross")
left right
0 foo 7
1 foo 8
2 bar 7
3 bar 8
>>> from pandas import merge_ordered
>>> df1 = pd.DataFrame(
... {
... "key": ["a", "c", "e", "a", "c", "e"],
... "lvalue": [1, 2, 3, 1, 2, 3],
... "group": ["a", "a", "a", "b", "b", "b"],
... }
... )
>>> df1
key lvalue group
0 a 1 a
1 c 2 a
2 e 3 a
3 a 1 b
4 c 2 b
5 e 3 b
>>> df2 = pd.DataFrame({"key": ["b", "c", "d"], "rvalue": [1, 2, 3]})
>>> df2
key rvalue
0 b 1
1 c 2
2 d 3
>>> merge_ordered(df1, df2, fill_method="ffill", left_by="group")
key lvalue group rvalue
0 a 1 a NaN
1 b 1 a 1.0
2 c 2 a 2.0
3 d 2 a 3.0
4 e 3 a 3.0
5 a 1 b NaN
6 b 1 b 1.0
7 c 2 b 2.0
8 d 2 b 3.0
9 e 3 b 3.0
>>> left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]})
>>> left
a left_val
0 1 a
1 5 b
2 10 c
>>> right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]})
>>> right
a right_val
0 1 1
1 2 2
2 3 3
3 6 6
4 7 7
>>> pd.merge_asof(left, right, on="a")
a left_val right_val
0 1 a 1
1 5 b 3
2 10 c 7
>>> pd.merge_asof(left, right, on="a", allow_exact_matches=False)
a left_val right_val
0 1 a NaN
1 5 b 3.0
2 10 c 7.0
>>> pd.merge_asof(left, right, on="a", direction="forward")
a left_val right_val
0 1 a 1.0
1 5 b 6.0
2 10 c NaN
>>> pd.merge_asof(left, right, on="a", direction="nearest")
a left_val right_val
0 1 a 1
1 5 b 6
2 10 c 7
We can use indexed DataFrames as well.
>>> left = pd.DataFrame({"left_val": ["a", "b", "c"]}, index=[1, 5, 10])
>>> left
left_val
1 a
5 b
10 c
>>> right = pd.DataFrame({"right_val": [1, 2, 3, 6, 7]}, index=[1, 2, 3, 6, 7])
>>> right
right_val
1 1
2 2
3 3
6 6
7 7
>>> pd.merge_asof(left, right, left_index=True, right_index=True)
left_val right_val
1 a 1
5 b 3
10 c 7
Here is a real-world times-series example
>>> quotes = pd.DataFrame(
... {
... "time": [
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.030"),
... pd.Timestamp("2016-05-25 13:30:00.041"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.049"),
... pd.Timestamp("2016-05-25 13:30:00.072"),
... pd.Timestamp("2016-05-25 13:30:00.075"),
... ],
... "ticker": [
... "GOOG",
... "MSFT",
... "MSFT",
... "MSFT",
... "GOOG",
... "AAPL",
... "GOOG",
... "MSFT",
... ],
... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
... )
>>> quotes
time ticker bid ask
0 2016-05-25 13:30:00.023 GOOG 720.50 720.93
1 2016-05-25 13:30:00.023 MSFT 51.95 51.96
2 2016-05-25 13:30:00.030 MSFT 51.97 51.98
3 2016-05-25 13:30:00.041 MSFT 51.99 52.00
4 2016-05-25 13:30:00.048 GOOG 720.50 720.93
5 2016-05-25 13:30:00.049 AAPL 97.99 98.01
6 2016-05-25 13:30:00.072 GOOG 720.50 720.88
7 2016-05-25 13:30:00.075 MSFT 52.01 52.03
>>> trades = pd.DataFrame(
... {
... "time": [
... pd.Timestamp("2016-05-25 13:30:00.023"),
... pd.Timestamp("2016-05-25 13:30:00.038"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... pd.Timestamp("2016-05-25 13:30:00.048"),
... ],
... "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
... "price": [51.95, 51.95, 720.77, 720.92, 98.0],
... "quantity": [75, 155, 100, 100, 100],
... }
... )
>>> trades
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 GOOG 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
By default we are taking the asof of the quotes
>>> pd.merge_asof(trades, quotes, on="time", by="ticker")
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 2ms between the quote time and the trade time
>>> pd.merge_asof(
... trades, quotes, on="time", by="ticker", tolerance=pd.Timedelta("2ms")
... )
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN
2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
We only asof within 10ms between the quote time and the trade time and we exclude exact matches on time. However prior data will propagate forward
>>> pd.merge_asof(
... trades,
... quotes,
... on="time",
... by="ticker",
... tolerance=pd.Timedelta("10ms"),
... allow_exact_matches=False,
... )
time ticker price quantity bid ask
0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN
1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN
3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
Combine two Series.
>>> s1 = pd.Series(["a", "b"])
>>> s2 = pd.Series(["c", "d"])
>>> pd.concat([s1, s2])
0 a
1 b
0 c
1 d
dtype: str
Clear the existing index and reset it in the result by setting the ignore_index option to True.
>>> pd.concat([s1, s2], ignore_index=True)
0 a
1 b
2 c
3 d
dtype: str
Add a hierarchical index at the outermost level of the data with the keys option.
>>> pd.concat([s1, s2], keys=["s1", "s2"])
s1 0 a
1 b
s2 0 c
1 d
dtype: str
Label the index keys you create with the names option.
>>> pd.concat([s1, s2], keys=["s1", "s2"], names=["Series name", "Row ID"])
Series name Row ID
s1 0 a
1 b
s2 0 c
1 d
dtype: str
Combine two DataFrame objects with identical columns.
>>> df1 = pd.DataFrame([["a", 1], ["b", 2]], columns=["letter", "number"])
>>> df1
letter number
0 a 1
1 b 2
>>> df2 = pd.DataFrame([["c", 3], ["d", 4]], columns=["letter", "number"])
>>> df2
letter number
0 c 3
1 d 4
>>> pd.concat([df1, df2])
letter number
0 a 1
1 b 2
0 c 3
1 d 4
Combine DataFrame objects with overlapping columns and return everything. Columns outside the intersection will be filled with NaN values.
>>> df3 = pd.DataFrame(
... [["c", 3, "cat"], ["d", 4, "dog"]], columns=["letter", "number", "animal"]
... )
>>> df3
letter number animal
0 c 3 cat
1 d 4 dog
>>> pd.concat([df1, df3], sort=False)
letter number animal
0 a 1 NaN
1 b 2 NaN
0 c 3 cat
1 d 4 dog
Combine DataFrame objects with overlapping columns and return only those that are shared by passing inner to the join keyword argument.
>>> pd.concat([df1, df3], join="inner")
letter number
0 a 1
1 b 2
0 c 3
1 d 4
Combine DataFrame objects horizontally along the x axis by passing in axis=1.
>>> df4 = pd.DataFrame(
... [["bird", "polly"], ["monkey", "george"]], columns=["animal", "name"]
... )
>>> pd.concat([df1, df4], axis=1)
letter number animal name
0 a 1 bird polly
1 b 2 monkey george
Prevent the result from including duplicate index values with the verify_integrity option.
>>> df5 = pd.DataFrame([1], index=["a"])
>>> df5
0
a 1
>>> df6 = pd.DataFrame([2], index=["a"])
>>> df6
0
a 2
>>> pd.concat([df5, df6], verify_integrity=True)
Traceback (most recent call last):
...
ValueError: Indexes have overlapping values: ['a']
Append a single row to the end of a DataFrame object.
>>> df7 = pd.DataFrame({"a": 1, "b": 2}, index=[0])
>>> df7
a b
0 1 2
>>> new_row = pd.Series({"a": 3, "b": 4})
>>> new_row
a 3
b 4
dtype: int64
>>> pd.concat([df7, new_row.to_frame().T], ignore_index=True)
a b
0 1 2
1 3 4
>>> s = pd.Series(list("abca"))
>>> pd.get_dummies(s)
a b c
0 True False False
1 False True False
2 False False True
3 True False False
>>> s1 = ["a", "b", np.nan]
>>> pd.get_dummies(s1)
a b
0 True False
1 False True
2 False False
>>> pd.get_dummies(s1, dummy_na=True)
a b NaN
0 True False False
1 False True False
2 False False True
>>> df = pd.DataFrame({"A": ["a", "b", "a"], "B": ["b", "a", "c"], "C": [1, 2, 3]})
>>> pd.get_dummies(df, prefix=["col1", "col2"])
C col1_a col1_b col2_a col2_b col2_c
0 1 True False False True False
1 2 False True True False False
2 3 True False False False True
>>> pd.get_dummies(pd.Series(list("abcaa")))
a b c
0 True False False
1 False True False
2 False False True
3 True False False
4 True False False
>>> pd.get_dummies(pd.Series(list("abcaa")), drop_first=True)
b c
0 False False
1 True False
2 False True
3 False False
4 False False
>>> pd.get_dummies(pd.Series(list("abc")), dtype=float)
a b c
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
>>> df = pd.DataFrame({"a": [1, 0, 0, 1], "b": [0, 1, 0, 0], "c": [0, 0, 1, 0]})
>>> df
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
>>> pd.from_dummies(df)
0 a
1 b
2 c
3 a
>>> df = pd.DataFrame(
... {
... "col1_a": [1, 0, 1],
... "col1_b": [0, 1, 0],
... "col2_a": [0, 1, 0],
... "col2_b": [1, 0, 0],
... "col2_c": [0, 0, 1],
... }
... )
>>> df
col1_a col1_b col2_a col2_b col2_c
0 1 0 0 1 0
1 0 1 1 0 0
2 1 0 0 0 1
>>> pd.from_dummies(df, sep="_")
col1 col2
0 a b
1 b a
2 a c
>>> df = pd.DataFrame(
... {
... "col1_a": [1, 0, 0],
... "col1_b": [0, 1, 0],
... "col2_a": [0, 1, 0],
... "col2_b": [1, 0, 0],
... "col2_c": [0, 0, 0],
... }
... )
>>> df
col1_a col1_b col2_a col2_b col2_c
0 1 0 0 1 0
1 0 1 1 0 0
2 0 0 0 0 0
>>> pd.from_dummies(df, sep="_", default_category={"col1": "d", "col2": "e"})
col1 col2
0 a b
1 b a
2 d e
These examples all show factorize as a top-level method like pd.factorize(values). The results are identical for methods like Series.factorize().
>>> codes, uniques = pd.factorize(np.array(["b", "b", "a", "c", "b"], dtype="O"))
>>> codes
array([0, 0, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With sort=True, the uniques will be sorted, and codes will be shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(
... np.array(["b", "b", "a", "c", "b"], dtype="O"), sort=True
... )
>>> codes
array([1, 1, 0, 2, 1])
>>> uniques
array(['a', 'b', 'c'], dtype=object)
When use_na_sentinel=True (the default), missing values are indicated in the codes with the sentinel value -1 and missing values are not included in uniques.
>>> codes, uniques = pd.factorize(np.array(["b", None, "a", "c", "b"], dtype="O"))
>>> codes
array([ 0, -1, 1, 2, 0])
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.
>>> cat = pd.Categorical(["a", "a", "c"], categories=["a", "b", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
['a', 'c']
Categories (3, str): ['a', 'b', 'c']
Notice that 'b' is in uniques.categories, despite not being present in cat.values.
For all other pandas objects, an Index of the appropriate type is returned.
>>> cat = pd.Series(["a", "a", "c"])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1])
>>> uniques
Index(['a', 'c'], dtype='str')
If NaN is in the values, and we want to include NaN in the uniques of the values, it can be achieved by setting use_na_sentinel=False.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: use_na_sentinel=True
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, use_na_sentinel=False)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
>>> pd.unique(pd.Series([2, 1, 3, 3]))
array([2, 1, 3])
>>> pd.unique(pd.Series([2] + [1] * 5))
array([2, 1])
>>> pd.unique(pd.Series([pd.Timestamp("20160101"), pd.Timestamp("20160101")]))
array(['2016-01-01T00:00:00.000000'], dtype='datetime64[us]')
>>> pd.unique(
... pd.Series(
... [
... pd.Timestamp("20160101", tz="US/Eastern"),
... pd.Timestamp("20160101", tz="US/Eastern"),
... ],
... dtype="M8[ns, US/Eastern]",
... )
... )
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]
>>> pd.unique(
... pd.Index(
... [
... pd.Timestamp("20160101", tz="US/Eastern"),
... pd.Timestamp("20160101", tz="US/Eastern"),
... ],
... dtype="M8[ns, US/Eastern]",
... )
... )
DatetimeIndex(['2016-01-01 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]',
freq=None)
>>> pd.unique(np.array(list("baabc"), dtype="O"))
array(['b', 'a', 'c'], dtype=object)
An unordered Categorical will return categories in the order of appearance.
>>> pd.unique(pd.Series(pd.Categorical(list("baabc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']
>>> pd.unique(pd.Series(pd.Categorical(list("baabc"), categories=list("abc"))))
['b', 'a', 'c']
Categories (3, str): ['a', 'b', 'c']
An ordered Categorical preserves the category ordering.
>>> pd.unique(
... pd.Series(
... pd.Categorical(list("baabc"), categories=list("abc"), ordered=True)
... )
... )
['b', 'a', 'c']
Categories (3, str): ['a' < 'b' < 'c']
An array of tuples
>>> pd.unique(pd.Series([("a", "b"), ("b", "a"), ("a", "c"), ("b", "a")]).values)
array([('a', 'b'), ('b', 'a'), ('a', 'c')], dtype=object)
A NumpyExtensionArray of complex
>>> pd.unique(pd.array([1 + 1j, 2, 3]))
<NumpyExtensionArray>
[(1+1j), (2+0j), (3+0j)]
Length: 3, dtype: complex128
>>> data = pd.DataFrame(
... {
... "hr1": [514, 573],
... "hr2": [545, 526],
... "team": ["Red Sox", "Yankees"],
... "year1": [2007, 2007],
... "year2": [2008, 2008],
... }
... )
>>> data
hr1 hr2 team year1 year2
0 514 545 Red Sox 2007 2008
1 573 526 Yankees 2007 2008
>>> pd.lreshape(data, {"year": ["year1", "year2"], "hr": ["hr1", "hr2"]})
team year hr
0 Red Sox 2007 514
1 Yankees 2007 573
2 Red Sox 2008 545
3 Yankees 2008 526
>>> np.random.seed(123)
>>> df = pd.DataFrame(
... {
... "A1970": {0: "a", 1: "b", 2: "c"},
... "A1980": {0: "d", 1: "e", 2: "f"},
... "B1970": {0: 2.5, 1: 1.2, 2: 0.7},
... "B1980": {0: 3.2, 1: 1.3, 2: 0.1},
... "X": dict(zip(range(3), np.random.randn(3), strict=True)),
... }
... )
>>> df["id"] = df.index
>>> df
A1970 A1980 B1970 B1980 X id
0 a d 2.5 3.2 -1.085631 0
1 b e 1.2 1.3 0.997345 1
2 c f 0.7 0.1 0.282978 2
>>> pd.wide_to_long(df, ["A", "B"], i="id", j="year")
...
X A B
id year
0 1970 -1.085631 a 2.5
1 1970 0.997345 b 1.2
2 1970 0.282978 c 0.7
0 1980 -1.085631 d 3.2
1 1980 0.997345 e 1.3
2 1980 0.282978 f 0.1
With multiple id columns
>>> df = pd.DataFrame(
... {
... "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
... "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
... "ht1": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
... "ht2": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
... }
... )
>>> df
famid birth ht1 ht2
0 1 1 2.8 3.4
1 1 2 2.9 3.8
2 1 3 2.2 2.9
3 2 1 2.0 3.2
4 2 2 1.8 2.8
5 2 3 1.9 2.4
6 3 1 2.2 3.3
7 3 2 2.3 3.4
8 3 3 2.1 2.9
>>> long_format = pd.wide_to_long(df, stubnames="ht", i=["famid", "birth"], j="age")
>>> long_format
...
ht
famid birth age
1 1 1 2.8
2 3.4
2 1 2.9
2 3.8
3 1 2.2
2 2.9
2 1 1 2.0
2 3.2
2 1 1.8
2 2.8
3 1 1.9
2 2.4
3 1 1 2.2
2 3.3
2 1 2.3
2 3.4
3 1 2.1
2 2.9
Going from long back to wide just takes some creative use of unstack
>>> wide_format = long_format.unstack()
>>> wide_format.columns = wide_format.columns.map("{0[0]}{0[1]}".format)
>>> wide_format.reset_index()
famid birth ht1 ht2
0 1 1 2.8 3.4
1 1 2 2.9 3.8
2 1 3 2.2 2.9
3 2 1 2.0 3.2
4 2 2 1.8 2.8
5 2 3 1.9 2.4
6 3 1 2.2 3.3
7 3 2 2.3 3.4
8 3 3 2.1 2.9
Less wieldy column names are also handled
>>> np.random.seed(0)
>>> df = pd.DataFrame(
... {
... "A(weekly)-2010": np.random.rand(3),
... "A(weekly)-2011": np.random.rand(3),
... "B(weekly)-2010": np.random.rand(3),
... "B(weekly)-2011": np.random.rand(3),
... "X": np.random.randint(3, size=3),
... }
... )
>>> df["id"] = df.index
>>> df
A(weekly)-2010 A(weekly)-2011 B(weekly)-2010 B(weekly)-2011 X id
0 0.548814 0.544883 0.437587 0.383442 0 0
1 0.715189 0.423655 0.891773 0.791725 1 1
2 0.602763 0.645894 0.963663 0.528895 1 2
>>> pd.wide_to_long(df, ["A(weekly)", "B(weekly)"], i="id", j="year", sep="-")
...
X A(weekly) B(weekly)
id year
0 2010 0 0.548814 0.437587
1 2010 1 0.715189 0.891773
2 2010 1 0.602763 0.963663
0 2011 0 0.544883 0.383442
1 2011 1 0.423655 0.791725
2 2011 1 0.645894 0.528895
If we have many columns, we could also use a regex to find our stubnames and pass that list on to wide_to_long
>>> stubnames = sorted(
... set(
... [
... match[0]
... for match in df.columns.str.findall(r"[A-B]\(.*\)").values
... if match != []
... ]
... )
... )
>>> list(stubnames)
['A(weekly)', 'B(weekly)']
All of the above examples have integers as suffixes. It is possible to have non-integers as suffixes.
>>> df = pd.DataFrame(
... {
... "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
... "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
... "ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
... "ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
... }
... )
>>> df
famid birth ht_one ht_two
0 1 1 2.8 3.4
1 1 2 2.9 3.8
2 1 3 2.2 2.9
3 2 1 2.0 3.2
4 2 2 1.8 2.8
5 2 3 1.9 2.4
6 3 1 2.2 3.3
7 3 2 2.3 3.4
8 3 3 2.1 2.9
>>> long_format = pd.wide_to_long(
... df, stubnames="ht", i=["famid", "birth"], j="age", sep="_", suffix=r"\w+"
... )
>>> long_format
...
ht
famid birth age
1 1 one 2.8
two 3.4
2 one 2.9
two 3.8
3 one 2.2
two 2.9
2 1 one 2.0
two 3.2
2 one 1.8
two 2.8
3 one 1.9
two 2.4
3 1 one 2.2
two 3.3
2 one 2.3
two 3.4
3 one 2.1
two 2.9
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.isna("dog")
False
>>> pd.isna(pd.NA)
True
>>> pd.isna(np.nan)
True
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.isna(array)
array([[False, True, False],
[False, False, True]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False, True, False])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
0 1 2
0 ant bee cat
1 dog NaN fly
>>> pd.isna(df)
0 1 2
0 False False False
1 False True False
>>> pd.isna(df[1])
0 False
1 True
Name: 1, dtype: bool
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.isna("dog")
False
>>> pd.isna(pd.NA)
True
>>> pd.isna(np.nan)
True
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.isna(array)
array([[False, True, False],
[False, False, True]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[us]', freq=None)
>>> pd.isna(index)
array([False, False, True, False])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
0 1 2
0 ant bee cat
1 dog NaN fly
>>> pd.isna(df)
0 1 2
0 False False False
1 False True False
>>> pd.isna(df[1])
0 False
1 True
Name: 1, dtype: bool
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.notna("dog")
True
>>> pd.notna(pd.NA)
False
>>> pd.notna(np.nan)
False
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.notna(array)
array([[ True, False, True],
[ True, True, False]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True, True, False, True])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
0 1 2
0 ant bee cat
1 dog NaN fly
>>> pd.notna(df)
0 1 2
0 True True True
1 True False True
>>> pd.notna(df[1])
0 True
1 False
Name: 1, dtype: bool
Scalar arguments (including strings) result in a scalar boolean.
>>> pd.notna("dog")
True
>>> pd.notna(pd.NA)
False
>>> pd.notna(np.nan)
False
ndarrays result in an ndarray of booleans.
>>> array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
>>> array
array([[ 1., nan, 3.],
[ 4., 5., nan]])
>>> pd.notna(array)
array([[ True, False, True],
[ True, True, False]])
For indexes, an ndarray of booleans is returned.
>>> index = pd.DatetimeIndex(["2017-07-05", "2017-07-06", None, "2017-07-08"])
>>> index
DatetimeIndex(['2017-07-05', '2017-07-06', 'NaT', '2017-07-08'],
dtype='datetime64[us]', freq=None)
>>> pd.notna(index)
array([ True, True, False, True])
For Series and DataFrame, the same type is returned, containing booleans.
>>> df = pd.DataFrame([["ant", "bee", "cat"], ["dog", None, "fly"]])
>>> df
0 1 2
0 ant bee cat
1 dog NaN fly
>>> pd.notna(df)
0 1 2
0 True True True
1 True False True
>>> pd.notna(df[1])
0 True
1 False
Name: 1, dtype: bool
| Function | Description |
|---|---|
| Series([data, index, dtype, name, copy]) | One-dimensional ndarray with axis labels (including time series). |
Axes
| Function | Description |
|---|---|
| Series.index | The index (axis labels) of the Series. |
| Series.array | The ExtensionArray of the data backing this Series or Index. |
| Series.values | Return Series as ndarray or ndarray-like depending on the dtype. |
| Series.dtype | Return the dtype object of the underlying data. |
| Series.info([verbose, buf, max_cols, ...]) | Print a concise summary of a Series. |
| Series.shape | Return a tuple of the shape of the underlying data. |
| Series.nbytes | Return the number of bytes in the underlying data. |
| Series.ndim | Number of dimensions of the underlying data, by definition 1. |
| Series.size | Return the number of elements in the underlying data. |
| Series.T | Return the transpose, which is by definition self. |
| Series.memory_usage([index, deep]) | Return the memory usage of the Series. |
| Series.hasnans | Return True if there are any NaNs. |
| Series.empty | Indicator whether Index is empty. |
| Series.dtypes | Return the dtype object of the underlying data. |
| Series.name | Return the name of the Series. |
| Series.flags | Get the properties associated with this pandas object. |
| Series.set_flags(*[, copy, ...]) | Return a new object with updated flags. |
| Function | Description |
|---|---|
| Series.astype(dtype[, copy, errors]) | Cast a pandas object to a specified dtype dtype. |
| Series.convert_dtypes([infer_objects, ...]) | Convert columns from numpy dtypes to the best dtypes that support pd.NA. |
| Series.infer_objects([copy]) | Attempt to infer better dtypes for object columns. |
| Series.copy([deep]) | Make a copy of this object's indices and data. |
| Series.to_numpy([dtype, copy, na_value]) | A NumPy ndarray representing the values in this Series or Index. |
| Series.to_period([freq, copy]) | Convert Series from DatetimeIndex to PeriodIndex. |
| Series.to_timestamp([freq, how, copy]) | Cast to DatetimeIndex of Timestamps, at beginning of period. |
| Series.to_list() | Return a list of the values. |
| Series.array([dtype, copy]) | Return the values as a NumPy array. |
| Function | Description |
|---|---|
| Series.get(key[, default]) | Get item from object for given key (ex: DataFrame column). |
| Series.at | Access a single value for a row/column label pair. |
| Series.iat | Access a single value for a row/column pair by integer position. |
| Series.loc | Access a group of rows and columns by label(s) or a boolean array. |
| Series.iloc | Purely integer-location based indexing for selection by position. |
| Series.iter() | Return an iterator of the values. |
| Series.items() | Lazily iterate over (index, value) tuples. |
| Series.keys() | Return alias for index. |
| Series.pop(item) | Return item and drops from series. |
| Series.item() | Return the first element of the underlying data as a Python scalar. |
| Series.xs(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.
| Function | Description |
|---|---|
| Series.add(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator add). |
| Series.sub(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator sub). |
| Series.mul(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator mul). |
| Series.div(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| Series.truediv(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| Series.floordiv(other[, level, fill_value, axis]) | Return Integer division of series and other, element-wise (binary operator floordiv). |
| Series.mod(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator mod). |
| Series.pow(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator pow). |
| Series.radd(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator radd). |
| Series.rsub(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator rsub). |
| Series.rmul(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator rmul). |
| Series.rdiv(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| Series.rtruediv(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| Series.rfloordiv(other[, level, fill_value, ...]) | Return Integer division of series and other, element-wise (binary operator rfloordiv). |
| Series.rmod(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator rmod). |
| Series.rpow(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator rpow). |
| Series.combine(other, func[, fill_value]) | Combine the Series with a Series or scalar according to func. |
| Series.combine_first(other) | Update null elements with value in the same location in 'other'. |
| Series.round([decimals]) | Round each value in a Series to the given number of decimals. |
| Series.lt(other[, level, fill_value, axis]) | Return Greater than of series and other, element-wise (binary operator lt). |
| Series.gt(other[, level, fill_value, axis]) | Return Greater than of series and other, element-wise (binary operator gt). |
| Series.le(other[, level, fill_value, axis]) | Return Less than or equal to of series and other, element-wise (binary operator le). |
| Series.ge(other[, level, fill_value, axis]) | Return Greater than or equal to of series and other, element-wise (binary operator ge). |
| Series.ne(other[, level, fill_value, axis]) | Return Not equal to of series and other, element-wise (binary operator ne). |
| Series.eq(other[, level, fill_value, axis]) | Return Equal to of series and other, element-wise (binary operator eq). |
| Series.product(*[, axis, skipna, ...]) | Return the product of the values over the requested axis. |
| Series.dot(other) | Compute the dot product between the Series and the columns of other. |
| Function | Description |
|---|---|
| Series.apply(func[, args, by_row]) | Invoke function on values of Series. |
| Series.agg([func, axis]) | Aggregate using one or more operations over the specified axis. |
| Series.aggregate([func, axis]) | Aggregate using one or more operations over the specified axis. |
| Series.transform(func[, axis]) | Call func on self producing a Series with the same axis shape as self. |
| Series.map([func, na_action, engine]) | Map values of Series according to an input mapping or function. |
| Series.groupby([by, level, as_index, sort, ...]) | Group Series using a mapper or by a Series of columns. |
| Series.rolling(window[, min_periods, ...]) | Provide rolling window calculations. |
| Series.expanding([min_periods, method]) | Provide expanding window calculations. |
| Series.ewm([com, span, halflife, alpha, ...]) | Provide exponentially weighted (EW) calculations. |
| Series.pipe(func, *args, **kwargs) | Apply chainable functions that expect Series or DataFrames. |
| Function | Description |
|---|---|
| Series.abs() | Return a Series/DataFrame with absolute numeric value of each element. |
| Series.all(*[, axis, bool_only, skipna]) | Return whether all elements are True, potentially over an axis. |
| Series.any(*[, axis, bool_only, skipna]) | Return whether any element is True, potentially over an axis. |
| Series.autocorr([lag]) | Compute the lag-N autocorrelation. |
| Series.between(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. |
| Series.clip([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| Series.corr(other[, method, min_periods]) | Compute correlation with other Series, excluding missing values. |
| Series.count() | Return number of non-NA/null observations in the Series. |
| Series.cov(other[, min_periods, ddof]) | Compute covariance with Series, excluding missing values. |
| Series.cummax([axis, skipna]) | Return cumulative maximum over a Series. |
| Series.cummin([axis, skipna]) | Return cumulative minimum over a Series. |
| Series.cumprod([axis, skipna]) | Return cumulative product over a Series. |
| Series.cumsum([axis, skipna]) | Return cumulative sum over a Series. |
| Series.describe([percentiles, include, exclude]) | Generate descriptive statistics. |
| Series.diff([periods]) | First discrete difference of Series elements. |
| Series.factorize([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| Series.kurt(*[, axis, skipna, numeric_only]) | Return unbiased kurtosis over requested axis. |
| Series.max(*[, axis, skipna, numeric_only]) | Return the maximum of the values over the requested axis. |
| Series.mean(*[, axis, skipna, numeric_only]) | Return the mean of the values over the requested axis. |
| Series.median(*[, axis, skipna, numeric_only]) | Return the median of the values over the requested axis. |
| Series.min(*[, axis, skipna, numeric_only]) | Return the minimum of the values over the requested axis. |
| Series.mode([dropna]) | Return the mode(s) of the Series. |
| Series.nlargest([n, keep]) | Return the largest n elements. |
| Series.nsmallest([n, keep]) | Return the smallest n elements. |
| Series.pct_change([periods, fill_method, freq]) | Fractional change between the current and a prior element. |
| Series.prod(*[, axis, skipna, numeric_only, ...]) | Return the product of the values over the requested axis. |
| Series.quantile([q, interpolation]) | Return value at the given quantile. |
| Series.rank([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| Series.sem(*[, axis, skipna, ddof, numeric_only]) | Return unbiased standard error of the mean over requested axis. |
| Series.skew(*[, axis, skipna, numeric_only]) | Return unbiased skew over requested axis. |
| Series.std(*[, axis, skipna, ddof, numeric_only]) | Return sample standard deviation. |
| Series.sum(*[, axis, skipna, numeric_only, ...]) | Return the sum of the values over the requested axis. |
| Series.var(*[, axis, skipna, ddof, numeric_only]) | Return unbiased variance over requested axis. |
| Series.kurtosis(*[, axis, skipna, numeric_only]) | Return unbiased kurtosis over requested axis. |
| Series.unique() | Return unique values of Series object. |
| Series.nunique([dropna]) | Return number of unique elements in the object. |
| Series.is_unique | Return True if values in the object are unique. |
| Series.is_monotonic_increasing | Return True if values in the object are monotonically increasing. |
| Series.is_monotonic_decreasing | Return True if values in the object are monotonically decreasing. |
| Series.value_counts([normalize, sort, ...]) | Return a Series containing counts of unique values. |
| Function | Description |
|---|---|
| Series.align(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. |
| Series.case_when(caselist) | Replace values where the conditions are True. |
| Series.drop([labels, axis, index, columns, ...]) | Return Series with specified index labels removed. |
| Series.droplevel(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| Series.drop_duplicates(*[, keep, inplace, ...]) | Return Series with duplicate values removed. |
| Series.duplicated([keep]) | Indicate duplicate Series values. |
| Series.equals(other) | Test whether two objects contain the same elements. |
| Series.head([n]) | Return the first n rows. |
| Series.idxmax([axis, skipna]) | Return the row label of the maximum value. |
| Series.idxmin([axis, skipna]) | Return the row label of the minimum value. |
| Series.isin(values) | Whether elements in Series are contained in values. |
| Series.reindex([index, axis, method, copy, ...]) | Conform Series to new index with optional filling logic. |
| Series.reindex_like(other[, method, copy, ...]) | Return an object with matching indices as other object. |
| Series.rename([index, axis, copy, inplace, ...]) | Alter Series index labels or name. |
| Series.rename_axis([mapper, index, axis, ...]) | Set the name of the axis for the index. |
| Series.reset_index([level, drop, name, ...]) | Generate a new DataFrame or Series with the index reset. |
| Series.sample([n, frac, replace, weights, ...]) | Return a random sample of items from an axis of object. |
| Series.set_axis(labels, *[, axis, copy]) | (DEPRECATED) Assign desired index to given axis. |
| Series.take(indices[, axis]) | Return the elements in the given positional indices along an axis. |
| Series.tail([n]) | Return the last n rows. |
| Series.truncate([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
| Series.where(cond[, other, inplace, axis, level]) | Replace values where the condition is False. |
| Series.mask(cond[, other, inplace, axis, level]) | Replace values where the condition is True. |
| Series.add_prefix(prefix[, axis]) | Prefix labels with string prefix. |
| Series.add_suffix(suffix[, axis]) | Suffix labels with string suffix. |
| Series.filter([items, like, regex, axis]) | Subset the DataFrame or Series according to the specified index labels. |
| Function | Description |
|---|---|
| Series.bfill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by using the next valid observation to fill the gap. |
| Series.dropna(*[, axis, inplace, how, ...]) | Return a new Series with missing values removed. |
| Series.ffill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by propagating the last valid observation to next valid. |
| Series.fillna(value, *[, axis, inplace, limit]) | Fill NA/NaN values with value. |
| Series.interpolate([method, axis, limit, ...]) | Fill NaN values using an interpolation method. |
| Series.isna() | Detect missing values. |
| Series.isnull() | Series.isnull is an alias for Series.isna. |
| Series.notna() | Detect existing (non-missing) values. |
| Series.notnull() | Series.notnull is an alias for Series.notna. |
| Series.replace([to_replace, value, inplace, ...]) | Replace values given in to_replace with value. |
| Function | Description |
|---|---|
| Series.argsort([axis, kind, order, stable]) | Return the integer indices that would sort the Series values. |
| Series.argmin([axis, skipna]) | Return int position of the smallest value in the Series. |
| Series.argmax([axis, skipna]) | Return int position of the largest value in the Series. |
| Series.reorder_levels(order) | Rearrange index levels using input order. |
| Series.sort_values(*[, axis, ascending, ...]) | Sort by the values. |
| Series.sort_index(*[, axis, level, ...]) | Sort Series by index labels. |
| Series.swaplevel([i, j, copy]) | Swap levels i and j in a MultiIndex. |
| Series.unstack([level, fill_value, sort]) | Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. |
| Series.explode([ignore_index]) | Transform each element of a list-like to a row. |
| Series.searchsorted(value[, side, sorter]) | Find indices where elements should be inserted to maintain order. |
| Series.repeat(repeats[, axis]) | Repeat elements of a Series. |
| Series.squeeze([axis]) | Squeeze 1 dimensional axis objects into scalars. |
| Function | Description |
|---|---|
| Series.compare(other[, align_axis, ...]) | Compare to another Series and show the differences. |
| Series.update(other) | Modify Series in place using values from passed Series. |
| Function | Description |
|---|---|
| Series.asfreq(freq[, method, how, ...]) | Convert time series to specified frequency. |
| Series.asof(where[, subset]) | Return the last row(s) without any NaNs before where. |
| Series.shift([periods, freq, axis, ...]) | Shift index by desired number of periods with an optional time freq. |
| Series.first_valid_index() | Return index for first non-missing value or None, if no value is found. |
| Series.last_valid_index() | Return index for last non-missing value or None, if no value is found. |
| Series.resample(rule[, closed, label, ...]) | Resample time-series data. |
| Series.tz_convert(tz[, axis, level, copy]) | Convert tz-aware axis to target time zone. |
| Series.tz_localize(tz[, axis, level, copy, ...]) | Localize time zone naive index of a Series or DataFrame to target time zone. |
| Series.at_time(time[, asof, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| Series.between_time(start_time, end_time[, ...]) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
pandas provides dtype-specific methods under various accessors. These are separate namespaces within Series that only apply to specific data types.
| Function | Description |
|---|---|
| Series.str | alias of StringMethods |
| Series.cat | alias of CategoricalAccessor |
| Series.dt | alias of CombinedDatetimelikeProperties |
| Series.sparse | alias of SparseAccessor |
| DataFrame.sparse | alias of SparseFrameAccessor |
| Index.str | alias of StringMethods |
| Data Type | Accessor |
|---|---|
| Datetime, Timedelta, Period | dt |
| String | str |
| Categorical | cat |
| Sparse | sparse |
Series.dt can be used to access the values of the series as datetimelike and return several properties. These can be accessed like Series.dt.<property>.
| Function | Description |
|---|---|
| Series.dt.date | Returns numpy array of python datetime.date objects. |
| Series.dt.time | Returns numpy array of datetime.time objects. |
| Series.dt.timetz | Returns numpy array of datetime.time objects with timezones. |
| Series.dt.year | The year of the datetime. |
| Series.dt.month | The month as January=1, December=12. |
| Series.dt.day | The day of the datetime. |
| Series.dt.hour | The hours of the datetime. |
| Series.dt.minute | The minutes of the datetime. |
| Series.dt.second | The seconds of the datetime. |
| Series.dt.microsecond | The microseconds of the datetime. |
| Series.dt.nanosecond | The nanoseconds of the datetime. |
| Series.dt.dayofweek | The day of the week with Monday=0, Sunday=6. |
| Series.dt.day_of_week | The day of the week with Monday=0, Sunday=6. |
| Series.dt.weekday | The day of the week with Monday=0, Sunday=6. |
| Series.dt.dayofyear | The ordinal day of the year. |
| Series.dt.day_of_year | The ordinal day of the year. |
| Series.dt.days_in_month | The number of days in the month. |
| Series.dt.quarter | The quarter of the date. |
| Series.dt.is_month_start | Indicates whether the date is the first day of the month. |
| Series.dt.is_month_end | Indicates whether the date is the last day of the month. |
| Series.dt.is_quarter_start | Indicator for whether the date is the first day of a quarter. |
| Series.dt.is_quarter_end | Indicator for whether the date is the last day of a quarter. |
| Series.dt.is_year_start | Indicate whether the date is the first day of a year. |
| Series.dt.is_year_end | Indicate whether the date is the last day of the year. |
| Series.dt.is_leap_year | Boolean indicator if the date belongs to a leap year. |
| Series.dt.daysinmonth | The number of days in the month. |
| Series.dt.days_in_month | The number of days in the month. |
| Series.dt.tz | Return the timezone. |
| Series.dt.freq | Tries to return a string representing a frequency generated by infer_freq. |
| Series.dt.unit | The precision unit of the datetime data. |
| Function | Description |
|---|---|
| Series.dt.isocalendar() | Calculate year, week, and day according to the ISO 8601 standard. |
| Series.dt.to_period([freq]) | Cast to PeriodArray/PeriodIndex at a particular frequency. |
| Series.dt.to_pydatetime() | Return the data as a Series of datetime.datetime objects. |
| Series.dt.tz_localize(tz[, ambiguous, ...]) | Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index. |
| Series.dt.tz_convert(tz) | Convert tz-aware Datetime Array/Index from one time zone to another. |
| Series.dt.normalize() | Convert times to midnight. |
| Series.dt.strftime(date_format) | Convert to Index using specified date_format. |
| Series.dt.round(freq[, ambiguous, nonexistent]) | Perform round operation on the data to the specified freq. |
| Series.dt.floor(freq[, ambiguous, nonexistent]) | Perform floor operation on the data to the specified freq. |
| Series.dt.ceil(freq[, ambiguous, nonexistent]) | Perform ceil operation on the data to the specified freq. |
| Series.dt.month_name([locale]) | Return the month names with specified locale. |
| Series.dt.day_name([locale]) | Return the day names with specified locale. |
| Series.dt.as_unit(unit[, round_ok]) | Convert to a dtype with the given unit resolution. |
| Function | Description |
|---|---|
| Series.dt.qyear | Fiscal year the Period lies in according to its starting-quarter. |
| Series.dt.start_time | Get the Timestamp for the start of the period. |
| Series.dt.end_time | Get the Timestamp for the end of the period. |
| Function | Description |
|---|---|
| Series.dt.days | Number of days for each element. |
| Series.dt.seconds | Number of seconds (>= 0 and less than 1 day) for each element. |
| Series.dt.microseconds | Number of microseconds (>= 0 and less than 1 second) for each element. |
| Series.dt.nanoseconds | Number of nanoseconds (>= 0 and less than 1 microsecond) for each element. |
| Series.dt.components | Return a Dataframe of the components of the Timedeltas. |
| Series.dt.unit | The precision unit of the datetime data. |
| Function | Description |
|---|---|
| Series.dt.to_pytimedelta() | Return an array of native datetime.timedelta objects. |
| Series.dt.total_seconds() | Return total duration of each element expressed in seconds. |
| Series.dt.as_unit(unit[, round_ok]) | Convert to a dtype with the given unit resolution. |
Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property>.
| Function | Description |
|---|---|
| Series.str.capitalize() | Convert strings in the Series/Index to be capitalized. |
| Series.str.casefold() | Convert strings in the Series/Index to be casefolded. |
| Series.str.cat([others, sep, na_rep, join]) | Concatenate strings in the Series/Index with given separator. |
| Series.str.center(width[, fillchar]) | Pad left and right side of strings in the Series/Index. |
| Series.str.contains(pat[, case, flags, na, ...]) | Test if pattern or regex is contained within a string of a Series or Index. |
| Series.str.count(pat[, flags]) | Count occurrences of pattern in each string of the Series/Index. |
| Series.str.decode(encoding[, errors, dtype]) | Decode character string in the Series/Index using indicated encoding. |
| Series.str.encode(encoding[, errors]) | Encode character string in the Series/Index using indicated encoding. |
| Series.str.endswith(pat[, na]) | Test if the end of each string element matches a pattern. |
| Series.str.extract(pat[, flags, expand]) | Extract capture groups in the regex pat as columns in a DataFrame. |
| Series.str.extractall(pat[, flags]) | Extract capture groups in the regex pat as columns in DataFrame. |
| Series.str.find(sub[, start, end]) | Return lowest indexes in each strings in the Series/Index. |
| Series.str.findall(pat[, flags]) | Find all occurrences of pattern or regular expression in the Series/Index. |
| Series.str.fullmatch(pat[, case, flags, na]) | Determine if each string entirely matches a regular expression. |
| Series.str.get(i) | Extract element from each component at specified position or with specified key. |
| Series.str.index(sub[, start, end]) | Return lowest indexes in each string in Series/Index. |
| Series.str.isascii() | Check whether all characters in each string are ascii. |
| Series.str.join(sep) | Join lists contained as elements in the Series/Index with passed delimiter. |
| Series.str.len() | Compute the length of each element in the Series/Index. |
| Series.str.ljust(width[, fillchar]) | Pad right side of strings in the Series/Index. |
| Series.str.lower() | Convert strings in the Series/Index to lowercase. |
| Series.str.lstrip([to_strip]) | Remove leading characters. |
| Series.str.match(pat[, case, flags, na]) | Determine if each string starts with a match of a regular expression. |
| Series.str.normalize(form) | Return the Unicode normal form for the strings in the Series/Index. |
| Series.str.pad(width[, side, fillchar]) | Pad strings in the Series/Index up to width. |
| Series.str.partition([sep, expand]) | Split the string at the first occurrence of sep. |
| Series.str.removeprefix(prefix) | Remove a prefix from an object series. |
| Series.str.removesuffix(suffix) | Remove a suffix from an object series. |
| Series.str.repeat(repeats) | Duplicate each string in the Series or Index. |
| Series.str.replace(pat[, repl, n, case, ...]) | Replace each occurrence of pattern/regex in the Series/Index. |
| Series.str.rfind(sub[, start, end]) | Return highest indexes in each strings in the Series/Index. |
| Series.str.rindex(sub[, start, end]) | Return highest indexes in each string in Series/Index. |
| Series.str.rjust(width[, fillchar]) | Pad left side of strings in the Series/Index. |
| Series.str.rpartition([sep, expand]) | Split the string at the last occurrence of sep. |
| Series.str.rstrip([to_strip]) | Remove trailing characters. |
| Series.str.slice([start, stop, step]) | Slice substrings from each element in the Series or Index. |
| Series.str.slice_replace([start, stop, repl]) | Replace a positional slice of a string with another value. |
| Series.str.split([pat, n, expand, regex]) | Split strings around given separator/delimiter. |
| Series.str.rsplit([pat, n, expand]) | Split strings around given separator/delimiter. |
| Series.str.startswith(pat[, na]) | Test if the start of each string element matches a pattern. |
| Series.str.strip([to_strip]) | Remove leading and trailing characters. |
| Series.str.swapcase() | Convert strings in the Series/Index to be swapcased. |
| Series.str.title() | Convert strings in the Series/Index to titlecase. |
| Series.str.translate(table) | Map all characters in the string through the given mapping table. |
| Series.str.upper() | Convert strings in the Series/Index to uppercase. |
| Series.str.wrap(width[, expand_tabs, ...]) | Wrap strings in Series/Index at specified line width. |
| Series.str.zfill(width) | Pad strings in the Series/Index by prepending '0' characters. |
| Series.str.isalnum() | Check whether all characters in each string are alphanumeric. |
| Series.str.isalpha() | Check whether all characters in each string are alphabetic. |
| Series.str.isdigit() | Check whether all characters in each string are digits. |
| Series.str.isspace() | Check whether all characters in each string are whitespace. |
| Series.str.islower() | Check whether all characters in each string are lowercase. |
| Series.str.isupper() | Check whether all characters in each string are uppercase. |
| Series.str.istitle() | Check whether all characters in each string are titlecase. |
| Series.str.isnumeric() | Check whether all characters in each string are numeric. |
| Series.str.isdecimal() | Check whether all characters in each string are decimal. |
| Series.str.get_dummies([sep, dtype]) | Return DataFrame of dummy/indicator variables for Series. |
Categorical-dtype specific methods and attributes are available under the Series.cat accessor.
| Function | Description |
|---|---|
| Series.cat.categories | The categories of this categorical. |
| Series.cat.ordered | Whether the categories have an ordered relationship. |
| Series.cat.codes | Return Series of codes as well as the index. |
| Function | Description |
|---|---|
| Series.cat.rename_categories(new_categories) | Rename categories. |
| Series.cat.reorder_categories(new_categories) | Reorder categories as specified in new_categories. |
| Series.cat.add_categories(new_categories) | Add new categories. |
| Series.cat.remove_categories(removals) | Remove the specified categories. |
| Series.cat.remove_unused_categories() | Remove categories which are not used. |
| Series.cat.set_categories(new_categories[, ...]) | Set the categories to the specified new categories. |
| Series.cat.as_ordered() | Set the Categorical to be ordered. |
| Series.cat.as_unordered() | Set the Categorical to be unordered. |
Sparse-dtype specific methods and attributes are provided under the Series.sparse accessor.
| Function | Description |
|---|---|
| Series.sparse.npoints | The number of non- fill_value points. |
| Series.sparse.density | The percent of non- fill_value points, as decimal. |
| Series.sparse.fill_value | Elements in data that are fill_value are not stored. |
| Series.sparse.sp_values | An ndarray containing the non- fill_value values. |
| Function | Description |
|---|---|
| Series.sparse.from_coo(A[, dense_index]) | Create a Series with sparse values from a scipy.sparse.coo_matrix. |
| Series.sparse.to_coo([row_levels, ...]) | Create a scipy.sparse.coo_matrix from a Series with MultiIndex. |
Arrow list-dtype specific methods and attributes are provided under the Series.list accessor.
| Function | Description |
|---|---|
| Series.list.flatten() | Flatten list values. |
| Series.list.len() | Return the length of each list in the Series. |
| Series.list.getitem(key) | Index or slice lists in the Series. |
Arrow struct-dtype specific methods and attributes are provided under the Series.struct accessor.
| Function | Description |
|---|---|
| Series.struct.dtypes | Return the dtype object of each child field of the struct. |
| Function | Description |
|---|---|
| Series.struct.field(name_or_index) | Extract a child field of a struct as a Series. |
| Series.struct.explode() | Extract all child fields of a struct as a DataFrame. |
Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in Series.attrs.
| Function | Description |
|---|---|
| Flags(obj, *, allows_duplicate_labels) | Flags that apply to pandas objects. |
Series.attrs is a dictionary for storing global metadata for this Series.
Warning
Series.attrs is considered experimental and may change without warning.
| Function | Description |
|---|---|
| Series.attrs | Dictionary of global attributes of this dataset. |
Series.plot is both a callable method and a namespace attribute for specific plotting methods of the form Series.plot.<kind>.
| Function | Description |
|---|---|
| Series.plot([kind, ax, figsize, ....]) | Series plotting accessor and method |
| Function | Description |
|---|---|
| Series.plot.area([x, y, stacked]) | Draw a stacked area plot. |
| Series.plot.bar([x, y, color]) | Vertical bar plot. |
| Series.plot.barh([x, y, color]) | Make a horizontal bar plot. |
| Series.plot.box([by]) | Make a box plot of the DataFrame columns. |
| Series.plot.density([bw_method, ind, weights]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| Series.plot.hist([by, bins]) | Draw one histogram of the DataFrame's columns. |
| Series.plot.kde([bw_method, ind, weights]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| Series.plot.line([x, y, color]) | Plot Series or DataFrame as lines. |
| Series.plot.pie([y]) | Generate a pie plot. |
| Function | Description |
|---|---|
| Series.hist([by, ax, grid, xlabelsize, ...]) | Draw histogram of the input series using matplotlib. |
| Function | Description |
|---|---|
| Series.from_arrow(data) | Construct a Series from an array-like Arrow object. |
| Series.to_pickle(path, *[, compression, ...]) | Pickle (serialize) object to file. |
| Series.to_csv([path_or_buf, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
| Series.to_dict(*[, into]) | Convert Series to {label -> value} dict or dict-like object. |
| Series.to_excel(excel_writer, *[, ...]) | Write object to an Excel sheet. |
| Series.to_frame([name]) | Convert Series to DataFrame. |
| Series.to_xarray() | Return an xarray object from the pandas object. |
| Series.to_hdf(path_or_buf, *, key[, mode, ...]) | Write the contained data to an HDF5 file using HDFStore. |
| Series.to_sql(name, con, *[, schema, ...]) | Write records stored in a DataFrame to a SQL database. |
| Series.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
| Series.to_string([buf, na_rep, ...]) | Render a string representation of the Series. |
| Series.to_clipboard(*[, excel, sep]) | Copy object to the system clipboard. |
| Series.to_latex([buf, columns, header, ...]) | Render object to a LaTeX tabular, longtable, or nested table. |
| Series.to_markdown([buf, mode, index, ...]) | Print Series in Markdown-friendly format. |
Constructing Series from a dictionary with an Index specified
>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["a", "b", "c"])
>>> ser
a 1
b 2
c 3
dtype: int64
The keys of the dictionary match with the Index values, hence the Index values have no effect.
>>> d = {"a": 1, "b": 2, "c": 3}
>>> ser = pd.Series(data=d, index=["x", "y", "z"])
>>> ser
x NaN
y NaN
z NaN
dtype: float64
Note that the Index is first built with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.
Constructing Series from a list with copy=False.
>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0 999
1 2
dtype: int64
Due to input data type the Series has a copy of the original data even though copy=False, so the data is unchanged.
Constructing Series from a 1d ndarray with copy=False.
>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999, 2])
>>> ser
0 999
1 2
dtype: int64
Due to input data type the Series has a view on the original data, so the data is changed as well.
To create a Series with a custom index and view the index labels:
>>> cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
>>> populations = [14.85, 2.71, 2.93, 0.51]
>>> city_series = pd.Series(populations, index=cities)
>>> city_series.index
Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')
To change the index labels of an existing Series:
>>> city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
>>> city_series.index
Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')
For regular NumPy types like int, and float, a NumpyExtensionArray is returned.
>>> pd.Series([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray is returned
>>> ser = pd.Series(pd.Categorical(["a", "b", "a"]))
>>> ser.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']
>>> pd.Series([1, 2, 3]).values
array([1, 2, 3])
>>> pd.Series(list("aabc")).values
<ArrowStringArray>
['a', 'a', 'b', 'c']
Length: 4, dtype: str
>>> pd.Series(list("aabc")).astype("category").values
['a', 'a', 'b', 'c']
Categories (3, str): ['a', 'b', 'c']
Timezone aware datetime data is converted to UTC:
>>> pd.Series(pd.date_range("20130101", periods=3, tz="US/Eastern")).values
array(['2013-01-01T05:00:00.000000',
'2013-01-02T05:00:00.000000',
'2013-01-03T05:00:00.000000'], dtype='datetime64[us]')
>>> s = pd.Series([1, 2, 3])
>>> s.dtype
dtype('int64')
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
Series name: None
Non-Null Count Dtype
-------------- -----
5 non-null str
dtypes: str(1)
memory usage: 106.0 bytes
Prints a summary excluding information about its values:
>>> s.info(verbose=False)
<class 'pandas.Series'>
Index: 5 entries, 1 to 5
dtypes: str(1)
memory usage: 106.0 bytes
Pipe output of Series.info to buffer instead of sys.stdout, get buffer content and writes to a text file:
>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
... f.write(s)
260
The memory_usage parameter allows deep introspection mode, specially useful for big Series and fine-tune memory optimization:
>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> s = pd.Series(np.random.choice(["a", "b", "c"], 10**6))
>>> s.info()
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null str
dtypes: str(1)
memory usage: 8.6 MB
>>> s.info(memory_usage="deep")
<class 'pandas.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null str
dtypes: str(1)
memory usage: 8.6 MB
>>> s = pd.Series([1, 2, 3])
>>> s.shape
(3,)
For Series:
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.nbytes
34
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.ndim
1
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1
For Series:
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.size
3
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3
For Series:
>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.T
0 Ant
1 Bear
2 Cow
dtype: str
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')
>>> s = pd.Series(range(3))
>>> s.memory_usage()
156
Not including the index gives the size of the rest of the data, which is necessarily smaller:
>>> s.memory_usage(index=False)
24
The memory footprint of object values is ignored by default:
>>> s = pd.Series(["a", "b"])
>>> s.values
<ArrowStringArray>
['a', 'b']
Length: 2, dtype: str
>>> s.memory_usage()
150
>>> s.memory_usage(deep=True)
150
>>> s = pd.Series([1, 2, 3, None])
>>> s
0 1.0
1 2.0
2 3.0
3 NaN
dtype: float64
>>> s.hasnans
True
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False
>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True
If we only have NaNs in our DataFrame, it is not considered empty!
>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False
>>> s = pd.Series([1, 2, 3])
>>> s.dtypes
dtype('int64')
The Series name can be set initially when calling the constructor.
>>> s = pd.Series([1, 2, 3], dtype=np.int64, name="Numbers")
>>> s
0 1
1 2
2 3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0 1
1 2
2 3
Name: Integers, dtype: int64
The name of a Series within a DataFrame is its column name.
>>> df = pd.DataFrame(
... [[1, 2], [3, 4], [5, 6]], columns=["Odd Numbers", "Even Numbers"]
... )
>>> df
Odd Numbers Even Numbers
0 1 2
1 3 4
2 5 6
>>> df["Even Numbers"].name
'Even Numbers'
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>
Flags can be get or set using .
>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False
Or by slicing with a key
>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
Create a DataFrame:
>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1 int64
col2 int64
dtype: object
Cast all columns to int32:
>>> df.astype("int32").dtypes
col1 int32
col2 int32
dtype: object
Cast col1 to int32 using a dictionary:
>>> df.astype({"col1": "int32"}).dtypes
col1 int32
col2 int64
dtype: object
Create a series:
>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0 1
1 2
dtype: int32
>>> ser.astype("int64")
0 1
1 2
dtype: int64
Convert to categorical type:
>>> ser.astype("category")
0 1
1 2
dtype: category
Categories (2, int32): [1, 2]
Convert to ordered categorical type with custom ordering:
>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
Create a series of dates:
>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0 2020-01-01
1 2020-01-02
2 2020-01-03
dtype: datetime64[us]
>>> df = pd.DataFrame(
... {
... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
... "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
... "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
... "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
... "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
... "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
... }
... )
Start with a DataFrame with default dtypes.
>>> df
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
>>> df.dtypes
a int32
b object
c object
d object
e float64
f float64
dtype: object
Convert the DataFrame to use best possible dtypes.
>>> dfn = df.convert_dtypes()
>>> dfn
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
>>> dfn.dtypes
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
Start with a Series of strings and missing data represented by np.nan.
>>> s = pd.Series(["a", "b", np.nan])
>>> s
0 a
1 b
2 NaN
dtype: str
Obtain a Series with dtype StringDtype.
>>> s.convert_dtypes()
0 a
1 b
2 <NA>
dtype: string
| Function | Description |
|---|---|
| DataFrame([data, index, columns, dtype, copy]) | Two-dimensional, size-mutable, potentially heterogeneous tabular data. |
Axes
| Function | Description |
|---|---|
| DataFrame.index | The index (row labels) of the DataFrame. |
| DataFrame.columns | The column labels of the DataFrame. |
| Function | Description |
|---|---|
| DataFrame.dtypes | Return the dtypes in the DataFrame. |
| DataFrame.info([verbose, buf, max_cols, ...]) | Print a concise summary of a DataFrame. |
| DataFrame.select_dtypes([include, exclude]) | Return a subset of the DataFrame's columns based on the column dtypes. |
| DataFrame.values | Return a Numpy representation of the DataFrame. |
| DataFrame.axes | Return a list representing the axes of the DataFrame. |
| DataFrame.ndim | Return an int representing the number of axes / array dimensions. |
| DataFrame.size | Return an int representing the number of elements in this object. |
| DataFrame.shape | Return a tuple representing the dimensionality of the DataFrame. |
| DataFrame.memory_usage([index, deep]) | Return the memory usage of each column in bytes. |
| DataFrame.empty | Indicator whether Series/DataFrame is empty. |
| DataFrame.set_flags(*[, copy, ...]) | Return a new object with updated flags. |
| Function | Description |
|---|---|
| DataFrame.astype(dtype[, copy, errors]) | Cast a pandas object to a specified dtype dtype. |
| DataFrame.convert_dtypes([infer_objects, ...]) | Convert columns from numpy dtypes to the best dtypes that support pd.NA. |
| DataFrame.infer_objects([copy]) | Attempt to infer better dtypes for object columns. |
| DataFrame.copy([deep]) | Make a copy of this object's indices and data. |
| DataFrame.to_numpy([dtype, copy, na_value]) | Convert the DataFrame to a NumPy array. |
| Function | Description |
|---|---|
| DataFrame.head([n]) | Return the first n rows. |
| DataFrame.at | Access a single value for a row/column label pair. |
| DataFrame.iat | Access a single value for a row/column pair by integer position. |
| DataFrame.loc | Access a group of rows and columns by label(s) or a boolean array. |
| DataFrame.iloc | Purely integer-location based indexing for selection by position. |
| DataFrame.insert(loc, column, value[, ...]) | Insert column into DataFrame at specified location. |
| DataFrame.iter() | Iterate over info axis. |
| DataFrame.items() | Iterate over (column name, Series) pairs. |
| DataFrame.keys() | Get the 'info axis' (see Indexing for more). |
| DataFrame.iterrows() | Iterate over DataFrame rows as (index, Series) pairs. |
| DataFrame.itertuples([index, name]) | Iterate over DataFrame rows as namedtuples. |
| DataFrame.pop(item) | Return item and drop it from DataFrame. |
| DataFrame.tail([n]) | Return the last n rows. |
| DataFrame.xs(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
| DataFrame.get(key[, default]) | Get item from object for given key (ex: DataFrame column). |
| DataFrame.isin(values) | Whether each element in the DataFrame is contained in values. |
| DataFrame.where(cond[, other, inplace, ...]) | Replace values where the condition is False. |
| DataFrame.mask(cond[, other, inplace, axis, ...]) | Replace values where the condition is True. |
| DataFrame.query(expr, *[, parser, engine, ...]) | Query the columns of a DataFrame with a boolean expression. |
| DataFrame.isetitem(loc, value) | Set the given value in the column with position loc. |
For more information on .at, .iat, .loc, and .iloc, see the indexing documentation.
| Function | Description |
|---|---|
| DataFrame.add(other) | Get Addition of DataFrame and other, column-wise. |
| DataFrame.add(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator add). |
| DataFrame.sub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator sub). |
| DataFrame.mul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). |
| DataFrame.div(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| DataFrame.truediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| DataFrame.floordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator floordiv). |
| DataFrame.mod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator mod). |
| DataFrame.pow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator pow). |
| DataFrame.dot(other) | Compute the matrix multiplication between the DataFrame and other. |
| DataFrame.radd(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator radd). |
| DataFrame.rsub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator rsub). |
| DataFrame.rmul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator rmul). |
| DataFrame.rdiv(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| DataFrame.rtruediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| DataFrame.rfloordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator rfloordiv). |
| DataFrame.rmod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator rmod). |
| DataFrame.rpow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator rpow). |
| DataFrame.lt(other[, axis, level]) | Get Greater than of dataframe and other, element-wise (binary operator lt). |
| DataFrame.gt(other[, axis, level]) | Get Greater than of dataframe and other, element-wise (binary operator gt). |
| DataFrame.le(other[, axis, level]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator le). |
| DataFrame.ge(other[, axis, level]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator ge). |
| DataFrame.ne(other[, axis, level]) | Get Not equal to of dataframe and other, element-wise (binary operator ne). |
| DataFrame.eq(other[, axis, level]) | Get Not equal to of dataframe and other, element-wise (binary operator eq). |
| DataFrame.combine(other, func[, fill_value, ...]) | Perform column-wise combine with another DataFrame. |
| DataFrame.combine_first(other) | Update null elements with value in the same location in other. |
| Function | Description |
|---|---|
| DataFrame.apply(func[, axis, raw, ...]) | Apply a function along an axis of the DataFrame. |
| DataFrame.map(func[, na_action]) | Apply a function to a Dataframe elementwise. |
| DataFrame.pipe(func, *args, **kwargs) | Apply chainable functions that expect Series or DataFrames. |
| DataFrame.agg([func, axis]) | Aggregate using one or more operations over the specified axis. |
| DataFrame.aggregate([func, axis]) | Aggregate using one or more operations over the specified axis. |
| DataFrame.transform(func[, axis]) | Call func on self producing a DataFrame with the same axis shape as self. |
| DataFrame.groupby([by, level, as_index, ...]) | Group DataFrame using a mapper or by a Series of columns. |
| DataFrame.rolling(window[, min_periods, ...]) | Provide rolling window calculations. |
| DataFrame.expanding([min_periods, method]) | Provide expanding window calculations. |
| DataFrame.ewm([com, span, halflife, alpha, ...]) | Provide exponentially weighted (EW) calculations. |
| Function | Description |
|---|---|
| DataFrame.abs() | Return a Series/DataFrame with absolute numeric value of each element. |
| DataFrame.all(*[, axis, bool_only, skipna]) | Return whether all elements are True, potentially over an axis. |
| DataFrame.any(*[, axis, bool_only, skipna]) | Return whether any element is True, potentially over an axis. |
| DataFrame.clip([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| DataFrame.corr([method, min_periods, ...]) | Compute pairwise correlation of columns, excluding NA/null values. |
| DataFrame.corrwith(other[, axis, drop, ...]) | Compute pairwise correlation. |
| DataFrame.count([axis, numeric_only]) | Count non-NA cells for each column or row. |
| DataFrame.cov([min_periods, ddof, numeric_only]) | Compute pairwise covariance of columns, excluding NA/null values. |
| DataFrame.cummax([axis, skipna, numeric_only]) | Return cumulative maximum over a DataFrame or Series axis. |
| DataFrame.cummin([axis, skipna, numeric_only]) | Return cumulative minimum over a DataFrame or Series axis. |
| DataFrame.cumprod([axis, skipna, numeric_only]) | Return cumulative product over a DataFrame or Series axis. |
| DataFrame.cumsum([axis, skipna, numeric_only]) | Return cumulative sum over a DataFrame or Series axis. |
| DataFrame.describe([percentiles, include, ...]) | Generate descriptive statistics. |
| DataFrame.diff([periods, axis]) | First discrete difference of element. |
| DataFrame.eval(expr, *[, inplace]) | Evaluate a string describing operations on DataFrame columns. |
| DataFrame.kurt(*[, axis, skipna, numeric_only]) | Return unbiased kurtosis over requested axis. |
| DataFrame.kurtosis(*[, axis, skipna, ...]) | Return unbiased kurtosis over requested axis. |
| DataFrame.max(*[, axis, skipna, numeric_only]) | Return the maximum of the values over the requested axis. |
| DataFrame.mean(*[, axis, skipna, numeric_only]) | Return the mean of the values over the requested axis. |
| DataFrame.median(*[, axis, skipna, numeric_only]) | Return the median of the values over the requested axis. |
| DataFrame.min(*[, axis, skipna, numeric_only]) | Return the minimum of the values over the requested axis. |
| DataFrame.mode([axis, numeric_only, dropna]) | Get the mode(s) of each element along the selected axis. |
| DataFrame.pct_change([periods, fill_method, ...]) | Fractional change between the current and a prior element. |
| DataFrame.prod(*[, axis, skipna, ...]) | Return the product of the values over the requested axis. |
| DataFrame.product(*[, axis, skipna, ...]) | Return the product of the values over the requested axis. |
| DataFrame.quantile([q, axis, numeric_only, ...]) | Return values at the given quantile over requested axis. |
| DataFrame.rank([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| DataFrame.round([decimals]) | Round numeric columns in a DataFrame to a variable number of decimal places. |
| DataFrame.sem(*[, axis, skipna, ddof, ...]) | Return unbiased standard error of the mean over requested axis. |
| DataFrame.skew(*[, axis, skipna, numeric_only]) | Return unbiased skew over requested axis. |
| DataFrame.sum(*[, axis, skipna, ...]) | Return the sum of the values over the requested axis. |
| DataFrame.std(*[, axis, skipna, ddof, ...]) | Return sample standard deviation over requested axis. |
| DataFrame.var(*[, axis, skipna, ddof, ...]) | Return unbiased variance over requested axis. |
| DataFrame.nunique([axis, dropna]) | Count number of distinct elements in specified axis. |
| DataFrame.value_counts([subset, normalize, ...]) | Return a Series containing the frequency of each distinct row in the DataFrame. |
| Function | Description |
|---|---|
| DataFrame.add_prefix(prefix[, axis]) | Prefix labels with string prefix. |
| DataFrame.add_suffix(suffix[, axis]) | Suffix labels with string suffix. |
| DataFrame.align(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. |
| DataFrame.at_time(time[, asof, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| DataFrame.between_time(start_time, end_time) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| DataFrame.drop([labels, axis, index, ...]) | Drop specified labels from rows or columns. |
| DataFrame.drop_duplicates([subset, keep, ...]) | Return DataFrame with duplicate rows removed. |
| DataFrame.duplicated([subset, keep]) | Return boolean Series denoting duplicate rows. |
| DataFrame.equals(other) | Test whether two objects contain the same elements. |
| DataFrame.filter([items, like, regex, axis]) | Subset the DataFrame or Series according to the specified index labels. |
| DataFrame.idxmax([axis, skipna, numeric_only]) | Return index of first occurrence of maximum over requested axis. |
| DataFrame.idxmin([axis, skipna, numeric_only]) | Return index of first occurrence of minimum over requested axis. |
| DataFrame.reindex([labels, index, columns, ...]) | Conform DataFrame to new index with optional filling logic. |
| DataFrame.reindex_like(other[, method, ...]) | Return an object with matching indices as other object. |
| DataFrame.rename([mapper, index, columns, ...]) | Rename columns or index labels. |
| DataFrame.rename_axis([mapper, index, ...]) | Set the name of the axis for the index or columns. |
| DataFrame.reset_index([level, drop, ...]) | Reset the index, or a level of it. |
| DataFrame.sample([n, frac, replace, ...]) | Return a random sample of items from an axis of object. |
| DataFrame.set_axis(labels, *[, axis, copy]) | Assign desired index to given axis. |
| DataFrame.set_index(keys, *[, drop, append, ...]) | Set the DataFrame index using existing columns. |
| DataFrame.take(indices[, axis]) | Return the elements in the given positional indices along an axis. |
| DataFrame.truncate([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
| Function | Description |
|---|---|
| DataFrame.bfill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by using the next valid observation to fill the gap. |
| DataFrame.dropna(*[, axis, how, thresh, ...]) | Remove missing values. |
| DataFrame.ffill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by propagating the last valid observation to next valid. |
| DataFrame.fillna(value, *[, axis, inplace, ...]) | Fill NA/NaN values with value. |
| DataFrame.interpolate([method, axis, limit, ...]) | Fill NaN values using an interpolation method. |
| DataFrame.isna() | Detect missing values. |
| DataFrame.isnull() | DataFrame.isnull is an alias for DataFrame.isna. |
| DataFrame.notna() | Detect existing (non-missing) values. |
| DataFrame.notnull() | DataFrame.notnull is an alias for DataFrame.notna. |
| DataFrame.replace([to_replace, value, ...]) | Replace values given in to_replace with value. |
| Function | Description |
|---|---|
| DataFrame.droplevel(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| DataFrame.pivot(*, columns[, index, values]) | Return reshaped DataFrame organized by given index / column values. |
| DataFrame.pivot_table([values, index, ...]) | Create a spreadsheet-style pivot table as a DataFrame. |
| DataFrame.reorder_levels(order[, axis]) | Rearrange index or column levels using input order. |
| DataFrame.sort_values(by, *[, axis, ...]) | Sort by the values along either axis. |
| DataFrame.sort_index(*[, axis, level, ...]) | Sort object by labels (along an axis). |
| DataFrame.nlargest(n, columns[, keep]) | Return the first n rows ordered by columns in descending order. |
| DataFrame.nsmallest(n, columns[, keep]) | Return the first n rows ordered by columns in ascending order. |
| DataFrame.swaplevel([i, j, axis]) | Swap levels i and j in a MultiIndex. |
| DataFrame.stack([level, dropna, sort, ...]) | Stack the prescribed level(s) from columns to index. |
| DataFrame.unstack([level, fill_value, sort]) | Pivot a level of the (necessarily hierarchical) index labels. |
| DataFrame.melt([id_vars, value_vars, ...]) | Unpivot DataFrame from wide to long format, optionally leaving identifiers set. |
| DataFrame.explode(column[, ignore_index]) | Transform each element of a list-like to a row, replicating index values. |
| DataFrame.squeeze([axis]) | Squeeze 1 dimensional axis objects into scalars. |
| DataFrame.to_xarray() | Return an xarray object from the pandas object. |
| DataFrame.T | The transpose of the DataFrame. |
| DataFrame.transpose(*args[, copy]) | Transpose index and columns. |
| Function | Description |
|---|---|
| DataFrame.assign(**kwargs) | Assign new columns to a DataFrame. |
| DataFrame.compare(other[, align_axis, ...]) | Compare to another DataFrame and show the differences. |
| DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns of another DataFrame. |
| DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
| DataFrame.update(other[, join, overwrite, ...]) | Modify in place using non-NA values from another DataFrame. |
| Function | Description |
|---|---|
| DataFrame.asfreq(freq[, method, how, ...]) | Convert time series to specified frequency. |
| DataFrame.asof(where[, subset]) | Return the last row(s) without any NaNs before where. |
| DataFrame.shift([periods, freq, axis, ...]) | Shift index by desired number of periods with an optional time freq. |
| DataFrame.first_valid_index() | Return index for first non-missing value or None, if no value is found. |
| DataFrame.last_valid_index() | Return index for last non-missing value or None, if no value is found. |
| DataFrame.resample(rule[, closed, label, ...]) | Resample time-series data. |
| DataFrame.to_period([freq, axis, copy]) | Convert DataFrame from DatetimeIndex to PeriodIndex. |
| DataFrame.to_timestamp([freq, how, axis, copy]) | Cast PeriodIndex to DatetimeIndex of timestamps, at beginning of period. |
| DataFrame.tz_convert(tz[, axis, level, copy]) | Convert tz-aware axis to target time zone. |
| DataFrame.tz_localize(tz[, axis, level, ...]) | Localize time zone naive index of a Series or DataFrame to target time zone. |
Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in DataFrame.attrs.
| Function | Description |
|---|---|
| Flags(obj, *, allows_duplicate_labels) | Flags that apply to pandas objects. |
DataFrame.attrs is a dictionary for storing global metadata for this DataFrame.
Warning
DataFrame.attrs is considered experimental and may change without warning.
| Function | Description |
|---|---|
| DataFrame.attrs | Dictionary of global attributes of this dataset. |
DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.
| Function | Description |
|---|---|
| DataFrame.plot([x, y, kind, ax, ....]) | DataFrame plotting accessor and method |
| Function | Description |
|---|---|
| DataFrame.plot.area([x, y, stacked]) | Draw a stacked area plot. |
| DataFrame.plot.bar([x, y, color]) | Vertical bar plot. |
| DataFrame.plot.barh([x, y, color]) | Make a horizontal bar plot. |
| DataFrame.plot.box([by]) | Make a box plot of the DataFrame columns. |
| DataFrame.plot.density([bw_method, ind, weights]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| DataFrame.plot.hexbin(x, y[, C, ...]) | Generate a hexagonal binning plot. |
| DataFrame.plot.hist([by, bins]) | Draw one histogram of the DataFrame's columns. |
| DataFrame.plot.kde([bw_method, ind, weights]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| DataFrame.plot.line([x, y, color]) | Plot Series or DataFrame as lines. |
| DataFrame.plot.pie([y]) | Generate a pie plot. |
| DataFrame.plot.scatter(x, y[, s, c]) | Create a scatter plot with varying marker point size and color. |
| Function | Description |
|---|---|
| DataFrame.boxplot([column, by, ax, ...]) | Make a box plot from DataFrame columns. |
| DataFrame.hist([column, by, grid, ...]) | Make a histogram of the DataFrame's columns. |
Sparse-dtype specific methods and attributes are provided under the DataFrame.sparse accessor.
| Function | Description |
|---|---|
| DataFrame.sparse.density | Ratio of non-sparse points to total (dense) data points. |
| Function | Description |
|---|---|
| DataFrame.sparse.from_spmatrix(data[, ...]) | Create a new DataFrame from a scipy sparse matrix. |
| DataFrame.sparse.to_coo() | Return the contents of the frame as a sparse SciPy COO matrix. |
| DataFrame.sparse.to_dense() | Convert a DataFrame with sparse values to dense. |
| Function | Description |
|---|---|
| DataFrame.from_arrow(data) | Construct a DataFrame from a tabular Arrow object. |
| DataFrame.from_dict(data[, orient, dtype, ...]) | Construct DataFrame from dict of array-like or dicts. |
| DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame. |
| DataFrame.to_orc([path, engine, index, ...]) | Write a DataFrame to the Optimized Row Columnar (ORC) format. |
| DataFrame.to_parquet([path, engine, ...]) | Write a DataFrame to the binary parquet format. |
| DataFrame.to_pickle(path, *[, compression, ...]) | Pickle (serialize) object to file. |
| DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
| DataFrame.to_hdf(path_or_buf, *, key[, ...]) | Write the contained data to an HDF5 file using HDFStore. |
| DataFrame.to_sql(name, con, *[, schema, ...]) | Write records stored in a DataFrame to a SQL database. |
| DataFrame.to_dict([orient, into, index]) | Convert the DataFrame to a dictionary. |
| DataFrame.to_excel(excel_writer, *[, ...]) | Write object to an Excel sheet. |
| DataFrame.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
| DataFrame.to_html([buf, columns, col_space, ...]) | Render a DataFrame as an HTML table. |
| DataFrame.to_feather(path, **kwargs) | Write a DataFrame to the binary Feather format. |
| DataFrame.to_latex([buf, columns, header, ...]) | Render object to a LaTeX tabular, longtable, or nested table. |
| DataFrame.to_stata(path, *[, convert_dates, ...]) | Export DataFrame object to Stata dta format. |
| DataFrame.to_records([index, column_dtypes, ...]) | Convert DataFrame to a NumPy record array. |
| DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |
| DataFrame.to_clipboard(*[, excel, sep]) | Copy object to the system clipboard. |
| DataFrame.to_markdown([buf, mode, index, ...]) | Print DataFrame in Markdown-friendly format. |
| DataFrame.style | Returns a Styler object. |
| DataFrame.dataframe([nan_as_null, ...]) | (DEPRECATED) Return the dataframe interchange object implementing the interchange protocol. |
Constructing DataFrame from a dictionary.
>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes
col1 int64
col2 int64
dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1 int8
col2 int8
dtype: object
Constructing DataFrame from a dictionary including Series:
>>> d = {"col1": [0, 1, 2, 3], "col2": pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
col1 col2
0 0 NaN
1 1 NaN
2 2 2.0
3 3 3.0
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(
... np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["a", "b", "c"]
... )
>>> df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9
Constructing DataFrame from a numpy ndarray that has labeled columns:
>>> data = np.array(
... [(1, 2, 3), (4, 5, 6), (7, 8, 9)],
... dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")],
... )
>>> df3 = pd.DataFrame(data, columns=["c", "a"])
>>> df3
c a
0 3 1
1 6 4
2 9 7
Constructing DataFrame from dataclass:
>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
x y
0 0 0
1 0 3
2 2 3
Constructing DataFrame from Series/DataFrame:
>>> ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> df = pd.DataFrame(data=ser, index=["a", "c"])
>>> df
0
a 1
c 3
>>> df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
>>> df2 = pd.DataFrame(data=df1, index=["a", "c"])
>>> df2
x
a 1
c 3
>>> df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
... 'Age': [25, 30, 35],
... 'Location': ['Seattle', 'New York', 'Kona']},
... index=([10, 20, 30]))
>>> df.index
Index([10, 20, 30], dtype='int64')
In this example, we create a DataFrame with 3 rows and 3 columns, including Name, Age, and Location information. We set the index labels to be the integers 10, 20, and 30. We then access the index attribute of the DataFrame, which returns an Index object containing the index labels.
>>> df.index = [100, 200, 300]
>>> df
Name Age Location
100 Alice 25 Seattle
200 Bob 30 New York
300 Aritra 35 Kona
In this example, we modify the index labels of the DataFrame by assigning a new list of labels to the index attribute. The DataFrame is then updated with the new labels, and the output shows the modified DataFrame.
>>> df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> df
A B
0 1 3
1 2 4
>>> df.columns
Index(['A', 'B'], dtype='str')
>>> df = pd.DataFrame(
... {
... "float": [1.0],
... "int": [1],
... "datetime": [pd.Timestamp("20180310")],
... "string": ["foo"],
... }
... )
>>> df.dtypes
float float64
int int64
datetime datetime64[us]
string str
dtype: object
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ["alpha", "beta", "gamma", "delta", "epsilon"]
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame(
... {
... "int_col": int_values,
... "text_col": text_values,
... "float_col": float_values,
... }
... )
>>> df
int_col text_col float_col
0 1 alpha 0.00
1 2 beta 0.25
2 3 gamma 0.50
3 4 delta 0.75
4 5 epsilon 1.00
Prints information of all columns:
>>> df.info(verbose=True)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 int_col 5 non-null int64
1 text_col 5 non-null str
2 float_col 5 non-null float64
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes
Prints a summary of columns count and its dtypes but not per column information:
>>> df.info(verbose=False)
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), str(1)
memory usage: 278.0 bytes
Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:
>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w", encoding="utf-8") as f:
... f.write(s)
260
The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:
>>> random_strings_array = np.random.choice(["a", "b", "c"], 10**6)
>>> df = pd.DataFrame(
... {
... "column_1": np.random.choice(["a", "b", "c"], 10**6),
... "column_2": np.random.choice(["a", "b", "c"], 10**6),
... "column_3": np.random.choice(["a", "b", "c"], 10**6),
... }
... )
>>> df.info()
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null str
1 column_2 1000000 non-null str
2 column_3 1000000 non-null str
dtypes: str(3)
memory usage: 25.7 MB
>>> df.info(memory_usage="deep")
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null str
1 column_2 1000000 non-null str
2 column_3 1000000 non-null str
dtypes: str(3)
memory usage: 25.7 MB
>>> df = pd.DataFrame(
... {"a": [1, 2] * 3, "b": [True, False] * 3, "c": [1.0, 2.0] * 3}
... )
>>> df
a b c
0 1 True 1.0
1 2 False 2.0
2 1 True 1.0
3 2 False 2.0
4 1 True 1.0
5 2 False 2.0
>>> df.select_dtypes(include="bool")
b
0 True
1 False
2 True
3 False
4 True
5 False
>>> df.select_dtypes(include=["float64"])
c
0 1.0
1 2.0
2 1.0
3 2.0
4 1.0
5 2.0
>>> df.select_dtypes(exclude=["int64"])
b c
0 True 1.0
1 False 2.0
2 True 1.0
3 False 2.0
4 True 1.0
5 False 2.0
A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
>>> df = pd.DataFrame(
... {"age": [3, 29], "height": [94, 170], "weight": [31, 115]}
... )
>>> df
age height weight
0 3 94 31
1 29 170 115
>>> df.dtypes
age int64
height int64
weight int64
dtype: object
>>> df.values
array([[ 3, 94, 31],
[ 29, 170, 115]])
A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).
>>> df2 = pd.DataFrame(
... [
... ("parrot", 24.0, "second"),
... ("lion", 80.5, 1),
... ("monkey", np.nan, None),
... ],
... columns=("name", "max_speed", "rank"),
... )
>>> df2.dtypes
name str
max_speed float64
rank object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
['lion', 80.5, 1],
['monkey', nan, None]], dtype=object)
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'], dtype='str')]
>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.ndim
1
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.ndim
2
>>> s = pd.Series({"a": 1, "b": 2, "c": 3})
>>> s.size
3
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.size
4
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
>>> df.shape
(2, 2)
>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4], "col3": [5, 6]})
>>> df.shape
(2, 3)
>>> dtypes = ["int64", "float64", "complex128", "object", "bool"]
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t)) for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
int64 float64 complex128 object bool
0 1 1.0 1.0+0.0j 1 True
1 1 1.0 1.0+0.0j 1 True
2 1 1.0 1.0+0.0j 1 True
3 1 1.0 1.0+0.0j 1 True
4 1 1.0 1.0+0.0j 1 True
>>> df.memory_usage()
Index 132
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
>>> df.memory_usage(index=False)
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
The memory footprint of object dtype columns is ignored by default:
>>> df.memory_usage(deep=True)
Index 132
int64 40000
float64 40000
complex128 80000
object 180000
bool 5000
dtype: int64
Use a Categorical for efficient storage of an object-dtype column with many repeated values.
>>> df["object"].astype("category").memory_usage(deep=True)
5140
An example of an actual empty DataFrame. Notice the index is empty:
>>> df_empty = pd.DataFrame({"A": []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True
If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:
>>> df = pd.DataFrame({"A": [np.nan]})
>>> df
A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
>>> ser_empty = pd.Series({"A": []})
>>> ser_empty
A []
dtype: object
>>> ser_empty.empty
False
>>> ser_empty = pd.Series()
>>> ser_empty.empty
True
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
Create a DataFrame:
>>> d = {"col1": [1, 2], "col2": [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1 int64
col2 int64
dtype: object
Cast all columns to int32:
>>> df.astype("int32").dtypes
col1 int32
col2 int32
dtype: object
Cast col1 to int32 using a dictionary:
>>> df.astype({"col1": "int32"}).dtypes
col1 int32
col2 int64
dtype: object
Create a series:
>>> ser = pd.Series([1, 2], dtype="int32")
>>> ser
0 1
1 2
dtype: int32
>>> ser.astype("int64")
0 1
1 2
dtype: int64
Convert to categorical type:
>>> ser.astype("category")
0 1
1 2
dtype: category
Categories (2, int32): [1, 2]
Convert to ordered categorical type with custom ordering:
>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
Create a series of dates:
>>> ser_date = pd.Series(pd.date_range("20200101", periods=3))
>>> ser_date
0 2020-01-01
1 2020-01-02
2 2020-01-03
dtype: datetime64[us]
>>> df = pd.DataFrame(
... {
... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
... "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
... "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
... "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
... "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
... "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
... }
... )
Start with a DataFrame with default dtypes.
>>> df
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
>>> df.dtypes
a int32
b object
c object
d object
e float64
f float64
dtype: object
Convert the DataFrame to use best possible dtypes.
>>> dfn = df.convert_dtypes()
>>> dfn
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
>>> dfn.dtypes
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
Start with a Series of strings and missing data represented by np.nan.
>>> s = pd.Series(["a", "b", np.nan])
>>> s
0 a
1 b
2 NaN
dtype: str
Obtain a Series with dtype StringDtype.
>>> s.convert_dtypes()
0 a
1 b
2 <NA>
dtype: string
>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
A
1 1
2 2
3 3
>>> df.dtypes
A object
dtype: object
>>> df.infer_objects().dtypes
A int64
dtype: object
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a 1
b 2
dtype: int64
>>> s_copy = s.copy(deep=True)
>>> s_copy
a 1
b 2
dtype: int64
Due to Copy-on-Write, shallow copies still protect data modifications. Note shallow does not get modified below.
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> shallow = s.copy(deep=False)
>>> s.iloc[1] = 200
>>> shallow
a 1
b 2
dtype: int64
When the data has object dtype, even a deep copy does not copy the underlying Python objects. Updating a nested data object will be reflected in the deep copy.
>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0 [10, 2]
1 [3, 4]
dtype: object
>>> deep
0 [10, 2]
1 [3, 4]
dtype: object
>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])
With heterogeneous data, the lowest common type will have to be used.
>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
[2. , 4.5]])
For a mix of numeric and non-numeric types, the output array will have object dtype.
>>> df["C"] = pd.date_range("2000", periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
[2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
>>> df = pd.DataFrame(
... {
... "animal": [
... "alligator",
... "bee",
... "falcon",
... "lion",
... "monkey",
... "parrot",
... "shark",
... "whale",
... "zebra",
... ]
... }
... )
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the first 5 lines
>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
Viewing the first n lines (three in this case)
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon
For negative values of n
>>> df.head(-3)
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
| Kind of Data | pandas Data Type | Scalar | Array |
|---|---|---|---|
| TZ-aware datetime | DatetimeTZDtype | Timestamp | Datetimes |
| Timedeltas | (none) | Timedelta | Timedeltas |
| Period (time spans) | PeriodDtype | Period | Periods |
| Intervals | IntervalDtype | Interval | Intervals |
| Nullable Integer | Int64Dtype, … | (none) | Nullable integer |
| Nullable Float | Float64Dtype, … | (none) | Nullable float |
| Categorical | CategoricalDtype | (none) | Categoricals |
| Sparse | SparseDtype | (none) | Sparse |
| Strings | StringDtype | str |
Strings |
| Nullable Boolean | BooleanDtype | bool |
Nullable Boolean |
| PyArrow | ArrowDtype | Python Scalars or NA | PyArrow |
pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.
| Function | Description |
|---|---|
| array(data[, dtype, copy]) | Create an array. |
Warning
This feature is experimental, and the API can change in a future release without warning.
The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with a pyarrow.DataType instead of a NumPy array and data type. The .dtype of a arrays.ArrowExtensionArray is an ArrowDtype.
Pyarrow provides similar array and data type support as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_()).
| PyArrow type | pandas extension type | NumPy type |
|---|---|---|
pyarrow.bool_() |
BooleanDtype | np.bool_ |
pyarrow.int8() |
Int8Dtype | np.int8 |
pyarrow.int16() |
Int16Dtype | np.int16 |
pyarrow.int32() |
Int32Dtype | np.int32 |
pyarrow.int64() |
Int64Dtype | np.int64 |
pyarrow.uint8() |
UInt8Dtype | np.uint8 |
pyarrow.uint16() |
UInt16Dtype | np.uint16 |
pyarrow.uint32() |
UInt32Dtype | np.uint32 |
pyarrow.uint64() |
UInt64Dtype | np.uint64 |
pyarrow.float32() |
Float32Dtype | np.float32 |
pyarrow.float64() |
Float64Dtype | np.float64 |
pyarrow.time32() |
(none) | (none) |
pyarrow.time64() |
(none) | (none) |
pyarrow.timestamp() |
DatetimeTZDtype | np.datetime64 |
pyarrow.date32() |
(none) | (none) |
pyarrow.date64() |
(none) | (none) |
pyarrow.duration() |
(none) | np.timedelta64 |
pyarrow.binary() |
(none) | (none) |
pyarrow.string() |
StringDtype | np.str_ |
pyarrow.decimal128() |
(none) | (none) |
pyarrow.list_() |
(none) | (none) |
pyarrow.map_() |
(none) | (none) |
pyarrow.dictionary() |
CategoricalDtype | (none) |
Note
Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow") and pd.ArrowDtype(pa.string()). pd.StringDtype("pyarrow") is described below in the string section and will be returned if the string alias "string[pyarrow]" is specified. pd.ArrowDtype(pa.string()) generally has better interoperability with ArrowDtype of different types.
While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returned as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.
| Function | Description |
|---|---|
| arrays.ArrowExtensionArray(values) | Pandas ExtensionArray backed by a PyArrow ChunkedArray. |
| Function | Description |
|---|---|
| ArrowDtype(pyarrow_dtype) | An ExtensionDtype for PyArrow data types. |
For more information, please see the PyArrow user guide.
NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.
Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaT is the missing value for datetime data.
| Function | Description |
|---|---|
| Timestamp([ts_input, year, month, day, ...]) | Pandas replacement for python datetime.datetime object. |
| Property | Description |
|---|---|
| Timestamp.asm8 | Return numpy datetime64 format with same precision. |
| Timestamp.day | Return the day of the Timestamp. |
| Timestamp.dayofweek | Return day of the week. |
| Timestamp.day_of_week | Return day of the week. |
| Timestamp.dayofyear | Return the day of the year. |
| Timestamp.day_of_year | Return the day of the year. |
| Timestamp.days_in_month | Return the number of days in the month. |
| Timestamp.daysinmonth | Return the number of days in the month. |
| Timestamp.fold | Return the fold value of the Timestamp. |
| Timestamp.hour | Return the hour of the Timestamp. |
| Timestamp.is_leap_year | Return True if year is a leap year. |
| Timestamp.is_month_end | Check if the date is the last day of the month. |
| Timestamp.is_month_start | Check if the date is the first day of the month. |
| Timestamp.is_quarter_end | Check if date is last day of the quarter. |
| Timestamp.is_quarter_start | Check if the date is the first day of the quarter. |
| Timestamp.is_year_end | Return True if date is last day of the year. |
| Timestamp.is_year_start | Return True if date is first day of the year. |
| Timestamp.max | |
| Timestamp.microsecond | Return the microsecond of the Timestamp. |
| Timestamp.min | |
| Timestamp.minute | Return the minute of the Timestamp. |
| Timestamp.month | Return the month of the Timestamp. |
| Timestamp.nanosecond | Return the nanosecond of the Timestamp. |
| Timestamp.quarter | Return the quarter of the year for the Timestamp. |
| Timestamp.resolution | |
| Timestamp.second | Return the second of the Timestamp. |
| Timestamp.tz | Alias for tzinfo. |
| Timestamp.tzinfo | Returns the timezone info of the Timestamp. |
| Timestamp.unit | The abbreviation associated with self._creso. |
| Timestamp.value | Return the value of the Timestamp. |
| Timestamp.week | Return the week number of the year. |
| Timestamp.weekofyear | Return the week number of the year. |
| Timestamp.year | Return the year of the Timestamp. |
| Method | Description |
|---|---|
| Timestamp.as_unit(unit[, round_ok]) | Convert the underlying int64 representation to the given unit. |
| Timestamp.astimezone(tz) | Convert timezone-aware Timestamp to another time zone. |
| Timestamp.ceil(freq[, ambiguous, nonexistent]) | Return a new Timestamp ceiled to this resolution. |
| Timestamp.combine(date, time) | Combine a date and time into a single Timestamp object. |
| Timestamp.ctime() | Return a ctime() style string representing the Timestamp. |
| Timestamp.date() | Returns datetime.date with the same year, month, and day. |
| Timestamp.day_name([locale]) | Return the day name of the Timestamp with specified locale. |
| Timestamp.dst() | Return the daylight saving time (DST) adjustment. |
| Timestamp.floor(freq[, ambiguous, nonexistent]) | Return a new Timestamp floored to this resolution. |
| Timestamp.fromordinal(ordinal[, tz]) | Construct a timestamp from a proleptic Gregorian ordinal. |
| Timestamp.fromtimestamp(ts[, tz]) | Create a Timestamp object from a POSIX timestamp. |
| Timestamp.isocalendar() | Return a named tuple containing ISO year, week number, and weekday. |
| Timestamp.isoformat([sep, timespec]) | Return the time formatted according to ISO 8601. |
| Timestamp.isoweekday() | Return the day of the week represented by the date. |
| Timestamp.month_name([locale]) | Return the month name of the Timestamp with specified locale. |
| Timestamp.normalize() | Normalize Timestamp to midnight, preserving tz information. |
| Timestamp.now([tz]) | Return new Timestamp object representing current time local to tz. |
| Timestamp.replace([year, month, day, hour, ...]) | Implements datetime.replace, handles nanoseconds. |
| Timestamp.round(freq[, ambiguous, nonexistent]) | Round the Timestamp to the specified resolution. |
| Timestamp.strftime(format) | Return a formatted string of the Timestamp. |
| Timestamp.strptime(date_string, format) | Convert string argument to datetime. |
| Timestamp.time() | Return time object with same time but with tzinfo=None. |
| Timestamp.timestamp() | Return POSIX timestamp as float. |
| Timestamp.timetuple() | Return time tuple, compatible with time.localtime(). |
| Timestamp.timetz() | Return time object with same time and tzinfo. |
| Timestamp.to_datetime64() | Return a NumPy datetime64 object with same precision. |
| Timestamp.to_numpy([dtype, copy]) | Convert the Timestamp to a NumPy datetime64. |
| Timestamp.to_julian_date() | Convert TimeStamp to a Julian Date. |
| Timestamp.to_period([freq]) | Return a period of which this timestamp is an observation. |
| Timestamp.to_pydatetime([warn]) | Convert a Timestamp object to a native Python datetime object. |
| Timestamp.today([tz]) | Return the current time in the local timezone. |
| Timestamp.toordinal() | Return proleptic Gregorian ordinal. |
| Timestamp.tz_convert(tz) | Convert timezone-aware Timestamp to another time zone. |
| Timestamp.tz_localize(tz[, ambiguous, ...]) | Localize the Timestamp to a timezone. |
| Timestamp.tzname() | Return time zone name. |
| Timestamp.utcfromtimestamp(ts) | Construct a timezone-aware UTC datetime from a POSIX timestamp. |
| Timestamp.utcnow() | Return a new Timestamp representing UTC day and time. |
| Timestamp.utcoffset() | Return utc offset. |
| Timestamp.utctimetuple() | Return UTC time tuple, compatible with time.localtime(). |
| Timestamp.weekday() | Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a arrays.DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
| Function | Description |
|---|---|
| arrays.DatetimeArray(data[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
| Function | Description |
|---|---|
| DatetimeTZDtype([unit, tz]) | An ExtensionDtype for timezone-aware datetime data. |
NumPy can natively represent timedeltas. pandas provides Timedelta for symmetry with Timestamp. NaT is the missing value for timedelta data.
| Function | Description |
|---|---|
| Timedelta([value, unit]) | Represents a duration, the difference between two dates or times. |
| Property | Description |
|---|---|
| Timedelta.asm8 | Return a numpy timedelta64 array scalar view. |
| Timedelta.components | Return a components namedtuple-like. |
| Timedelta.days | Returns the days of the timedelta. |
| Timedelta.max | |
| Timedelta.microseconds | Return the number of microseconds (n), where 0 <= n < 1 millisecond. |
| Timedelta.min | |
| Timedelta.nanoseconds | Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
| Timedelta.resolution | |
| Timedelta.seconds | Return the total hours, minutes, and seconds of the timedelta as seconds. |
| Timedelta.unit | Return the unit of Timedelta object. |
| Timedelta.value | Return the value of Timedelta object in nanoseconds. |
| Timedelta.view(dtype) | Array view compatibility. |
| Method | Description |
|---|---|
| Timedelta.as_unit(unit[, round_ok]) | Convert the underlying int64 representation to the given unit. |
| Timedelta.ceil(freq) | Return a new Timedelta ceiled to this resolution. |
| Timedelta.floor(freq) | Return a new Timedelta floored to this resolution. |
| Timedelta.isoformat() | Format the Timedelta as ISO 8601 Duration. |
| Timedelta.round(freq) | Round the Timedelta to the specified resolution. |
| Timedelta.to_pytimedelta() | Convert a pandas Timedelta object into a python datetime.timedelta object. |
| Timedelta.to_timedelta64() | Return a numpy.timedelta64 object with 'ns' precision. |
| Timedelta.to_numpy([dtype, copy]) | Convert the Timedelta to a NumPy timedelta64. |
| Timedelta.total_seconds() | Total seconds in the duration. |
A collection of Timedelta may be stored in a TimedeltaArray.
| Function | Description |
|---|---|
| arrays.TimedeltaArray(data[, dtype, freq, copy]) | Pandas ExtensionArray for timedelta data. |
pandas represents spans of times as Period objects.
| Function | Description |
|---|---|
| Period([value, freq, ordinal, year, month, ...]) | Represents a period of time. |
| Property | Description |
|---|---|
| Period.day | Get day of the month that a Period falls on. |
| Period.dayofweek | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.day_of_week | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.dayofyear | Return the day of the year. |
| Period.day_of_year | Return the day of the year. |
| Period.days_in_month | Get the total number of days in the month that this period falls on. |
| Period.daysinmonth | Get the total number of days of the month that this period falls on. |
| Period.end_time | Get the Timestamp for the end of the period. |
| Period.freq | Return the frequency object for this Period. |
| Period.freqstr | Return a string representation of the frequency. |
| Period.hour | Get the hour of the day component of the Period. |
| Period.is_leap_year | Return True if the period's year is in a leap year. |
| Period.minute | Get minute of the hour component of the Period. |
| Period.month | Return the month this Period falls on. |
| Period.ordinal | Return the integer ordinal for this Period. |
| Period.quarter | Return the quarter this Period falls on. |
| Period.qyear | Fiscal year the Period lies in according to its starting-quarter. |
| Period.second | Get the second component of the Period. |
| Period.start_time | Get the Timestamp for the start of the period. |
| Period.week | Get the week of the year on the given Period. |
| Period.weekday | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.weekofyear | Get the week of the year on the given Period. |
| Period.year | Return the year this Period falls on. |
| Method | Description |
|---|---|
| Period.asfreq(freq[, how]) | Convert Period to desired frequency, at the start or end of the interval. |
| Period.now(freq) | Return the period of now's date. |
| Period.strftime(fmt) | Returns a formatted string representation of the Period. |
| Period.to_timestamp([freq, how]) | Return the Timestamp representation of the Period. |
A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq.
| Function | Description |
|---|---|
| arrays.PeriodArray(values[, dtype, copy]) | Pandas ExtensionArray for storing Period data. |
| Function | Description |
|---|---|
| PeriodDtype(freq) | An ExtensionDtype for Period data. |
Arbitrary intervals can be represented as Interval objects.
| Function | Description |
|---|---|
| Interval | Immutable object implementing an Interval, a bounded slice-like interval. |
| Property | Description |
|---|---|
| Interval.closed | String describing the inclusive side the intervals. |
| Interval.closed_left | Check if the interval is closed on the left side. |
| Interval.closed_right | Check if the interval is closed on the right side. |
| Interval.is_empty | Indicates if an interval is empty, meaning it contains no points. |
| Interval.left | Left bound for the interval. |
| Interval.length | Return the length of the Interval. |
| Interval.mid | Return the midpoint of the Interval. |
| Interval.open_left | Check if the interval is open on the left side. |
| Interval.open_right | Check if the interval is open on the right side. |
| Interval.overlaps(other) | Check whether two Interval objects overlap. |
| Interval.right | Right bound for the interval. |
A collection of intervals may be stored in an arrays.IntervalArray.
| Function | Description |
|---|---|
| arrays.IntervalArray(data[, closed, dtype, ...]) | Pandas array for interval data that are closed on the same side. |
| Function | Description |
|---|---|
| IntervalDtype([subtype, closed]) | An ExtensionDtype for Interval data. |
numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.
| Function | Description |
|---|---|
| arrays.IntegerArray(values, mask[, copy]) | Array of integer (optional missing) values. |
| Function | Description |
|---|---|
| Int8Dtype() | An ExtensionDtype for int8 integer data. |
| Int16Dtype() | An ExtensionDtype for int16 integer data. |
| Int32Dtype() | An ExtensionDtype for int32 integer data. |
| Int64Dtype() | An ExtensionDtype for int64 integer data. |
| UInt8Dtype() | An ExtensionDtype for uint8 integer data. |
| UInt16Dtype() | An ExtensionDtype for uint16 integer data. |
| UInt32Dtype() | An ExtensionDtype for uint32 integer data. |
| UInt64Dtype() | An ExtensionDtype for uint64 integer data. |
| Function | Description |
|---|---|
| arrays.FloatingArray(values, mask[, copy]) | Array of floating (optional missing) values. |
| Function | Description |
|---|---|
| Float32Dtype() | An ExtensionDtype for float32 data. |
| Float64Dtype() | An ExtensionDtype for float64 data. |
pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.
| Function | Description |
|---|---|
| CategoricalDtype([categories, ordered]) | Type for categorical data with the categories and orderedness. |
| Property | Description |
|---|---|
| CategoricalDtype.categories | An Index containing the unique categories allowed. |
| CategoricalDtype.ordered | Whether the categories have an ordered relationship. |
Categorical data can be stored in a pandas.Categorical:
| Function | Description |
|---|---|
| Categorical(values[, categories, ordered, ...]) | Represent a categorical variable in classic R / S-plus fashion. |
The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:
| Function | Description |
|---|---|
| Categorical.from_codes(codes[, categories, ...]) | Make a Categorical type from codes and categories or dtype. |
The dtype information is available on the Categorical
| Property | Description |
|---|---|
| Categorical.dtype | The CategoricalDtype for this instance. |
| Categorical.categories | The categories of this categorical. |
| Categorical.ordered | Whether the categories have an ordered relationship. |
| Categorical.codes | The category codes of this categorical index. |
np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
| Function | Description |
|---|---|
| Categorical.array([dtype, copy]) | The numpy array interface. |
A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either
- the string
'category' - an instance of
CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.
More methods are available on Categorical:
| Method | Description |
|---|---|
| Categorical.as_ordered() | Set the Categorical to be ordered. |
| Categorical.as_unordered() | Set the Categorical to be unordered. |
| Categorical.set_categories(new_categories[, ...]) | Set the categories to the specified new categories. |
| Categorical.rename_categories(new_categories) | Rename categories. |
| Categorical.reorder_categories(new_categories) | Reorder categories as specified in new_categories. |
| Categorical.add_categories(new_categories) | Add new categories. |
| Categorical.remove_categories(removals) | Remove the specified categories. |
| Categorical.remove_unused_categories() | Remove categories which are not used. |
| Categorical.map(mapper[, na_action]) | Map categories using an input mapping or function. |
Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.
| Function | Description |
|---|---|
| arrays.SparseArray(data[, sparse_index, ...]) | An ExtensionArray for storing sparse data. |
| Function | Description |
|---|---|
| SparseDtype([dtype, fill_value]) | Dtype for data stored in SparseArray. |
The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse accessor and the user guide for more.
When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").
| Function | Description |
|---|---|
| arrays.StringArray(values, *[, dtype, copy]) | Extension array for string data. |
| arrays.ArrowStringArray(values, *[, dtype]) | Extension array for string data in a pyarrow.ChunkedArray. |
| Function | Description |
|---|---|
| StringDtype([storage, na_value]) | Extension dtype for string data. |
The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.
The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False) with missing values, which is not possible with a bool numpy.ndarray.
| Function | Description |
|---|---|
| arrays.BooleanArray(values, mask[, copy]) | Array of boolean (True/False) data with missing values. |
| Function | Description |
|---|---|
| BooleanDtype() | Extension dtype for boolean data. |
| Function | Description |
|---|---|
| api.types.union_categoricals(to_union[, ...]) | Combine list-like of Categorical-like, unioning categories. |
| api.types.infer_dtype(value[, skipna]) | Return a string label of the type of the elements in a list-like input. |
| api.types.pandas_dtype(dtype) | Convert input into a pandas only dtype object or a numpy dtype object. |
| Function | Description |
|---|---|
| api.types.is_any_real_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a real number dtype. |
| api.types.is_bool_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a boolean dtype. |
| api.types.is_categorical_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype. |
| api.types.is_complex_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a complex dtype. |
| api.types.is_datetime64_any_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64 dtype. |
| api.types.is_datetime64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the datetime64 dtype. |
| api.types.is_datetime64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64[ns] dtype. |
| api.types.is_datetime64tz_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype. |
| api.types.is_dtype_equal(source, target) | Check if two dtypes are equal. |
| api.types.is_extension_array_dtype(arr_or_dtype) | Check if an object is a pandas extension array type. |
| api.types.is_float_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a float dtype. |
| api.types.is_int64_dtype(arr_or_dtype) | (DEPRECATED) Check whether the provided array or dtype is of the int64 dtype. |
| api.types.is_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an integer dtype. |
| api.types.is_interval_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Interval dtype. |
| api.types.is_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a numeric dtype. |
| api.types.is_object_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the object dtype. |
| api.types.is_period_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Period dtype. |
| api.types.is_signed_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a signed integer dtype. |
| api.types.is_string_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the string dtype. |
| api.types.is_timedelta64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the timedelta64 dtype. |
| api.types.is_timedelta64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the timedelta64[ns] dtype. |
| api.types.is_unsigned_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an unsigned integer dtype. |
| api.types.is_sparse(arr) | (DEPRECATED) Check whether an array-like is a 1-D pandas sparse array. |
| Function | Description |
|---|---|
| api.types.is_dict_like(obj) | Check if the object is dict-like. |
| api.types.is_file_like(obj) | Check if the object is a file-like object. |
| api.types.is_list_like(obj[, allow_sets]) | Check if the object is list-like. |
| api.types.is_named_tuple(obj) | Check if the object is a named tuple. |
| api.types.is_iterator(obj) | Check if the object is an iterator. |
| Function | Description |
|---|---|
| api.types.is_bool(obj) | Return True if given object is boolean. |
| api.types.is_complex(obj) | Return True if given object is complex. |
| api.types.is_float(obj) | Return True if given object is float. |
| api.types.is_hashable(obj[, allow_slice]) | Return True if hash(obj) will succeed, False otherwise. |
| api.types.is_integer(obj) | Return True if given object is integer. |
| api.types.is_number(obj) | Check if the object is a number. |
| api.types.is_re(obj) | Check if the object is a regex pattern instance. |
| api.types.is_re_compilable(obj) | Check if the object can be compiled into a regex pattern instance. |
| api.types.is_scalar(val) | Return True if given object is scalar. |
If a dtype is not specified, pandas will infer the best dtype from the values. See the description of dtype for the types pandas infers for.
>>> pd.array([1, 2])
<IntegerArray>
[1, 2]
Length: 2, dtype: Int64
>>> pd.array([1, 2, np.nan])
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64
>>> pd.array([1.1, 2.2])
<FloatingArray>
[1.1, 2.2]
Length: 2, dtype: Float64
>>> pd.array(["a", None, "c"])
<ArrowStringArray>
['a', <NA>, 'c']
Length: 3, dtype: string
>>> with pd.option_context("string_storage", "python"):
... arr = pd.array(["a", None, "c"])
>>> arr
<StringArray>
['a', <NA>, 'c']
Length: 3, dtype: string
>>> pd.array([pd.Period("2000", freq="D"), pd.Period("2000", freq="D")])
<PeriodArray>
['2000-01-01', '2000-01-01']
Length: 2, dtype: period[D]
You can use the string alias for dtype
>>> pd.array(["a", "b", "a"], dtype="category")
['a', 'b', 'a']
Categories (2, str): ['a', 'b']
Or specify the actual dtype
>>> pd.array(
... ["a", "b", "a"], dtype=pd.CategoricalDtype(["a", "b", "c"], ordered=True)
... )
['a', 'b', 'a']
Categories (3, str): ['a' < 'b' < 'c']
If pandas does not infer a dedicated extension type a arrays.NumpyExtensionArray is returned.
>>> pd.array([1 + 1j, 3 + 2j])
<NumpyExtensionArray>
[(1+1j), (3+2j)]
Length: 2, dtype: complex128
As mentioned in the “Notes” section, new extension types may be added in the future (by pandas or 3rd party libraries), causing the return value to no longer be a arrays.NumpyExtensionArray. Specify the dtype as a NumPy dtype if you need to ensure there’s no future change in behavior.
>>> pd.array([1, 2], dtype=np.dtype("int32"))
<NumpyExtensionArray>
[1, 2]
Length: 2, dtype: int32
data must be 1-dimensional. A ValueError is raised when the input has the wrong dimensionality.
>>> pd.array(1)
Traceback (most recent call last):
...
ValueError: Cannot pass scalar '1' to 'pandas.array'.
Create an ArrowExtensionArray with pandas.array():
>>> pd.array([1, 1, None], dtype="int64[pyarrow]")
<ArrowExtensionArray>
[1, 1, <NA>]
Length: 3, dtype: int64[pyarrow]
>>> import pyarrow as pa
>>> pd.ArrowDtype(pa.int64())
int64[pyarrow]
Types with parameters must be constructed with ArrowDtype.
>>> pd.ArrowDtype(pa.timestamp("s", tz="America/New_York"))
timestamp[s, tz=America/New_York][pyarrow]
>>> pd.ArrowDtype(pa.list_(pa.int64()))
list<item: int64>[pyarrow]
Using the primary calling convention:
This converts a datetime-like string
>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')
This converts a float representing a Unix epoch in units of seconds
>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
This converts an int representing a Unix-epoch in units of weeks
>>> pd.Timestamp(1535, unit='W')
Timestamp('1999-06-03 00:00:00')
This converts an int representing a Unix-epoch in units of seconds and for a particular timezone
>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')
Using the other two forms that mimic the API for datetime.datetime:
>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')
>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00')
>>> ts = pd.Timestamp(2020, 3, 14, 15)
>>> ts.asm8
numpy.datetime64('2020-03-14T15:00:00.000000')
>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.day
31
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31
>>> ts = pd.Timestamp("2024-11-03 01:30:00")
>>> ts.fold
0
>>> ts = pd.Timestamp("2024-08-31 16:16:30")
>>> ts.hour
16
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_leap_year
True
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_month_end
True
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_start
False
>>> ts = pd.Timestamp(2020, 1, 1)
>>> ts.is_month_start
True
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_end
False
>>> ts = pd.Timestamp(2020, 3, 31)
>>> ts.is_quarter_end
True
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_start
False
>>> ts = pd.Timestamp(2020, 4, 1)
>>> ts.is_quarter_start
True
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_year_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_year_end
True
Many of these methods or variants thereof are available on the objects that contain an index (Series/DataFrame) and those should most likely be used before calling these methods directly.
| Function | Description |
|---|---|
| Index([data, dtype, copy, name, tupleize_cols]) | Immutable sequence used for indexing and alignment. |
| Function | Description |
|---|---|
| Index.values | Return an array representing the data in the Index. |
| Index.is_monotonic_increasing | Return a boolean if the values are equal or increasing. |
| Index.is_monotonic_decreasing | Return a boolean if the values are equal or decreasing. |
| Index.is_unique | Return if the index has unique values. |
| Index.has_duplicates | Check if the Index has duplicate values. |
| Index.hasnans | Return True if there are any NaNs. |
| Index.dtype | Return the dtype object of the underlying data. |
| Index.inferred_type | Return a string of the type inferred from the values. |
| Index.shape | Return a tuple of the shape of the underlying data. |
| Index.name | Return Index or MultiIndex name. |
| Index.names | Get names on index. |
| Index.nbytes | Return the number of bytes in the underlying data. |
| Index.ndim | Number of dimensions of the underlying data, by definition 1. |
| Index.size | Return the number of elements in the underlying data. |
| Index.empty | Indicator whether Index is empty. |
| Index.T | Return the transpose, which is by definition self. |
| Index.memory_usage([deep]) | Memory usage of the values. |
| Index.array | The ExtensionArray of the data backing this Index. |
| Function | Description |
|---|---|
| Index.all(*args, **kwargs) | Return whether all elements are Truthy. |
| Index.any(*args, **kwargs) | Return whether any element is Truthy. |
| Index.argmin([axis, skipna]) | Return int position of the smallest value in the Index. |
| Index.argmax([axis, skipna]) | Return int position of the largest value in the Index. |
| Index.copy([name, deep]) | Make a copy of this object. |
| Index.delete(loc) | Make new Index with passed location(-s) deleted. |
| Index.drop(labels[, errors]) | Make new Index with passed list of labels deleted. |
| Index.drop_duplicates(*[, keep]) | Return Index with duplicate values removed. |
| Index.duplicated([keep]) | Indicate duplicate index values. |
| Index.equals(other) | Determine if two Index object are equal. |
| Index.factorize([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| Index.identical(other) | Similar to equals, but checks that object attributes and types are also equal. |
| Index.insert(loc, item) | Make new Index inserting new item at location. |
| Index.is_(other) | More flexible, faster check like is but that works through views. |
| Index.min([axis, skipna]) | Return the minimum value of the Index. |
| Index.max([axis, skipna]) | Return the maximum value of the Index. |
| Index.reindex(target[, method, level, ...]) | Create index with target's values. |
| Index.rename(name, *[, inplace]) | Alter Index or MultiIndex name. |
| Index.repeat(repeats[, axis]) | Repeat elements of an Index. |
| Index.where(cond[, other]) | Replace values where the condition is False. |
| Index.take(indices[, axis, allow_fill, ...]) | Return a new Index of the values selected by the indices. |
| Index.putmask(mask, value) | Return a new Index of the values set with the mask. |
| Index.unique([level]) | Return unique values in the index. |
| Index.nunique([dropna]) | Return number of unique elements in the object. |
| Index.value_counts([normalize, sort, ...]) | Return a Series containing counts of unique values. |
| Function | Description |
|---|---|
| Index.set_names(names, *[, level, inplace]) | Set Index or MultiIndex name. |
| Index.droplevel([level]) | Return index with requested level(s) removed. |
| Function | Description |
|---|---|
| Index.fillna(value) | Fill NA/NaN values with the specified value. |
| Index.dropna([how]) | Return Index without NA/NaN values. |
| Index.isna() | Detect missing values. |
| Index.notna() | Detect existing (non-missing) values. |
| Function | Description |
|---|---|
| Index.astype(dtype[, copy]) | Create an Index with values cast to dtypes. |
| Index.infer_objects([copy]) | If we have an object dtype, try to infer a non-object dtype. |
| Index.item() | Return the first element of the underlying data as a Python scalar. |
| Index.map(mapper[, na_action]) | Map values using an input mapping or function. |
| Index.ravel([order]) | Return a view on self. |
| Index.to_list() | Return a list of the values. |
| Index.to_series([index, name]) | Create a Series with both index and values equal to the index keys. |
| Index.to_frame([index, name]) | Create a DataFrame with a column containing the Index. |
| Index.to_numpy([dtype, copy, na_value]) | A NumPy ndarray representing the values in this Series or Index. |
| Index.view([cls]) | Return a view of the Index with the specified dtype or a new Index instance. |
| Function | Description |
|---|---|
| Index.argsort(*args, **kwargs) | Return the integer indices that would sort the index. |
| Index.searchsorted(value[, side, sorter]) | Find indices where elements should be inserted to maintain order. |
| Index.sort_values(*[, return_indexer, ...]) | Return a sorted copy of the index. |
| Function | Description |
|---|---|
| Index.shift([periods, freq]) | Shift index by desired number of time frequency increments. |
| Function | Description |
|---|---|
| Index.append(other) | Append a collection of Index options together. |
| Index.join(other, *[, how, level, ...]) | Compute join_index and indexers to conform data structures to the new index. |
| Index.intersection(other[, sort]) | Form the intersection of two Index objects. |
| Index.union(other[, sort]) | Form the union of two Index objects. |
| Index.difference(other[, sort]) | Return a new Index with elements of index not in other. |
| Index.symmetric_difference(other[, ...]) | Compute the symmetric difference of two Index objects. |
| Function | Description |
|---|---|
| Index.asof(label) | Return the label from the index, or, if not present, the previous one. |
| Index.asof_locs(where, mask) | Return the locations (indices) of labels in the index. |
| Index.get_indexer(target[, method, limit, ...]) | Compute indexer and mask for new index given the current index. |
| Index.get_indexer_for(target) | Guaranteed return of an indexer even when non-unique. |
| Index.get_indexer_non_unique(target) | Compute indexer and mask for new index given the current index. |
| Index.get_level_values(level) | Return an Index of values for requested level. |
| Index.get_loc(key) | Get integer location, slice or boolean mask for requested label. |
| Index.get_slice_bound(label, side) | Calculate slice bound that corresponds to given label. |
| Index.isin(values[, level]) | Return a boolean array where the index values are in values. |
| Index.slice_indexer([start, end, step]) | Compute the slice indexer for input labels and step. |
| Index.slice_locs([start, end, step]) | Compute slice locations for input labels. |
| Function | Description |
|---|---|
| RangeIndex([start, stop, step, dtype, copy, ...]) | Immutable Index implementing a monotonic integer range. |
| Function | Description |
|---|---|
| RangeIndex.start | The value of the start parameter (0 if this was not supplied). |
| RangeIndex.stop | The value of the stop parameter. |
| RangeIndex.step | The value of the step parameter (1 if this was not supplied). |
| RangeIndex.from_range(data[, name, dtype]) | Create pandas.RangeIndex from a range object. |
| Function | Description |
|---|---|
| CategoricalIndex([data, categories, ...]) | Index based on an underlying Categorical. |
| Function | Description |
|---|---|
| CategoricalIndex.append(other) | Append a collection of Index options together. |
| CategoricalIndex.codes | The category codes of this categorical index. |
| CategoricalIndex.categories | The categories of this categorical. |
| CategoricalIndex.ordered | Whether the categories have an ordered relationship. |
| CategoricalIndex.rename_categories(...) | Rename categories. |
| CategoricalIndex.reorder_categories(...[, ...]) | Reorder categories as specified in new_categories. |
| CategoricalIndex.add_categories(new_categories) | Add new categories. |
| CategoricalIndex.remove_categories(removals) | Remove the specified categories. |
| CategoricalIndex.remove_unused_categories() | Remove categories which are not used. |
| CategoricalIndex.set_categories(new_categories) | Set the categories to the specified new categories. |
| CategoricalIndex.as_ordered() | Set the Categorical to be ordered. |
| CategoricalIndex.as_unordered() | Set the Categorical to be unordered. |
| Function | Description |
|---|---|
| CategoricalIndex.map(mapper[, na_action]) | Map values using input an input mapping or function. |
| CategoricalIndex.equals(other) | Determine if two CategoricalIndex objects contain the same elements. |
| Function | Description |
|---|---|
| IntervalIndex(data[, closed, dtype, copy, ...]) | Immutable index of intervals that are closed on the same side. |
| Function | Description |
|---|---|
| IntervalIndex.from_arrays(left, right[, ...]) | Construct from two arrays defining the left and right bounds. |
| IntervalIndex.from_tuples(data[, closed, ...]) | Construct an IntervalIndex from an array-like of tuples. |
| IntervalIndex.from_breaks(breaks[, closed, ...]) | Construct an IntervalIndex from an array of splits. |
| IntervalIndex.left | Return left bounds of the intervals in the IntervalIndex. |
| IntervalIndex.right | Return right bounds of the intervals in the IntervalIndex. |
| IntervalIndex.mid | Return the midpoint of each interval in the IntervalIndex as an Index. |
| IntervalIndex.closed | String describing the inclusive side the intervals. |
| IntervalIndex.length | Calculate the length of each interval in the IntervalIndex. |
| IntervalIndex.values | Return an array representing the data in the Index. |
| IntervalIndex.is_empty | Indicates if an interval is empty, meaning it contains no points. |
| IntervalIndex.is_non_overlapping_monotonic | Return a boolean whether the IntervalArray/IntervalIndex is non-overlapping and monotonic. |
| IntervalIndex.is_overlapping | Return True if the IntervalIndex has overlapping intervals, else False. |
| IntervalIndex.get_loc(key) | Get integer location, slice or boolean mask for requested label. |
| IntervalIndex.get_indexer(target[, method, ...]) | Compute indexer and mask for new index given the current index. |
| IntervalIndex.set_closed(closed) | Return an identical IntervalArray closed on the specified side. |
| IntervalIndex.contains(other) | Check elementwise if the Intervals contain the value. |
| IntervalIndex.overlaps(other) | Check elementwise if an Interval overlaps the values in the IntervalArray. |
| IntervalIndex.to_tuples([na_tuple]) | Return an ndarray (if self is IntervalArray) or Index (if self is IntervalIndex) of tuples of the form (left, right). |
| Function | Description |
|---|---|
| MultiIndex([levels, codes, sortorder, ...]) | A multi-level, or hierarchical, index object for pandas objects. |
| Function | Description |
|---|---|
| MultiIndex.from_arrays(arrays[, sortorder, ...]) | Convert arrays to MultiIndex. |
| MultiIndex.from_tuples(tuples[, sortorder, ...]) | Convert list of tuples to MultiIndex. |
| MultiIndex.from_product(iterables[, ...]) | Make a MultiIndex from the cartesian product of multiple iterables. |
| MultiIndex.from_frame(df[, sortorder, names]) | Make a MultiIndex from a DataFrame. |
| Function | Description |
|---|---|
| MultiIndex.names | Names of levels in MultiIndex. |
| MultiIndex.levels | Levels of the MultiIndex. |
| MultiIndex.codes | Codes of the MultiIndex. |
| MultiIndex.nlevels | Integer number of levels in this MultiIndex. |
| MultiIndex.levshape | A tuple representing the length of each level in the MultiIndex. |
| MultiIndex.dtypes | Return the dtypes as a Series for the underlying MultiIndex. |
| Function | Description |
|---|---|
| MultiIndex.set_levels(levels, *[, level, ...]) | Set new levels on MultiIndex. |
| MultiIndex.set_codes(codes, *[, level, ...]) | Set new codes on MultiIndex. |
| MultiIndex.to_flat_index() | Convert a MultiIndex to an Index of Tuples containing the level values. |
| MultiIndex.to_frame([index, name, ...]) | Create a DataFrame with the levels of the MultiIndex as columns. |
| MultiIndex.sortlevel([level, ascending, ...]) | Sort MultiIndex at the requested level. |
| MultiIndex.droplevel([level]) | Return index with requested level(s) removed. |
| MultiIndex.swaplevel([i, j]) | Swap level i with level j. |
| MultiIndex.reorder_levels(order) | Rearrange levels using input order. |
| MultiIndex.remove_unused_levels() | Create new MultiIndex from current that removes unused levels. |
| MultiIndex.drop(codes[, level, errors]) | Make a new pandas.MultiIndex with the passed list of codes deleted. |
| MultiIndex.copy([names, deep, name]) | Make a copy of this object. |
| MultiIndex.append(other) | Append a collection of Index options together. |
| MultiIndex.truncate([before, after]) | Slice index between two labels / tuples, return new MultiIndex. |
| Function | Description |
|---|---|
| MultiIndex.get_loc(key) | Get location for a label or a tuple of labels. |
| MultiIndex.get_locs(seq) | Get location for a sequence of labels. |
| MultiIndex.get_loc_level(key[, level, ...]) | Get location and sliced index for requested label(s)/level(s). |
| MultiIndex.get_indexer(target[, method, ...]) | Compute indexer and mask for new index given the current index. |
| MultiIndex.get_level_values(level) | Return vector of label values for requested level. |
| Function | Description |
|---|---|
| IndexSlice | Create an object to more easily perform multi-index slicing. |
| Function | Description |
|---|---|
| DatetimeIndex([data, freq, tz, ambiguous, ...]) | Immutable ndarray-like of datetime64 data. |
| Function | Description |
|---|---|
| DatetimeIndex.year | The year of the datetime. |
| DatetimeIndex.month | The month as January=1, December=12. |
| DatetimeIndex.day | The day of the datetime. |
| DatetimeIndex.hour | The hours of the datetime. |
| DatetimeIndex.minute | The minutes of the datetime. |
| DatetimeIndex.second | The seconds of the datetime. |
| DatetimeIndex.microsecond | The microseconds of the datetime. |
| DatetimeIndex.nanosecond | The nanoseconds of the datetime. |
| DatetimeIndex.date | Returns numpy array of python datetime.date objects. |
| DatetimeIndex.time | Returns numpy array of datetime.time objects. |
| DatetimeIndex.timetz | Returns numpy array of datetime.time objects with timezones. |
| DatetimeIndex.dayofyear | The ordinal day of the year. |
| DatetimeIndex.day_of_year | The ordinal day of the year. |
| DatetimeIndex.dayofweek | The day of the week with Monday=0, Sunday=6. |
| DatetimeIndex.day_of_week | The day of the week with Monday=0, Sunday=6. |
| DatetimeIndex.weekday | The day of the week with Monday=0, Sunday=6. |
| DatetimeIndex.quarter | The quarter of the date. |
| DatetimeIndex.tz | Return the timezone. |
| DatetimeIndex.freq | Return the frequency object if it is set, otherwise None. |
| DatetimeIndex.freqstr | Return the frequency object as a string if it's set, otherwise None. |
| DatetimeIndex.is_month_start | Indicates whether the date is the first day of the month. |
| DatetimeIndex.is_month_end | Indicates whether the date is the last day of the month. |
| DatetimeIndex.is_quarter_start | Indicator for whether the date is the first day of a quarter. |
| DatetimeIndex.is_quarter_end | Indicator for whether the date is the last day of a quarter. |
| DatetimeIndex.is_year_start | Indicate whether the date is the first day of a year. |
| DatetimeIndex.is_year_end | Indicate whether the date is the last day of the year. |
| DatetimeIndex.is_leap_year | Boolean indicator if the date belongs to a leap year. |
| DatetimeIndex.inferred_freq | Return the inferred frequency of the index. |
| Function | Description |
|---|---|
| DatetimeIndex.indexer_at_time(time[, asof]) | Return index locations of values at particular time of day. |
| DatetimeIndex.indexer_between_time(...[, ...]) | Return index locations of values between particular times of day. |
| Function | Description |
|---|---|
| DatetimeIndex.normalize() | Convert times to midnight. |
| DatetimeIndex.strftime(date_format) | Convert to Index using specified date_format. |
| DatetimeIndex.snap([freq]) | Snap time stamps to nearest occurring frequency. |
| DatetimeIndex.tz_convert(tz) | Convert tz-aware Datetime Array/Index from one time zone to another. |
| DatetimeIndex.tz_localize(tz[, ambiguous, ...]) | Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index. |
| DatetimeIndex.round(freq[, ambiguous, ...]) | Perform round operation on the data to the specified freq. |
| DatetimeIndex.floor(freq[, ambiguous, ...]) | Perform floor operation on the data to the specified freq. |
| DatetimeIndex.ceil(freq[, ambiguous, ...]) | Perform ceil operation on the data to the specified freq. |
| DatetimeIndex.month_name([locale]) | Return the month names with specified locale. |
| DatetimeIndex.day_name([locale]) | Return the day names with specified locale. |
| Function | Description |
|---|---|
| DatetimeIndex.as_unit(unit[, round_ok]) | Convert to a dtype with the given unit resolution. |
| DatetimeIndex.to_period([freq]) | Cast to PeriodArray/PeriodIndex at a particular frequency. |
| DatetimeIndex.to_pydatetime() | Return an ndarray of datetime.datetime objects. |
| DatetimeIndex.to_series([index, name]) | Create a Series with both index and values equal to the index keys. |
| DatetimeIndex.to_frame([index, name]) | Create a DataFrame with a column containing the Index. |
| DatetimeIndex.to_julian_date() | Convert TimeStamp to a Julian Date. |
| Function | Description |
|---|---|
| DatetimeIndex.mean(*[, skipna, axis]) | Return the mean value of the Array. |
| DatetimeIndex.std([axis, dtype, out, ddof, ...]) | Return sample standard deviation over requested axis. |
| Function | Description |
|---|---|
| TimedeltaIndex([data, freq, dtype, copy, name]) | Immutable Index of timedelta64 data. |
| Function | Description |
|---|---|
| TimedeltaIndex.days | Number of days for each element. |
| TimedeltaIndex.seconds | Number of seconds (>= 0 and less than 1 day) for each element. |
| TimedeltaIndex.microseconds | Number of microseconds (>= 0 and less than 1 second) for each element. |
| TimedeltaIndex.nanoseconds | Number of nanoseconds (>= 0 and less than 1 microsecond) for each element. |
| TimedeltaIndex.components | Return a DataFrame of the individual resolution components of the Timedeltas. |
| TimedeltaIndex.inferred_freq | Return the inferred frequency of the index. |
| Function | Description |
|---|---|
| TimedeltaIndex.as_unit(unit) | Convert to a dtype with the given unit resolution. |
| TimedeltaIndex.to_pytimedelta() | Return an ndarray of datetime.timedelta objects. |
| TimedeltaIndex.to_series([index, name]) | Create a Series with both index and values equal to the index keys. |
| TimedeltaIndex.round(freq[, ambiguous, ...]) | Perform round operation on the data to the specified freq. |
| TimedeltaIndex.floor(freq[, ambiguous, ...]) | Perform floor operation on the data to the specified freq. |
| TimedeltaIndex.ceil(freq[, ambiguous, ...]) | Perform ceil operation on the data to the specified freq. |
| TimedeltaIndex.to_frame([index, name]) | Create a DataFrame with a column containing the Index. |
| Function | Description |
|---|---|
| TimedeltaIndex.mean(*[, skipna, axis]) | Return the mean value of the Array. |
| Function | Description |
|---|---|
| PeriodIndex([data, freq, dtype, copy, name]) | Immutable ndarray holding ordinal values indicating regular periods in time. |
| Function | Description |
|---|---|
| PeriodIndex.day | The days of the period. |
| PeriodIndex.dayofweek | The day of the week with Monday=0, Sunday=6. |
| PeriodIndex.day_of_week | The day of the week with Monday=0, Sunday=6. |
| PeriodIndex.dayofyear | The ordinal day of the year. |
| PeriodIndex.day_of_year | The ordinal day of the year. |
| PeriodIndex.days_in_month | The number of days in the month. |
| PeriodIndex.daysinmonth | The number of days in the month. |
| PeriodIndex.end_time | Get the Timestamp for the end of the period. |
| PeriodIndex.freq | Return the frequency object if it is set, otherwise None. |
| PeriodIndex.freqstr | Return the frequency object as a string if it's set, otherwise None. |
| PeriodIndex.hour | The hour of the period. |
| PeriodIndex.is_leap_year | Logical indicating if the date belongs to a leap year. |
| PeriodIndex.minute | The minute of the period. |
| PeriodIndex.month | The month as January=1, December=12. |
| PeriodIndex.quarter | The quarter of the date. |
| PeriodIndex.qyear | Fiscal year the Period lies in according to its starting-quarter. |
| PeriodIndex.second | The second of the period. |
| PeriodIndex.start_time | Get the Timestamp for the start of the period. |
| PeriodIndex.week | The week ordinal of the year. |
| PeriodIndex.weekday | The day of the week with Monday=0, Sunday=6. |
| PeriodIndex.weekofyear | The week ordinal of the year. |
| PeriodIndex.year | The year of the period. |
| Function | Description |
|---|---|
| PeriodIndex.asfreq([freq, how]) | Convert the PeriodArray to the specified frequency freq. |
| PeriodIndex.strftime(date_format) | Convert to Index using specified date_format. |
| PeriodIndex.to_timestamp([freq, how]) | Cast to DatetimeArray/Index. |
| PeriodIndex.from_fields(*[, year, quarter, ...]) | Construct a PeriodIndex from fields (year, month, day, etc.). |
| PeriodIndex.from_ordinals(ordinals, *, freq) | Construct a PeriodIndex from ordinals. |
>>> pd.Index([1, 2, 3])
Index([1, 2, 3], dtype='int64')
>>> pd.Index(list("abc"))
Index(['a', 'b', 'c'], dtype='str')
>>> pd.Index([1, 2, 3], dtype="uint8")
Index([1, 2, 3], dtype='uint8')
For pandas.Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.values
array([1, 2, 3])
For pandas.IntervalIndex:
>>> idx = pd.interval_range(start=0, end=5)
>>> idx.values
<IntervalArray>
[(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]]
Length: 5, dtype: interval[int64, right]
>>> pd.Index([1, 2, 3]).is_monotonic_increasing
True
>>> pd.Index([1, 2, 2]).is_monotonic_increasing
True
>>> pd.Index([1, 3, 2]).is_monotonic_increasing
False
>>> pd.Index([3, 2, 1]).is_monotonic_decreasing
True
>>> pd.Index([3, 2, 2]).is_monotonic_decreasing
True
>>> pd.Index([3, 1, 2]).is_monotonic_decreasing
False
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.is_unique
False
>>> idx = pd.Index([1, 5, 7])
>>> idx.is_unique
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
... "category"
... )
>>> idx.is_unique
False
>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.is_unique
True
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple", "Watermelon"]).astype(
... "category"
... )
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple", "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
>>> s = pd.Series([1, 2, 3], index=["a", "b", None])
>>> s
a 1
b 2
None 3
dtype: int64
>>> s.index.hasnans
True
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.dtype
dtype('int64')
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.inferred_type
'integer'
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.shape
(3,)
>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx
Index([1, 2, 3], dtype='int64', name='x')
>>> idx.name
'x'
>>> idx = pd.Index([1, 2, 3], name="x")
>>> idx.names
FrozenList(['x'])
>>> idx = pd.Index([1, 2, 3], name=("x", "y"))
>>> idx.names
FrozenList([('x', 'y')])
If the index does not have a name set:
>>> idx = pd.Index([1, 2, 3])
>>> idx.names
FrozenList([None])
For Series:
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.nbytes
34
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.nbytes
24
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.ndim
1
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.ndim
1
For Series:
>>> s = pd.Series(["Ant", "Bear", "Cow"])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.size
3
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.size
3
>>> idx = pd.Index([1, 2, 3])
>>> idx
Index([1, 2, 3], dtype='int64')
>>> idx.empty
False
>>> idx_empty = pd.Index([])
>>> idx_empty
Index([], dtype='object')
>>> idx_empty.empty
True
If we only have NaNs in our DataFrame, it is not considered empty!
>>> idx = pd.Index([np.nan, np.nan])
>>> idx
Index([nan, nan], dtype='float64')
>>> idx.empty
False
For Series:
>>> s = pd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0 Ant
1 Bear
2 Cow
dtype: str
>>> s.T
0 Ant
1 Bear
2 Cow
dtype: str
For Index:
>>> idx = pd.Index([1, 2, 3])
>>> idx.T
Index([1, 2, 3], dtype='int64')
>>> idx = pd.Index([1, 2, 3])
>>> idx.memory_usage()
24
For regular NumPy types like int, and float, a NumpyExtensionArray is returned.
>>> pd.Index([1, 2, 3]).array
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray is returned
>>> idx = pd.Index(pd.Categorical(["a", "b", "a"]))
>>> idx.array
['a', 'b', 'a']
Categories (2, str): ['a', 'b']
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because 0 is considered False.
>>> pd.Index([0, 1, 2]).all()
False
pandas.api.typing.DataFrameGroupBy and pandas.api.typing.SeriesGroupBy instances are returned by groupby calls pandas.DataFrame.groupby() and pandas.Series.groupby() respectively.
| Function | Description |
|---|---|
| DataFrameGroupBy.iter() | Groupby iterator. |
| SeriesGroupBy.iter() | Groupby iterator. |
| DataFrameGroupBy.groups | Dict {group name -> group labels}. |
| SeriesGroupBy.groups | Dict {group name -> group labels}. |
| DataFrameGroupBy.indices | Dict {group name -> group indices}. |
| SeriesGroupBy.indices | Dict {group name -> group indices}. |
| DataFrameGroupBy.get_group(name) | Construct DataFrame from group with provided name. |
| SeriesGroupBy.get_group(name) | Construct DataFrame from group with provided name. |
| Function | Description |
|---|---|
| Grouper(*args, **kwargs) | A Grouper allows the user to specify a groupby instruction for an object. |
| Function | Description |
|---|---|
| NamedAgg(column, aggfunc, *args, **kwargs) | Helper for column specific aggregation with control over output column names. |
| Function | Description |
|---|---|
| SeriesGroupBy.apply(func, *args, **kwargs) | Apply function func group-wise and combine the results together. |
| DataFrameGroupBy.apply(func, *args[, ...]) | Apply function func group-wise and combine the results together. |
| SeriesGroupBy.agg([func, engine, engine_kwargs]) | Aggregate using one or more operations. |
| DataFrameGroupBy.agg([func, engine, ...]) | Aggregate using one or more operations. |
| SeriesGroupBy.aggregate([func, engine, ...]) | Aggregate using one or more operations. |
| DataFrameGroupBy.aggregate([func, engine, ...]) | Aggregate using one or more operations. |
| SeriesGroupBy.transform(func, *args[, ...]) | Call function producing a same-indexed Series on each group. |
| DataFrameGroupBy.transform(func, *args[, ...]) | Call function producing a same-indexed DataFrame on each group. |
| SeriesGroupBy.pipe(func, *args, **kwargs) | Apply a func with arguments to this GroupBy object and return its result. |
| DataFrameGroupBy.pipe(func, *args, **kwargs) | Apply a func with arguments to this GroupBy object and return its result. |
| DataFrameGroupBy.filter(func[, dropna]) | Filter elements from groups that don't satisfy a criterion. |
| SeriesGroupBy.filter(func[, dropna]) | Filter elements from groups that don't satisfy a criterion. |
| Function | Description |
|---|---|
| DataFrameGroupBy.all([skipna]) | Return True if all values in the group are truthful, else False. |
| DataFrameGroupBy.any([skipna]) | Return True if any value in the group is truthful, else False. |
| DataFrameGroupBy.bfill([limit]) | Backward fill the values. |
| DataFrameGroupBy.corr([method, min_periods, ...]) | Compute pairwise correlation of columns, excluding NA/null values. |
| DataFrameGroupBy.corrwith(other[, drop, ...]) | (DEPRECATED) Compute pairwise correlation. |
| DataFrameGroupBy.count() | Compute count of group, excluding missing values. |
| DataFrameGroupBy.cov([min_periods, ddof, ...]) | Compute pairwise covariance of columns, excluding NA/null values. |
| DataFrameGroupBy.cumcount([ascending]) | Number each item in each group from 0 to the length of that group - 1. |
| DataFrameGroupBy.cummax([numeric_only]) | Cumulative max for each group. |
| DataFrameGroupBy.cummin([numeric_only]) | Cumulative min for each group. |
| DataFrameGroupBy.cumprod([numeric_only]) | Cumulative product for each group. |
| DataFrameGroupBy.cumsum([numeric_only]) | Cumulative sum for each group. |
| DataFrameGroupBy.describe([percentiles, ...]) | Generate descriptive statistics. |
| DataFrameGroupBy.diff([periods]) | First discrete difference of element. |
| DataFrameGroupBy.ewm([com, span, halflife, ...]) | Return an ewm grouper, providing ewm functionality per group. |
| DataFrameGroupBy.expanding([min_periods, method]) | Return an expanding grouper, providing expanding functionality per group. |
| DataFrameGroupBy.ffill([limit]) | Forward fill the values. |
| DataFrameGroupBy.first([numeric_only, ...]) | Compute the first entry of each column within each group. |
| DataFrameGroupBy.head([n]) | Return first n rows of each group. |
| DataFrameGroupBy.idxmax([skipna, numeric_only]) | Return index of first occurrence of maximum in each group. |
| DataFrameGroupBy.idxmin([skipna, numeric_only]) | Return index of first occurrence of minimum in each group. |
| DataFrameGroupBy.last([numeric_only, ...]) | Compute the last entry of each column within each group. |
| DataFrameGroupBy.max([numeric_only, ...]) | Compute max of group values. |
| DataFrameGroupBy.mean([numeric_only, ...]) | Compute mean of groups, excluding missing values. |
| DataFrameGroupBy.median([numeric_only, skipna]) | Compute median of groups, excluding missing values. |
| DataFrameGroupBy.min([numeric_only, ...]) | Compute min of group values. |
| DataFrameGroupBy.ngroup([ascending]) | Number each group from 0 to the number of groups - 1. |
| DataFrameGroupBy.nth | Take the nth row from each group if n is an int, otherwise a subset of rows. |
| DataFrameGroupBy.nunique([dropna]) | Return DataFrame with counts of unique elements in each position. |
| DataFrameGroupBy.ohlc() | Compute open, high, low and close values of a group, excluding missing values. |
| DataFrameGroupBy.pct_change([periods, ...]) | Calculate pct_change of each value to previous entry in group. |
| DataFrameGroupBy.prod([numeric_only, ...]) | Compute prod of group values. |
| DataFrameGroupBy.quantile([q, ...]) | Return group values at the given quantile, a la numpy.percentile. |
| DataFrameGroupBy.rank([method, ascending, ...]) | Provide the rank of values within each group. |
| DataFrameGroupBy.resample(rule, *args[, ...]) | Provide resampling when using a TimeGrouper. |
| DataFrameGroupBy.rolling(window[, ...]) | Return a rolling grouper, providing rolling functionality per group. |
| DataFrameGroupBy.sample([n, frac, replace, ...]) | Return a random sample of items from each group. |
| DataFrameGroupBy.sem([ddof, numeric_only, ...]) | Compute standard error of the mean of groups, excluding missing values. |
| DataFrameGroupBy.shift([periods, freq, ...]) | Shift each group by periods observations. |
| DataFrameGroupBy.size() | Compute group sizes. |
| DataFrameGroupBy.skew([skipna, numeric_only]) | Return unbiased skew within groups. |
| DataFrameGroupBy.kurt([skipna, numeric_only]) | Return unbiased kurtosis within groups. |
| DataFrameGroupBy.std([ddof, engine, ...]) | Compute standard deviation of groups, excluding missing values. |
| DataFrameGroupBy.sum([numeric_only, ...]) | Compute sum of group values. |
| DataFrameGroupBy.var([ddof, engine, ...]) | Compute variance of groups, excluding missing values. |
| DataFrameGroupBy.tail([n]) | Return last n rows of each group. |
| DataFrameGroupBy.take(indices, **kwargs) | Return the elements in the given positional indices in each group. |
| DataFrameGroupBy.value_counts([subset, ...]) | Return a Series or DataFrame containing counts of unique rows. |
| Function | Description |
|---|---|
| SeriesGroupBy.all([skipna]) | Return True if all values in the group are truthful, else False. |
| SeriesGroupBy.any([skipna]) | Return True if any value in the group is truthful, else False. |
| SeriesGroupBy.bfill([limit]) | Backward fill the values. |
| SeriesGroupBy.corr(other[, method, min_periods]) | Compute correlation between each group and another Series. |
| SeriesGroupBy.count() | Compute count of group, excluding missing values. |
| SeriesGroupBy.cov(other[, min_periods, ddof]) | Compute covariance between each group and another Series. |
| SeriesGroupBy.cumcount([ascending]) | Number each item in each group from 0 to the length of that group - 1. |
| SeriesGroupBy.cummax([numeric_only]) | Cumulative max for each group. |
| SeriesGroupBy.cummin([numeric_only]) | Cumulative min for each group. |
| SeriesGroupBy.cumprod([numeric_only]) | Cumulative product for each group. |
| SeriesGroupBy.cumsum([numeric_only]) | Cumulative sum for each group. |
| SeriesGroupBy.describe([percentiles, ...]) | Generate descriptive statistics. |
| SeriesGroupBy.diff([periods]) | First discrete difference of element. |
| SeriesGroupBy.ewm([com, span, halflife, ...]) | Return an ewm grouper, providing ewm functionality per group. |
| SeriesGroupBy.expanding([min_periods, method]) | Return an expanding grouper, providing expanding functionality per group. |
| SeriesGroupBy.ffill([limit]) | Forward fill the values. |
| SeriesGroupBy.first([numeric_only, ...]) | Compute the first entry of each column within each group. |
| SeriesGroupBy.head([n]) | Return first n rows of each group. |
| SeriesGroupBy.last([numeric_only, ...]) | Compute the last entry of each column within each group. |
| SeriesGroupBy.idxmax([skipna]) | Return the row label of the maximum value. |
| SeriesGroupBy.idxmin([skipna]) | Return the row label of the minimum value. |
| SeriesGroupBy.is_monotonic_increasing | Return whether each group's values are monotonically increasing. |
| SeriesGroupBy.is_monotonic_decreasing | Return whether each group's values are monotonically decreasing. |
| SeriesGroupBy.max([numeric_only, min_count, ...]) | Compute max of group values. |
| SeriesGroupBy.mean([numeric_only, skipna, ...]) | Compute mean of groups, excluding missing values. |
| SeriesGroupBy.median([numeric_only, skipna]) | Compute median of groups, excluding missing values. |
| SeriesGroupBy.min([numeric_only, min_count, ...]) | Compute min of group values. |
| SeriesGroupBy.ngroup([ascending]) | Number each group from 0 to the number of groups - 1. |
| SeriesGroupBy.nlargest([n, keep]) | Return the largest n elements. |
| SeriesGroupBy.nsmallest([n, keep]) | Return the smallest n elements. |
| SeriesGroupBy.nth | Take the nth row from each group if n is an int, otherwise a subset of rows. |
| SeriesGroupBy.nunique([dropna]) | Return number of unique elements in the group. |
| SeriesGroupBy.unique() | Return unique values for each group. |
| SeriesGroupBy.ohlc() | Compute open, high, low and close values of a group, excluding missing values. |
| SeriesGroupBy.pct_change([periods, ...]) | Calculate pct_change of each value to previous entry in group. |
| SeriesGroupBy.prod([numeric_only, ...]) | Compute prod of group values. |
| SeriesGroupBy.quantile([q, interpolation, ...]) | Return group values at the given quantile, a la numpy.percentile. |
| SeriesGroupBy.rank([method, ascending, ...]) | Provide the rank of values within each group. |
| SeriesGroupBy.resample(rule, *args[, ...]) | Provide resampling when using a TimeGrouper. |
| SeriesGroupBy.rolling(window[, min_periods, ...]) | Return a rolling grouper, providing rolling functionality per group. |
| SeriesGroupBy.sample([n, frac, replace, ...]) | Return a random sample of items from each group. |
| SeriesGroupBy.sem([ddof, numeric_only, skipna]) | Compute standard error of the mean of groups, excluding missing values. |
| SeriesGroupBy.shift([periods, freq, ...]) | Shift each group by periods observations. |
| SeriesGroupBy.size() | Compute group sizes. |
| SeriesGroupBy.skew([skipna, numeric_only]) | Return unbiased skew within groups. |
| SeriesGroupBy.kurt([skipna, numeric_only]) | Return unbiased kurtosis within groups. |
| SeriesGroupBy.std([ddof, engine, ...]) | Compute standard deviation of groups, excluding missing values. |
| SeriesGroupBy.sum([numeric_only, min_count, ...]) | Compute sum of group values. |
| SeriesGroupBy.var([ddof, engine, ...]) | Compute variance of groups, excluding missing values. |
| SeriesGroupBy.tail([n]) | Return last n rows of each group. |
| SeriesGroupBy.take(indices, **kwargs) | Return the elements in the given positional indices in each group. |
| SeriesGroupBy.value_counts([normalize, ...]) | Return a Series or DataFrame containing counts of unique rows. |
| Function | Description |
|---|---|
| DataFrameGroupBy.boxplot([subplots, column, ...]) | Make box plots from DataFrameGroupBy data. |
| DataFrameGroupBy.hist([column, by, grid, ...]) | Make a histogram of the DataFrame's columns. |
| SeriesGroupBy.hist([by, ax, grid, ...]) | Draw histogram for each group's values using Series.hist() API. |
| DataFrameGroupBy.plot | Make plots of groups from a DataFrame. |
| SeriesGroupBy.plot | Make plots of groups from a Series. |
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> for x, y in ser.groupby(level=0):
... print(f"{x}\n{y}\n")
a
a 1
a 2
dtype: int64
b
b 3
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> for x, y in df.groupby(by=["a"]):
... print(f"{x}\n{y}\n")
(1,)
a b c
0 1 2 3
1 1 5 6
(7,)
a b c
2 7 8 9
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> for x, y in ser.resample("MS"):
... print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01 1
2023-01-15 2
dtype: int64
2023-02-01 00:00:00
2023-02-01 3
2023-02-15 4
dtype: int64
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> for x, y in ser.groupby(level=0):
... print(f"{x}\n{y}\n")
a
a 1
a 2
dtype: int64
b
b 3
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> for x, y in df.groupby(by=["a"]):
... print(f"{x}\n{y}\n")
(1,)
a b c
0 1 2 3
1 1 5 6
(7,)
a b c
2 7 8 9
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> for x, y in ser.resample("MS"):
... print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01 1
2023-01-15 2
dtype: int64
2023-02-01 00:00:00
2023-02-01 3
2023-02-15 4
dtype: int64
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
Timestamp('2023-02-01 00:00:00'): np.int64(4)}
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
Timestamp('2023-02-01 00:00:00'): np.int64(4)}
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a 1
a 2
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).get_group((1,))
a b c
owl 1 2 3
toucan 1 5 6
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01 1
2023-01-15 2
dtype: int64
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a 1
a 2
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).get_group((1,))
a b c
owl 1 2 3
toucan 1 5 6
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01 1
2023-01-15 2
dtype: int64
df.groupby(pd.Grouper(key="Animal")) is equivalent to df.groupby('Animal')
>>> df = pd.DataFrame(
... {
... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
... "Speed": [100, 5, 200, 300, 15],
... }
... )
>>> df
Animal Speed
0 Falcon 100
1 Parrot 5
2 Falcon 200
3 Falcon 300
4 Parrot 15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
Speed
Animal
Falcon 200.0
Parrot 10.0
Specify a resample operation on the column ‘Publish date’
>>> df = pd.DataFrame(
... {
... "Publish date": [
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-09"),
... pd.Timestamp("2000-01-16"),
... ],
... "ID": [0, 1, 2, 3],
... "Price": [10, 20, 30, 40],
... }
... )
>>> df
Publish date ID Price
0 2000-01-02 0 10
1 2000-01-02 1 20
2 2000-01-09 2 30
3 2000-01-16 3 40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
ID Price
Publish date
2000-01-02 0.5 15.0
2000-01-09 2.0 30.0
2000-01-16 3.0 40.0
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = "2000-10-01 23:30:00", "2000-10-02 00:30:00"
>>> rng = pd.date_range(start, end, freq="7min")
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min")).sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="epoch")).sum()
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", origin="2000-01-01")).sum()
2000-10-01 23:24:00 3
2000-10-01 23:41:00 15
2000-10-01 23:58:00 45
2000-10-02 00:15:00 45
Freq: 17min, dtype: int64
If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:
>>> ts.groupby(pd.Grouper(freq="17min", origin="start")).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17min, dtype: int64
>>> ts.groupby(pd.Grouper(freq="17min", offset="23h30min")).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17min, dtype: int64
To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:
>>> ts.groupby(pd.Grouper(freq="17min", offset="2min")).sum()
2000-10-01 23:16:00 0
2000-10-01 23:33:00 9
2000-10-01 23:50:00 36
2000-10-02 00:07:00 39
2000-10-02 00:24:00 24
Freq: 17min, dtype: int64
>>> df = pd.DataFrame({"key": [1, 1, 2], "a": [-1, 0, 1], 1: [10, 11, 12]})
>>> agg_a = pd.NamedAgg(column="a", aggfunc="min")
>>> agg_1 = pd.NamedAgg(column=1, aggfunc=lambda x: np.mean(x))
>>> df.groupby("key").agg(result_a=agg_a, result_1=agg_1)
result_a result_1
key
1 -1 10.5
2 1 12.0
>>> def n_between(ser, low, high, **kwargs):
... return ser.between(low, high, **kwargs).sum()
>>> agg_between = pd.NamedAgg("a", n_between, 0, 1)
>>> df.groupby("key").agg(count_between=agg_between)
count_between
key
1 1
2 1
>>> agg_between_kw = pd.NamedAgg("a", n_between, 0, 1, inclusive="both")
>>> df.groupby("key").agg(count_between_kw=agg_between_kw)
count_between_kw
key
1 1
2 1
>>> s = pd.Series([0, 1, 2], index="a a b".split())
>>> g1 = s.groupby(s.index, group_keys=False)
>>> g2 = s.groupby(s.index, group_keys=True)
From s above we can see that g has two groups, a and b. Notice that g1 have g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:
Example 1: The function passed to apply takes a Series as its argument and returns a Series. apply combines the result for each group together into a new Series.
The resulting dtype will reflect the return value of the passed func.
>>> g1.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a 0.0
a 2.0
b 1.0
dtype: float64
In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:
>>> g2.apply(lambda x: x * 2 if x.name == "a" else x / 2)
a a 0.0
a 2.0
b b 1.0
dtype: float64
Example 2: The function passed to apply takes a Series as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:
>>> g1.apply(lambda x: x.max() - x.min())
a 1
b 0
dtype: int64
The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.
>>> g2.apply(lambda x: x.max() - x.min())
a 1
b 0
dtype: int64
>>> df = pd.DataFrame({"A": "a a b".split(), "B": [1, 2, 3], "C": [4, 6, 5]})
>>> g1 = df.groupby("A", group_keys=False)
>>> g2 = df.groupby("A", group_keys=True)
Notice that g1 and g2 have two groups, a and b, and only differ in their group_keys argument. Calling apply in various ways, we can get different grouping results:
Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:
>>> g1[["B", "C"]].apply(lambda x: x / x.sum())
B C
0 0.333333 0.4
1 0.666667 0.6
2 1.000000 1.0
In the above, the groups are not part of the index. We can have them included by using g2 where group_keys=True:
>>> g2[["B", "C"]].apply(lambda x: x / x.sum())
B C
A
a 0 0.333333 0.4
1 0.666667 0.6
b 2 1.000000 1.0
Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.
The resulting dtype will reflect the return value of the passed func.
>>> g1[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
B C
A
a 1.0 2.0
b 0.0 0.0
>>> g2[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min())
B C
A
a 1.0 2.0
b 0.0 0.0
The group_keys argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.
Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:
>>> g1.apply(lambda x: x.C.max() - x.B.min())
A
a 5
b 2
dtype: int64
Example 4: The function passed to apply returns None for one of the group. This group is filtered from the result:
>>> g1.apply(lambda x: None if x.iloc[0, 0] == 3 else x)
B C
0 1 4
1 2 6
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.groupby([1, 1, 2, 2]).min()
1 1
2 3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg("min")
1 1
2 3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
min max
1 1 2
2 3 4
The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.
>>> s.groupby([1, 1, 2, 2]).agg(
... minimum="min",
... maximum="max",
... )
minimum maximum
1 1 2
2 3 4
The resulting dtype will reflect the return value of the aggregating function.
>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1 1.0
2 3.0
dtype: float64
>>> data = {
... "A": [1, 1, 2, 2],
... "B": [1, 2, 3, 4],
... "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
A B C
0 1 1 0.362838
1 1 2 0.227877
2 2 3 1.267767
3 2 4 -0.562860
The aggregation is for each column.
>>> df.groupby("A").agg("min")
B C
A
1 1 0.227877
2 3 -0.562860
Multiple aggregations
>>> df.groupby("A").agg(["min", "max"])
B C
min max min max
A
1 1 2 0.227877 0.362838
2 3 4 -0.562860 1.267767
Select a column for aggregation
>>> df.groupby("A").B.agg(["min", "max"])
min max
A
1 1 2
2 3 4
User-defined function for aggregation
>>> df.groupby("A").agg(lambda x: sum(x) + 2)
B C
A
1 5 2.590715
2 9 2.704907
Different aggregations per column
>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
B C
min max sum
A
1 1 2 0.590715
2 3 4 0.704907
To control the output names with different aggregations per column, pandas supports “named aggregation”
>>> df.groupby("A").agg(
... b_min=pd.NamedAgg(column="B", aggfunc="min"),
... c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
b_min c_sum
A
1 1 0.590715
2 3 0.704907
- The keywords are the output column names
- The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the
pandas.NamedAggnamedtuple with the fields['column', 'aggfunc']to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.
See Named aggregation for more.
The resulting dtype will reflect the return value of the aggregating function.
>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
B
A
1 1.0
2 3.0
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.groupby([1, 1, 2, 2]).min()
1 1
2 3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg("min")
1 1
2 3
dtype: int64
>>> s.groupby([1, 1, 2, 2]).agg(["min", "max"])
min max
1 1 2
2 3 4
The output column names can be controlled by passing the desired column names and aggregations as keyword arguments.
>>> s.groupby([1, 1, 2, 2]).agg(
... minimum="min",
... maximum="max",
... )
minimum maximum
1 1 2
2 3 4
The resulting dtype will reflect the return value of the aggregating function.
>>> s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
1 1.0
2 3.0
dtype: float64
>>> data = {
... "A": [1, 1, 2, 2],
... "B": [1, 2, 3, 4],
... "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
>>> df = pd.DataFrame(data)
>>> df
A B C
0 1 1 0.362838
1 1 2 0.227877
2 2 3 1.267767
3 2 4 -0.562860
The aggregation is for each column.
>>> df.groupby("A").agg("min")
B C
A
1 1 0.227877
2 3 -0.562860
Multiple aggregations
>>> df.groupby("A").agg(["min", "max"])
B C
min max min max
A
1 1 2 0.227877 0.362838
2 3 4 -0.562860 1.267767
Select a column for aggregation
>>> df.groupby("A").B.agg(["min", "max"])
min max
A
1 1 2
2 3 4
User-defined function for aggregation
>>> df.groupby("A").agg(lambda x: sum(x) + 2)
B C
A
1 5 2.590715
2 9 2.704907
Different aggregations per column
>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
B C
min max sum
A
1 1 2 0.590715
2 3 4 0.704907
To control the output names with different aggregations per column, pandas supports “named aggregation”
>>> df.groupby("A").agg(
... b_min=pd.NamedAgg(column="B", aggfunc="min"),
... c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
... )
b_min c_sum
A
1 1 0.590715
2 3 0.704907
- The keywords are the output column names
- The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the
pandas.NamedAggnamedtuple with the fields['column', 'aggfunc']to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.
See Named aggregation for more.
The resulting dtype will reflect the return value of the aggregating function.
>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
B
A
1 1.0
2 3.0
>>> ser = pd.Series(
... [390.0, 350.0, 30.0, 20.0],
... index=["Falcon", "Falcon", "Parrot", "Parrot"],
... name="Max Speed",
... )
>>> grouped = ser.groupby([1, 1, 2, 2])
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
Falcon 0.707107
Falcon -0.707107
Parrot 0.707107
Parrot -0.707107
Name: Max Speed, dtype: float64
Broadcast result of the transformation
>>> grouped.transform(lambda x: x.max() - x.min())
Falcon 40.0
Falcon 40.0
Parrot 10.0
Parrot 10.0
Name: Max Speed, dtype: float64
>>> grouped.transform("mean")
Falcon 370.0
Falcon 370.0
Parrot 25.0
Parrot 25.0
Name: Max Speed, dtype: float64
The resulting dtype will reflect the return value of the passed func, for example:
>>> grouped.transform(lambda x: x.astype(int).max())
Falcon 390
Falcon 390
Parrot 30
Parrot 30
Name: Max Speed, dtype: int64
>>> df = pd.DataFrame(
... {
... "A": ["foo", "bar", "foo", "bar", "foo", "bar"],
... "B": ["one", "one", "two", "three", "two", "two"],
... "C": [1, 5, 5, 2, 5, 5],
... "D": [2.0, 5.0, 8.0, 1.0, 2.0, 9.0],
... }
... )
>>> grouped = df.groupby("A")[["C", "D"]]
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
C D
0 -1.154701 -0.577350
1 0.577350 0.000000
2 0.577350 1.154701
3 -1.154701 -1.000000
4 0.577350 -0.577350
5 0.577350 1.000000
Broadcast result of the transformation
>>> grouped.transform(lambda x: x.max() - x.min())
C D
0 4.0 6.0
1 3.0 8.0
2 4.0 6.0
3 3.0 8.0
4 4.0 6.0
5 3.0 8.0
>>> grouped.transform("mean")
C D
0 3.666667 4.0
1 4.000000 5.0
2 3.666667 4.0
3 4.000000 5.0
4 3.666667 4.0
5 4.000000 5.0
The resulting dtype will reflect the return value of the passed func, for example:
>>> grouped.transform(lambda x: x.astype(int).max())
C D
0 5 8
1 5 9
2 5 8
3 5 9
4 5 8
5 5 9
>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
A B
0 a 1
1 b 2
2 a 3
3 b 4
To get the difference between each groups maximum and minimum value in one pass, you can do
>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
B
A
a 2
b 2
>>> df = pd.DataFrame({"A": "a b a b".split(), "B": [1, 2, 3, 4]})
>>> df
A B
0 a 1
1 b 2
2 a 3
3 b 4
To get the difference between each groups maximum and minimum value in one pass, you can do
>>> df.groupby("A").pipe(lambda x: x.max() - x.min())
B
A
a 2
b 2
pandas.api.typing.Rolling instances are returned by .rolling calls: pandas.DataFrame.rolling() and pandas.Series.rolling(). pandas.api.typing.Expanding instances are returned by .expanding calls: pandas.DataFrame.expanding() and pandas.Series.expanding(). pandas.api.typing.ExponentialMovingWindow instances are returned by .ewm calls: pandas.DataFrame.ewm() and pandas.Series.ewm().
| Function | Description |
|---|---|
| Rolling.count([numeric_only]) | Calculate the rolling count of non NaN observations. |
| Rolling.sum([numeric_only, engine, ...]) | Calculate the rolling sum. |
| Rolling.mean([numeric_only, engine, ...]) | Calculate the rolling mean. |
| Rolling.median([numeric_only, engine, ...]) | Calculate the rolling median. |
| Rolling.var([ddof, numeric_only, engine, ...]) | Calculate the rolling variance. |
| Rolling.std([ddof, numeric_only, engine, ...]) | Calculate the rolling standard deviation. |
| Rolling.min([numeric_only, engine, ...]) | Calculate the rolling minimum. |
| Rolling.max([numeric_only, engine, ...]) | Calculate the rolling maximum. |
| Rolling.first([numeric_only]) | Calculate the rolling First (left-most) element of the window. |
| Rolling.last([numeric_only]) | Calculate the rolling Last (right-most) element of the window. |
| Rolling.corr([other, pairwise, ddof, ...]) | Calculate the rolling correlation. |
| Rolling.cov([other, pairwise, ddof, ...]) | Calculate the rolling sample covariance. |
| Rolling.skew([numeric_only]) | Calculate the rolling unbiased skewness. |
| Rolling.kurt([numeric_only]) | Calculate the rolling Fisher's definition of kurtosis without bias. |
| Rolling.apply(func[, raw, engine, ...]) | Calculate the rolling custom aggregation function. |
| Rolling.pipe(func, *args, **kwargs) | Apply a func with arguments to this Rolling object and return its result. |
| Rolling.aggregate([func]) | Aggregate using one or more operations over the specified axis. |
| Rolling.quantile(q[, interpolation, ...]) | Calculate the rolling quantile. |
| Rolling.sem([ddof, numeric_only]) | Calculate the rolling standard error of mean. |
| Rolling.rank([method, ascending, pct, ...]) | Calculate the rolling rank. |
| Rolling.nunique([numeric_only]) | Calculate the rolling nunique. |
| Function | Description |
|---|---|
| Window.mean([numeric_only]) | Calculate the rolling weighted window mean. |
| Window.sum([numeric_only]) | Calculate the rolling weighted window sum. |
| Window.var([ddof, numeric_only]) | Calculate the rolling weighted window variance. |
| Window.std([ddof, numeric_only]) | Calculate the rolling weighted window standard deviation. |
| Function | Description |
|---|---|
| Expanding.count([numeric_only]) | Calculate the expanding count of non NaN observations. |
| Expanding.sum([numeric_only, engine, ...]) | Calculate the expanding sum. |
| Expanding.mean([numeric_only, engine, ...]) | Calculate the expanding mean. |
| Expanding.median([numeric_only, engine, ...]) | Calculate the expanding median. |
| Expanding.var([ddof, numeric_only, engine, ...]) | Calculate the expanding variance. |
| Expanding.std([ddof, numeric_only, engine, ...]) | Calculate the expanding standard deviation. |
| Expanding.min([numeric_only, engine, ...]) | Calculate the expanding minimum. |
| Expanding.max([numeric_only, engine, ...]) | Calculate the expanding maximum. |
| Expanding.first([numeric_only]) | Calculate the expanding First (left-most) element of the window. |
| Expanding.last([numeric_only]) | Calculate the expanding Last (right-most) element of the window. |
| Expanding.corr([other, pairwise, ddof, ...]) | Calculate the expanding correlation. |
| Expanding.cov([other, pairwise, ddof, ...]) | Calculate the expanding sample covariance. |
| Expanding.skew([numeric_only]) | Calculate the expanding unbiased skewness. |
| Expanding.kurt([numeric_only]) | Calculate the expanding Fisher's definition of kurtosis without bias. |
| Expanding.apply(func[, raw, engine, ...]) | Calculate the expanding custom aggregation function. |
| Expanding.pipe(func, *args, **kwargs) | Apply a func with arguments to this Expanding object and return its result. |
| Expanding.aggregate([func]) | Aggregate using one or more operations over the specified axis. |
| Expanding.quantile(q[, interpolation, ...]) | Calculate the expanding quantile. |
| Expanding.sem([ddof, numeric_only]) | Calculate the expanding standard error of mean. |
| Expanding.rank([method, ascending, pct, ...]) | Calculate the expanding rank. |
| Expanding.nunique([numeric_only]) | Calculate the expanding nunique. |
| Function | Description |
|---|---|
| ExponentialMovingWindow.mean([numeric_only, ...]) | Calculate the ewm (exponential weighted moment) mean. |
| ExponentialMovingWindow.sum([numeric_only, ...]) | Calculate the ewm (exponential weighted moment) sum. |
| ExponentialMovingWindow.std([bias, numeric_only]) | Calculate the ewm (exponential weighted moment) standard deviation. |
| ExponentialMovingWindow.var([bias, numeric_only]) | Calculate the ewm (exponential weighted moment) variance. |
| ExponentialMovingWindow.corr([other, ...]) | Calculate the ewm (exponential weighted moment) sample correlation. |
| ExponentialMovingWindow.cov([other, ...]) | Calculate the ewm (exponential weighted moment) sample covariance. |
Base class for defining custom window boundaries.
| Function | Description |
|---|---|
| api.indexers.BaseIndexer([index_array, ...]) | Base class for window bounds calculations. |
| api.indexers.FixedForwardWindowIndexer([...]) | Creates window boundaries for fixed-length windows that include the current row. |
| api.indexers.VariableOffsetWindowIndexer([...]) | Calculate window boundaries based on a non-fixed offset such as a BusinessDay. |
>>> s = pd.Series([2, 3, np.nan, 10])
>>> s.rolling(2).count()
0 NaN
1 2.0
2 1.0
3 1.0
dtype: float64
>>> s.rolling(3).count()
0 NaN
1 NaN
2 2.0
3 2.0
dtype: float64
>>> s.rolling(4).count()
0 NaN
1 NaN
2 NaN
3 3.0
dtype: float64
>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s
0 1
1 2
2 3
3 4
4 5
dtype: int64
>>> s.rolling(3).sum()
0 NaN
1 NaN
2 6.0
3 9.0
4 12.0
dtype: float64
>>> s.rolling(3, center=True).sum()
0 NaN
1 6.0
2 9.0
3 12.0
4 NaN
dtype: float64
For DataFrame, each sum is computed column-wise.
>>> df = pd.DataFrame({"A": s, "B": s**2})
>>> df
A B
0 1 1
1 2 4
2 3 9
3 4 16
4 5 25
>>> df.rolling(3).sum()
A B
0 NaN NaN
1 NaN NaN
2 6.0 14.0
3 9.0 29.0
4 12.0 50.0
The below examples will show rolling mean calculations with window sizes of two and three, respectively.
>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).mean()
0 NaN
1 1.5
2 2.5
3 3.5
dtype: float64
>>> s.rolling(3).mean()
0 NaN
1 NaN
2 2.0
3 3.0
dtype: float64
Compute the rolling median of a series with a window size of 3.
>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.rolling(3).median()
0 NaN
1 NaN
2 1.0
3 2.0
4 3.0
dtype: float64
>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).var()
0 NaN
1 NaN
2 0.333333
3 1.000000
4 1.000000
5 1.333333
6 0.000000
dtype: float64
>>> s = pd.Series([5, 5, 6, 7, 5, 5, 5])
>>> s.rolling(3).std()
0 NaN
1 NaN
2 0.577350
3 1.000000
4 1.000000
5 1.154701
6 0.000000
dtype: float64
Performing a rolling minimum with a window size of 3.
>>> s = pd.Series([4, 3, 5, 2, 6])
>>> s.rolling(3).min()
0 NaN
1 NaN
2 3.0
3 2.0
4 2.0
dtype: float64
>>> ser = pd.Series([1, 2, 3, 4])
>>> ser.rolling(2).max()
0 NaN
1 2.0
2 3.0
3 4.0
dtype: float64
The example below will show a rolling calculation with a window size of three.
>>> s = pd.Series(range(5))
>>> s.rolling(3).first()
0 NaN
1 NaN
2 0.0
3 1.0
4 2.0
dtype: float64
The example below will show a rolling calculation with a window size of three.
>>> s = pd.Series(range(5))
>>> s.rolling(3).last()
0 NaN
1 NaN
2 2.0
3 3.0
4 4.0
dtype: float64
The below example shows a rolling calculation with a window size of four matching the equivalent function call using numpy.corrcoef().
>>> v1 = [3, 3, 3, 5, 8]
>>> v2 = [3, 4, 4, 4, 8]
>>> np.corrcoef(v1[:-1], v2[:-1])
array([[1. , 0.33333333],
[0.33333333, 1. ]])
>>> np.corrcoef(v1[1:], v2[1:])
array([[1. , 0.9169493],
[0.9169493, 1. ]])
>>> s1 = pd.Series(v1)
>>> s2 = pd.Series(v2)
>>> s1.rolling(4).corr(s2)
0 NaN
1 NaN
2 NaN
3 0.333333
4 0.916949
dtype: float64
The below example shows a similar rolling calculation on a DataFrame using the pairwise option.
>>> matrix = np.array(
... [[51.0, 35.0], [49.0, 30.0], [47.0, 32.0], [46.0, 31.0], [50.0, 36.0]]
... )
>>> np.corrcoef(matrix[:-1, 0], matrix[:-1, 1])
array([[1. , 0.6263001],
[0.6263001, 1. ]])
>>> np.corrcoef(matrix[1:, 0], matrix[1:, 1])
array([[1. , 0.55536811],
[0.55536811, 1. ]])
>>> df = pd.DataFrame(matrix, columns=["X", "Y"])
>>> df
X Y
0 51.0 35.0
1 49.0 30.0
2 47.0 32.0
3 46.0 31.0
4 50.0 36.0
>>> df.rolling(4).corr(pairwise=True)
X Y
0 X NaN NaN
Y NaN NaN
1 X NaN NaN
Y NaN NaN
2 X NaN NaN
Y NaN NaN
3 X 1.000000 0.626300
Y 0.626300 1.000000
4 X 1.000000 0.555368
Y 0.555368 1.000000
>>> ser1 = pd.Series([1, 2, 3, 4])
>>> ser2 = pd.Series([1, 4, 5, 8])
>>> ser1.rolling(2).cov(ser2)
0 NaN
1 1.5
2 0.5
3 1.5
dtype: float64
>>> ser = pd.Series([1, 5, 2, 7, 15, 6])
>>> ser.rolling(3).skew().round(6)
0 NaN
1 NaN
2 1.293343
3 -0.585583
4 0.670284
5 1.652317
dtype: float64
The example below will show a rolling calculation with a window size of four matching the equivalent function call using scipy.stats.
>>> arr = [1, 2, 3, 4, 999]
>>> import scipy.stats
>>> print(f"{scipy.stats.kurtosis(arr[:-1], bias=False):.6f}")
-1.200000
>>> print(f"{scipy.stats.kurtosis(arr[1:], bias=False):.6f}")
3.999946
>>> s = pd.Series(arr)
>>> s.rolling(4).kurt()
0 NaN
1 NaN
2 NaN
3 -1.200000
4 3.999946
dtype: float64
>>> ser = pd.Series([1, 6, 5, 4])
>>> ser.rolling(2).apply(lambda s: s.sum() - s.min())
0 NaN
1 6.0
2 6.0
3 5.0
dtype: float64
>>> df = pd.DataFrame(
... {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
A
2012-08-02 1
2012-08-03 2
2012-08-04 3
2012-08-05 4
To get the difference between each rolling 2-day window’s maximum and minimum value in one pass, you can do
>>> df.rolling("2D").pipe(lambda x: x.max() - x.min())
A
2012-08-02 0.0
2012-08-03 1.0
2012-08-04 1.0
2012-08-05 1.0
>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]})
>>> df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
>>> df.rolling(2).sum()
A B C
0 NaN NaN NaN
1 3.0 9.0 15.0
2 5.0 11.0 17.0
>>> df.rolling(2).agg({"A": "sum", "B": "min"})
A B
0 NaN NaN
1 3.0 4.0
2 5.0 5.0
>>> s = pd.Series([1, 2, 3, 4])
>>> s.rolling(2).quantile(0.4, interpolation="lower")
0 NaN
1 1.0
2 2.0
3 3.0
dtype: float64
>>> s.rolling(2).quantile(0.4, interpolation="midpoint")
0 NaN
1 1.5
2 2.5
3 3.5
dtype: float64
>>> s = pd.Series([0, 1, 2, 3])
>>> s.rolling(2, min_periods=1).sem()
0 NaN
1 0.5
2 0.5
3 0.5
dtype: float64
>>> s = pd.Series([1, 4, 2, 3, 5, 3])
>>> s.rolling(3).rank()
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 1.5
dtype: float64
>>> s.rolling(3).rank(method="max")
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 2.0
dtype: float64
>>> s.rolling(3).rank(method="min")
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 1.0
dtype: float64
pandas.api.typing.Resampler instances are returned by resample calls: pandas.DataFrame.resample(), pandas.Series.resample().
| Function | Description |
|---|---|
| Resampler.iter() | Groupby iterator. |
| Resampler.groups | Dict {group name -> group labels}. |
| Resampler.indices | Dict {group name -> group indices}. |
| Resampler.get_group(name) | Construct DataFrame from group with provided name. |
| Function | Description |
|---|---|
| Resampler.apply([func]) | Aggregate using one or more operations over the specified axis. |
| Resampler.aggregate([func]) | Aggregate using one or more operations over the specified axis. |
| Resampler.transform(arg, *args, **kwargs) | Call function producing a like-indexed Series on each group. |
| Resampler.pipe(func, *args, **kwargs) | Apply a func with arguments to this Resampler object and return its result. |
| Function | Description |
|---|---|
| Resampler.ffill([limit]) | Forward fill the values. |
| Resampler.bfill([limit]) | Backward fill the new missing values in the resampled data. |
| Resampler.nearest([limit]) | Resample by using the nearest value. |
| Resampler.asfreq([fill_value]) | Return the values at the new freq, essentially a reindex. |
| Resampler.interpolate([method, axis, limit, ...]) | Interpolate values between target timestamps according to different methods. |
| Function | Description |
|---|---|
| Resampler.count() | Compute count of group, excluding missing values. |
| Resampler.nunique() | Return number of unique elements in the group. |
| Resampler.first([numeric_only, min_count, ...]) | Compute the first non-null entry of each column. |
| Resampler.last([numeric_only, min_count, skipna]) | Compute the last non-null entry of each column. |
| Resampler.max([numeric_only, min_count]) | Compute max value of group. |
| Resampler.mean([numeric_only]) | Compute mean of groups, excluding missing values. |
| Resampler.median([numeric_only]) | Compute median of groups, excluding missing values. |
| Resampler.min([numeric_only, min_count]) | Compute min value of group. |
| Resampler.ohlc() | Compute open, high, low and close values of a group, excluding missing values. |
| Resampler.prod([numeric_only, min_count]) | Compute prod of group values. |
| Resampler.size() | Compute group sizes. |
| Resampler.sem([ddof, numeric_only]) | Compute standard error of the mean of groups, excluding missing values. |
| Resampler.std([ddof, numeric_only]) | Compute standard deviation of groups, excluding missing values. |
| Resampler.sum([numeric_only, min_count]) | Compute sum of group values. |
| Resampler.var([ddof, numeric_only]) | Compute variance of groups, excluding missing values. |
| Resampler.quantile([q]) | Return value at the given quantile. |
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> for x, y in ser.groupby(level=0):
... print(f"{x}\n{y}\n")
a
a 1
a 2
dtype: int64
b
b 3
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> for x, y in df.groupby(by=["a"]):
... print(f"{x}\n{y}\n")
(1,)
a b c
0 1 2 3
1 1 5 6
(7,)
a b c
2 7 8 9
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> for x, y in ser.resample("MS"):
... print(f"{x}\n{y}\n")
2023-01-01 00:00:00
2023-01-01 1
2023-01-15 2
dtype: int64
2023-02-01 00:00:00
2023-02-01 3
2023-02-15 4
dtype: int64
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).groups
{'a': ['a', 'a'], 'b': ['b']}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"])
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
>>> df.groupby(by="a").groups
{1: [0, 1], 7: [2]}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").groups
{Timestamp('2023-01-01 00:00:00'): np.int64(2),
Timestamp('2023-02-01 00:00:00'): np.int64(4)}
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).indices
{'a': array([0, 1]), 'b': array([2])}
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).indices
{np.int64(1): array([0, 1]), np.int64(7): array([2])}
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").indices
defaultdict(<class 'list'>, {Timestamp('2023-01-01 00:00:00'): [0, 1],
Timestamp('2023-02-01 00:00:00'): [2, 3]})
For SeriesGroupBy:
>>> lst = ["a", "a", "b"]
>>> ser = pd.Series([1, 2, 3], index=lst)
>>> ser
a 1
a 2
b 3
dtype: int64
>>> ser.groupby(level=0).get_group("a")
a 1
a 2
dtype: int64
For DataFrameGroupBy:
>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = pd.DataFrame(
... data, columns=["a", "b", "c"], index=["owl", "toucan", "eagle"]
... )
>>> df
a b c
owl 1 2 3
toucan 1 5 6
eagle 7 8 9
>>> df.groupby(by=["a"]).get_group((1,))
a b c
owl 1 2 3
toucan 1 5 6
For Resampler:
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").get_group("2023-01-01")
2023-01-01 1
2023-01-15 2
dtype: int64
>>> s = pd.Series(
... [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00 1
2013-01-01 00:00:01 2
2013-01-01 00:00:02 3
2013-01-01 00:00:03 4
2013-01-01 00:00:04 5
Freq: s, dtype: int64
>>> r = s.resample("2s")
>>> r.agg("sum")
2013-01-01 00:00:00 3
2013-01-01 00:00:02 7
2013-01-01 00:00:04 5
Freq: 2s, dtype: int64
>>> r.agg(["sum", "mean", "max"])
sum mean max
2013-01-01 00:00:00 3 1.5 2
2013-01-01 00:00:02 7 3.5 4
2013-01-01 00:00:04 5 5.0 5
>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
result total
2013-01-01 00:00:00 2.121320 3
2013-01-01 00:00:02 4.949747 7
2013-01-01 00:00:04 NaN 5
>>> r.agg(average="mean", total="sum")
average total
2013-01-01 00:00:00 1.5 3
2013-01-01 00:00:02 3.5 7
2013-01-01 00:00:04 5.0 5
>>> s = pd.Series(
... [1, 2, 3, 4, 5], index=pd.date_range("20130101", periods=5, freq="s")
... )
>>> s
2013-01-01 00:00:00 1
2013-01-01 00:00:01 2
2013-01-01 00:00:02 3
2013-01-01 00:00:03 4
2013-01-01 00:00:04 5
Freq: s, dtype: int64
>>> r = s.resample("2s")
>>> r.agg("sum")
2013-01-01 00:00:00 3
2013-01-01 00:00:02 7
2013-01-01 00:00:04 5
Freq: 2s, dtype: int64
>>> r.agg(["sum", "mean", "max"])
sum mean max
2013-01-01 00:00:00 3 1.5 2
2013-01-01 00:00:02 7 3.5 4
2013-01-01 00:00:04 5 5.0 5
>>> r.agg({"result": lambda x: x.mean() / x.std(), "total": "sum"})
result total
2013-01-01 00:00:00 2.121320 3
2013-01-01 00:00:02 4.949747 7
2013-01-01 00:00:04 NaN 5
>>> r.agg(average="mean", total="sum")
average total
2013-01-01 00:00:00 1.5 3
2013-01-01 00:00:02 3.5 7
2013-01-01 00:00:04 5.0 5
>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00 1
2018-01-01 01:00:00 2
Freq: h, dtype: int64
>>> resampled = s.resample("15min")
>>> resampled.transform(lambda x: (x - x.mean()) / x.std())
2018-01-01 00:00:00 NaN
2018-01-01 01:00:00 NaN
Freq: h, dtype: float64
>>> df = pd.DataFrame(
... {"A": [1, 2, 3, 4]}, index=pd.date_range("2012-08-02", periods=4)
... )
>>> df
A
2012-08-02 1
2012-08-03 2
2012-08-04 3
2012-08-05 4
To get the difference between each 2-day period’s maximum and minimum value in one pass, you can do
>>> df.resample("2D").pipe(lambda x: x.max() - x.min())
A
2012-08-02 1
2012-08-04 1
Here we only create a Series.
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
Example for ffill with downsampling (we have fewer dates after resampling):
>>> ser.resample("MS").ffill()
2023-01-01 1
2023-02-01 3
Freq: MS, dtype: int64
Example for ffill with upsampling (fill the new dates with the previous value):
>>> ser.resample("W").ffill()
2023-01-01 1
2023-01-08 1
2023-01-15 2
2023-01-22 2
2023-01-29 2
2023-02-05 3
2023-02-12 3
2023-02-19 4
Freq: W-SUN, dtype: int64
With upsampling and limiting (only fill the first new date with the previous value):
>>> ser.resample("W").ffill(limit=1)
2023-01-01 1.0
2023-01-08 1.0
2023-01-15 2.0
2023-01-22 2.0
2023-01-29 NaN
2023-02-05 3.0
2023-02-12 NaN
2023-02-19 4.0
Freq: W-SUN, dtype: float64
Resampling a Series:
>>> s = pd.Series(
... [1, 2, 3], index=pd.date_range("20180101", periods=3, freq="h")
... )
>>> s
2018-01-01 00:00:00 1
2018-01-01 01:00:00 2
2018-01-01 02:00:00 3
Freq: h, dtype: int64
>>> s.resample("30min").bfill()
2018-01-01 00:00:00 1
2018-01-01 00:30:00 2
2018-01-01 01:00:00 2
2018-01-01 01:30:00 3
2018-01-01 02:00:00 3
Freq: 30min, dtype: int64
>>> s.resample("15min").bfill(limit=2)
2018-01-01 00:00:00 1.0
2018-01-01 00:15:00 NaN
2018-01-01 00:30:00 2.0
2018-01-01 00:45:00 2.0
2018-01-01 01:00:00 2.0
2018-01-01 01:15:00 NaN
2018-01-01 01:30:00 3.0
2018-01-01 01:45:00 3.0
2018-01-01 02:00:00 3.0
Freq: 15min, dtype: float64
Resampling a DataFrame that has missing values:
>>> df = pd.DataFrame(
... {"a": [2, np.nan, 6], "b": [1, 3, 5]},
... index=pd.date_range("20180101", periods=3, freq="h"),
... )
>>> df
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 01:00:00 NaN 3
2018-01-01 02:00:00 6.0 5
>>> df.resample("30min").bfill()
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 00:30:00 NaN 3
2018-01-01 01:00:00 NaN 3
2018-01-01 01:30:00 6.0 5
2018-01-01 02:00:00 6.0 5
>>> df.resample("15min").bfill(limit=2)
a b
2018-01-01 00:00:00 2.0 1.0
2018-01-01 00:15:00 NaN NaN
2018-01-01 00:30:00 NaN 3.0
2018-01-01 00:45:00 NaN 3.0
2018-01-01 01:00:00 NaN 3.0
2018-01-01 01:15:00 NaN NaN
2018-01-01 01:30:00 6.0 5.0
2018-01-01 01:45:00 6.0 5.0
2018-01-01 02:00:00 6.0 5.0
>>> s = pd.Series([1, 2], index=pd.date_range("20180101", periods=2, freq="1h"))
>>> s
2018-01-01 00:00:00 1
2018-01-01 01:00:00 2
Freq: h, dtype: int64
>>> s.resample("15min").nearest()
2018-01-01 00:00:00 1
2018-01-01 00:15:00 1
2018-01-01 00:30:00 2
2018-01-01 00:45:00 2
2018-01-01 01:00:00 2
Freq: 15min, dtype: int64
Limit the number of upsampled values imputed by the nearest:
>>> s.resample("15min").nearest(limit=1)
2018-01-01 00:00:00 1.0
2018-01-01 00:15:00 1.0
2018-01-01 00:30:00 NaN
2018-01-01 00:45:00 2.0
2018-01-01 01:00:00 2.0
Freq: 15min, dtype: float64
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-31", "2023-02-01", "2023-02-28"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-31 2
2023-02-01 3
2023-02-28 4
dtype: int64
>>> ser.resample("MS").asfreq()
2023-01-01 1
2023-02-01 3
Freq: MS, dtype: int64
>>> start = "2023-03-01T07:00:00"
>>> timesteps = pd.date_range(start, periods=5, freq="s")
>>> series = pd.Series(data=[1, -1, 2, 1, 3], index=timesteps)
>>> series
2023-03-01 07:00:00 1
2023-03-01 07:00:01 -1
2023-03-01 07:00:02 2
2023-03-01 07:00:03 1
2023-03-01 07:00:04 3
Freq: s, dtype: int64
Downsample the dataframe to 0.5Hz by providing the period time of 2s.
>>> series.resample("2s").interpolate("linear")
2023-03-01 07:00:00 1
2023-03-01 07:00:02 2
2023-03-01 07:00:04 3
Freq: 2s, dtype: int64
Upsample the dataframe to 2Hz by providing the period time of 500ms.
>>> series.resample("500ms").interpolate("linear")
2023-03-01 07:00:00.000 1.0
2023-03-01 07:00:00.500 0.0
2023-03-01 07:00:01.000 -1.0
2023-03-01 07:00:01.500 0.5
2023-03-01 07:00:02.000 2.0
2023-03-01 07:00:02.500 1.5
2023-03-01 07:00:03.000 1.0
2023-03-01 07:00:03.500 2.0
2023-03-01 07:00:04.000 3.0
Freq: 500ms, dtype: float64
Internal reindexing with asfreq() prior to interpolation leads to an interpolated timeseries on the basis of the reindexed timestamps (anchors). It is assured that all available datapoints from original series become anchors, so it also works for resampling-cases that lead to non-aligned timestamps, as in the following example:
>>> series.resample("400ms").interpolate("linear")
2023-03-01 07:00:00.000 1.000000
2023-03-01 07:00:00.400 0.333333
2023-03-01 07:00:00.800 -0.333333
2023-03-01 07:00:01.200 0.000000
2023-03-01 07:00:01.600 1.000000
2023-03-01 07:00:02.000 2.000000
2023-03-01 07:00:02.400 1.666667
2023-03-01 07:00:02.800 1.333333
2023-03-01 07:00:03.200 1.666667
2023-03-01 07:00:03.600 2.333333
2023-03-01 07:00:04.000 3.000000
Freq: 400ms, dtype: float64
Note that the series correctly decreases between two anchors 07:00:00 and 07:00:02.
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").count()
2023-01-01 2
2023-02-01 2
Freq: MS, dtype: int64
>>> ser = pd.Series(
... [1, 2, 3, 3],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 3
dtype: int64
>>> ser.resample("MS").nunique()
2023-01-01 2
2023-02-01 1
Freq: MS, dtype: int64
>>> s = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> s
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> s.resample("MS").first()
2023-01-01 1
2023-02-01 3
Freq: MS, dtype: int64
>>> s = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> s
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> s.resample("MS").last()
2023-01-01 2
2023-02-01 4
Freq: MS, dtype: int64
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").max()
2023-01-01 2
2023-02-01 4
Freq: MS, dtype: int64
>>> ser = pd.Series(
... [1, 2, 3, 4],
... index=pd.DatetimeIndex(
... ["2023-01-01", "2023-01-15", "2023-02-01", "2023-02-15"]
... ),
... )
>>> ser
2023-01-01 1
2023-01-15 2
2023-02-01 3
2023-02-15 4
dtype: int64
>>> ser.resample("MS").mean()
2023-01-01 1.5
2023-02-01 3.5
Freq: MS, dtype: float64
>>> ser = pd.Series(
... [1, 2, 3, 3, 4, 5],
... index=pd.DatetimeIndex(
... [
... "2023-01-01",
... "2023-01-10",
... "2023-01-15",
... "2023-02-01",
... "2023-02-10",
... "2023-02-15",
... ]
... ),
... )
>>> ser.resample("MS").median()
2023-01-01 2.0
2023-02-01 4.0
Freq: MS, dtype: float64
| Function | Description |
|---|---|
| DateOffset | Standard kind of date increment used for a date range. |
| Function | Description |
|---|---|
| DateOffset.freqstr | Return a string representing the frequency. |
| DateOffset.kwds | Return a dict of extra parameters for the offset. |
| DateOffset.name | Return a string representing the base frequency. |
| DateOffset.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| DateOffset.normalize | Return boolean whether the frequency can align with midnight. |
| DateOffset.rule_code | Return a string representing the base frequency. |
| DateOffset.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| DateOffset.copy() | Return a copy of the frequency. |
| DateOffset.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| DateOffset.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| DateOffset.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| DateOffset.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| DateOffset.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| DateOffset.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| DateOffset.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| DateOffset.rollback(dt) | Roll provided date backward to next offset only if not on offset. |
| DateOffset.rollforward(dt) | Roll provided date forward to next offset only if not on offset. |
| Function | Description |
|---|---|
| BusinessDay | DateOffset subclass representing possibly n business days. |
Alias:
| Function | Description |
|---|---|
| BDay | alias of BusinessDay |
| Function | Description |
|---|---|
| BusinessDay.freqstr | Return a string representing the frequency. |
| BusinessDay.kwds | Return a dict of extra parameters for the offset. |
| BusinessDay.name | Return a string representing the base frequency. |
| BusinessDay.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BusinessDay.normalize | Return boolean whether the frequency can align with midnight. |
| BusinessDay.rule_code | Return a string representing the base frequency. |
| BusinessDay.n | Return the count of the number of periods. |
| BusinessDay.weekmask | Return the weekmask used for custom business day calculations. |
| BusinessDay.holidays | Return the holidays used for custom business day calculations. |
| BusinessDay.calendar | Return the calendar used for business day calculations. |
| Function | Description |
|---|---|
| BusinessDay.copy() | Return a copy of the frequency. |
| BusinessDay.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BusinessDay.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BusinessDay.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BusinessDay.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BusinessDay.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BusinessDay.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BusinessDay.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BusinessHour | DateOffset subclass representing possibly n business hours. |
| Function | Description |
|---|---|
| BusinessHour.freqstr | Return a string representing the frequency. |
| BusinessHour.kwds | Return a dict of extra parameters for the offset. |
| BusinessHour.name | Return a string representing the base frequency. |
| BusinessHour.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BusinessHour.normalize | Return boolean whether the frequency can align with midnight. |
| BusinessHour.rule_code | Return a string representing the base frequency. |
| BusinessHour.n | Return the count of the number of periods. |
| BusinessHour.start | Return the start time(s) of the business hour. |
| BusinessHour.end | Return the end time(s) of the business hour. |
| BusinessHour.weekmask | Return the weekmask used for custom business day calculations. |
| BusinessHour.holidays | Return the holidays used for custom business day calculations. |
| BusinessHour.calendar | Return the calendar used for business day calculations. |
| Function | Description |
|---|---|
| BusinessHour.copy() | Return a copy of the frequency. |
| BusinessHour.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BusinessHour.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BusinessHour.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BusinessHour.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BusinessHour.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BusinessHour.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BusinessHour.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| CustomBusinessDay | DateOffset subclass representing possibly n custom business days. |
Alias:
| Function | Description |
|---|---|
| CDay | alias of CustomBusinessDay |
| Function | Description |
|---|---|
| CustomBusinessDay.freqstr | Return a string representing the frequency. |
| CustomBusinessDay.kwds | Return a dict of extra parameters for the offset. |
| CustomBusinessDay.name | Return a string representing the base frequency. |
| CustomBusinessDay.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| CustomBusinessDay.normalize | Return boolean whether the frequency can align with midnight. |
| CustomBusinessDay.rule_code | Return a string representing the base frequency. |
| CustomBusinessDay.n | Return the count of the number of periods. |
| CustomBusinessDay.weekmask | Return the weekmask used for custom business day calculations. |
| CustomBusinessDay.calendar | Return the calendar used for business day calculations. |
| CustomBusinessDay.holidays | Return the holidays used for custom business day calculations. |
| Function | Description |
|---|---|
| CustomBusinessDay.copy() | Return a copy of the frequency. |
| CustomBusinessDay.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| CustomBusinessDay.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| CustomBusinessDay.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| CustomBusinessDay.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| CustomBusinessDay.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| CustomBusinessDay.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| CustomBusinessDay.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| CustomBusinessHour | DateOffset subclass representing possibly n custom business days. |
| Function | Description |
|---|---|
| CustomBusinessHour.freqstr | Return a string representing the frequency. |
| CustomBusinessHour.kwds | Return a dict of extra parameters for the offset. |
| CustomBusinessHour.name | Return a string representing the base frequency. |
| CustomBusinessHour.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| CustomBusinessHour.normalize | Return boolean whether the frequency can align with midnight. |
| CustomBusinessHour.rule_code | Return a string representing the base frequency. |
| CustomBusinessHour.n | Return the count of the number of periods. |
| CustomBusinessHour.weekmask | Return the weekmask used for custom business day calculations. |
| CustomBusinessHour.calendar | Return the calendar used for business day calculations. |
| CustomBusinessHour.holidays | Return the holidays used for custom business day calculations. |
| CustomBusinessHour.start | Return the start time(s) of the business hour. |
| CustomBusinessHour.end | Return the end time(s) of the business hour. |
| Function | Description |
|---|---|
| CustomBusinessHour.copy() | Return a copy of the frequency. |
| CustomBusinessHour.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| CustomBusinessHour.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| CustomBusinessHour.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| CustomBusinessHour.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| CustomBusinessHour.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| CustomBusinessHour.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| CustomBusinessHour.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| MonthEnd | DateOffset of one month end. |
| Function | Description |
|---|---|
| MonthEnd.freqstr | Return a string representing the frequency. |
| MonthEnd.kwds | Return a dict of extra parameters for the offset. |
| MonthEnd.name | Return a string representing the base frequency. |
| MonthEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| MonthEnd.normalize | Return boolean whether the frequency can align with midnight. |
| MonthEnd.rule_code | Return a string representing the base frequency. |
| MonthEnd.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| MonthEnd.copy() | Return a copy of the frequency. |
| MonthEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| MonthEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| MonthEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| MonthEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| MonthEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| MonthEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| MonthEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| MonthBegin | DateOffset of one month at beginning. |
| Function | Description |
|---|---|
| MonthBegin.freqstr | Return a string representing the frequency. |
| MonthBegin.kwds | Return a dict of extra parameters for the offset. |
| MonthBegin.name | Return a string representing the base frequency. |
| MonthBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| MonthBegin.normalize | Return boolean whether the frequency can align with midnight. |
| MonthBegin.rule_code | Return a string representing the base frequency. |
| MonthBegin.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| MonthBegin.copy() | Return a copy of the frequency. |
| MonthBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| MonthBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| MonthBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| MonthBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| MonthBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| MonthBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| MonthBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BusinessMonthEnd | DateOffset increments between the last business day of the month. |
Alias:
| Function | Description |
|---|---|
| BMonthEnd | alias of BusinessMonthEnd |
| Function | Description |
|---|---|
| BusinessMonthEnd.freqstr | Return a string representing the frequency. |
| BusinessMonthEnd.kwds | Return a dict of extra parameters for the offset. |
| BusinessMonthEnd.name | Return a string representing the base frequency. |
| BusinessMonthEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BusinessMonthEnd.normalize | Return boolean whether the frequency can align with midnight. |
| BusinessMonthEnd.rule_code | Return a string representing the base frequency. |
| BusinessMonthEnd.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| BusinessMonthEnd.copy() | Return a copy of the frequency. |
| BusinessMonthEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BusinessMonthEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BusinessMonthEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BusinessMonthEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BusinessMonthEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BusinessMonthEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BusinessMonthEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BusinessMonthBegin | DateOffset of one month at the first business day. |
Alias:
| Function | Description |
|---|---|
| BMonthBegin | alias of BusinessMonthBegin |
| Function | Description |
|---|---|
| BusinessMonthBegin.freqstr | Return a string representing the frequency. |
| BusinessMonthBegin.kwds | Return a dict of extra parameters for the offset. |
| BusinessMonthBegin.name | Return a string representing the base frequency. |
| BusinessMonthBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BusinessMonthBegin.normalize | Return boolean whether the frequency can align with midnight. |
| BusinessMonthBegin.rule_code | Return a string representing the base frequency. |
| BusinessMonthBegin.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| BusinessMonthBegin.copy() | Return a copy of the frequency. |
| BusinessMonthBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BusinessMonthBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BusinessMonthBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BusinessMonthBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BusinessMonthBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BusinessMonthBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BusinessMonthBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| CustomBusinessMonthEnd | DateOffset subclass representing custom business month(s). |
Alias:
| Function | Description |
|---|---|
| CBMonthEnd | alias of CustomBusinessMonthEnd |
| Function | Description |
|---|---|
| CustomBusinessMonthEnd.freqstr | Return a string representing the frequency. |
| CustomBusinessMonthEnd.kwds | Return a dict of extra parameters for the offset. |
| CustomBusinessMonthEnd.m_offset | Return a MonthBegin or MonthEnd offset. |
| CustomBusinessMonthEnd.name | Return a string representing the base frequency. |
| CustomBusinessMonthEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| CustomBusinessMonthEnd.normalize | Return boolean whether the frequency can align with midnight. |
| CustomBusinessMonthEnd.rule_code | Return a string representing the base frequency. |
| CustomBusinessMonthEnd.n | Return the count of the number of periods. |
| CustomBusinessMonthEnd.weekmask | Return the weekmask used for custom business day calculations. |
| CustomBusinessMonthEnd.calendar | Return the calendar used for business day calculations. |
| CustomBusinessMonthEnd.holidays | Return the holidays used for custom business day calculations. |
| Function | Description |
|---|---|
| CustomBusinessMonthEnd.copy() | Return a copy of the frequency. |
| CustomBusinessMonthEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| CustomBusinessMonthEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| CustomBusinessMonthEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| CustomBusinessMonthEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| CustomBusinessMonthEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| CustomBusinessMonthEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| CustomBusinessMonthEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| CustomBusinessMonthBegin | DateOffset subclass representing custom business month(s). |
Alias:
| Function | Description |
|---|---|
| CBMonthBegin | alias of CustomBusinessMonthBegin |
| Function | Description |
|---|---|
| CustomBusinessMonthBegin.freqstr | Return a string representing the frequency. |
| CustomBusinessMonthBegin.kwds | Return a dict of extra parameters for the offset. |
| CustomBusinessMonthBegin.m_offset | Return a MonthBegin or MonthEnd offset. |
| CustomBusinessMonthBegin.name | Return a string representing the base frequency. |
| CustomBusinessMonthBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| CustomBusinessMonthBegin.normalize | Return boolean whether the frequency can align with midnight. |
| CustomBusinessMonthBegin.rule_code | Return a string representing the base frequency. |
| CustomBusinessMonthBegin.n | Return the count of the number of periods. |
| CustomBusinessMonthBegin.weekmask | Return the weekmask used for custom business day calculations. |
| CustomBusinessMonthBegin.calendar | Return the calendar used for business day calculations. |
| CustomBusinessMonthBegin.holidays | Return the holidays used for custom business day calculations. |
| Function | Description |
|---|---|
| CustomBusinessMonthBegin.copy() | Return a copy of the frequency. |
| CustomBusinessMonthBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| CustomBusinessMonthBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| CustomBusinessMonthBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| CustomBusinessMonthBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| CustomBusinessMonthBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| CustomBusinessMonthBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| CustomBusinessMonthBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| SemiMonthEnd | Two DateOffset's per month repeating on the last day of the month & day_of_month. |
| Function | Description |
|---|---|
| SemiMonthEnd.freqstr | Return a string representing the frequency. |
| SemiMonthEnd.kwds | Return a dict of extra parameters for the offset. |
| SemiMonthEnd.name | Return a string representing the base frequency. |
| SemiMonthEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| SemiMonthEnd.normalize | Return boolean whether the frequency can align with midnight. |
| SemiMonthEnd.rule_code | |
| SemiMonthEnd.n | Return the count of the number of periods. |
| SemiMonthEnd.day_of_month | Return the day of the month for the semi-monthly offset. |
| Function | Description |
|---|---|
| SemiMonthEnd.copy() | Return a copy of the frequency. |
| SemiMonthEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| SemiMonthEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| SemiMonthEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| SemiMonthEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| SemiMonthEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| SemiMonthEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| SemiMonthEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| SemiMonthBegin | Two DateOffset's per month repeating on the first day of the month & day_of_month. |
| Function | Description |
|---|---|
| SemiMonthBegin.freqstr | Return a string representing the frequency. |
| SemiMonthBegin.kwds | Return a dict of extra parameters for the offset. |
| SemiMonthBegin.name | Return a string representing the base frequency. |
| SemiMonthBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| SemiMonthBegin.normalize | Return boolean whether the frequency can align with midnight. |
| SemiMonthBegin.rule_code | |
| SemiMonthBegin.n | Return the count of the number of periods. |
| SemiMonthBegin.day_of_month | Return the day of the month for the semi-monthly offset. |
| Function | Description |
|---|---|
| SemiMonthBegin.copy() | Return a copy of the frequency. |
| SemiMonthBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| SemiMonthBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| SemiMonthBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| SemiMonthBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| SemiMonthBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| SemiMonthBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| SemiMonthBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Week | Weekly offset. |
| Function | Description |
|---|---|
| Week.freqstr | Return a string representing the frequency. |
| Week.kwds | Return a dict of extra parameters for the offset. |
| Week.name | Return a string representing the base frequency. |
| Week.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| Week.normalize | Return boolean whether the frequency can align with midnight. |
| Week.rule_code | Return a string representing the base frequency. |
| Week.n | Return the count of the number of periods. |
| Week.weekday | Return the day of the week on which the offset is applied. |
| Function | Description |
|---|---|
| Week.copy() | Return a copy of the frequency. |
| Week.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Week.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Week.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Week.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Week.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Week.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Week.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| WeekOfMonth | Describes monthly dates like "the Tuesday of the 2nd week of each month". |
| Function | Description |
|---|---|
| WeekOfMonth.freqstr | Return a string representing the frequency. |
| WeekOfMonth.kwds | Return a dict of extra parameters for the offset. |
| WeekOfMonth.name | Return a string representing the base frequency. |
| WeekOfMonth.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| WeekOfMonth.normalize | Return boolean whether the frequency can align with midnight. |
| WeekOfMonth.rule_code | Return a string representing the base frequency. |
| WeekOfMonth.n | Return the count of the number of periods. |
| WeekOfMonth.week |
| Function | Description |
|---|---|
| WeekOfMonth.copy() | Return a copy of the frequency. |
| WeekOfMonth.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| WeekOfMonth.weekday | |
| WeekOfMonth.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| WeekOfMonth.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| WeekOfMonth.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| WeekOfMonth.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| WeekOfMonth.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| WeekOfMonth.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| LastWeekOfMonth | Describes monthly dates in last week of month. |
| Function | Description |
|---|---|
| LastWeekOfMonth.freqstr | Return a string representing the frequency. |
| LastWeekOfMonth.kwds | Return a dict of extra parameters for the offset. |
| LastWeekOfMonth.name | Return a string representing the base frequency. |
| LastWeekOfMonth.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| LastWeekOfMonth.normalize | Return boolean whether the frequency can align with midnight. |
| LastWeekOfMonth.rule_code | Return a string representing the base frequency. |
| LastWeekOfMonth.n | Return the count of the number of periods. |
| LastWeekOfMonth.weekday | |
| LastWeekOfMonth.week |
| Function | Description |
|---|---|
| LastWeekOfMonth.copy() | Return a copy of the frequency. |
| LastWeekOfMonth.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| LastWeekOfMonth.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| LastWeekOfMonth.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| LastWeekOfMonth.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| LastWeekOfMonth.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| LastWeekOfMonth.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| LastWeekOfMonth.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BQuarterEnd | DateOffset increments between the last business day of each Quarter. |
| Function | Description |
|---|---|
| BQuarterEnd.freqstr | Return a string representing the frequency. |
| BQuarterEnd.kwds | Return a dict of extra parameters for the offset. |
| BQuarterEnd.name | Return a string representing the base frequency. |
| BQuarterEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BQuarterEnd.normalize | Return boolean whether the frequency can align with midnight. |
| BQuarterEnd.rule_code | Return a string representing the frequency with month suffix. |
| BQuarterEnd.n | Return the count of the number of periods. |
| BQuarterEnd.startingMonth | Return the month of the year from which quarters start. |
| Function | Description |
|---|---|
| BQuarterEnd.copy() | Return a copy of the frequency. |
| BQuarterEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BQuarterEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BQuarterEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BQuarterEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BQuarterEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BQuarterEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BQuarterEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BQuarterBegin | DateOffset increments between the first business day of each Quarter. |
| Function | Description |
|---|---|
| BQuarterBegin.freqstr | Return a string representing the frequency. |
| BQuarterBegin.kwds | Return a dict of extra parameters for the offset. |
| BQuarterBegin.name | Return a string representing the base frequency. |
| BQuarterBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BQuarterBegin.normalize | Return boolean whether the frequency can align with midnight. |
| BQuarterBegin.rule_code | Return a string representing the frequency with month suffix. |
| BQuarterBegin.n | Return the count of the number of periods. |
| BQuarterBegin.startingMonth | Return the month of the year from which quarters start. |
| Function | Description |
|---|---|
| BQuarterBegin.copy() | Return a copy of the frequency. |
| BQuarterBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BQuarterBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BQuarterBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BQuarterBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BQuarterBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BQuarterBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BQuarterBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| QuarterEnd | DateOffset increments between Quarter end dates. |
| Function | Description |
|---|---|
| QuarterEnd.freqstr | Return a string representing the frequency. |
| QuarterEnd.kwds | Return a dict of extra parameters for the offset. |
| QuarterEnd.name | Return a string representing the base frequency. |
| QuarterEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| QuarterEnd.normalize | Return boolean whether the frequency can align with midnight. |
| QuarterEnd.rule_code | Return a string representing the frequency with month suffix. |
| QuarterEnd.n | Return the count of the number of periods. |
| QuarterEnd.startingMonth | Return the month of the year from which quarters start. |
| Function | Description |
|---|---|
| QuarterEnd.copy() | Return a copy of the frequency. |
| QuarterEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| QuarterEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| QuarterEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| QuarterEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| QuarterEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| QuarterEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| QuarterEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| QuarterBegin | DateOffset increments between Quarter start dates. |
| Function | Description |
|---|---|
| QuarterBegin.freqstr | Return a string representing the frequency. |
| QuarterBegin.kwds | Return a dict of extra parameters for the offset. |
| QuarterBegin.name | Return a string representing the base frequency. |
| QuarterBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| QuarterBegin.normalize | Return boolean whether the frequency can align with midnight. |
| QuarterBegin.rule_code | Return a string representing the frequency with month suffix. |
| QuarterBegin.n | Return the count of the number of periods. |
| QuarterBegin.startingMonth | Return the month of the year from which quarters start. |
| Function | Description |
|---|---|
| QuarterBegin.copy() | Return a copy of the frequency. |
| QuarterBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| QuarterBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| QuarterBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| QuarterBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| QuarterBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| QuarterBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| QuarterBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BHalfYearEnd | DateOffset increments between the last business day of each half-year. |
| Function | Description |
|---|---|
| BHalfYearEnd.freqstr | Return a string representing the frequency. |
| BHalfYearEnd.kwds | Return a dict of extra parameters for the offset. |
| BHalfYearEnd.name | Return a string representing the base frequency. |
| BHalfYearEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BHalfYearEnd.normalize | Return boolean whether the frequency can align with midnight. |
| BHalfYearEnd.rule_code | Return a string representing the frequency with month suffix. |
| BHalfYearEnd.n | Return the count of the number of periods. |
| BHalfYearEnd.startingMonth | Return the month of the year from which half-years start. |
| Function | Description |
|---|---|
| BHalfYearEnd.copy() | Return a copy of the frequency. |
| BHalfYearEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BHalfYearEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BHalfYearEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BHalfYearEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BHalfYearEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BHalfYearEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BHalfYearEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BHalfYearBegin | DateOffset increments between the first business day of each half-year. |
| Function | Description |
|---|---|
| BHalfYearBegin.freqstr | Return a string representing the frequency. |
| BHalfYearBegin.kwds | Return a dict of extra parameters for the offset. |
| BHalfYearBegin.name | Return a string representing the base frequency. |
| BHalfYearBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BHalfYearBegin.normalize | Return boolean whether the frequency can align with midnight. |
| BHalfYearBegin.rule_code | Return a string representing the frequency with month suffix. |
| BHalfYearBegin.n | Return the count of the number of periods. |
| BHalfYearBegin.startingMonth | Return the month of the year from which half-years start. |
| Function | Description |
|---|---|
| BHalfYearBegin.copy() | Return a copy of the frequency. |
| BHalfYearBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BHalfYearBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BHalfYearBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BHalfYearBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BHalfYearBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BHalfYearBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BHalfYearBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| HalfYearEnd | DateOffset increments between half-year end dates. |
| Function | Description |
|---|---|
| HalfYearEnd.freqstr | Return a string representing the frequency. |
| HalfYearEnd.kwds | Return a dict of extra parameters for the offset. |
| HalfYearEnd.name | Return a string representing the base frequency. |
| HalfYearEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| HalfYearEnd.normalize | Return boolean whether the frequency can align with midnight. |
| HalfYearEnd.rule_code | Return a string representing the frequency with month suffix. |
| HalfYearEnd.n | Return the count of the number of periods. |
| HalfYearEnd.startingMonth | Return the month of the year from which half-years start. |
| Function | Description |
|---|---|
| HalfYearEnd.copy() | Return a copy of the frequency. |
| HalfYearEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| HalfYearEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| HalfYearEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| HalfYearEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| HalfYearEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| HalfYearEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| HalfYearEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| HalfYearBegin | DateOffset increments between half-year start dates. |
| Function | Description |
|---|---|
| HalfYearBegin.freqstr | Return a string representing the frequency. |
| HalfYearBegin.kwds | Return a dict of extra parameters for the offset. |
| HalfYearBegin.name | Return a string representing the base frequency. |
| HalfYearBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| HalfYearBegin.normalize | Return boolean whether the frequency can align with midnight. |
| HalfYearBegin.rule_code | Return a string representing the frequency with month suffix. |
| HalfYearBegin.n | Return the count of the number of periods. |
| HalfYearBegin.startingMonth | Return the month of the year from which half-years start. |
| Function | Description |
|---|---|
| HalfYearBegin.copy() | Return a copy of the frequency. |
| HalfYearBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| HalfYearBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| HalfYearBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| HalfYearBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| HalfYearBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| HalfYearBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| HalfYearBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BYearEnd | DateOffset increments between the last business day of the year. |
| Function | Description |
|---|---|
| BYearEnd.freqstr | Return a string representing the frequency. |
| BYearEnd.kwds | Return a dict of extra parameters for the offset. |
| BYearEnd.name | Return a string representing the base frequency. |
| BYearEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BYearEnd.normalize | Return boolean whether the frequency can align with midnight. |
| BYearEnd.rule_code | Return a string representing the base frequency. |
| BYearEnd.n | Return the count of the number of periods. |
| BYearEnd.month | Return the month of the year on which this offset applies. |
| Function | Description |
|---|---|
| BYearEnd.copy() | Return a copy of the frequency. |
| BYearEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BYearEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BYearEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BYearEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BYearEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BYearEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BYearEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| BYearBegin | DateOffset increments between the first business day of the year. |
| Function | Description |
|---|---|
| BYearBegin.freqstr | Return a string representing the frequency. |
| BYearBegin.kwds | Return a dict of extra parameters for the offset. |
| BYearBegin.name | Return a string representing the base frequency. |
| BYearBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| BYearBegin.normalize | Return boolean whether the frequency can align with midnight. |
| BYearBegin.rule_code | Return a string representing the base frequency. |
| BYearBegin.n | Return the count of the number of periods. |
| BYearBegin.month | Return the month of the year on which this offset applies. |
| Function | Description |
|---|---|
| BYearBegin.copy() | Return a copy of the frequency. |
| BYearBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| BYearBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| BYearBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| BYearBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| BYearBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| BYearBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| BYearBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| YearEnd([n, normalize, month]) | DateOffset increments between calendar year end dates. |
| Function | Description |
|---|---|
| YearEnd.freqstr | Return a string representing the frequency. |
| YearEnd.kwds | Return a dict of extra parameters for the offset. |
| YearEnd.name | Return a string representing the base frequency. |
| YearEnd.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| YearEnd.normalize | Return boolean whether the frequency can align with midnight. |
| YearEnd.rule_code | Return a string representing the base frequency. |
| YearEnd.n | Return the count of the number of periods. |
| YearEnd.month | Return the month of the year on which this offset applies. |
| Function | Description |
|---|---|
| YearEnd.copy() | Return a copy of the frequency. |
| YearEnd.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| YearEnd.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| YearEnd.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| YearEnd.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| YearEnd.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| YearEnd.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| YearEnd.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| YearBegin | DateOffset increments between calendar year begin dates. |
| Function | Description |
|---|---|
| YearBegin.freqstr | Return a string representing the frequency. |
| YearBegin.kwds | Return a dict of extra parameters for the offset. |
| YearBegin.name | Return a string representing the base frequency. |
| YearBegin.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| YearBegin.normalize | Return boolean whether the frequency can align with midnight. |
| YearBegin.rule_code | Return a string representing the base frequency. |
| YearBegin.n | Return the count of the number of periods. |
| YearBegin.month | Return the month of the year on which this offset applies. |
| Function | Description |
|---|---|
| YearBegin.copy() | Return a copy of the frequency. |
| YearBegin.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| YearBegin.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| YearBegin.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| YearBegin.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| YearBegin.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| YearBegin.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| YearBegin.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| FY5253 | Describes 52-53 week fiscal year. |
| Function | Description |
|---|---|
| FY5253.freqstr | Return a string representing the frequency. |
| FY5253.kwds | Return a dict of extra parameters for the offset. |
| FY5253.name | Return a string representing the base frequency. |
| FY5253.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| FY5253.normalize | Return boolean whether the frequency can align with midnight. |
| FY5253.rule_code | |
| FY5253.n | Return the count of the number of periods. |
| FY5253.startingMonth | |
| FY5253.variation | |
| FY5253.weekday | Return the weekday used by the fiscal year. |
| Function | Description |
|---|---|
| FY5253.copy() | Return a copy of the frequency. |
| FY5253.get_rule_code_suffix() | Return the suffix component of the rule code. |
| FY5253.get_year_end(dt) | |
| FY5253.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| FY5253.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| FY5253.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| FY5253.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| FY5253.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| FY5253.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| FY5253.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| FY5253Quarter | DateOffset increments between business quarter dates for 52-53 week fiscal year. |
| Function | Description |
|---|---|
| FY5253Quarter.freqstr | Return a string representing the frequency. |
| FY5253Quarter.kwds | Return a dict of extra parameters for the offset. |
| FY5253Quarter.name | Return a string representing the base frequency. |
| FY5253Quarter.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| FY5253Quarter.normalize | Return boolean whether the frequency can align with midnight. |
| FY5253Quarter.rule_code | |
| FY5253Quarter.n | Return the count of the number of periods. |
| FY5253Quarter.qtr_with_extra_week | |
| FY5253Quarter.startingMonth | |
| FY5253Quarter.variation | |
| FY5253Quarter.weekday | Return the weekday used by the fiscal year. |
| Function | Description |
|---|---|
| FY5253Quarter.copy() | Return a copy of the frequency. |
| FY5253Quarter.get_rule_code_suffix() | Return the suffix component of the rule code. |
| FY5253Quarter.get_weeks(dt) | |
| FY5253Quarter.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| FY5253Quarter.year_has_extra_week(dt) | |
| FY5253Quarter.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| FY5253Quarter.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| FY5253Quarter.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| FY5253Quarter.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| FY5253Quarter.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| FY5253Quarter.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Easter | DateOffset for the Easter holiday using logic defined in dateutil. |
| Function | Description |
|---|---|
| Easter.freqstr | Return a string representing the frequency. |
| Easter.kwds | Return a dict of extra parameters for the offset. |
| Easter.name | Return a string representing the base frequency. |
| Easter.nanos | Returns an integer of the total number of nanoseconds for fixed frequencies. |
| Easter.normalize | Return boolean whether the frequency can align with midnight. |
| Easter.rule_code | Return a string representing the base frequency. |
| Easter.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Easter.copy() | Return a copy of the frequency. |
| Easter.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Easter.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Easter.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Easter.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Easter.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Easter.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Easter.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Tick | Base class for fixed frequency offsets (Milli, Micro, Second, Minute, Hour). |
| Function | Description |
|---|---|
| Tick.freqstr | Return a string representing the frequency. |
| Tick.kwds | Return a dict of extra parameters for the offset. |
| Tick.name | Return a string representing the base frequency. |
| Tick.nanos | Returns an integer of the total number of nanoseconds. |
| Tick.normalize | Return boolean whether the frequency can align with midnight. |
| Tick.rule_code | Return a string representing the base frequency. |
| Tick.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Tick.copy() | Return a copy of the frequency. |
| Tick.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Tick.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Tick.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Tick.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Tick.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Tick.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Tick.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Day | Offset n days. |
| Function | Description |
|---|---|
| Day.freqstr | Return a string representing the frequency. |
| Day.kwds | Return a dict of extra parameters for the offset. |
| Day.name | Return a string representing the base frequency. |
| Day.nanos | Returns an integer of the total number of nanoseconds. |
| Day.normalize | Return boolean whether the frequency can align with midnight. |
| Day.rule_code | Return a string representing the base frequency. |
| Day.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Day.copy() | Return a copy of the frequency. |
| Day.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Day.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Day.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Day.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Day.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Day.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Day.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Hour | Offset n hours. |
| Function | Description |
|---|---|
| Hour.freqstr | Return a string representing the frequency. |
| Hour.kwds | Return a dict of extra parameters for the offset. |
| Hour.name | Return a string representing the base frequency. |
| Hour.nanos | Returns an integer of the total number of nanoseconds. |
| Hour.normalize | Return boolean whether the frequency can align with midnight. |
| Hour.rule_code | Return a string representing the base frequency. |
| Hour.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Hour.copy() | Return a copy of the frequency. |
| Hour.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Hour.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Hour.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Hour.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Hour.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Hour.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Hour.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Minute | Offset n minutes. |
| Function | Description |
|---|---|
| Minute.freqstr | Return a string representing the frequency. |
| Minute.kwds | Return a dict of extra parameters for the offset. |
| Minute.name | Return a string representing the base frequency. |
| Minute.nanos | Returns an integer of the total number of nanoseconds. |
| Minute.normalize | Return boolean whether the frequency can align with midnight. |
| Minute.rule_code | Return a string representing the base frequency. |
| Minute.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Minute.copy() | Return a copy of the frequency. |
| Minute.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Minute.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Minute.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Minute.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Minute.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Minute.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Minute.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Second | Offset n seconds. |
| Function | Description |
|---|---|
| Second.freqstr | Return a string representing the frequency. |
| Second.kwds | Return a dict of extra parameters for the offset. |
| Second.name | Return a string representing the base frequency. |
| Second.nanos | Returns an integer of the total number of nanoseconds. |
| Second.normalize | Return boolean whether the frequency can align with midnight. |
| Second.rule_code | Return a string representing the base frequency. |
| Second.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Second.copy() | Return a copy of the frequency. |
| Second.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Second.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Second.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Second.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Second.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Second.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Second.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Milli | Offset n milliseconds. |
| Function | Description |
|---|---|
| Milli.freqstr | Return a string representing the frequency. |
| Milli.kwds | Return a dict of extra parameters for the offset. |
| Milli.name | Return a string representing the base frequency. |
| Milli.nanos | Returns an integer of the total number of nanoseconds. |
| Milli.normalize | Return boolean whether the frequency can align with midnight. |
| Milli.rule_code | Return a string representing the base frequency. |
| Milli.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Milli.copy() | Return a copy of the frequency. |
| Milli.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Milli.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Milli.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Milli.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Milli.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Milli.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Milli.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Micro | Offset n microseconds. |
| Function | Description |
|---|---|
| Micro.freqstr | Return a string representing the frequency. |
| Micro.kwds | Return a dict of extra parameters for the offset. |
| Micro.name | Return a string representing the base frequency. |
| Micro.nanos | Returns an integer of the total number of nanoseconds. |
| Micro.normalize | Return boolean whether the frequency can align with midnight. |
| Micro.rule_code | Return a string representing the base frequency. |
| Micro.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Micro.copy() | Return a copy of the frequency. |
| Micro.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Micro.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Micro.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Micro.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Micro.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Micro.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Micro.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| Nano | Offset n nanoseconds. |
| Function | Description |
|---|---|
| Nano.freqstr | Return a string representing the frequency. |
| Nano.kwds | Return a dict of extra parameters for the offset. |
| Nano.name | Return a string representing the base frequency. |
| Nano.nanos | Returns an integer of the total number of nanoseconds. |
| Nano.normalize | Return boolean whether the frequency can align with midnight. |
| Nano.rule_code | Return a string representing the base frequency. |
| Nano.n | Return the count of the number of periods. |
| Function | Description |
|---|---|
| Nano.copy() | Return a copy of the frequency. |
| Nano.is_on_offset(dt) | Return boolean whether a timestamp intersects with this frequency. |
| Nano.is_month_start(ts) | Return boolean whether a timestamp occurs on the month start. |
| Nano.is_month_end(ts) | Return boolean whether a timestamp occurs on the month end. |
| Nano.is_quarter_start(ts) | Return boolean whether a timestamp occurs on the quarter start. |
| Nano.is_quarter_end(ts) | Return boolean whether a timestamp occurs on the quarter end. |
| Nano.is_year_start(ts) | Return boolean whether a timestamp occurs on the year start. |
| Nano.is_year_end(ts) | Return boolean whether a timestamp occurs on the year end. |
| Function | Description |
|---|---|
| to_offset(freq[, is_period]) | Return DateOffset object from string or datetime.timedelta object. |
>>> from pandas.tseries.offsets import DateOffset
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=3)
Timestamp('2017-04-01 09:10:11')
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=2)
Timestamp('2017-03-01 09:10:11')
>>> ts + DateOffset(day=31)
Timestamp('2017-01-31 09:10:11')
>>> ts + pd.DateOffset(hour=8)
Timestamp('2017-01-01 08:10:11')
>>> pd.DateOffset(5).freqstr
'<5 * DateOffsets>'
>>> pd.offsets.BusinessHour(2).freqstr
'2bh'
>>> pd.offsets.Nano().freqstr
'ns'
>>> pd.offsets.Nano(-3).freqstr
'-3ns'
>>> pd.DateOffset(5).kwds
{}
>>> pd.offsets.FY5253Quarter().kwds
{'weekday': 0,
'startingMonth': 1,
'qtr_with_extra_week': 1,
'variation': 'nearest'}
>>> pd.offsets.Hour().name
'h'
>>> pd.offsets.Hour(5).name
'h'
>>> pd.offsets.Week(n=1).nanos
ValueError: Week: weekday=None is a non-fixed frequency
>>> pd.offsets.Hour(5).normalize
False
>>> pd.offsets.Day(5).normalize
False
>>> pd.offsets.Hour().rule_code
'h'
>>> pd.offsets.Week(5).rule_code
'W'
>>> pd.offsets.Hour(5).n
5
>>> pd.offsets.Day(3).n
3
>>> freq = pd.DateOffset(1)
>>> freq_copy = freq.copy()
>>> freq is freq_copy
False
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Day(1)
>>> freq.is_on_offset(ts)
True
>>> ts = pd.Timestamp(2022, 8, 6)
>>> ts.day_name()
'Saturday'
>>> freq = pd.offsets.BusinessDay(1)
>>> freq.is_on_offset(ts)
False
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_start(ts)
True
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_month_end(ts)
False
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_start(ts)
True
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_quarter_end(ts)
False
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_start(ts)
True
>>> ts = pd.Timestamp(2022, 1, 1)
>>> freq = pd.offsets.Hour(5)
>>> freq.is_year_end(ts)
False
>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()
Timestamp is not on the offset (not a month end), so it rolls backward:
>>> offset.rollback(ts)
Timestamp('2024-12-31 00:00:00')
If the timestamp is already on the offset, it remains unchanged:
>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollback(ts_on_offset)
Timestamp('2025-01-31 00:00:00')
>>> ts = pd.Timestamp("2025-01-15 09:00:00")
>>> offset = pd.tseries.offsets.MonthEnd()
Timestamp is not on the offset (not a month end), so it rolls forward:
>>> offset.rollforward(ts)
Timestamp('2025-01-31 00:00:00')
If the timestamp is already on the offset, it remains unchanged:
>>> ts_on_offset = pd.Timestamp("2025-01-31")
>>> offset.rollforward(ts_on_offset)
Timestamp('2025-01-31 00:00:00')
You can use the parameter n to represent a shift of n business days.
>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts.strftime('%a %d %b %Y %H:%M')
'Fri 09 Dec 2022 15:00'
>>> (ts + pd.offsets.BusinessDay(n=5)).strftime('%a %d %b %Y %H:%M')
'Fri 16 Dec 2022 15:00'
Passing the parameter normalize equal to True, you shift the start of the next business day to midnight.
>>> ts = pd.Timestamp(2022, 12, 9, 15)
>>> ts + pd.offsets.BusinessDay(normalize=True)
Timestamp('2022-12-12 00:00:00')