From f3c9b3707e818a1386ccaad6ede20de5b49d6181 Mon Sep 17 00:00:00 2001 From: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Mon, 31 Jul 2023 08:05:54 -1000 Subject: [PATCH] DOC: Use more executed instead of static code blocks (#54282) * DOC: Use more executed instead of static code blocks * change to except * Convert more code blocks * convert even more * okexcept * More fixes * Address more * another okexcept * Fix okexcept * address again * more fixes * fix merging --- doc/source/user_guide/advanced.rst | 43 +++----- doc/source/user_guide/basics.rst | 43 +++----- doc/source/user_guide/categorical.rst | 11 +- doc/source/user_guide/dsintro.rst | 4 +- doc/source/user_guide/indexing.rst | 151 ++++++++------------------ doc/source/user_guide/io.rst | 137 ++++++++++------------- doc/source/user_guide/merging.rst | 19 ++-- doc/source/user_guide/text.rst | 6 +- doc/source/user_guide/timeseries.rst | 36 +++--- doc/source/whatsnew/v0.21.0.rst | 1 - 10 files changed, 169 insertions(+), 282 deletions(-) diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst index 41b0c98e339da..682fa4c9b4fcc 100644 --- a/doc/source/user_guide/advanced.rst +++ b/doc/source/user_guide/advanced.rst @@ -620,31 +620,23 @@ inefficient (and show a ``PerformanceWarning``). It will also return a copy of the data rather than a view: .. ipython:: python + :okwarning: dfm = pd.DataFrame( {"jim": [0, 0, 1, 1], "joe": ["x", "x", "z", "y"], "jolie": np.random.rand(4)} ) dfm = dfm.set_index(["jim", "joe"]) dfm - -.. code-block:: ipython - - In [4]: dfm.loc[(1, 'z')] - PerformanceWarning: indexing past lexsort depth may impact performance. - - Out[4]: - jolie - jim joe - 1 z 0.64094 + dfm.loc[(1, 'z')] .. _advanced.unsorted: Furthermore, if you try to index something that is not fully lexsorted, this can raise: -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [5]: dfm.loc[(0, 'y'):(1, 'z')] - UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)' + dfm.loc[(0, 'y'):(1, 'z')] The :meth:`~MultiIndex.is_monotonic_increasing` method on a ``MultiIndex`` shows if the index is sorted: @@ -836,10 +828,10 @@ values **not** in the categories, similarly to how you can reindex **any** panda df5 = df5.set_index("B") df5.index - .. code-block:: ipython + .. ipython:: python + :okexcept: - In [1]: pd.concat([df4, df5]) - TypeError: categories must match existing categories when appending + pd.concat([df4, df5]) .. _advanced.rangeindex: @@ -921,11 +913,10 @@ Selecting using an ``Interval`` will only return exact matches. Trying to select an ``Interval`` that is not exactly contained in the ``IntervalIndex`` will raise a ``KeyError``. -.. code-block:: python +.. ipython:: python + :okexcept: - In [7]: df.loc[pd.Interval(0.5, 2.5)] - --------------------------------------------------------------------------- - KeyError: Interval(0.5, 2.5, closed='right') + df.loc[pd.Interval(0.5, 2.5)] Selecting all ``Intervals`` that overlap a given ``Interval`` can be performed using the :meth:`~IntervalIndex.overlaps` method to create a boolean indexer. @@ -1062,15 +1053,14 @@ On the other hand, if the index is not monotonic, then both slice bounds must be # OK because 2 and 4 are in the index df.loc[2:4, :] -.. code-block:: ipython +.. ipython:: python + :okexcept: # 0 is not in the index - In [9]: df.loc[0:4, :] - KeyError: 0 + df.loc[0:4, :] # 3 is not a unique label - In [11]: df.loc[2:3, :] - KeyError: 'Cannot get right slice bound for non-unique label: 3' + df.loc[2:3, :] ``Index.is_monotonic_increasing`` and ``Index.is_monotonic_decreasing`` only check that an index is weakly monotonic. To check for strict monotonicity, you can combine one of those with @@ -1109,7 +1099,8 @@ accomplished as such: However, if you only had ``c`` and ``e``, determining the next element in the index can be somewhat complicated. For example, the following does not work: -:: +.. ipython:: python + :okexcept: s.loc['c':'e' + 1] diff --git a/doc/source/user_guide/basics.rst b/doc/source/user_guide/basics.rst index 06e52d8713409..2e299da5e5794 100644 --- a/doc/source/user_guide/basics.rst +++ b/doc/source/user_guide/basics.rst @@ -322,24 +322,21 @@ You can test if a pandas object is empty, via the :attr:`~DataFrame.empty` prope .. warning:: - You might be tempted to do the following: + Asserting the truthiness of a pandas object will raise an error, as the testing of the emptiness + or values is ambiguous. - .. code-block:: python - - >>> if df: - ... pass - - Or - - .. code-block:: python + .. ipython:: python + :okexcept: - >>> df and df2 + if df: + print(True) - These will both raise errors, as you are trying to compare multiple values.:: + .. ipython:: python + :okexcept: - ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all(). + df and df2 -See :ref:`gotchas` for a more detailed discussion. + See :ref:`gotchas` for a more detailed discussion. .. _basics.equals: @@ -404,13 +401,12 @@ objects of the same length: Trying to compare ``Index`` or ``Series`` objects of different lengths will raise a ValueError: -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [55]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar']) - ValueError: Series lengths must match to compare + pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar']) - In [56]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo']) - ValueError: Series lengths must match to compare + pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo']) Note that this is different from the NumPy behavior where a comparison can be broadcast: @@ -910,18 +906,15 @@ maximum value for each column occurred: tsdf.apply(lambda x: x.idxmax()) You may also pass additional arguments and keyword arguments to the :meth:`~DataFrame.apply` -method. For instance, consider the following function you would like to apply: +method. -.. code-block:: python +.. ipython:: python def subtract_and_divide(x, sub, divide=1): return (x - sub) / divide -You may then apply this function as follows: - -.. code-block:: python - - df.apply(subtract_and_divide, args=(5,), divide=3) + df_udf = pd.DataFrame(np.ones((2, 2))) + df_udf.apply(subtract_and_divide, args=(5,), divide=3) Another useful feature is the ability to pass Series methods to carry out some Series operation on each column or row: diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst index 61ecbff96ac7d..9efa7df3ff669 100644 --- a/doc/source/user_guide/categorical.rst +++ b/doc/source/user_guide/categorical.rst @@ -873,13 +873,12 @@ categoricals of the same categories and order information The below raises ``TypeError`` because the categories are ordered and not identical. -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [1]: a = pd.Categorical(["a", "b"], ordered=True) - In [2]: b = pd.Categorical(["a", "b", "c"], ordered=True) - In [3]: union_categoricals([a, b]) - Out[3]: - TypeError: to union ordered Categoricals, all categories must be the same + a = pd.Categorical(["a", "b"], ordered=True) + b = pd.Categorical(["a", "b", "c"], ordered=True) + union_categoricals([a, b]) Ordered categoricals with different categories or orderings can be combined by using the ``ignore_ordered=True`` argument. diff --git a/doc/source/user_guide/dsintro.rst b/doc/source/user_guide/dsintro.rst index 4b0829e4a23b9..d60532f5f4027 100644 --- a/doc/source/user_guide/dsintro.rst +++ b/doc/source/user_guide/dsintro.rst @@ -31,9 +31,9 @@ Series type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the **index**. The basic method to create a :class:`Series` is to call: -:: +.. code-block:: python - >>> s = pd.Series(data, index=index) + s = pd.Series(data, index=index) Here, ``data`` can be many different things: diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst index b574ae9cb12c7..52bc43f52b1d3 100644 --- a/doc/source/user_guide/indexing.rst +++ b/doc/source/user_guide/indexing.rst @@ -244,17 +244,13 @@ You can use attribute access to modify an existing element of a Series or column if you try to use attribute access to create a new column, it creates a new attribute rather than a new column and will this raise a ``UserWarning``: -.. code-block:: ipython - - In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) - In [2]: df.two = [4, 5, 6] - UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access - In [3]: df - Out[3]: - one - 0 1.0 - 1 2.0 - 2 3.0 +.. ipython:: python + :okwarning: + + df_new = pd.DataFrame({'one': [1., 2., 3.]}) + df_new.two = [4, 5, 6] + df_new + Slicing ranges -------------- @@ -304,17 +300,14 @@ Selection by label ``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a ``DatetimeIndex``. These will raise a ``TypeError``. - .. ipython:: python - - dfl = pd.DataFrame(np.random.randn(5, 4), - columns=list('ABCD'), - index=pd.date_range('20130101', periods=5)) - dfl - - .. code-block:: ipython + .. ipython:: python + :okexcept: - In [4]: dfl.loc[2:3] - TypeError: cannot do slice indexing on with these indexers [2] of + dfl = pd.DataFrame(np.random.randn(5, 4), + columns=list('ABCD'), + index=pd.date_range('20130101', periods=5)) + dfl + dfl.loc[2:3] String likes in slicing *can* be convertible to the type of the index and lead to natural slicing. @@ -542,13 +535,15 @@ A single indexer that is out of bounds will raise an ``IndexError``. A list of indexers where any element is out of bounds will raise an ``IndexError``. -.. code-block:: python +.. ipython:: python + :okexcept: + + dfl.iloc[[4, 5, 6]] - >>> dfl.iloc[[4, 5, 6]] - IndexError: positional indexers are out-of-bounds +.. ipython:: python + :okexcept: - >>> dfl.iloc[:, 4] - IndexError: single positional indexer is out-of-bounds + dfl.iloc[:, 4] .. _indexing.callable: @@ -618,59 +613,6 @@ For getting *multiple* indexers, using ``.get_indexer``: dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])] -.. _deprecate_loc_reindex_listlike: -.. _indexing.deprecate_loc_reindex_listlike: - -Indexing with list with missing labels is deprecated ----------------------------------------------------- - -In prior versions, using ``.loc[list-of-labels]`` would work as long as *at least 1* of the keys was found (otherwise it -would raise a ``KeyError``). This behavior was changed and will now raise a ``KeyError`` if at least one label is missing. -The recommended alternative is to use ``.reindex()``. - -For example. - -.. ipython:: python - - s = pd.Series([1, 2, 3]) - s - -Selection with all keys found is unchanged. - -.. ipython:: python - - s.loc[[1, 2]] - -Previous behavior - -.. code-block:: ipython - - In [4]: s.loc[[1, 2, 3]] - Out[4]: - 1 2.0 - 2 3.0 - 3 NaN - dtype: float64 - - -Current behavior - -.. code-block:: ipython - - In [4]: s.loc[[1, 2, 3]] - Passing list-likes to .loc with any non-matching elements will raise - KeyError in the future, you can use .reindex() as an alternative. - - See the documentation here: - https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike - - Out[4]: - 1 2.0 - 2 3.0 - 3 NaN - dtype: float64 - - Reindexing ~~~~~~~~~~ @@ -678,6 +620,7 @@ The idiomatic way to achieve selecting potentially not-found elements is via ``. .. ipython:: python + s = pd.Series([1, 2, 3]) s.reindex([1, 2, 3]) Alternatively, if you want to select only *valid* keys, the following is idiomatic and efficient; it is guaranteed to preserve the dtype of the selection. @@ -690,14 +633,11 @@ Alternatively, if you want to select only *valid* keys, the following is idiomat Having a duplicated index will raise for a ``.reindex()``: .. ipython:: python + :okexcept: s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c']) labels = ['c', 'd'] - -.. code-block:: ipython - - In [17]: s.reindex(labels) - ValueError: cannot reindex on an axis with duplicate labels + s.reindex(labels) Generally, you can intersect the desired labels with the current axis, and then reindex. @@ -708,12 +648,11 @@ axis, and then reindex. However, this would *still* raise if your resulting index is duplicated. -.. code-block:: ipython - - In [41]: labels = ['a', 'd'] +.. ipython:: python + :okexcept: - In [42]: s.loc[s.index.intersection(labels)].reindex(labels) - ValueError: cannot reindex on an axis with duplicate labels + labels = ['a', 'd'] + s.loc[s.index.intersection(labels)].reindex(labels) .. _indexing.basics.partial_setting: @@ -1757,11 +1696,13 @@ discards the index, instead of putting index values in the DataFrame's columns. Adding an ad hoc index ~~~~~~~~~~~~~~~~~~~~~~ -If you create an index yourself, you can just assign it to the ``index`` field: +You can assign a custom index to the ``index`` attribute: -.. code-block:: python +.. ipython:: python - data.index = index + df_idx = pd.DataFrame(range(4)) + df_idx.index = pd.Index([10, 20, 30, 40], name="a") + df_idx .. _indexing.view_versus_copy: @@ -1892,15 +1833,12 @@ chained indexing expression, you can set the :ref:`option ` This however is operating on a copy and will not work. -:: +.. ipython:: python + :okwarning: + :okexcept: - >>> pd.set_option('mode.chained_assignment','warn') - >>> dfb[dfb['a'].str.startswith('o')]['c'] = 42 - Traceback (most recent call last) - ... - SettingWithCopyWarning: - A value is trying to be set on a copy of a slice from a DataFrame. - Try using .loc[row_index,col_indexer] = value instead + with option_context('mode.chained_assignment','warn'): + dfb[dfb['a'].str.startswith('o')]['c'] = 42 A chained assignment can also crop up in setting in a mixed dtype frame. @@ -1937,15 +1875,12 @@ The following *can* work at times, but it is not guaranteed to, and therefore sh Last, the subsequent example will **not** work at all, and so should be avoided: -:: +.. ipython:: python + :okwarning: + :okexcept: - >>> pd.set_option('mode.chained_assignment','raise') - >>> dfd.loc[0]['a'] = 1111 - Traceback (most recent call last) - ... - SettingWithCopyError: - A value is trying to be set on a copy of a slice from a DataFrame. - Try using .loc[row_index,col_indexer] = value instead + with option_context('mode.chained_assignment','raise'): + dfd.loc[0]['a'] = 1111 .. warning:: diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 3f986cd803b10..006ab5c49e24c 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -849,8 +849,8 @@ column names: with open("tmp.csv", "w") as fh: fh.write(data) - df = pd.read_csv("tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]]) - df + df = pd.read_csv("tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]]) + df By default the parser removes the component date columns, but you can choose to retain them via the ``keep_date_col`` keyword: @@ -1103,10 +1103,10 @@ By default, numbers with a thousands separator will be parsed as strings: with open("tmp.csv", "w") as fh: fh.write(data) - df = pd.read_csv("tmp.csv", sep="|") - df + df = pd.read_csv("tmp.csv", sep="|") + df - df.level.dtype + df.level.dtype The ``thousands`` keyword allows integers to be parsed correctly: @@ -1212,81 +1212,55 @@ too many fields will raise an error by default: You can elect to skip bad lines: -.. code-block:: ipython +.. ipython:: python - In [29]: pd.read_csv(StringIO(data), on_bad_lines="warn") - Skipping line 3: expected 3 fields, saw 4 + data = "a,b,c\n1,2,3\n4,5,6,7\n8,9,10" + pd.read_csv(StringIO(data), on_bad_lines="skip") - Out[29]: - a b c - 0 1 2 3 - 1 8 9 10 +.. versionadded:: 1.4.0 Or pass a callable function to handle the bad line if ``engine="python"``. The bad line will be a list of strings that was split by the ``sep``: -.. code-block:: ipython - - In [29]: external_list = [] - - In [30]: def bad_lines_func(line): - ...: external_list.append(line) - ...: return line[-3:] - - In [31]: pd.read_csv(StringIO(data), on_bad_lines=bad_lines_func, engine="python") - Out[31]: - a b c - 0 1 2 3 - 1 5 6 7 - 2 8 9 10 +.. ipython:: python - In [32]: external_list - Out[32]: [4, 5, 6, 7] + external_list = [] + def bad_lines_func(line): + external_list.append(line) + return line[-3:] + pd.read_csv(StringIO(data), on_bad_lines=bad_lines_func, engine="python") + external_list - .. versionadded:: 1.4.0 +.. note:: -Note that the callable function will handle only a line with too many fields. -Bad lines caused by other errors will be silently skipped. + The callable function will handle only a line with too many fields. + Bad lines caused by other errors will be silently skipped. -For example: - -.. code-block:: ipython + .. ipython:: python - def bad_lines_func(line): - print(line) + bad_lines_func = lambda line: print(line) - data = 'name,type\nname a,a is of type a\nname b,"b\" is of type b"' - data - pd.read_csv(data, on_bad_lines=bad_lines_func, engine="python") + data = 'name,type\nname a,a is of type a\nname b,"b\" is of type b"' + data + pd.read_csv(StringIO(data), on_bad_lines=bad_lines_func, engine="python") -The line was not processed in this case, as a "bad line" here is caused by an escape character. + The line was not processed in this case, as a "bad line" here is caused by an escape character. You can also use the ``usecols`` parameter to eliminate extraneous column data that appear in some lines but not others: -.. code-block:: ipython - - In [33]: pd.read_csv(StringIO(data), usecols=[0, 1, 2]) +.. ipython:: python + :okexcept: - Out[33]: - a b c - 0 1 2 3 - 1 4 5 6 - 2 8 9 10 + pd.read_csv(StringIO(data), usecols=[0, 1, 2]) In case you want to keep all data including the lines with too many fields, you can specify a sufficient number of ``names``. This ensures that lines with not enough fields are filled with ``NaN``. -.. code-block:: ipython - - In [34]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd']) +.. ipython:: python - Out[34]: - a b c d - 0 1 2 3 NaN - 1 4 5 6 7 - 2 8 9 10 NaN + pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd']) .. _io.dialect: @@ -4301,12 +4275,16 @@ This format is specified by default when using ``put`` or ``to_hdf`` or by ``for A ``fixed`` format will raise a ``TypeError`` if you try to retrieve using a ``where``: - .. code-block:: python + .. ipython:: python + :okexcept: - >>> pd.DataFrame(np.random.randn(10, 2)).to_hdf("test_fixed.h5", "df") - >>> pd.read_hdf("test_fixed.h5", "df", where="index>5") - TypeError: cannot pass a where specification when reading a fixed format. - this store must be selected in its entirety + pd.DataFrame(np.random.randn(10, 2)).to_hdf("test_fixed.h5", "df") + pd.read_hdf("test_fixed.h5", "df", where="index>5") + + .. ipython:: python + :suppress: + + os.remove("test_fixed.h5") .. _io.hdf5-table: @@ -4397,16 +4375,15 @@ will yield a tuple for each group key along with the relative keys of its conten Hierarchical keys cannot be retrieved as dotted (attribute) access as described above for items stored under the root node. - .. code-block:: ipython + .. ipython:: python + :okexcept: - In [8]: store.foo.bar.bah - AttributeError: 'HDFStore' object has no attribute 'foo' + store.foo.bar.bah + + .. ipython:: python # you can directly access the actual PyTables node but using the root node - In [9]: store.root.foo.bar.bah - Out[9]: - /foo/bar/bah (Group) '' - children := ['block0_items' (Array), 'block0_values' (Array), 'axis0' (Array), 'axis1' (Array)] + store.root.foo.bar.bah Instead, use explicit string based keys: @@ -4466,19 +4443,19 @@ storing/selecting from homogeneous index ``DataFrames``. .. ipython:: python - index = pd.MultiIndex( - levels=[["foo", "bar", "baz", "qux"], ["one", "two", "three"]], - codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], - names=["foo", "bar"], - ) - df_mi = pd.DataFrame(np.random.randn(10, 3), index=index, columns=["A", "B", "C"]) - df_mi + index = pd.MultiIndex( + levels=[["foo", "bar", "baz", "qux"], ["one", "two", "three"]], + codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + names=["foo", "bar"], + ) + df_mi = pd.DataFrame(np.random.randn(10, 3), index=index, columns=["A", "B", "C"]) + df_mi - store.append("df_mi", df_mi) - store.select("df_mi") + store.append("df_mi", df_mi) + store.select("df_mi") - # the levels are automatically included as data columns - store.select("df_mi", "foo=bar") + # the levels are automatically included as data columns + store.select("df_mi", "foo=bar") .. note:: The ``index`` keyword is reserved and cannot be use as a level name. @@ -4559,7 +4536,7 @@ The right-hand side of the sub-expression (after a comparison operator) can be: instead of this - .. code-block:: ipython + .. code-block:: python string = "HolyMoly'" store.select('df', f'index == {string}') diff --git a/doc/source/user_guide/merging.rst b/doc/source/user_guide/merging.rst index 962de385a08c5..10793a6973f8a 100644 --- a/doc/source/user_guide/merging.rst +++ b/doc/source/user_guide/merging.rst @@ -155,10 +155,10 @@ functionality below. reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension. -:: + .. code-block:: python - frames = [ process_your_file(f) for f in files ] - result = pd.concat(frames) + frames = [process_your_file(f) for f in files] + result = pd.concat(frames) .. note:: @@ -732,17 +732,12 @@ In the following example, there are duplicate values of ``B`` in the right ``DataFrame``. As this is not a one-to-one merge -- as specified in the ``validate`` argument -- an exception will be raised. - .. ipython:: python + :okexcept: - left = pd.DataFrame({"A": [1, 2], "B": [1, 2]}) - right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]}) - -.. code-block:: ipython - - In [53]: result = pd.merge(left, right, on="B", how="outer", validate="one_to_one") - ... - MergeError: Merge keys are not unique in right dataset; not a one-to-one merge + left = pd.DataFrame({"A": [1, 2], "B": [1, 2]}) + right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]}) + result = pd.merge(left, right, on="B", how="outer", validate="one_to_one") If the user is aware of the duplicates in the right ``DataFrame`` but wants to ensure there are no duplicates in the left DataFrame, one can use the diff --git a/doc/source/user_guide/text.rst b/doc/source/user_guide/text.rst index c193df5118926..cf27fc8385223 100644 --- a/doc/source/user_guide/text.rst +++ b/doc/source/user_guide/text.rst @@ -574,10 +574,10 @@ returns a ``DataFrame`` if ``expand=True``. It raises ``ValueError`` if ``expand=False``. -.. code-block:: python +.. ipython:: python + :okexcept: - >>> s.index.str.extract("(?P[a-zA-Z])([0-9]+)", expand=False) - ValueError: only one regex group is supported with Index + s.index.str.extract("(?P[a-zA-Z])([0-9]+)", expand=False) The table below summarizes the behavior of ``extract(expand=False)`` (input subject in first column, number of groups in regex in diff --git a/doc/source/user_guide/timeseries.rst b/doc/source/user_guide/timeseries.rst index a0754ba0d2995..bc6a3926188f1 100644 --- a/doc/source/user_guide/timeseries.rst +++ b/doc/source/user_guide/timeseries.rst @@ -289,10 +289,10 @@ Invalid data The default behavior, ``errors='raise'``, is to raise when unparsable: -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [2]: pd.to_datetime(['2009/07/31', 'asd'], errors='raise') - ValueError: Unknown datetime string format + pd.to_datetime(['2009/07/31', 'asd'], errors='raise') Pass ``errors='ignore'`` to return the original input when unparsable: @@ -2016,12 +2016,11 @@ If ``Period`` freq is daily or higher (``D``, ``H``, ``T``, ``S``, ``L``, ``U``, p + datetime.timedelta(minutes=120) p + np.timedelta64(7200, "s") -.. code-block:: ipython +.. ipython:: python + :okexcept: + + p + pd.offsets.Minute(5) - In [1]: p + pd.offsets.Minute(5) - Traceback - ... - ValueError: Input has different freq from Period(freq=H) If ``Period`` has other frequencies, only the same ``offsets`` can be added. Otherwise, ``ValueError`` will be raised. @@ -2030,12 +2029,11 @@ If ``Period`` has other frequencies, only the same ``offsets`` can be added. Oth p = pd.Period("2014-07", freq="M") p + pd.offsets.MonthEnd(3) -.. code-block:: ipython +.. ipython:: python + :okexcept: + + p + pd.offsets.MonthBegin(3) - In [1]: p + pd.offsets.MonthBegin(3) - Traceback - ... - ValueError: Input has different freq from Period(freq=M) Taking the difference of ``Period`` instances with the same frequency will return the number of frequency units between them: @@ -2564,10 +2562,10 @@ twice within one day ("clocks fall back"). The following options are available: This will fail as there are ambiguous times (``'11/06/2011 01:00'``) -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [2]: rng_hourly.tz_localize('US/Eastern') - AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument + rng_hourly.tz_localize('US/Eastern') Handle these ambiguous times by specifying the following. @@ -2599,10 +2597,10 @@ can be controlled by the ``nonexistent`` argument. The following options are ava Localization of nonexistent times will raise an error by default. -.. code-block:: ipython +.. ipython:: python + :okexcept: - In [2]: dti.tz_localize('Europe/Warsaw') - NonExistentTimeError: 2015-03-29 02:30:00 + dti.tz_localize('Europe/Warsaw') Transform nonexistent times to ``NaT`` or shift the times. diff --git a/doc/source/whatsnew/v0.21.0.rst b/doc/source/whatsnew/v0.21.0.rst index 1dae2e8463c27..b45ea8a2b522c 100644 --- a/doc/source/whatsnew/v0.21.0.rst +++ b/doc/source/whatsnew/v0.21.0.rst @@ -440,7 +440,6 @@ Indexing with a list with missing labels is deprecated Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning ``NaN`` for missing labels. This will now show a ``FutureWarning``. In the future this will raise a ``KeyError`` (:issue:`15747`). This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]`` or ``[[]]`` when passing a list-of-labels with at least 1 missing label. -See the :ref:`deprecation docs `. .. ipython:: python