Slicing and extracting from dataframe

August 22, 2023

I have a dataframe like below :

     time  power speed state 

1   14.00  29    3     1
2   14.01  30    3     2
3   14.02  29    3     3
4   14.03  30    3     4
5   14.04  29    3     5
6   14.05  30    3     6
7   14.06  29    3     6
8   14.07  30    3     6
9   14.08  29    3     6
10  14.09  30    3     5
11  14.10  29    3     5
12  14.11  30    3     5
13  14.12  29    3     5
14  14.13  30    3     6
15  14.14  31    4     6 
16  14.15  32    4     6

Each cycle starts at state 5 ( row 10, only after state 6 ) and ends just before state 6 is back ( i.e row 13 ). So cycle 1 is between rows 10 and 13.

This is a large data and there are multiple cycles. I want to extract each cycle as a data frame.
I tried some iterations but it didn’t work.

 charge_cycles = []
current_charge_start = None
current_drive_start = None
total_energy_consumed = 0
drive_data = []

for index, row in data.iterrows():
    if row['state'] == '6':
        if current_drive_start is not None:
            energy_during_drive = total_energy_consumed
            charge_cycles.append(energy_during_drive)
            drive_data.append(data.loc[current_drive_start:index])
            current_drive_start = None
            total_energy_consumed = 0
        current_charge_start = row['time']
    elif row['state'] == '5':
        if current_charge_start is not None and current_drive_start is None:
            current_drive_start = index
        if current_drive_start is not None:
            total_energy_consumed += row['power'] * (row['time'] - data.loc[current_drive_start, 'time'])
            current_drive_start = index

# Print the energy consumption during driving between each charge cycle
for i, energy in enumerate(charge_cycles, start=1):
    print(f"Charge Cycle {i}: Energy Consumed During Driving = {energy} units")

# Display the DataFrames for each driving cycle
for i, drive_df in enumerate(drive_data, start=1):
    print(f"Driving Cycle {i}:\n{drive_df}")

This is giving me the whole data frame as a result. Can anyone please help me with the python code for this problem ?

>Solution :

IIUC, you can try:

df = pd.DataFrame(
    {
        "state": list(
            "6666665555555555555543555555512555666666666666666655555555412344666666666"
        )
    }
)
df["state"] = df["state"].astype(int)


# remove the initial values 'till 6
df = df.loc[df["state"].eq(6).idxmax() :]

mask = df["state"].eq(6)
for _, g in df.groupby((mask != mask.shift()).cumsum()):
    if (eq5 := g["state"].eq(5)).any():
        g = g.loc[eq5.idxmax() :]
        print(g)
        print("-" * 80)

Prints:

    state
6       5
7       5
8       5
9       5
10      5
11      5
12      5
13      5
14      5
15      5
16      5
17      5
18      5
19      5
20      4
21      3
22      5
23      5
24      5
25      5
26      5
27      5
28      5
29      1
30      2
31      5
32      5
33      5
--------------------------------------------------------------------------------
    state
50      5
51      5
52      5
53      5
54      5
55      5
56      5
57      5
58      4
59      1
60      2
61      3
62      4
63      4
--------------------------------------------------------------------------------