Xlwings Notes
xlwings Functions
examples_write_to_sheet()examples_read_from_sheet()example_get_user_input_text()example_get_user_input_range_as_dataframe()example_get_user_input_range_as_array()load_timeseries(kind: str, curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], date_format: Optional[str], force_reload: Optional[bool])load_timeseries_excel(curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], force_reload: Optional[bool])load_timeseries_volue(curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], force_reload: Optional[bool])load_timeseries_datacenter(curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], force_reload: Optional[bool])load_timeseries_csv(curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], force_reload: Optional[bool])load_timeseries_enmacc(curve_name: xw.Range, start_date: Optional[datetime], end_date: Optional[datetime], force_reload: Optional[bool])load_requests_enmacc()load_concluded_requests_enmacc(concluded_from: datetime, concluded_to: datetime)write_timeseries_to_sheet(rng: xw.Range, n: Optional[int], start_date: Optional[datetime], end_date: Optional[datetime])describe_data(rng: xw.Range)list_cached_sources()clear_cache(return_value_in_cell: Optional[str]) -> stradd_price_timeseries(ts1_id: xw.Range, ts2_id: xw.Range, floor: Optional[bool])subtract_price_timeseries(ts1_id: xw.Range, ts2_id: xw.Range, floor: Optional[bool])multiply_price_timeseries(ts1_id: xw.Range, ts2_id: xw.Range)concat_price_timeseries(ts1_id: xw.Range, ts2_id: xw.Range, duplicate_strategy: str)CORREL2(df)
Why is new_df = new_df.sort_index(kind='mergesort') necessary?
Short answer: itโs there to make keep="first" / keep="last" deterministic and time-correct.
๐น What problem it solves
When you do:
new_df = pd.concat([self.df, other.df], axis=0)
the resulting order is:
- all rows from
self.df - then all rows from
other.df
๐ This is not guaranteed to be time-ordered
๐ฅ Why that matters
Your duplicate handling depends on order:
new_df.index.duplicated(keep="last")
๐ โlastโ means:
last in the current row order, not last in time
โ Without sorting (buggy behavior)
Imagine:
self.df:
2024-01-02 โ A
other.df:
2024-01-01 โ B
2024-01-02 โ C
After concat:
2024-01-02 โ A
2024-01-01 โ B
2024-01-02 โ C
Now:
keep="last"
keeps C, but:
- ordering is inconsistent
- time series is not sorted
- logic becomes fragile
โ With sorting
new_df = new_df.sort_index(kind="mergesort")
Result:
2024-01-01 โ B
2024-01-02 โ A
2024-01-02 โ C
Now:
- duplicates are grouped
- โlastโ means latest occurrence within same timestamp group
- behavior is consistent and predictable
๐น Why mergesort specifically?
sort_index(kind="mergesort")
Because:
- โ stable sort
- preserves original order within equal indices
๐ This is critical:
If two rows share the same timestamp:
- their relative order (self vs other) is preserved
- so
keep="last"reliably prefersother
โ Without stable sort (e.g. quicksort)
Pandas might reorder equal timestamps unpredictably:
๐ then:
- sometimes
selfwins - sometimes
otherwins ๐ฅ exactly the issue you described earlier
๐ Key insight
This line ensures:
โFor equal timestamps, preserve concat order โ so
otheroverwritesselfโ
๐ง Summary
new_df = new_df.sort_index(kind="mergesort")
is necessary because it:
- Sorts time series chronologically
- Groups duplicate timestamps together
- Preserves order between duplicates (critical)
- Makes
keep="last"deterministic
๐ Alternative (if you want to avoid sorting dependency)
If your intent is strictly:
โother overrides selfโ
You could skip sorting entirely and do:
new_df = self.df.copy()
new_df.update(other.df)
But:
- โ doesnโt handle non-overlapping rows the same way
- โ not equivalent to concat