User Tools

Site Tools


vd

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
vd [2021/08/06 13:19] – [Case Study: Exported CSV from PG&E] dblumevd [2024/05/13 11:23] (current) – [Process Data] dblume
Line 25: Line 25:
 | ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' | | ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' |
 | %%^%% | rename the column. Might have to be "product_id" or "platform_id" | | %%^%% | rename the column. Might have to be "product_id" or "platform_id" |
 +| = | Use Python function to create new column. Ex, hex to dec: ''int(curcol,16)'' |
 +| : | Split column by regex |
 | - | Hide column | | - | Hide column |
 | S | Go to "Sheets" sheet, to select another sheet to format. | | S | Go to "Sheets" sheet, to select another sheet to format. |
Line 34: Line 36:
 | " | Open duplicate sheet with only selected rows | | " | Open duplicate sheet with only selected rows |
  
-==== Case Study: Exported CSV from PG&====+===== Inspecting Columnar Data =====
  
-PG&E CSVs come with 5 rows of metadata followed by TypeDateStart TimeEnd TimeUsageUnitsCostNotes columnsDelete the five rows of metadata in text editor, or use ''tail'' to remove them like so:+^ Key ^ Meaning ^ 
 +| I | Describe all columnserrorsdistinctmodemeanmedianstdevetc
 +| i | Add column of incrementing numbers (useful for '.charts) | 
 +| . | Requires an "important" numeric column for row data. Make chart. | 
 +| O | Options to enable "numeric_binning" and set number of "histogram_bins" (use 'e') | 
 +| F | Frequency table of row counts, or histogram if numeric_binning is true |
  
-    tail +6 pge_electric_interval_data.csv | vd -f csv - +Calculating a percentage-of-total column for a numeric column:
- +
-Then prepare your PG&E data like so:+
  
 ^ Key ^ Meaning ^ ^ Key ^ Meaning ^
-Hide columns TYPE, END TIME, UNITS and NOTES | +Set column type to "int". | 
-| C | Go to column mode and... | +Describe all columns. (Highlight the "sum" cell for the column of interest.) 
-Select the DATE and START TIME columns +Convert that column to "text" so it'll be copied correctly for pasting later. 
-Make a new column that merges them +zy Yank the value of the sum. | 
-q, - Quit the Column mode, hide DATE and START TIME columns | +| q | Quit the Describe sheet. | 
-| O | Go to options mode and..+| = | New column. Enter ''curcol/'', use Ctrl+y to paste the column sum value. | 
-| e | Set ''disp_date_fmt'' to ''%Y-%m-%d %H:%M'' +====== Case Study Link: Exported CSV from PG&E ====== 
-| q | Quit options mode+ 
-| @!, %, $ | Set DATE_START_TIME to date format and important, USAGE to float, COST to currency +Visit [[vd-pge]]
-| = | Add a column, enter ''COST/USAGE'', (Make it float with ''%'') | + 
-| %%^%% | Rename COST/USAGE to ''kWh cost''+====== Cast Study: Merging Two Tables, logs and metadata ======
-| . or g. | Select columns to graph them. Notice rate changes. Notice times of high use. | +
-| +, - | Navigate with hjkl, zoom in and out ([[https://www.visidata.org/docs/graph/|more]]) |+
  
 ==== Protip: Use column view to set multiple columns at once ==== ==== Protip: Use column view to set multiple columns at once ====
Line 125: Line 128:
   $ vd --play=my_cmdlog.vd --replay-wait=0.5   $ vd --play=my_cmdlog.vd --replay-wait=0.5
  
 +====== Lists in Cells for Frequency Tables ======
  
 +Sometimes you want one of the columns in a Frequency Table to be a list of unique values. Let's say the column title is "my_column", then:
 +
 +^ Key ^ Meaning ^
 +| + | Set the aggregator to "List" |
 +| F | Make a Frequency Table for the selected column. (gF for selected columns) |
 +| =, ','.join(set(my_column)) | Create a new column of a comma delimited Python Set of cell entries. |
vd.1628281188.txt.gz · Last modified: 2023/04/12 20:44 (external edit)